OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Josef Bacik	a79050434b	blk-rq-qos: refactor out common elements of blk-wbt blkcg-qos is going to do essentially what wbt does, only on a cgroup basis. Break out the common code that will be shared between blkcg-qos and wbt into blk-rq-qos.* so they can both utilize the same infrastructure. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	2ecbf45635	blk-stat: export helpers for modifying blk_rq_stat We need to use blk_rq_stat in the blkcg qos stuff, so export some of these helpers so they can be used by other things. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	d09d8df3a2	blkcg: add generic throttling mechanism Since IO can be issued from literally anywhere it's almost impossible to do throttling without having some sort of adverse effect somewhere else in the system because of locking or other dependencies. The best way to solve this is to do the throttling when we know we aren't holding any other kernel resources. Do this by tracking throttling in a per-blkg basis, and if we require throttling flag the task that it needs to check before it returns to user space and possibly sleep there. This is to address the case where a process is doing work that is generating IO that can't be throttled, whether that is directly with a lot of REQ_META IO, or indirectly by allocating so much memory that it is swamping the disk with REQ_SWAP. We can't use task_add_work as we don't want to induce a memory allocation in the IO path, so simply saving the request queue in the task and flagging it to do the notify_resume thing achieves the same result without the overhead of a memory allocation. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Tejun Heo	0d3bd88d54	swap,blkcg: issue swap io with the appropriate context For backcharging we need to know who the page belongs to when swapping it out. We don't worry about things that do ->rw_page (zram etc) at the moment, we're only worried about pages that actually go to a block device. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	903d23f0a3	blk-cgroup: allow controllers to output their own stats blk-iolatency has a few stats that it would like to print out, and instead of adding a bunch of crap to the generic code just provide a helper so that controllers can add stuff to the stat line if they want to. Hide it behind a boot option since it changes the output of io.stat from normal, and these stats are only interesting to developers. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	08e18eab0c	block: add bi_blkg to the bio for cgroups Currently io.low uses a bi_cg_private to stash its private data for the blkg, however other blkcg policies may want to use this as well. Since we can get the private data out of the blkg, move this to bi_blkg in the bio and make it generic, then we can use bio_associate_blkg() to attach the blkg to the bio. Theoretically we could simply replace the bi_css with this since we can get to all the same information from the blkg, however you have to lookup the blkg, so for example wbc_init_bio() would have to lookup and possibly allocate the blkg for the css it was trying to attach to the bio. This could be problematic and result in us either not attaching the css at all to the bio, or falling back to the root blkcg if we are unable to allocate the corresponding blkg. So for now do this, and in the future if possible we could just replace the bi_css with bi_blkg and update the helpers to do the correct translation. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Ming Lei	6e76871730	blk-mq: dequeue request one by one from sw queue if hctx is busy It won't be efficient to dequeue request one by one from sw queue, but we have to do that when queue is busy for better merge performance. This patch takes the Exponential Weighted Moving Average(EWMA) to figure out if queue is busy, then only dequeue request one by one from sw queue when queue is busy. Fixes: `b347689ffb` ("blk-mq-sched: improve dispatching from sw queue") Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Reported-by: Kashyap Desai <kashyap.desai@broadcom.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Ming Lei	b04f50ab8a	blk-mq: only attempt to merge bio if there is rq in sw queue Only attempt to merge bio iff the ctx->rq_list isn't empty, because: 1) for high-performance SSD, most of times dispatch may succeed, then there may be nothing left in ctx->rq_list, so don't try to merge over sw queue if it is empty, then we can save one acquiring of ctx->lock 2) we can't expect good merge performance on per-cpu sw queue, and missing one merge on sw queue won't be a big deal since tasks can be scheduled from one CPU to another. Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reported-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Ming Lei	3f0cedc7e9	blk-mq: use list_splice_tail_init() to insert requests list_splice_tail_init() is much more faster than inserting each request one by one, given all requets in 'list' belong to same sw queue and ctx->lock is required to insert requests. Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Minwoo Im	c018c84fdb	blk-mq: fix typo in a function comment Fix typo in a function blk_mq_alloc_tag_set() comment. if if it too large -> if it's too large. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Minwoo Im	0da73d00ca	blk-mq: code clean-up by adding an API to clear set->mq_map set->mq_map is now currently cleared if something goes wrong when establishing a queue map in blk-mq-pci.c. It's also cleared before updating a queue map in blk_mq_update_queue_map(). This patch provides an API to clear set->mq_map to make it clear. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Colin Ian King	e84422cdf3	partitions/ldm: remove redundant pointer dgrp Pointer dgrp is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'dgrp' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Liu Bo	43ada78781	Block: blk-throttle: set low_valid immediately once one cgroup has io.low configured Once one cgroup has io.low configured, @low_valid becomes true and other cgroups won't switch it back whatsoever. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Bart Van Assche	1954e9a998	block: Document how blk_update_request() handles RQF_SPECIAL_PAYLOAD requests The payload of struct request is stored in the request.bio chain if the RQF_SPECIAL_PAYLOAD flag is not set and in request.special_vec if RQF_SPECIAL_PAYLOAD has been set. However, blk_update_request() iterates over req->bio whether or not RQF_SPECIAL_PAYLOAD has been set. Additionally, the RQF_SPECIAL_PAYLOAD flag is ignored by blk_rq_bytes() which means that the value returned by that function is incorrect if the RQF_SPECIAL_PAYLOAD flag has been set. It is not clear to me whether this is an oversight or whether this happened on purpose. Anyway, document that it is known that both functions ignore RQF_SPECIAL_PAYLOAD. See also commit `f9d03f96b9` ("block: improve handling of the magic discard payload"). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Ming Lei	1311326cf4	blk-mq: avoid to synchronize rcu inside blk_cleanup_queue() SCSI probing may synchronously create and destroy a lot of request_queues for non-existent devices. Any synchronize_rcu() in queue creation or destroy path may introduce long latency during booting, see detailed description in comment of blk_register_queue(). This patch removes one synchronize_rcu() inside blk_cleanup_queue() for this case, commit c2856ae2f315d75(blk-mq: quiesce queue before freeing queue) needs synchronize_rcu() for implementing blk_mq_quiesce_queue(), but when queue isn't initialized, it isn't necessary to do that since only pass-through requests are involved, no original issue in scsi_execute() at all. Without this patch and previous one, it may take more 20+ seconds for virtio-scsi to complete disk probe. With the two patches, the time becomes less than 100ms. Fixes: `c2856ae2f3` ("blk-mq: quiesce queue before freeing queue") Reported-by: Andrew Jones <drjones@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Tested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Ming Lei	97889f9ac2	blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set() We have to remove synchronize_rcu() from blk_queue_cleanup(), otherwise long delay can be caused during lun probe. For removing it, we have to avoid to iterate the set->tag_list in IO path, eg, blk_mq_sched_restart(). This patch reverts 5b79413946d (Revert "blk-mq: don't handle TAG_SHARED in restart"). Given we have fixed enough IO hang issue, and there isn't any reason to restart all queues in one tags any more, see the following reasons: 1) blk-mq core can deal with shared-tags case well via blk_mq_get_driver_tag(), which can wake up queues waiting for driver tag. 2) SCSI is a bit special because it may return BLK_STS_RESOURCE if queue, target or host is ready, but SCSI built-in restart can cover all these well, see scsi_end_request(), queue will be rerun after any request initiated from this host/target is completed. In my test on scsi_debug(8 luns), this patch may improve IOPS by 20% ~ 30% when running I/O on these 8 luns concurrently. Fixes: `705cda97ee` ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Reported-by: Andrew Jones <drjones@redhat.com> Tested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Ming Lei	5815839b3c	blk-mq: introduce new lock for protecting hctx->dispatch_wait Now hctx->lock is only acquired when adding hctx->dispatch_wait to one wait queue, but not held when removing it from the wait queue. IO hang can be observed easily if SCHED RESTART is disabled, that means now RESTART exits just for fixing the issue in blk_mq_mark_tag_wait(). This patch fixes the issue by introducing hctx->dispatch_wait_lock and holding it for removing hctx->dispatch_wait in blk_mq_dispatch_wake(), since we need to avoid acquiring hctx->lock in irq context. Fixes: `eb619fdb2d` ("blk-mq: fix issue with shared tag queue re-running") Cc: Christoph Hellwig <hch@lst.de> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Ming Lei	2278d69f03	blk-mq: don't pass hctx to blk_mq_mark_tag_wait() 'hctx' won't be changed at all, so not necessary to pass 'hctx' to blk_mq_mark_tag_wait(). Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Ming Lei	8ab6bb9ee8	blk-mq: cleanup blk_mq_get_driver_tag() We never pass 'wait' as true to blk_mq_get_driver_tag(), and hence we never change '**hctx' as well. The last use of these went away with the flush cleanup, commit `0c2a6fe4dc`. So cleanup the usage and remove the two extra parameters. Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Tested-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Paolo Valente	277a4a9b56	block, bfq: give a better name to bfq_bfqq_may_idle The actual goal of the function bfq_bfqq_may_idle is to tell whether it is better to perform device idling (more precisely: I/O-dispatch plugging) for the input bfq_queue, either to boost throughput or to preserve service guarantees. This commit improves the name of the function accordingly. Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Paolo Valente	9fae8dd59f	block, bfq: fix service being wrongly set to zero in case of preemption If - a bfq_queue Q preempts another queue, because one request of Q arrives in time, - but, after this preemption, Q is not the queue that is set in service, then Q->entity.service is set to 0 when Q is eventually set in service. But Q should have continued receiving service with its old budget (which is why preemption has occurred) and its old service. This commit addresses this issue by resetting service on queue real expiration. Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Paolo Valente	4420b095cc	block, bfq: do not expire a queue that will deserve dispatch plugging For some bfq_queues, BFQ plugs I/O dispatching when the queue becomes idle, and keeps the plug until a new request of the queue arrives, or a timeout fires. BFQ does so either to boost throughput or to preserve service guarantees for the queue. More precisely, for such a queue, plugging starts when the queue happens to have either no request enqueued, or no request in flight, that is, no request already dispatched but not yet completed. On the opposite end, BFQ may happen to expire a queue with no request enqueued, without doing any plugging, if the queue still has some request in flight. Unfortunately, such a premature expiration causes the queue to lose its chance to enjoy dispatch plugging a moment later, i.e., when its in-flight requests finally get completed. This breaks service guarantees for the queue. This commit prevents BFQ from expiring an empty queue if the latter still has in-flight requests. Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Paolo Valente	0471559c2f	block, bfq: add/remove entity weights correctly To keep I/O throughput high as often as possible, BFQ performs I/O-dispatch plugging (aka device idling) only when beneficial exactly for throughput, or when needed for service guarantees (low latency, fairness). An important case where the latter condition holds is when the scenario is 'asymmetric' in terms of weights: i.e., when some bfq_queue or whole group of queues has a higher weight, and thus has to receive more service, than other queues or groups. Without dispatch plugging, lower-weight queues/groups may unjustly steal bandwidth to higher-weight queues/groups. To detect asymmetric scenarios, BFQ checks some sufficient conditions. One of these conditions is that active groups have different weights. BFQ controls this condition by maintaining a special set of unique weights of active groups (group_weights_tree). To this purpose, in the function bfq_active_insert/bfq_active_extract BFQ adds/removes the weight of a group to/from this set. Unfortunately, the function bfq_active_extract may happen to be invoked also for a group that is still active (to preserve the correct update of the next queue to serve, see comments in function bfq_no_longer_next_in_service() for details). In this case, removing the weight of the group makes the set group_weights_tree inconsistent. Service-guarantee violations follow. This commit addresses this issue by moving group_weights_tree insertions from their previous location (in bfq_active_insert) into the function __bfq_activate_entity, and by moving group_weights_tree extractions from bfq_active_extract to when the entity that represents a group remains throughly idle, i.e., with no request either enqueued or dispatched. Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Bart Van Assche	6a5ac98465	block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n Exclude zoned block device members from struct request_queue for CONFIG_BLK_DEV_ZONED == n. Avoid breaking the build by only building the code that uses these struct request_queue members if CONFIG_BLK_DEV_ZONED != n. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Bart Van Assche	7c8542b798	block: Inline blk_queue_nr_zones() Since the implementation of blk_queue_nr_zones() is trivial and since it only has a single caller, inline this function. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Bart Van Assche	f441108fa0	block: Remove a superfluous cast from blkdev_report_zones() No cast is necessary when assigning a non-void pointer to a void pointer. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Linus Torvalds	e6e5bec43c	for-linus-20180629 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAls2oVcQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpr89D/9PaJCtLpCEU1HdaohIDXDyJKBdtCllIsqJ YAJTlUkFRkMvsbdEA3U43rVRVlu7zxP3aRg79VRI+kdZ8y8XSiisAF/QUbG/Bwd3 NuFqKj4Hm5xFTtLMKIUlumR2/exD4ZpPHKnWmD4idp1PZRoTUwxeIjqVv77L6Nfy c4/GVlz2t9buWxH/5PSa//rsT49xM6CLpyIwO2lFsNEk2HYE/uAWya5KZVJ1YKVw mSjUdyFtJ//lFK9lVU4YkdNF0d5w2PcEoCfNyPe0n9F43iITACo2om7rkSkTgciS izPAs1DkqNdWPta2twULt656WlUoGlwnWyFchU34N/I/pDiww4kdCQFfNIOi6qBW 7rPQ4xpywiw/U93C2GLEeE09++T8+yO4KuELhIcN5+D3D2VGRIuaGcf4xeY0CX3A a335EZIxBGOfxyejknBDg3BsoUcLvejbANKD8ltis3zciGf/QyLt+FFqx25WCzkN N028ppml8WPSiqhSlFDMyfscO1dx/WSYh1q7ANrBiPPaEjFYmakzEDT0cZW4JsUh av4jqYt3cqrFIXyhRHJM2wjFymQv6aVnmLa4zOKCFbwm9KzMWUUFjc4R962T/sQK 0g+eCSFGidii4JZ4ghOAQk2dqCTIZqV72pmLlpJBGu+SAIQxkh+29h0LOi5U3v1Y FnTtJ1JhCg== =0uqV -----END PGP SIGNATURE----- Merge tag 'for-linus-20180629' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Small set of fixes for this series. Mostly just minor fixes, the only oddball in here is the sg change. The sg change came out of the stall fix for NVMe, where we added a mempool and limited us to a single page allocation. CONFIG_SG_DEBUG sort-of ruins that, since we'd need to account for that. That's actually a generic problem, since lots of drivers need to allocate SG lists. So this just removes support for CONFIG_SG_DEBUG, which I added back in 2007 and to my knowledge it was never useful. Anyway, outside of that, this pull contains: - clone of request with special payload fix (Bart) - drbd discard handling fix (Bart) - SATA blk-mq stall fix (me) - chunk size fix (Keith) - double free nvme rdma fix (Sagi)" * tag 'for-linus-20180629' of git://git.kernel.dk/linux-block: sg: remove ->sg_magic member drbd: Fix drbd_request_prepare() discard handling blk-mq: don't queue more if we get a busy return block: Fix cloning of requests with a special payload nvme-rdma: fix possible double free of controller async event buffer block: Fix transfer when chunk sectors exceeds max	2018-06-30 10:47:46 -07:00
Jens Axboe	1f57f8d442	blk-mq: don't queue more if we get a busy return Some devices have different queue limits depending on the type of IO. A classic case is SATA NCQ, where some commands can queue, but others cannot. If we have NCQ commands inflight and encounter a non-queueable command, the driver returns busy. Currently we attempt to dispatch more from the scheduler, if we were able to queue some commands. But for the case where we ended up stopping due to BUSY, we should not attempt to retrieve more from the scheduler. If we do, we can get into a situation where we attempt to queue a non-queueable command, get BUSY, then successfully retrieve more commands from that scheduler and queue those. This can repeat forever, starving the non-queuable command indefinitely. Fix this by NOT attempting to pull more commands from the scheduler, if we get a BUSY return. This should also be more optimal in terms of letting requests stay in the scheduler for as long as possible, if we get a BUSY due to the regular out-of-tags condition. Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-29 07:52:31 -06:00
Bart Van Assche	297ba57dcd	block: Fix cloning of requests with a special payload This patch avoids that removing a path controlled by the dm-mpath driver while mkfs is running triggers the following kernel bug: kernel BUG at block/blk-core.c:3347! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 20 PID: 24369 Comm: mkfs.ext4 Not tainted 4.18.0-rc1-dbg+ #2 RIP: 0010:blk_end_request_all+0x68/0x70 Call Trace: <IRQ> dm_softirq_done+0x326/0x3d0 [dm_mod] blk_done_softirq+0x19b/0x1e0 __do_softirq+0x128/0x60d irq_exit+0x100/0x110 smp_call_function_single_interrupt+0x90/0x330 call_function_single_interrupt+0xf/0x20 </IRQ> Fixes: `f9d03f96b9` ("block: improve handling of the magic discard payload") Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-28 09:51:30 -06:00
Linus Torvalds	77072ca59f	for-linus-20180623 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlsudKcQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgphYXEAC0G+fJcLgue64LwLNjXksDUsoldqNWjTDZ e3szwseN3Woe7t+Q9DwEUO48T421WhDYe3UY50EZF3Fz5yYLi0ggTeq4qsKgpaXx 80dUY9/STxFZUV/tKrYMz5xPjM3k5x+m8FTeGlFgyR77fbOZE6/8w9XPkpmQN/B2 ayyEPMgAR9aZKhtOU4TJY45O278b2Qdfd0s6XUsxDTwBoR1mTQqKhNnY38PrIA5s wkhtKz5YRdq5tNhKt2IwYABpJ4HJ1Hz/FE6Fo1RNx+6vfxfEhsk2/DZBn0rzzGDj kvSaVlPeQEu9SZdJGtb2sx26elVWdt5sqMORQ0zb1kVCochZiT1IO3Z81XCy2Y3u 5WVvOlgsGni9BWr/K+iiN/dc+cKoRPy0g/ib77XUiJHgIC+urUIr3XgcIiuP9GZc cgCfF+6+GvRKLatSsDZSbns5cpgueJLNogZUyhs7oUouRIECo0DD9NnDAlHmaZYB 957bRd/ykE6QOvtxEsG0TWc3tgyHjJFQQ9ukb45VmzYVB8mITKOGLpceJaqzju9c AnGyQDddK4fqOct1pgFsSA5C3V6OOVXDpUiFn3DTNoQZcu5HN2hMwEA/g/u5TnGD N2sEwcu2WCoUjzsMEWFLmX8R4w8laZUDwkNSRC5tYzQEZpVDfmAsRtRcUHJdNLl/ zPj8QMAidA== =hCgr -----END PGP SIGNATURE----- Merge tag 'for-linus-20180623' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - Further timeout fixes. We aren't quite there yet, so expect another round of fixes for that to completely close some of the IRQ vs completion races. (Christoph/Bart) - Set of NVMe fixes from the usual suspects, mostly error handling - Two off-by-one fixes (Dan) - Another bdi race fix (Jan) - Fix nbd reconfigure with NBD_DISCONNECT_ON_CLOSE (Doron) * tag 'for-linus-20180623' of git://git.kernel.dk/linux-block: blk-mq: Fix timeout handling in case the timeout handler returns BLK_EH_DONE bdi: Fix another oops in wb_workfn() lightnvm: Remove depends on HAS_DMA in case of platform dependency nvme-pci: limit max IO size and segments to avoid high order allocations nvme-pci: move nvme_kill_queues to nvme_remove_dead_ctrl nvme-fc: release io queues to allow fast fail nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag. block: sed-opal: Fix a couple off by one bugs blk-mq-debugfs: Off by one in blk_mq_rq_state_name() nvmet: reset keep alive timer in controller enable nvme-rdma: don't override opts->queue_size nvme-rdma: Fix command completion race at error recovery nvme-rdma: fix possible free of a non-allocated async event buffer nvme-rdma: fix possible double free condition when failing to create a controller Revert "block: Add warning for bi_next not NULL in bio_endio()" block: fix timeout changes for legacy request drivers	2018-06-24 06:33:54 +08:00
Bart Van Assche	f5e350f021	blk-mq: Fix timeout handling in case the timeout handler returns BLK_EH_DONE Make sure that RQF_TIMED_OUT is cleared when a request is reused after a block driver timeout handler has returned BLK_EH_DONE. Fixes: `da66126739` ("blk-mq: don't time out requests again that are in the timeout handler") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Andrew Randrianasulu <randrianasulu@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-23 10:25:45 -06:00
Dan Carpenter	ce042c183b	block: sed-opal: Fix a couple off by one bugs resp->num is the number of tokens in resp->tok[]. It gets set in response_parse(). So if n == resp->num then we're reading beyond the end of the data. Fixes: `455a7b238c` ("block: Add Sed-opal library") Reviewed-by: Scott Bauer <scott.bauer@intel.com> Tested-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-20 12:04:06 -06:00
Dan Carpenter	a1e7918862	blk-mq-debugfs: Off by one in blk_mq_rq_state_name() If rq_state == ARRAY_SIZE() then we read one element beyond the end of the blk_mq_rq_state_name_array[] array. Fixes: `ec6dcf63c5` ("blk-mq-debugfs: Show more request state information") Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-20 11:26:04 -06:00
Bart Van Assche	9c24c10a2c	Revert "block: Add warning for bi_next not NULL in bio_endio()" Commit `0ba99ca483` ("block: Add warning for bi_next not NULL in bio_endio()") breaks the dm driver. end_clone_bio() detects whether or not a bio is the last bio associated with a request by checking the .bi_next field. Commit `0ba99ca483` clears that field before end_clone_bio() has had a chance to inspect that field. Hence revert commit `0ba99ca483`. This patch avoids that KASAN reports the following complaint when running the srp-test software (srp-test/run_tests -c -d -r 10 -t 02-mq): ================================================================== BUG: KASAN: use-after-free in bio_advance+0x11b/0x1d0 Read of size 4 at addr ffff8801300e06d0 by task ksoftirqd/0/9 CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.18.0-rc1-dbg+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_address_description+0x6f/0x270 kasan_report+0x241/0x360 __asan_load4+0x78/0x80 bio_advance+0x11b/0x1d0 blk_update_request+0xa7/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d run_ksoftirqd+0x3f/0x60 smpboot_thread_fn+0x352/0x460 kthread+0x1c1/0x1e0 ret_from_fork+0x24/0x30 Allocated by task 1918: save_stack+0x43/0xd0 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x11/0x20 kmem_cache_alloc+0xfe/0x350 mempool_alloc_slab+0x15/0x20 mempool_alloc+0xfb/0x270 bio_alloc_bioset+0x244/0x350 submit_bh_wbc+0x9c/0x2f0 __block_write_full_page+0x299/0x5a0 block_write_full_page+0x16b/0x180 blkdev_writepage+0x18/0x20 __writepage+0x42/0x80 write_cache_pages+0x376/0x8a0 generic_writepages+0xbe/0x110 blkdev_writepages+0xe/0x10 do_writepages+0x9b/0x180 __filemap_fdatawrite_range+0x178/0x1c0 file_write_and_wait_range+0x59/0xc0 blkdev_fsync+0x46/0x80 vfs_fsync_range+0x66/0x100 do_fsync+0x3d/0x70 __x64_sys_fsync+0x21/0x30 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9: save_stack+0x43/0xd0 __kasan_slab_free+0x137/0x190 kasan_slab_free+0xe/0x10 kmem_cache_free+0xd3/0x380 mempool_free_slab+0x17/0x20 mempool_free+0x63/0x160 bio_free+0x81/0xa0 bio_put+0x59/0x60 end_bio_bh_io_sync+0x5d/0x70 bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 end_clone_bio+0xa3/0xd0 [dm_mod] bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d The buggy address belongs to the object at ffff8801300e0640 which belongs to the cache bio-0 of size 200 The buggy address is located 144 bytes inside of 200-byte region [ffff8801300e0640, ffff8801300e0708) The buggy address belongs to the page: page:ffffea0004c03800 count:1 mapcount:0 mapping:ffff88015a563a00 index:0x0 compound_mapcount: 0 flags: 0x8000000000008100(slab\|head) raw: 8000000000008100 dead000000000100 dead000000000200 ffff88015a563a00 raw: 0000000000000000 0000000000330033 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801300e0580: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc ffff8801300e0600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb >ffff8801300e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801300e0700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801300e0780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Cc: Kent Overstreet <kent.overstreet@gmail.com> Fixes: `0ba99ca483` ("block: Add warning for bi_next not NULL in bio_endio()") Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-19 11:59:47 -06:00
Christoph Hellwig	0cc61e64e2	block: fix timeout changes for legacy request drivers blk_mq_complete_request can only be called for blk-mq drivers, but when removing the BLK_EH_HANDLED return value, two legacy request timeout methods incorrectly got switched to call blk_mq_complete_request. Call __blk_complete_request instead to reinstance the previous behavior. For that __blk_complete_request needs to be exported. Fixes: `1fc2b62e` ("scsi_transport_fc: complete requests from ->timeout") Fixes: `0df0bb08` ("null_blk: complete requests from ->timeout") Reported-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-19 11:27:18 -06:00
Linus Torvalds	265c5596da	for-linus-20180616 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlslJkoQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpi3READZgtKEaixC4EMLNH/7WqVLC5XxsIpuV3N7 kxaV2KoJ/zJbBPqE+WIW/UasuTRkIFQpmXp3aYw879FZfCq1sN7K9yyzY2i7qvuG l8WRPWFc2SMfMrXb3pctGiNyYhTIZrOOzPmKAwlNinyyr7tlan2Ha4z5XHOJon9O uVk252kTlBqvTEX4nZKN2x6UiI9RaVegbOV6lthm3lQZF+25C6kN/8emdkhDIkjG mOI8MaHp6iDbpIBWBziEpkvGxvxO4PAaESldyJH6Fg+uzmNEDEwp6jvFBmlCiQIE wdt5Swsl2zKk8g/tBHeDSDK0inHnP5xho4tC68rDtLrgxyLeBuVjyLMItkYJdTqh s8B5gdCs3SS41aXMk+w3BnLBIRRMrByGiY2Nx23Si6KZU8pT5sM0dGGOTGCma1dy NNitIRbePyrajV2p/xqnMeiGvxqnCckRRYr0uV2KPJzqwwMoZvNsnS+XL2KqdgCY ndK5Nda5oRb+yDDlPKlRvY8cKh1N8GWdoFIGbfrMIYJcGiUXBoX7UoA/DKKbnY2o p+TE3t9RGz9UPwxHqSMTXRV+xs4X3lgWf4sA/ciGGPeuAgwdOoGJnfn4kY7AIPq8 LYXj8K4rx3cRA+l7Y4DCppPQEl8bxKXhFFccKsXToFITW+PNdCuM7FDugOHdzwQE QlesAkLV+w== =gyaY -----END PGP SIGNATURE----- Merge tag 'for-linus-20180616' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A collection of fixes that should go into -rc1. This contains: - bsg_open vs bsg_unregister race fix (Anatoliy) - NVMe pull request from Christoph, with fixes for regressions in this window, FC connect/reconnect path code unification, and a trace point addition. - timeout fix (Christoph) - remove a few unused functions (Christoph) - blk-mq tag_set reinit fix (Roman)" * tag 'for-linus-20180616' of git://git.kernel.dk/linux-block: bsg: fix race of bsg_open and bsg_unregister block: remov blk_queue_invalidate_tags nvme-fabrics: fix and refine state checks in __nvmf_check_ready nvme-fabrics: handle the admin-only case properly in nvmf_check_ready nvme-fabrics: refactor queue ready check blk-mq: remove blk_mq_tagset_iter nvme: remove nvme_reinit_tagset nvme-fc: fix nulling of queue data on reconnect nvme-fc: remove reinit_request routine blk-mq: don't time out requests again that are in the timeout handler nvme-fc: change controllers first connect to use reconnect path nvme: don't rely on the changed namespace list log nvmet: free smart-log buffer after use nvme-rdma: fix error flow during mapping request data nvme: add bio remapping tracepoint nvme: fix NULL pointer dereference in nvme_init_subsystem blk-mq: reinit q->tag_set_list entry only after grace period	2018-06-17 05:37:55 +09:00
Mauro Carvalho Chehab	5fb94e9ca3	docs: Fix some broken references As we move stuff around, some doc references are broken. Fix some of them via this script: ./scripts/documentation-file-ref-check --fix Manually checked if the produced result is valid, removing a few false-positives. Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>	2018-06-15 18:10:01 -03:00
Anatoliy Glagolev	d6c73964f1	bsg: fix race of bsg_open and bsg_unregister The existing implementation allows races between bsg_unregister and bsg_open paths. bsg_unregister and request_queue cleanup and deletion may start and complete right after bsg_get_device (in bsg_open path) retrieves bsg_class_device and releases the mutex. Then bsg_open path touches freed memory of bsg_class_device and request_queue. One possible fix is to hold the mutex all the way through bsg_get_device instead of releasing it after bsg_class_device retrieval. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-Off-By: Anatoliy Glagolev <glagolig@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-15 08:15:37 -06:00
Christoph Hellwig	be7f99c536	block: remov blk_queue_invalidate_tags This function is entirely unused, so remove it and the tag_queue_busy member of struct request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-15 08:13:35 -06:00
Jens Axboe	95c7c09f4c	Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-linus Pull NVMe fixes from Christoph: "Fix various little regressions introduced in this merge window, plus a rework of the fibre channel connect and reconnect path to share the code instead of having separate sets of bugs. Last but not least a trivial trace point addition from Hannes." * 'nvme-4.18' of git://git.infradead.org/nvme: nvme-fabrics: fix and refine state checks in __nvmf_check_ready nvme-fabrics: handle the admin-only case properly in nvmf_check_ready nvme-fabrics: refactor queue ready check blk-mq: remove blk_mq_tagset_iter nvme: remove nvme_reinit_tagset nvme-fc: fix nulling of queue data on reconnect nvme-fc: remove reinit_request routine nvme-fc: change controllers first connect to use reconnect path nvme: don't rely on the changed namespace list log nvmet: free smart-log buffer after use nvme-rdma: fix error flow during mapping request data nvme: add bio remapping tracepoint nvme: fix NULL pointer dereference in nvme_init_subsystem	2018-06-15 08:11:05 -06:00
Christoph Hellwig	e6c3456aa8	blk-mq: remove blk_mq_tagset_iter Unused now that nvme stopped using it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk>	2018-06-14 17:01:45 +02:00
Christoph Hellwig	da66126739	blk-mq: don't time out requests again that are in the timeout handler We can currently call the timeout handler again on a request that has already been handed over to the timeout handler. Prevent that with a new flag. Fixes: `12f5b931` ("blk-mq: Remove generation seqeunce") Reported-by: Andrew Randrianasulu <randrianasulu@gmail.com> Tested-by: Andrew Randrianasulu <randrianasulu@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-14 08:39:46 -06:00
Kees Cook	fad953ce0b	treewide: Use array_size() in vzalloc() The vzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vzalloc(a * b) with: vzalloc(array_size(a, b)) as well as handling cases of: vzalloc(a * b * c) with: vzalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vzalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| vzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(u8) * COUNT + COUNT , ...) \| vzalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| vzalloc( - sizeof(char) * COUNT + COUNT , ...) \| vzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vzalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vzalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| vzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| vzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| vzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vzalloc(C1 * C2 * C3, ...) \| vzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vzalloc(C1 * C2, ...) \| vzalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Kees Cook	344476e16a	treewide: kvmalloc() -> kvmalloc_array() The kvmalloc() function has a 2-factor argument form, kvmalloc_array(). This patch replaces cases of: kvmalloc(a * b, gfp) with: kvmalloc_array(a * b, gfp) as well as handling cases of: kvmalloc(a * b * c, gfp) with: kvmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kvmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kvmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kvmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kvmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kvmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| kvmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kvmalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| kvmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kvmalloc( - sizeof(u8) * COUNT + COUNT , ...) \| kvmalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| kvmalloc( - sizeof(char) * COUNT + COUNT , ...) \| kvmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kvmalloc + kvmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kvmalloc + kvmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kvmalloc + kvmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kvmalloc + kvmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kvmalloc + kvmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kvmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kvmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kvmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kvmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kvmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kvmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kvmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kvmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kvmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kvmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kvmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kvmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kvmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kvmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kvmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kvmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kvmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kvmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kvmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kvmalloc(C1 * C2 * C3, ...) \| kvmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kvmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kvmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kvmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kvmalloc(sizeof(THING) * C2, ...) \| kvmalloc(sizeof(TYPE) * C2, ...) \| kvmalloc(C1 * C2 * C3, ...) \| kvmalloc(C1 * C2, ...) \| - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kvmalloc + kvmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kvmalloc + kvmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kvmalloc + kvmalloc_array ( - (E1) * E2 + E1, E2 , ...) \| - kvmalloc + kvmalloc_array ( - (E1) * (E2) + E1, E2 , ...) \| - kvmalloc + kvmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Kees Cook	590b5b7d86	treewide: kzalloc_node() -> kcalloc_node() The kzalloc_node() function has a 2-factor argument form, kcalloc_node(). This patch replaces cases of: kzalloc_node(a * b, gfp, node) with: kcalloc_node(a * b, gfp, node) as well as handling cases of: kzalloc_node(a * b * c, gfp, node) with: kzalloc_node(array3_size(a, b, c), gfp, node) as it's slightly less ugly than: kcalloc_node(array_size(a, b), c, gfp, node) This does, however, attempt to ignore constant size factors like: kzalloc_node(4 * 1024, gfp, node) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc_node( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kzalloc_node( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc_node( - sizeof(u8) * (COUNT) + COUNT , ...) \| kzalloc_node( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kzalloc_node( - sizeof(char) * (COUNT) + COUNT , ...) \| kzalloc_node( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kzalloc_node( - sizeof(u8) * COUNT + COUNT , ...) \| kzalloc_node( - sizeof(__u8) * COUNT + COUNT , ...) \| kzalloc_node( - sizeof(char) * COUNT + COUNT , ...) \| kzalloc_node( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc_node + kcalloc_node ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc_node( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc_node( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc_node( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc_node( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc_node( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc_node( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc_node( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc_node( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc_node( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc_node( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc_node( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc_node( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc_node(C1 * C2 * C3, ...) \| kzalloc_node( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc_node( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc_node( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kzalloc_node( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc_node(sizeof(THING) * C2, ...) \| kzalloc_node(sizeof(TYPE) * C2, ...) \| kzalloc_node(C1 * C2 * C3, ...) \| kzalloc_node(C1 * C2, ...) \| - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kzalloc_node + kcalloc_node ( - (E1) * E2 + E1, E2 , ...) \| - kzalloc_node + kcalloc_node ( - (E1) * (E2) + E1, E2 , ...) \| - kzalloc_node + kcalloc_node ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Kees Cook	6396bb2215	treewide: kzalloc() -> kcalloc() The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(u8) * COUNT + COUNT , ...) \| kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| kzalloc( - sizeof(char) * COUNT + COUNT , ...) \| kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) \| kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) \| kzalloc(sizeof(TYPE) * C2, ...) \| kzalloc(C1 * C2 * C3, ...) \| kzalloc(C1 * C2, ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) \| - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) \| - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Kees Cook	6da2ec5605	treewide: kmalloc() -> kmalloc_array() The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(u8) * COUNT + COUNT , ...) \| kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| kmalloc( - sizeof(char) * COUNT + COUNT , ...) \| kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) \| kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) \| kmalloc(sizeof(TYPE) * C2, ...) \| kmalloc(C1 * C2 * C3, ...) \| kmalloc(C1 * C2, ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) \| - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) \| - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Roman Pen	a347c7ad8e	blk-mq: reinit q->tag_set_list entry only after grace period It is not allowed to reinit q->tag_set_list list entry while RCU grace period has not completed yet, otherwise the following soft lockup in blk_mq_sched_restart() happens: [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270] [ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000 [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150 [ 1064.256510] Call Trace: [ 1064.256664] <IRQ> [ 1064.256824] blk_mq_free_request+0xea/0x100 [ 1064.256987] msg_io_conf+0x59/0xd0 [ibnbd_client] [ 1064.257175] complete_rdma_req+0xf2/0x230 [ibtrs_client] [ 1064.257340] ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core] [ 1064.257502] ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client] [ 1064.257669] ib_create_qp+0x321/0x380 [ib_core] [ 1064.257841] ib_process_cq_direct+0xbd/0x120 [ib_core] [ 1064.258007] irq_poll_softirq+0xb7/0xe0 [ 1064.258165] __do_softirq+0x106/0x2a2 [ 1064.258328] irq_exit+0x92/0xa0 [ 1064.258509] do_IRQ+0x4a/0xd0 [ 1064.258660] common_interrupt+0x7a/0x7a [ 1064.258818] </IRQ> Meanwhile another context frees other queue but with the same set of shared tags: [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds. [ 1288.201833] bash D 0 5910 5820 0x00000000 [ 1288.202016] Call Trace: [ 1288.202315] schedule+0x32/0x80 [ 1288.202462] schedule_timeout+0x1e5/0x380 [ 1288.203838] wait_for_completion+0xb0/0x120 [ 1288.204137] __wait_rcu_gp+0x125/0x160 [ 1288.204287] synchronize_sched+0x6e/0x80 [ 1288.204770] blk_mq_free_queue+0x74/0xe0 [ 1288.204922] blk_cleanup_queue+0xc7/0x110 [ 1288.205073] ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client] [ 1288.205389] ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client] [ 1288.205548] kernfs_fop_write+0x109/0x180 [ 1288.206328] vfs_write+0xb3/0x1a0 [ 1288.206476] SyS_write+0x52/0xc0 [ 1288.206624] do_syscall_64+0x68/0x1d0 [ 1288.206774] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 What happened is the following: 1. There are several MQ queues with shared tags. 2. One queue is about to be freed and now task is in blk_mq_del_queue_tag_set(). 3. Other CPU is in blk_mq_sched_restart() and loops over all queues in tag list in order to find hctx to restart. Because linked list entry was modified in blk_mq_del_queue_tag_set() without proper waiting for a grace period, blk_mq_sched_restart() never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup. Fix is simple: reinit list entry after an RCU grace period elapsed. Fixes: Fixes: `705cda97ee` ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") Cc: stable@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-block@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-11 08:13:41 -06:00
Linus Torvalds	bbaa101303	for-linus-20180610 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlsdUjYQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplguD/963DhUe/u3mKGYa/iQxXRrbR/pnBvm/Uaf Xj9ikyKXgenz2cmvBvVbCAXfaugn3i6NbtBboaNWVSoUHPK7rbG682RxqeZOUOYk qKLAAMZefHbYyoKWsClfVgbO6DlLTHjBJ/uaxR0npV/ZsQ2HjNN4lCdODiR0/0Px oJNPdALJs1eO/u4hmhMbsSYdg5QVaYqv5p+Ssk9cIxdUTwkgjdWRyKJm4aZsfedp oB7hHtkB6SEO5KA7CSzruXhKWBT1hNBKzrvLBVXZUEn35d2SeqCrUZ3VaL1yLDg7 MCWN7xtAcu3fF6tWRGKMngki+lbf447bqcmB/lCr2Zmv0nF7bMgo0r5Ik8bl3gtp SwV4EDZWDaFibs/UGE8IHG2QYEb1ohaSEQKJpBEa9aZd38lhiKAMfH07b97//PmA BckguoPrwmUAjiG4av1eOqhNptRRXFmAMFvyYZcn+7T5Mp0QQd19P6Sk9ZZYUbbd v/8J1oqkbt5NS0LS3HQ2jnQRnfSvkIm2v59mpKJBFISzPfzjsVD73Ua/z9qPneOT EwLPxI91Pgu3ZuMZUMosC11RsB36xU+fU83RRytwXvFAK3REy+KpWf2Tr0pcwGoU jaYN+TlXl9BX18vNSV3X4PXUpTeEqRi6HhvTEjNM/2dKkoRrvcVdKwxrZ6y4JzUM uw5iJfp4Bg== =5ZQ2 -----END PGP SIGNATURE----- Merge tag 'for-linus-20180610' of git://git.kernel.dk/linux-block Pull block flush handling fix from Jens Axboe: "Single fix that we should merge now, fixing a regression in queuing flush request, accessing request flags after calling the end_request handler" * tag 'for-linus-20180610' of git://git.kernel.dk/linux-block: block: fix use-after-free in block flush handling	2018-06-10 10:50:41 -07:00
Jens Axboe	190b02ed79	block: fix use-after-free in block flush handling A recent commit reused the original request flags for the flush queue handling. However, for some of the kick flush cases, the original request was already completed. This caused a use after free, if blk-mq wasn't used. Fixes: `84fca1b0c4` ("block: pass failfast and driver-specific flags to flush requests") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-09 06:37:14 -06:00

1 2 3 4 5 ...

3926 Commits