OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Chaitanya Kulkarni	e274832590	null_blk: return error for invalid zone size In null_init_zone_dev() check if the zone size is larger than device capacity, return error if needed. This also fixes the following oops :- null_blk: changed the number of conventional zones to 4294967295 BUG: kernel NULL pointer dereference, address: 0000000000000010 PGD 7d76c5067 P4D 7d76c5067 PUD 7d240c067 PMD 0 Oops: 0002 [#1] SMP NOPTI CPU: 4 PID: 5508 Comm: nullbtests.sh Tainted: G OE 5.7.0-rc4lblk-fnext0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e4 RIP: 0010:null_init_zoned_dev+0x17a/0x27f [null_blk] RSP: 0018:ffffc90007007e00 EFLAGS: 00010246 RAX: 0000000000000020 RBX: ffff8887fb3f3c00 RCX: 0000000000000007 RDX: 0000000000000000 RSI: ffff8887ca09d688 RDI: ffff888810fea510 RBP: 0000000000000010 R08: ffff8887ca09d688 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8887c26e8000 R13: ffffffffa05e9390 R14: 0000000000000000 R15: 0000000000000001 FS: 00007fcb5256f740(0000) GS:ffff888810e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000081e8fe000 CR4: 00000000003406e0 Call Trace: null_add_dev+0x534/0x71b [null_blk] nullb_device_power_store.cold.41+0x8/0x2e [null_blk] configfs_write_file+0xe6/0x150 vfs_write+0xba/0x1e0 ksys_write+0x5f/0xe0 do_syscall_64+0x60/0x250 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x7fcb51c71840 Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:47:28 -06:00
Martijn Coenen	3448914e8c	loop: Add LOOP_CONFIGURE ioctl This allows userspace to completely setup a loop device with a single ioctl, removing the in-between state where the device can be partially configured - eg the loop device has a backing file associated with it, but is reading from the wrong offset. Besides removing the intermediate state, another big benefit of this ioctl is that LOOP_SET_STATUS can be slow; the main reason for this slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to freeze the associated queue; this requires waiting for RCU synchronization, which I've measured can take about 15-20ms on this device on average. In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can also be used to: - Set the correct block size immediately by setting loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE) - Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO) - Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY in loop_config.info.lo_flags Here's setting up ~70 regular loop devices with an offset on an x86 Android device, using LOOP_SET_FD and LOOP_SET_STATUS: vsoc_x86:/system/apex # time for i in `seq 30 100`; do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done 0m03.40s real 0m00.02s user 0m00.03s system Here's configuring ~70 devices in the same way, but using a modified losetup that uses the new LOOP_CONFIGURE ioctl: vsoc_x86:/system/apex # time for i in `seq 30 100`; do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done 0m01.94s real 0m00.01s user 0m00.01s system Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:20:35 -06:00
Martijn Coenen	faf1d25440	loop: Clean up LOOP_SET_STATUS lo_flags handling LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this explicit by updating the UAPI to include the flags that can be set/cleared using this ioctl. The implementation can then blindly take over the passed in flags, and use the previous flags for those flags that can't be set / cleared using LOOP_SET_STATUS. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:20:35 -06:00
Martijn Coenen	571fae6e29	loop: Rework lo_ioctl() __user argument casting In preparation for a new ioctl that needs to copy_from_user(); makes the code easier to read as well. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:20:35 -06:00
Martijn Coenen	62ab466ca8	loop: Move loop_set_status_from_info() and friends up So we can use it without forward declaration. This is a separate commit to make it easier to verify that this is just a move, without functional modifications. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:20:35 -06:00
Martijn Coenen	0c3796c244	loop: Factor out configuring loop from status Factor out this code into a separate function, so it can be reused by other code more easily. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:20:34 -06:00
Martijn Coenen	0a6ed1b5ff	loop: Remove figure_loop_size() This function was now only used by loop_set_capacity(). Just open code the remaining code in the caller instead. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:20:34 -06:00
Martijn Coenen	b0bd158dd6	loop: Refactor loop_set_status() size calculation figure_loop_size() calculates the loop size based on the passed in parameters, but at the same time it updates the offset and sizelimit parameters in the loop device configuration. That is a somewhat unexpected side effect of a function with this name, and it is only only needed by one of the two callers of this function - loop_set_status(). Move the lo_offset and lo_sizelimit assignment back into loop_set_status(), and use the newly factored out functions to validate and apply the newly calculated size. This allows us to get rid of figure_loop_size() in a follow-up commit. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:20:34 -06:00
Martijn Coenen	716ad0986c	loop: Switch to set_capacity_revalidate_and_notify() This was recently added to block/genhd.c, and takes care of both updating the capacity and notifying userspace of the new size. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:20:34 -06:00
Martijn Coenen	5795b6f560	loop: Factor out setting loop device size This code is used repeatedly. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:20:34 -06:00
Martijn Coenen	083a6a5078	loop: Remove sector_t truncation checks sector_t is now always u64, so we don't need to check for truncation. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:20:34 -06:00
Martijn Coenen	7c5014b098	loop: Call loop_config_discard() only after new config is applied loop_set_status() calls loop_config_discard() to configure discard for the loop device; however, the discard configuration depends on whether the loop device uses encryption, and when we call it the encryption configuration has not been updated yet. Move the call down so we apply the correct discard configuration based on the new configuration. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-21 08:20:34 -06:00
Danil Kipnis	d6ea395072	rnbd/rtrs: Pass max segment size from blk user to the rdma library When Block Device Layer is disabled, BLK_MAX_SEGMENT_SIZE is undefined. The rtrs is a transport library and should compile independently of the block layer. The desired max segment size should be passed down by the user. Introduce max_segment_size parameter for the rtrs_clt_open() call. Fixes: `f7a7a5c228` ("block/rnbd: client: main functionality") Fixes: `6a98d71dae` ("RDMA/rtrs: client: main functionality") Fixes: `cb80329c94` ("RDMA/rtrs: client: private header with client structs and functions") Fixes: `b5c27cdb09` ("RDMA/rtrs: public interface header to establish RDMA connections") Link: https://lore.kernel.org/r/20200519111419.924170-1-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-19 20:43:26 -03:00
Bart Van Assche	cecbc9ce80	null_blk: Zero-initialize read buffers in non-memory-backed mode This patch suppresses an uninteresting KMSAN complaint without affecting performance of the null_blk driver if CONFIG_KMSAN is disabled. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Alexander Potapenko <glider@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-19 09:40:29 -06:00
Emmanuel Nicolet	720bc31669	ps3disk: use the default segment boundary Since commit `dcebd75592` ("block: use bio_for_each_bvec() to compute multi-page bvec count"), the kernel will bug_on on the PS3 because bio_split() is called with sectors == 0: kernel BUG at block/bio.c:1853! Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=4K MMU=Hash PREEMPT SMP NR_CPUS=8 NUMA PS3 Modules linked in: firewire_sbp2 rtc_ps3(+) soundcore ps3_gelic(+) \ ps3rom(+) firewire_core ps3vram(+) usb_common crc_itu_t CPU: 0 PID: 97 Comm: blkid Not tainted 5.3.0-rc4 #1 NIP: c00000000027d0d0 LR: c00000000027d0b0 CTR: 0000000000000000 REGS: c00000000135ae90 TRAP: 0700 Not tainted (5.3.0-rc4) MSR: 8000000000028032 <SF,EE,IR,DR,RI> CR: 44008240 XER: 20000000 IRQMASK: 0 GPR00: c000000000289368 c00000000135b120 c00000000084a500 c000000004ff8300 GPR04: 0000000000000c00 c000000004c905e0 c000000004c905e0 000000000000ffff GPR08: 0000000000000000 0000000000000001 0000000000000000 000000000000ffff GPR12: 0000000000000000 c0000000008ef000 000000000000003e 0000000000080001 GPR16: 0000000000000100 000000000000ffff 0000000000000000 0000000000000004 GPR20: c00000000062fd7e 0000000000000001 000000000000ffff 0000000000000080 GPR24: c000000000781788 c00000000135b350 0000000000000080 c000000004c905e0 GPR28: c00000000135b348 c000000004ff8300 0000000000000000 c000000004c90000 NIP [c00000000027d0d0] .bio_split+0x28/0xac LR [c00000000027d0b0] .bio_split+0x8/0xac Call Trace: [c00000000135b120] [c00000000027d130] .bio_split+0x88/0xac (unreliable) [c00000000135b1b0] [c000000000289368] .__blk_queue_split+0x11c/0x53c [c00000000135b2d0] [c00000000028f614] .blk_mq_make_request+0x80/0x7d4 [c00000000135b3d0] [c000000000283a8c] .generic_make_request+0x118/0x294 [c00000000135b4b0] [c000000000283d34] .submit_bio+0x12c/0x174 [c00000000135b580] [c000000000205a44] .mpage_bio_submit+0x3c/0x4c [c00000000135b600] [c000000000206184] .mpage_readpages+0xa4/0x184 [c00000000135b750] [c0000000001ff8fc] .blkdev_readpages+0x24/0x38 [c00000000135b7c0] [c0000000001589f0] .read_pages+0x6c/0x1a8 [c00000000135b8b0] [c000000000158c74] .__do_page_cache_readahead+0x118/0x184 [c00000000135b9b0] [c0000000001591a8] .force_page_cache_readahead+0xe4/0xe8 [c00000000135ba50] [c00000000014fc24] .generic_file_read_iter+0x1d8/0x830 [c00000000135bb50] [c0000000001ffadc] .blkdev_read_iter+0x40/0x5c [c00000000135bbc0] [c0000000001b9e00] .new_sync_read+0x144/0x1a0 [c00000000135bcd0] [c0000000001bc454] .vfs_read+0xa0/0x124 [c00000000135bd70] [c0000000001bc7a4] .ksys_read+0x70/0xd8 [c00000000135be20] [c00000000000a524] system_call+0x5c/0x70 Instruction dump: 7fe3fb78 482e30dc 7c0802a6 482e3085 7c9e2378 f821ff71 7ca42b78 7d3e00d0 7c7d1b78 `79290fe0` 7cc53378 69290001 <0b090000> 81230028 7bca0020 7929ba62 [ end trace 313fec760f30aa1f ]--- The problem originates from setting the segment boundary of the request queue to -1UL. This makes get_max_segment_size() return zero when offset is zero, whatever the max segment size. The test with BLK_SEG_BOUNDARY_MASK fails and 'mask - (mask & offset) + 1' overflows to zero in the return statement. Not setting the segment boundary and using the default value (BLK_SEG_BOUNDARY_MASK) fixes the problem. Signed-off-by: Emmanuel Nicolet <emmanuel.nicolet@gmail.com> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/060a416c43138f45105c0540eff1a45539f7e2fc.1589049250.git.geoff@infradead.org	2020-05-19 00:10:35 +10:00
Jack Wang	aa4d16e44f	block/rnbd: a bit of documentation README with description of major sysfs entries, sysfs documentation are moved to ABI dir as Bart suggested. Link: https://lore.kernel.org/r/20200511135131.27580-25-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-17 18:57:17 -03:00
Jack Wang	bc01885342	block/rnbd: include client and server modules into kernel compilation Add rnbd Makefile, Kconfig and also corresponding lines into upper block layer files. Link: https://lore.kernel.org/r/20200511135131.27580-24-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-17 18:57:17 -03:00
Jack Wang	8cee532f46	block/rnbd: server: sysfs interface functions This is the sysfs interface to rnbd mapped devices on server side: /sys/class/rnbd-server/ctl/devices/<device_name>/ \|- block_dev \| * link pointing to the corresponding block device sysfs entry \| \|- sessions/<session-name>/ \| * sessions directory \| \|- read_only \| * is devices mapped as read only \| \|- mapping_path * relative device path provided by the client during mapping Link: https://lore.kernel.org/r/20200511135131.27580-23-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-17 18:57:17 -03:00
Jack Wang	f0aad9baad	block/rnbd: server: functionality for IO submitting to block dev This provides helper functions for IO submitting to block dev. Link: https://lore.kernel.org/r/20200511135131.27580-22-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-17 18:57:17 -03:00
Jack Wang	2de6c8de19	block/rnbd: server: main functionality This is main functionality of rnbd-server module, which handles RTRS events and rnbd protocol requests, like map (open) or unmap (close) device. Also server side is responsible for processing incoming IBTRS IO requests and forward them to local mapped devices. Link: https://lore.kernel.org/r/20200511135131.27580-21-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-17 18:57:16 -03:00
Jack Wang	d4c6957dd0	block/rnbd: server: private header with server structs and functions This header describes main structs and functions used by rnbd-server module, namely structs for managing sessions from different clients and mapped (opened) devices. Link: https://lore.kernel.org/r/20200511135131.27580-20-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-17 18:57:16 -03:00
Jack Wang	1eb54f8f5d	block/rnbd: client: sysfs interface functions This is the sysfs interface to rnbd block devices on client side: /sys/class/rnbd-client/ctl/ \|- map_device \| * maps remote device \| \|- devices/ * all mapped devices /sys/block/rnbd<N>/rnbd/ \|- unmap_device \| * unmaps device \| \|- state \| * device state \| \|- session \| * session name \| \|- mapping_path * path of the dev that was mapped on server Link: https://lore.kernel.org/r/20200511135131.27580-19-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-17 18:57:16 -03:00
Jack Wang	f7a7a5c228	block/rnbd: client: main functionality This is main functionality of rnbd-client module, which provides interface to map remote device as local block device /dev/rnbd<N> and feeds RTRS with IO requests. Link: https://lore.kernel.org/r/20200511135131.27580-18-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-17 18:57:16 -03:00
Jack Wang	90426e89f5	block/rnbd: client: private header with client structs and functions This header describes main structs and functions used by rnbd-client module, mainly for managing RNBD sessions and mapped block devices, creating and destroying sysfs entries. Link: https://lore.kernel.org/r/20200511135131.27580-17-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-17 18:57:16 -03:00
Jack Wang	219ace6077	block/rnbd: private headers with rnbd protocol structs and helpers These are common private headers with rnbd protocol structures, logging, sysfs and other helper functions, which are used on both client and server sides. Link: https://lore.kernel.org/r/20200511135131.27580-16-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-17 18:57:15 -03:00
Xu Wang	c65165651d	block/swim3: use set_current_state macro Use set_current_state macro instead of current->state = TASK_RUNNING. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-16 14:28:37 -06:00
Damien Le Moal	e0489ed5da	null_blk: Support REQ_OP_ZONE_APPEND Support REQ_OP_ZONE_APPEND requests for null_blk devices with zoned mode enabled. Use the internally tracked zone write pointer position as the actual write position and return it using the command request __sector field in the case of an mq device and using the command BIO sector in the case of a BIO device. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-12 20:36:28 -06:00
Damien Le Moal	e732671aa5	block: Modify revalidate zones Modify the interface of blk_revalidate_disk_zones() to add an optional driver callback function that a driver can use to extend processing done during zone revalidation. The callback, if defined, is executed with the device request queue frozen, after all zones have been inspected. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-12 20:36:28 -06:00
Denis Efremov	0836275df4	floppy: suppress UBSAN warning in setup_rw_floppy() UBSAN: array-index-out-of-bounds in drivers/block/floppy.c:1521:45 index 16 is out of range for type 'unsigned char [16]' Call Trace: ... setup_rw_floppy+0x5c3/0x7f0 floppy_ready+0x2be/0x13b0 process_one_work+0x2c1/0x5d0 worker_thread+0x56/0x5e0 kthread+0x122/0x170 ret_from_fork+0x35/0x40 From include/uapi/linux/fd.h: struct floppy_raw_cmd { ... unsigned char cmd_count; unsigned char cmd[16]; unsigned char reply_count; unsigned char reply[16]; ... } This out-of-bounds access is intentional. The command in struct floppy_raw_cmd may take up the space initially intended for the reply and the reply count. It is needed for long 82078 commands such as RESTORE, which takes 17 command bytes. Initial cmd size is not enough and since struct setup_rw_floppy is a part of uapi we check that cmd_count is in [0:16+1+16] in raw_cmd_copyin(). The patch adds union with original cmd,reply_count,reply fields and fullcmd field of equivalent size. The cmd accesses are turned to fullcmd where appropriate to suppress UBSAN warning. Link: https://lore.kernel.org/r/20200501134416.72248-5-efremov@linux.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:57 +03:00
Denis Efremov	bd10a5f3e2	floppy: add defines for sizes of cmd & reply buffers of floppy_raw_cmd Use FD_RAW_CMD_SIZE, FD_RAW_REPLY_SIZE defines instead of magic numbers for cmd & reply buffers of struct floppy_raw_cmd. Remove local to floppy.c MAX_REPLIES define, as it is now FD_RAW_REPLY_SIZE. FD_RAW_CMD_FULLSIZE added as we allow command to also fill reply_count and reply fields. Link: https://lore.kernel.org/r/20200501134416.72248-4-efremov@linux.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:56 +03:00
Denis Efremov	9c4c5a24c8	floppy: add FD_AUTODETECT_SIZE define for struct floppy_drive_params Use FD_AUTODETECT_SIZE for autodetect buffer size in struct floppy_drive_params instead of a magic number. Link: https://lore.kernel.org/r/20200501134416.72248-3-efremov@linux.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:56 +03:00
Denis Efremov	29ac67633c	floppy: use print_hex_dump() in setup_DMA() Remove pr_cont() and use print_hex_dump() in setup_DMA() to print the contents of the cmd buffer. Link: https://lore.kernel.org/r/20200501134416.72248-2-efremov@linux.com Suggested-by: Joe Perches <joe@perches.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:56 +03:00
Willy Tarreau	ca1b409a3b	floppy: cleanup: make set_fdc() always set current_drive and current_fd When called with a negative drive value, set_fdc() would stick to the current fdc (which was assumed to reflect the current_drive's FDC). We do not need this anymore as the last call place with a negative value was just addressed. Let's make this function always set both current_fdc and current_drive so that there's no more ambiguity. A few comments stating this were added to a few non-obvious places. Link: https://lore.kernel.org/r/20200410101904.14652-3-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:56 +03:00
Willy Tarreau	99ba6ccc7f	floppy: cleanup: get rid of current_reqD in favor of current_drive This macro equals -1 and is used as an alternative for current_drive when calling reschedule_timeout(), which in turn needs to remap it. This only adds obfuscation, let's simply use current_drive. Link: https://lore.kernel.org/r/20200410101904.14652-2-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:56 +03:00
Willy Tarreau	6111a4f9bb	floppy: make sure to reset all FDCs upon resume() In floppy_resume() we don't properly reinitialize all FDCs, instead we reinitialize the current FDC once per available FDC because value -1 is passed to user_reset_fdc(). Let's simply save the current drive and properly reinitialize each FDC. Link: https://lore.kernel.org/r/20200410101904.14652-1-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:56 +03:00
Willy Tarreau	05f5e319a1	floppy: cleanup: do not iterate on current_fdc in do_floppy_init() There's no need to iterate on current_fdc in do_floppy_init() anymore, in the first case it's only used as an array index to access fdc_state[], so let's get rid of this confusing assignment. The second case is a bit trickier because user_reset_fdc() needs to already know current_fdc when called with drive==-1 due to this call chain: user_reset_fdc() lock_fdc() set_fdc() drive<0 ==> new_fdc = current_fdc Note that current_drive is not used in this code part and may even not match a unit belonging to current_fdc. Instead of passing -1 we can simply pass the first drive of the FDC being initialized, which is even cleaner as it will allow the function chain above to consistently assign both variables. Link: https://lore.kernel.org/r/20200410093023.14499-1-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:56 +03:00
Willy Tarreau	12aebfac27	floppy: cleanup: add a few comments about expectations in certain functions The locking in the driver is far from being obvious, with unlocking automatically happening at end of operations scheduled by interrupt, especially for the error paths where one does not necessarily expect that such an interrupt may be triggered. Let's add a few comments about what to expect at certain places to avoid misdetecting bugs which are not. Link: https://lore.kernel.org/r/20200331094054.24441-24-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:55 +03:00
Willy Tarreau	82a6301058	floppy: cleanup: do not iterate on current_fdc in DMA grab/release functions Both floppy_grab_irq_and_dma() and floppy_release_irq_and_dma() used to iterate on the global variable while setting up or freeing resources. Now that they exclusively rely on functions which take the fdc as an argument, so let's not touch the global one anymore. Link: https://lore.kernel.org/r/20200331094054.24441-23-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:55 +03:00
Willy Tarreau	e5a9c95f9b	floppy: cleanup: make get_fdc_version() not rely on current_fdc anymore Now the fdc is passed in argument so that the function does not use current_fdc anymore. Link: https://lore.kernel.org/r/20200331094054.24441-22-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:55 +03:00
Willy Tarreau	43d81bb647	floppy: cleanup: make next_valid_format() not rely on current_drive anymore Now the drive is passed in argument so that the function does not use current_drive anymore. Link: https://lore.kernel.org/r/20200331094054.24441-21-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:55 +03:00
Willy Tarreau	c7af70b0fb	floppy: cleanup: make check_wp() not rely on current_{fdc,drive} anymore Now the fdc and drive are passed in argument so that the function does not use current_fdc nor current_drive anymore. Link: https://lore.kernel.org/r/20200331094054.24441-20-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:55 +03:00
Willy Tarreau	3631a674a2	floppy: cleanup: make fdc_specify() not rely on current_{fdc,drive} anymore Now the fdc and drive are passed in argument so that the function does not use current_fdc nor current_drive anymore. Link: https://lore.kernel.org/r/20200331094054.24441-19-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:55 +03:00
Willy Tarreau	d5da6fa2b8	floppy: cleanup: make fdc_configure() not rely on current_fdc anymore Now the fdc is passed in argument so that the function does not use current_fdc anymore. Link: https://lore.kernel.org/r/20200331094054.24441-18-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:55 +03:00
Willy Tarreau	197c7ffdb8	floppy: cleanup: make perpendicular_mode() not rely on current_fdc anymore Now the fdc is passed in argument so that the function does not use current_fdc anymore. It's worth noting that there's still a single raw_cmd pointer specific to the current fdc. It may make sense to have one per fdc in the future. In addition, cont->done() still relies on the current drive and current raw_cmd. Link: https://lore.kernel.org/r/20200331094054.24441-17-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:54 +03:00
Willy Tarreau	3ab12a1820	floppy: cleanup: make need_more_output() not rely on current_fdc anymore Now the fdc is passed in argument so that the function does not use current_fdc anymore. Link: https://lore.kernel.org/r/20200331094054.24441-16-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:54 +03:00
Willy Tarreau	96dad77a65	floppy: cleanup: make result() not rely on current_fdc anymore Now the fdc is passed in argument so that the function does not use current_fdc anymore. It's worth noting that there's still a single reply_buffer[] which will store the result for the current fdc. It may or may not make sense to implement one buffer per fdc in the future. Link: https://lore.kernel.org/r/20200331094054.24441-15-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:54 +03:00
Willy Tarreau	f8a8e0f7a8	floppy: cleanup: make output_byte() not rely on current_fdc anymore Now the fdc is passed in argument so that the function does not use current_fdc anymore. Link: https://lore.kernel.org/r/20200331094054.24441-14-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:54 +03:00
Willy Tarreau	5ea00bfc52	floppy: cleanup: make wait_til_ready() not rely on current_fdc anymore Now the fdc is passed in argument so that the function does not use current_fdc anymore. Link: https://lore.kernel.org/r/20200331094054.24441-13-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:54 +03:00
Willy Tarreau	6d494ed037	floppy: cleanup: make show_floppy() not rely on current_fdc anymore Now the fdc is passed in argument so that the function does not use current_fdc anymore. Link: https://lore.kernel.org/r/20200331094054.24441-12-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:54 +03:00
Willy Tarreau	f3e0dc1d8b	floppy: cleanup: make reset_fdc_info() not rely on current_fdc anymore Now the fdc is passed in argument so that the function does not use current_fdc anymore. Link: https://lore.kernel.org/r/20200331094054.24441-11-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:54 +03:00
Willy Tarreau	c1f710b5fe	floppy: cleanup: make twaddle() not rely on current_{fdc,drive} anymore Now the fdc and drive are passed in argument so that the function does not use current_fdc nor current_drive anymore. Link: https://lore.kernel.org/r/20200331094054.24441-10-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:53 +03:00
Willy Tarreau	e72e8bf1c9	floppy: split the base port from the register in I/O accesses Currently we have architecture-specific fd_inb() and fd_outb() functions or macros, taking just a port which is in fact made of a base address and a register. The base address is FDC-specific and derived from the local or global "fdc" variable through the FD_IOPORT macro used in the base address calculation. This change splits this by explicitly passing the FDC's base address and the register separately to fd_outb() and fd_inb(). It affects the following archs: - x86, alpha, mips, powerpc, parisc, arm, m68k: simple remap of port -> base+reg - sparc32: use of reg only, since the base address was already masked out and the FDC controller is known from a static struct. - sparc64: like x86 for PCI, like sparc32 for 82077 Some archs use inline functions and others macros. This was not unified in order to minimize the number of changes to review. For the same reason checkpatch still spews a few warnings about things that were already there before. The parisc still uses hard-coded register values and could be cleaned up by taking the register definitions. The sparc per-controller inb/outb functions could further be refined to explicitly take an FDC register instead of a port in argument but it was not needed yet and may be cleaned later. Link: https://lore.kernel.org/r/20200331094054.24441-2-w@1wt.eu Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Ian Molton <spyro@f2s.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: x86@kernel.org Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com>	2020-05-12 19:34:52 +03:00
Stephen Rothwell	ae979182eb	bdi: fix up for "remove the name field in struct backing_dev_info" Fixes: `1cd925d583` ("bdi: remove the name field in struct backing_dev_info") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-11 09:08:26 -06:00
Jens Axboe	873f1c8df7	Merge branch 'block-5.7' into for-5.8/block Pull in block-5.7 fixes for 5.8. Mostly to resolve a conflict with the blk-iocost changes, but we also need the base of the bdi use-after-free as well as we build on top of it. * block-5.7: nvme: fix possible hang when ns scanning fails during error recovery nvme-pci: fix "slimmer CQ head update" bdi: add a ->dev_name field to struct backing_dev_info bdi: use bdi_dev_name() to get device name bdi: move bdi_dev_name out of line vboxsf: don't use the source name in the bdi name iocost: protect iocg->abs_vdebt with iocg->waitq.lock block: remove the bd_openers checks in blk_drop_partitions nvme: prevent double free in nvme_alloc_ns() error handling null_blk: Cleanup zoned device initialization null_blk: Fix zoned command handling block: remove unused header blk-iocost: Fix error on iocost_ioc_vrate_adj bdev: Reduce time holding bd_mutex in sync in blkdev_close() buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:13:58 -06:00
Christoph Hellwig	a711d91cd9	block: add a cdrom_device_info pointer to struct gendisk Add a pointer to the CDROM information structure to struct gendisk. This will allow various removable media file systems to call directly into the CDROM layer instead of abusing ioctls with kernel pointers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-04 10:13:42 -06:00
Ira Weiny	efbe3c2493	fs: Remove unneeded IS_DAX() check in io_is_direct() Remove the check because DAX now has it's own read/write methods and file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on their own. Therefore, it does not matter if the file state is DAX when the iocb flags are created. Also remove io_is_direct() as it is just a simple flag check. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2020-05-04 08:49:39 -07:00
Stefan Hajnoczi	90b5feb8c4	virtio-blk: handle block_device_operations callbacks after hot unplug A userspace process holding a file descriptor to a virtio_blk device can still invoke block_device_operations after hot unplug. This leads to a use-after-free accessing vblk->vdev in virtblk_getgeo() when ioctl(HDIO_GETGEO) is invoked: BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 IP: [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio] PGD 800000003a92f067 PUD 3a930067 PMD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G OE ------------ 3.10.0-1062.el7.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 task: ffff9be5fbfb8000 ti: ffff9be5fa890000 task.ti: ffff9be5fa890000 RIP: 0010:[<ffffffffc00e5450>] [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio] RSP: 0018:ffff9be5fa893dc8 EFLAGS: 00010246 RAX: ffff9be5fc3f3400 RBX: ffff9be5fa893e30 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff9be5fbc10b40 RBP: ffff9be5fa893dc8 R08: 0000000000000301 R09: 0000000000000301 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9be5fdc24680 R13: ffff9be5fbc10b40 R14: ffff9be5fbc10480 R15: 0000000000000000 FS: 00007f1bfb968740(0000) GS:ffff9be5ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000090 CR3: 000000003a894000 CR4: 0000000000360ff0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: [<ffffffffc016ac37>] virtblk_getgeo+0x47/0x110 [virtio_blk] [<ffffffff8d3f200d>] ? handle_mm_fault+0x39d/0x9b0 [<ffffffff8d561265>] blkdev_ioctl+0x1f5/0xa20 [<ffffffff8d488771>] block_ioctl+0x41/0x50 [<ffffffff8d45d9e0>] do_vfs_ioctl+0x3a0/0x5a0 [<ffffffff8d45dc81>] SyS_ioctl+0xa1/0xc0 A related problem is that virtblk_remove() leaks the vd_index_ida index when something still holds a reference to vblk->disk during hot unplug. This causes virtio-blk device names to be lost (vda, vdb, etc). Fix these issues by protecting vblk->vdev with a mutex and reference counting vblk so the vd_index_ida index can be removed in all cases. Fixes: `48e4043d45` ("virtio: add virtio disk geometry feature") Reported-by: Lance Digby <ldigby@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200430140442.171016-1-stefanha@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>	2020-05-02 10:28:13 -04:00
Linus Torvalds	3d29cb17ba	block-5.7-2020-04-24 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6jKKUQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpoYCEADA5naNC7RC7XAB90tyZrqUAGd33pUGyu86 ZDc3xyd9V51xj21IoIUWLF7yqR+NFVnhEKcZVAHgZcTnHRAzT2opTV0NkkfseiUA p0ozevwJR6K++X/fefHZNYjCPcmFiC3FFTlNALBBBtTcIVKQKAYaX7fNEp/hrJOE njrkaujqqtq4QA4d7iPC3pXTn0mFC64+9lsBS67YG+qSKq/nM1Grjsw+eANTwKqZ +uBPJzDAEkqlqVQ3H16tLFb631agNEfgE0+KyLDufMNlahZ9n4+lBJWBKoeKXLCW 2OGjhq3MeIVZbvVtpnoVJBlxmECGr+d5PfuZc9Nn+v3XPWW48RLZg15BlFlV60JQ uRTMWfokpTFUEYIO6Rb7J/1Jz2XWgGZzxX3SPVKwLRtk6um/vgtjloD0KFKY9j3P YhzMVDyORqV8URk7TYkCYRDYkiOJ7bsJ0RiSirU9i6Mt8hAtW8cMTYcFWRCA/sbA 6N92E87YyiFLajclR5YVeZeBDjRYeZ6/6rK0MtXcqMQLTU6GfPSTb/D5tJ5BPCyi 2XI23vPeGtq8cN6dyB39y0l1NcP7/x6wnJesja+zDbOqfkkk07BBbzQey2hD2zBl LbM+7G6EQLASbI9lgzCRD/2EbZXi2OkqI3CqBAvw8aYh/t2brDw9+e6ShlnEa5JU eQfw1WGhkg== =06Zn -----END PGP SIGNATURE----- Merge tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A few fixes/changes that should go into this release: - null_blk zoned fixes (Damien) - blkdev_close() sync improvement (Douglas) - Fix regression in blk-iocost that impacted (at least) systemtap (Waiman) - Comment fix, header removal (Zhiqiang, Jianpeng)" * tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block: null_blk: Cleanup zoned device initialization null_blk: Fix zoned command handling block: remove unused header blk-iocost: Fix error on iocost_ioc_vrate_adj bdev: Reduce time holding bd_mutex in sync in blkdev_close() buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.	2020-04-24 12:44:19 -07:00
Damien Le Moal	d205bde78f	null_blk: Cleanup zoned device initialization Move all zoned mode related code from null_blk_main.c to null_blk_zoned.c, avoiding an ugly #ifdef in the process. Rename null_zone_init() into null_init_zoned_dev(), null_zone_exit() into null_free_zoned_dev() and add the new function null_register_zoned_dev() to finalize the zoned dev setup before add_disk(). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-23 09:35:09 -06:00
Damien Le Moal	9dd44c7e99	null_blk: Fix zoned command handling For write operations issued to a null_blk device with zoned mode enabled, the state and write pointer position of the zone targeted by the command should be checked before badblocks and memory backing are handled as the write may be first failed due to, for instance, a sector position not aligned with the zone write pointer. This order of checking for errors reflects more accuratly the behavior of physical zoned devices. Furthermore, the write pointer position of the target zone should be incremented only and only if no errors are reported by badblocks and memory backing handling. To fix this, introduce the small helper function null_process_cmd() which execute null_handle_badblocks() and null_handle_memory_backed() and use this function in null_zone_write() to correctly handle write requests to zoned null devices depending on the type and state of the write target zone. Also call this function in null_handle_zoned() to process read requests to zoned null devices. null_process_cmd() is called directly from null_handle_cmd() for regular null devices, resulting in no functional change for these type of devices. To have symmetric names, the function null_handle_zoned() is renamed to null_process_zoned_cmd(). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-23 09:35:09 -06:00
Linus Torvalds	189522da8b	virtio: fixes, cleanups Some bug fixes. Cleanup a couple of issues that surfaced meanwhile. Disable vhost on ARM with OABI for now - to be fixed fully later in the cycle or in the next release. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl6d6ZgPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpH3oH/0bJ6o+FiAi8xXgYqm9XXmswrZoZLahjyPay dA7Sz5nNKVtdSGH9o0wRdcekt0SOI3ilZSkv9nwt9ep/5YzC3brf2hry+nPvMTsA MhI3IAa7sK1vCXkftwOlx+SIeDfIwsqr+h4SCfMRxlIT0yAmOC8fl2ByT2dIbqnj dlzwczecHI9LPUEmRWiKH/4Tj5MPZN5IeFSIAE+nA/9cl5h4qVSfYtWD3Y4VQ82g Rv3mvVE+chaVbPxewaBZ8Y0Avti4tMyzsE0MY+dz5xfh+75hqMfygg//1osbEAbz SiL5dDcANe8Q+QOc/BxHdj4dqpqUp1ldV+3Lge9k4lWAGnsEMEk= =GZb2 -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes and cleanups from Michael Tsirkin: - Some bug fixes - Cleanup a couple of issues that surfaced meanwhile - Disable vhost on ARM with OABI for now - to be fixed fully later in the cycle or in the next release. * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (24 commits) vhost: disable for OABI virtio: drop vringh.h dependency virtio_blk: add a missing include virtio-balloon: Avoid using the word 'report' when referring to free page hinting virtio-balloon: make virtballoon_free_page_report() static vdpa: fix comment of vdpa_register_device() vdpa: make vhost, virtio depend on menu vdpa: allow a 32 bit vq alignment drm/virtio: fix up for include file changes remoteproc: pull in slab.h rpmsg: pull in slab.h virtio_input: pull in slab.h remoteproc: pull in slab.h virtio-rng: pull in slab.h virtgpu: pull in uaccess.h tools/virtio: make asm/barrier.h self contained tools/virtio: define aligned attribute virtio/test: fix up after IOTLB changes vhost: Create accessors for virtqueues private_data vdpasim: Return status in vdpasim_get_status ...	2020-04-21 12:27:18 -07:00
Michael S. Tsirkin	55a2415bef	virtio_blk: add a missing include virtio_blk uses VIRTIO_RING_F_INDIRECT_DESC, pull in the header defining that value. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-04-17 06:05:30 -04:00
Ilya Dryomov	8ae0299a4b	rbd: don't mess with a page vector in rbd_notify_op_lock() rbd_notify_op_lock() isn't interested in a notify reply. Instead of accepting that page vector just to free it, have watch-notify code take care of it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>	2020-04-13 08:55:49 +02:00
Ilya Dryomov	b877605152	rbd: don't test rbd_dev->opts in rbd_dev_image_release() rbd_dev->opts is used to distinguish between the image that is being mapped and a parent. However, because we no longer establish watch for read-only mappings, this test is imprecise and results in unnecessary rbd_unregister_watch() calls. Make it consistent with need_watch in rbd_dev_image_probe(). Fixes: `b9ef2b8858` ("rbd: don't establish watch for read-only mappings") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>	2020-04-13 08:55:49 +02:00
Ilya Dryomov	952c48b0ed	rbd: call rbd_dev_unprobe() after unwatching and flushing notifies rbd_dev_unprobe() is supposed to undo most of rbd_dev_image_probe(), including rbd_dev_header_info(), which means that rbd_dev_header_info() isn't supposed to be called after rbd_dev_unprobe(). However, rbd_dev_image_release() calls rbd_dev_unprobe() before rbd_unregister_watch(). This is racy because a header update notify can sneak in: "rbd unmap" thread ceph-watch-notify worker rbd_dev_image_release() rbd_dev_unprobe() free and zero out header rbd_watch_cb() rbd_dev_refresh() rbd_dev_header_info() read in header The same goes for "rbd map" because rbd_dev_image_probe() calls rbd_dev_unprobe() on errors. In both cases this results in a memory leak. Fixes: `fd22aef8b4` ("rbd: move rbd_unregister_watch() call into rbd_dev_image_release()") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>	2020-04-13 08:55:49 +02:00
Ilya Dryomov	0e4e1de5b6	rbd: avoid a deadlock on header_rwsem when flushing notifies rbd_unregister_watch() flushes notifies and therefore cannot be called under header_rwsem because a header update notify takes header_rwsem to synchronize with "rbd map". If mapping an image fails after the watch is established and a header update notify sneaks in, we deadlock when erroring out from rbd_dev_image_probe(). Move watch registration and unregistration out of the critical section. The only reason they were put there was to make header_rwsem management slightly more obvious. Fixes: `811c668877` ("rbd: fix rbd map vs notify races") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>	2020-04-13 08:55:49 +02:00
Linus Torvalds	e6383b185a	xen: branch for v5.7-rc1b -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXpAQNgAKCRCAXGG7T9hj voLNAP9VWlSX7Whn4o9fndit2HyqDpOo7fQKiuU4XtDd++FG6QD/Zcu201B8ZP8M rkbeFthX+W9PAyZ0itf1vCL4fQoR7gw= =pRJH -----END PGP SIGNATURE----- Merge tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - two cleanups - fix a boot regression introduced in this merge window - fix wrong use of memory allocation flags * tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: fix booting 32-bit pv guest x86/xen: make xen_pvmmu_arch_setup() static xen/blkfront: fix memory allocation flags in blkfront_setup_indirect() xen: Use evtchn_type_t as a type for event channels	2020-04-10 17:20:06 -07:00
Linus Torvalds	8df2a0a6da	block-5.7-2020-04-10 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6QhDIQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpsE/EADOQ0xDMOa8EmzRvjuCkiaB9yK2zXiBSAj5 ZBi7ReownfXhCR7nVc8Bv1s2f00PD6CFNURXZmdgyDDrXEd2ojueDoAZNBk59t0e i2CAF2wLAQ5EfuVaxSHVEOrVEmtu+ue+Ix83JNlnGPd7pf9s7uKc/W4iKGpgpxIo 1CpXmWwm5RwjX4z/Qsiaka2lB7QojjImp1n3C+XI5+pp/bJXiftep1lxH5Y3nSWU iR4jO81uxDMxhTEZ9z2cb1HarhctKvnihcb39gQYQ/kYYu7hSZnBPZo5zp5Dyb/t 4tGuDsfXCQCbF0smkusUrcyeT19vh9tOsGkiMzJ/ihm7TMyN4fT23h6DUb/7pAON jnlcB7r5Ezs8jLz9i+mAoq06djd5u54kiuKFog8170sTrtYsncZbyc01wLNAla/V /6KX1sMbPlbXZ+a3l3i7i/gcCBJ7ci6pV3x2elvM9dKHxyqJmwEGMlFVwt4s26ev wS+7+dktLAC73889Zyn8LutA4bWy5FmisSPA4PydSUSOZA+7JjlbILcz15jjwlP2 HzYk+TXsd3yJUQRYX5P0FcDaBUTISr/xeUUB+KT1rLv4Lhtso+S/9cvSc8x5mOa9 989gmqNfFAWoj1nKEIKeRwLjk0b6YA9qMv4jOwwiuobsT55aBxpbP80huNoRVj5L xFIWgBSwzg== =3woC -----END PGP SIGNATURE----- Merge tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Here's a set of fixes that should go into this merge window. This contains: - NVMe pull request from Christoph with various fixes - Better discard support for loop (Evan) - Only call ->commit_rqs() if we have queued IO (Keith) - blkcg offlining fixes (Tejun) - fix (and fix the fix) for busy partitions" * tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block: block: fix busy device checking in blk_drop_partitions again block: fix busy device checking in blk_drop_partitions nvmet-rdma: fix double free of rdma queue blk-mq: don't commit_rqs() if none were queued nvme-fc: Revert "add module to ops template to allow module references" nvme: fix deadlock caused by ANA update wrong locking nvmet-rdma: fix bonding failover possible NULL deref loop: Better discard support for block devices loop: Report EOPNOTSUPP properly nvmet: fix NULL dereference when removing a referral nvme: inherit stable pages constraint in the mpath stack device blkcg: don't offline parent blkcg first blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it nvme-tcp: fix possible crash in recv error flow nvme-tcp: don't poll a non-live queue nvme-tcp: fix possible crash in write_zeroes processing nvmet-fc: fix typo in comment nvme-rdma: Replace comma with a semicolon nvme-fcloop: fix deallocation of working context nvme: fix compat address handling in several ioctls	2020-04-10 10:06:54 -07:00
Linus Torvalds	fcc95f0640	The main items are: - support for asynchronous create and unlink (Jeff Layton). Creates and unlinks are satisfied locally, without waiting for a reply from the MDS, provided the client has been granted appropriate caps (new in v15.y.z ("Octopus") release). This can be a big help for metadata heavy workloads such as tar and rsync. Opt-in with the new nowsync mount option. - multiple blk-mq queues for rbd (Hannes Reinecke and myself). When the driver was converted to blk-mq, we settled on a single blk-mq queue because of a global lock in libceph and some other technical debt. These have since been addressed, so allocate a queue per CPU to enhance parallelism. - don't hold onto caps that aren't actually needed (Zheng Yan). This has been our long-standing behavior, but it causes issues with some active/standby applications (synchronous I/O, stalls if the standby goes down, etc). - .snap directory timestamps consistent with ceph-fuse (Luis Henriques) -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl6OEO4THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi0XNB/wItYipkjlL5fIUBqiRWzYai72DWdPp CnOZo8LB+O0MQDomPT6DdpU1OMlWZi5HF7zklrZ35LTm21UkRNC9zvccjs9l66PJ qo9cKJbxxju+hgzIvgvK9PjlDlaiFAc/pkF8lZ/NaOnSsM1vvsFL9IuY2LXS38MY A/uUTZNUnFy5udam8TPuN+gWwZcUIH48lRWQLWe2I/hNJSweX1l8OHvecOBg+cYH G+8vb7mLU2V9ky0YT5JJmVxUV3CWA5wH6ZrWWy1ofVDdeSFLPrhgWX6IMjaNq+Gd xPfxmly47uBviSqON9dMkiThgy0Qj7yi0Pvx+1sAZbD7aj/6A4qg3LX5 =GIX0 -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "The main items are: - support for asynchronous create and unlink (Jeff Layton). Creates and unlinks are satisfied locally, without waiting for a reply from the MDS, provided the client has been granted appropriate caps (new in v15.y.z ("Octopus") release). This can be a big help for metadata heavy workloads such as tar and rsync. Opt-in with the new nowsync mount option. - multiple blk-mq queues for rbd (Hannes Reinecke and myself). When the driver was converted to blk-mq, we settled on a single blk-mq queue because of a global lock in libceph and some other technical debt. These have since been addressed, so allocate a queue per CPU to enhance parallelism. - don't hold onto caps that aren't actually needed (Zheng Yan). This has been our long-standing behavior, but it causes issues with some active/standby applications (synchronous I/O, stalls if the standby goes down, etc). - .snap directory timestamps consistent with ceph-fuse (Luis Henriques)" * tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client: (49 commits) ceph: fix snapshot directory timestamps ceph: wait for async creating inode before requesting new max size ceph: don't skip updating wanted caps when cap is stale ceph: request new max size only when there is auth cap ceph: cleanup return error of try_get_cap_refs() ceph: return ceph_mdsc_do_request() errors from __get_parent() ceph: check all mds' caps after page writeback ceph: update i_requested_max_size only when sending cap msg to auth mds ceph: simplify calling of ceph_get_fmode() ceph: remove delay check logic from ceph_check_caps() ceph: consider inode's last read/write when calculating wanted caps ceph: always renew caps if mds_wanted is insufficient ceph: update dentry lease for async create ceph: attempt to do async create when possible ceph: cache layout in parent dir on first sync create ceph: add new MDS req field to hold delegated inode number ceph: decode interval_sets for delegated inos ceph: make ceph_fill_inode non-static ceph: perform asynchronous unlink if we have sufficient caps ceph: don't take refs to want mask unless we have all bits ...	2020-04-08 21:44:05 -07:00
Juergen Gross	3a169c0be7	xen/blkfront: fix memory allocation flags in blkfront_setup_indirect() Commit `1d5c76e664` ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation") didn't fix the issue it was meant to, as the flags for allocating the memory are GFP_NOIO, which will lead the memory allocation falling back to kmalloc(). So instead of GFP_NOIO use GFP_KERNEL and do all the memory allocation in blkfront_setup_indirect() in a memalloc_noio_{save,restore} section. Fixes: `1d5c76e664` ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation") Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20200403090034.8753-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>	2020-04-07 12:12:58 +02:00
Evan Green	c52abf5630	loop: Better discard support for block devices If the backing device for a loop device is itself a block device, then mirror the "write zeroes" capabilities of the underlying block device into the loop device. Copy this capability into both max_write_zeroes_sectors and max_discard_sectors of the loop device. The reason for this is that REQ_OP_DISCARD on a loop device translates into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This presents a consistent interface for loop devices (that discarded data is zeroed), regardless of the backing device type of the loop device. There should be no behavior change for loop devices backed by regular files. This change fixes blktest block/003, and removes an extraneous error print in block/013 when testing on a loop device backed by a block device that does not support discard. Signed-off-by: Evan Green <evgreen@chromium.org> Reviewed-by: Gwendal Grignou <gwendal@chromium.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [used updated version of Evan's comment in loop_config_discard()] [moved backingq to local scope, removed redundant braces] Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-03 13:44:22 -06:00
Evan Green	8cd55087dc	loop: Report EOPNOTSUPP properly Properly plumb out EOPNOTSUPP from loop driver operations, which may get returned when for instance a discard operation is attempted but not supported by the underlying block device. Before this change, everything was reported in the log as an I/O error, which is scary and not helpful in debugging. Signed-off-by: Evan Green <evgreen@chromium.org> Reviewed-by: Gwendal Grignou <gwendal@chromium.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-03 13:44:20 -06:00
Linus Torvalds	1592614838	for-5.7/drivers-2020-03-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6BJDYQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplhMD/95jd4nlVetHAo54z+Zk2ExE13+yDamRKyh vc7t2tz1reqFOimtVr5aVuTXCTgOx4CpiIox5qcn6qAExN4JtCChOBRGize/0u8S ckxnhHbN2C0rfnGldvrYYeNRonFI+7QKimnurWUSYYGN0xqbo21BxJ7dFaohMseo q4K8sIW0ctE6AOlw28Jerkg614s2NDGZ7q1laheXnYHn5c9f1m0NaKN/jyTGgr0X TLBiLbX2yRrAuvpctBj6Fna6YN7Vdd9jsf2Bt6ipUI1XgHQoVUGMxQNhWPyjsbSv GzRQUNAfVcasLzCP/Mj/47144OkUtDDpn2mjeXDaFljLDGFULD+jp/SsOmLCxkPC gI7G2yfBvF96/SOyT0JXrLyMcBd1R2vRoASbc5tPu82mZhx7YJZH5WYtOB9h2gra RTYo3xcm0EoN6yeMaH+xOuXxTWWInIrgKPONW4H8s7hxEiMt5oFNVBI7vqPr4LVp tpfxiKZDavKOofKXogNV4W7mSMP/Ir5Q9Ha4g5SXHBGp0z/PHmnQ0xDGNq0KDnU4 eNO0UYCFNCNa+0AOhpNxaVuVm9LjrgvyXRjePgOZQ4akhohwHO6DLrHK1f8Hb1vD 8Ih6uR+F5zZlKsouWro8HLGYm5w40Wq9tbCI8QbPYH6nkGoDmzpPv9jbAeWgJU5c KqP/5TBSLA== =Bs4E -----END PGP SIGNATURE----- Merge tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: - floppy driver cleanup series from Willy - NVMe updates and fixes (Various) - null_blk trace improvements (Chaitanya) - bcache fixes (Coly) - md fixes (via Song) - loop block size change optimizations (Martijn) - scnprintf() use (Takashi) * tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits) null_blk: add trace in null_blk_zoned.c null_blk: add tracepoint helpers for zoned mode block: add a zone condition debug helper nvme: cleanup namespace identifier reporting in nvme_init_ns_head nvme: rename __nvme_find_ns_head to nvme_find_ns_head nvme: refactor nvme_identify_ns_descs error handling nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl nvme: Fix controller creation races with teardown flow nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl nvme: Fix ctrl use-after-free during sysfs deletion nvme-pci: Re-order nvme_pci_free_ctrl nvme: Remove unused return code from nvme_delete_ctrl_sync nvme: Use nvme_state_terminal helper nvme: release ida resources nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO nvmet-tcp: optimize tcp stack TX when data digest is used nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow nvme-multipath: do not reset on unknown status nvmet-rdma: allocate RW ctxs according to mdts ...	2020-03-30 11:43:51 -07:00
Linus Torvalds	10f36b1e80	for-5.7/block-2020-03-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6BJCoQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpvziEACqQC+QRKiqR6X5yaPWJ9LqjKE7lfI1PUb7 0a1z1mKuf8d6z0qNleUwdSOEaS5zJiswou2K8GLvEtTQH41QYsQkxc9GLjAyTveK szAyzZaa3BNUy9hkczm9i2arv3fI8XoTE3JvRM0e9wL8fBJDYCtKtHFJvF4hisOQ ydaJlU6tcwzd9bdV7K5dLwBxu3AeAJjzS3Tyfw25u9N9O/btUxJ91RTqBb2+Xeoz AVasfRlAqf/CzdjxCCmDgWE2QM4852pAeQ7UJJBGISNWNoiwkezMg+6HD0jEOLee bQ8uDyQdihIWTY+/zQasotX8/71uLV8QgtjWLXR9zrjrubIBWHGzoWSQ4kPg5DfQ bJmKO0VvWN2sshZEpWvzzAFGYxZViNphbK2Pb4hKOcv7jtMcC8mmEogh/7EqbD/n KB3IM9qVoXM8INm5o0dTy5uDRJxiHiHYkqsZaKz55BB/R4Geym5TINT3nXgxhQrn JoSwp4zdm3/NJOySruDi2eETqWJC2bsz3FsQSyCQTPOuP0nLtFKBb1UKHpmYTCXG H4LCyCKFJ6s006qBcdaNPZBw1mrSNwoxEulHnpYA4BFfPeXi72yrnMZQkdwWONpW LIVuD0hBm8X/pulbvEEdjzXBqZVkqK3xFX+uX5+bnwwaUKddXAC/h9SQKpBP2Mbb AeZToMklKw== =6Glq -----END PGP SIGNATURE----- Merge tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - Online capacity resizing (Balbir) - Number of hardware queue change fixes (Bart) - null_blk fault injection addition (Bart) - Cleanup of queue allocation, unifying the node/no-node API (Christoph) - Cleanup of genhd, moving code to where it makes sense (Christoph) - Cleanup of the partition handling code (Christoph) - disk stat fixes/improvements (Konstantin) - BFQ improvements (Paolo) - Various fixes and improvements * tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block: (72 commits) block: return NULL in blk_alloc_queue() on error block: move bio_map_* to blk-map.c Revert "blkdev: check for valid request queue before issuing flush" block: simplify queue allocation bcache: pass the make_request methods to blk_queue_make_request null_blk: use blk_mq_init_queue_data block: add a blk_mq_init_queue_data helper block: move the ->devnode callback to struct block_device_operations block: move the part_stat* helpers from genhd.h to a new header block: move block layer internals out of include/linux/genhd.h block: move guard_bio_eod to bio.c block: unexport get_gendisk block: unexport disk_map_sector_rcu block: unexport disk_get_part block: mark part_in_flight and part_in_flight_rw static block: mark block_depr static block: factor out requeue handling from dispatch code block/diskstats: replace time_in_queue with sum of request times block/diskstats: accumulate all per-cpu counters in one pass block/diskstats: more accurate approximation of io_ticks for slow disks ...	2020-03-30 11:20:13 -07:00
Hannes Reinecke	f9b6b98d24	rbd: enable multiple blk-mq queues Allocate one queue per CPU and get a performance boost from higher parallelism. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Ilya Dryomov	59e542c869	rbd: embed image request in blk-mq pdu Avoid making allocations for !IMG_REQ_CHILD image requests. Only IMG_REQ_CHILD image requests need to be freed now. Move the initial request checks to rbd_queue_rq(). Unfortunately we can't fill the image request and kick the state machine directly from rbd_queue_rq() because ->queue_rq() isn't allowed to block. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Ilya Dryomov	a52cc68575	rbd: acquire header_rwsem just once in rbd_queue_workfn() Currently header_rwsem is acquired twice: once in rbd_dev_parent_get() when the image request is being created and then in rbd_queue_workfn() to capture mapping_size and snapc. Introduce rbd_img_capture_header() and move image request allocation so that header_rwsem can be acquired just once. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Ilya Dryomov	78b42a871a	rbd: get rid of img_request_layered_clear() No need to clear IMG_REQ_LAYERED before destroying the request. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Hannes Reinecke	679a97d286	rbd: kill img_request kref The reference counter is never increased, so we can as well call rbd_img_request_destroy() directly and drop the kref. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Ilya Dryomov	94f4857f4b	rbd: remove barriers from img_request_layered_{set,clear,test}() IMG_REQ_LAYERED is set in rbd_img_request_create(), and tested and cleared in rbd_img_request_destroy() when the image request is about to be destroyed. The barriers are unnecessary. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Chaitanya Kulkarni	766c3297d7	null_blk: add trace in null_blk_zoned.c With the help of previously added tracepoints we can now trace report-zones, zone-write and zone-mgmt ops in null_blk_zoned.c. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 13:39:10 -06:00
Chaitanya Kulkarni	c51d041998	null_blk: add tracepoint helpers for zoned mode This patch adds two new tracpoints for null_blk_zoned.c that allows us to trace report-zones, zone-mgmt-op and zone-write operations which has direct effect on the zone condition state machine. Also, we update drivers/block/Makefile so that new null_blk related tracefiles can be compiled. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 13:39:10 -06:00
Christoph Hellwig	3d745ea5b0	block: simplify queue allocation Current make_request based drivers use either blk_alloc_queue_node or blk_alloc_queue to allocate a queue, and then set up the make_request_fn function pointer and a few parameters using the blk_queue_make_request helper. Simplify this by passing the make_request pointer to blk_alloc_queue, and while at it merge the _node variant into the main helper by always passing a node_id, and remove the superfluous gfp_mask parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 10:23:43 -06:00
Christoph Hellwig	8d96a1117c	null_blk: use blk_mq_init_queue_data Use the new blk_mq_init_queue_data instead of open coding the queue allocation and initialization. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 10:23:43 -06:00
Christoph Hellwig	348e114bbd	block: move the ->devnode callback to struct block_device_operations There really isn't any good reason to stash a method directly into struct gendisk. Move it together with the other block device operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 09:50:05 -06:00
Christoph Hellwig	c6a564ffad	block: move the part_stat* helpers from genhd.h to a new header These macros are just used by a few files. Move them out of genhd.h, which is included everywhere into a new standalone header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-25 09:50:09 -06:00
Gustavo A. R. Silva	431d6e3eec	rsxx: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertenly introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-19 15:26:45 -06:00
Balbir Singh	3cbc28bb90	xen-blkfront.c: Convert to use set_capacity_revalidate_and_notify block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE notifications via uevents. Signed-off-by: Balbir Singh <sblbir@amazon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-18 15:13:21 -06:00
Balbir Singh	662155e289	virtio_blk.c: Convert to use set_capacity_revalidate_and_notify block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE notifications via uevents. Signed-off-by: Balbir Singh <sblbir@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-18 15:13:21 -06:00
Willy Tarreau	e83995c9f8	floppy: rename the global "fdc" variable to "current_fdc" This is done in order to remove the confusion that arises at some places in the code where local variables or arguments shadow the global variable. It is already visible that some places are a bit awkward and iterate over the global variable, for the sole reason that they used to rely on it being named "fdc" in order to get the correct address when using FD_DOR. These ones are easy to spot by searching for "for (current_fdc...". Some more cleanup is definitely possible. For example "fdc_state[current_fdc].somefield" is used all over the code and would probably be better with "fdc_state->somefield" with fdc_state being set when current_fdc is assigned. This would require to pass the pointer to the current state instead of the current_fdc to the I/O functions. Link: https://lore.kernel.org/r/20200301195555.11154-7-w@1wt.eu Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:58 -06:00
Willy Tarreau	e2032464fe	floppy: separate the FDC's base address from its registers FDC registers FD_STATUS, FD_DATA, FD_DOR, FD_DIR and FD_DCR used to be defined relative to FD_IOPORT, which is the FDC's base address, itself a macro depending on the "fdc" local or global variable. This patch changes this so that the register macros above now only reference the address offset, and that the FDC's address is explicitly passed in each call to fd_inb() and fd_outb(), thus removing the macro. With this change there is no more implicit usage of the local/global "fdc" variable. One place in the ARM code used to check if the port was equal to FD_DOR, this was changed to testing the register by applying a mask to the port, as was already done in the sparc code. There are still occurrences of fd_inb() and fd_outb() in the PARISC code and these ones remain unaffected since they already used to work with a base address and a register offset. The sparc, m68k and parisc code could now be slightly cleaned up to benefit from the macro definitions above instead of the equivalent hard-coded values. Link: https://lore.kernel.org/r/20200301195555.11154-6-w@1wt.eu Cc: Ian Molton <spyro@f2s.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:58 -06:00
Willy Tarreau	ac7018614d	floppy: introduce new functions fdc_inb() and fdc_outb() These two functions replace fd_inb() and fd_outb() in that they take the FDC in argument. This will ease the separation of the base address and the port everywhere the code is used. Link: https://lore.kernel.org/r/20200301195555.11154-5-w@1wt.eu Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:58 -06:00
Willy Tarreau	8fb3845023	floppy: cleanup: expand the reply_buffer macros Several macros were used to access reply_buffer[] at discrete positions without making it obvious they were relying on this. These ones have been replaced by their offset in the reply buffer to make these accesses more obvious. Link: https://lore.kernel.org/r/20200224212352.8640-11-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:58 -06:00
Willy Tarreau	76dabe7960	floppy: cleanup: expand the R/W / format command macros Various macros were used to access raw_cmd for R/W or format commands without making it obvious that raw_cmd->cmd[] was used. Let's expand the macros to make this more obvious. Link: https://lore.kernel.org/r/20200224212352.8640-10-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	2a34875279	floppy: cleanup: expand macro DRWE This macro doesn't bring much value and only slightly obfuscates the code by silently using global variable "current_drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-9-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	3bd7f87c68	floppy: cleanup: expand macro DRS This macro doesn't bring much value and only slightly obfuscates the code by silently using global variable "current_drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-8-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	031faabd80	floppy: cleanup: expand macro DP This macro doesn't bring much value and only slightly obfuscates the code by silently using global variable "current_drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-7-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	121e297955	floppy: cleanup: expand macro UDRWE This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-6-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	8d9d34e25a	floppy: cleanup: expand macro UDRS This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-5-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	1ce9ae9654	floppy: cleanup: expand macro UDP This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-4-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	f9d322bdb1	floppy: cleanup: expand macro UFDCS This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-3-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	de6048b843	floppy: cleanup: expand macro FDCS Macro FDCS silently uses identifier "fdc" which may be either the global one or a local one. Let's expand the macro to make this more obvious. Link: https://lore.kernel.org/r/20200224212352.8640-2-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Dongli Zhang	290df92a94	null_blk: describe the usage of fault injection param As null_blk is a very good start point to test block layer, this patch adds description and comments to 'timeout', 'requeue' and 'init_hctx' to explain how to use fault injection with null_blk. The nvme has similar with nvme_core.fail_request in the form of comment. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 18:54:28 -06:00
Alexey Dobriyan	ff77042296	null_blk: fix spurious IO errors after failed past-wp access Steps to reproduce: BLKRESETZONE zone 0 // force EIO pwrite(fd, buf, 4096, 4096); [issue more IO including zone ioctls] It will start failing randomly including IO to unrelated zones because of ->error "reuse". Trigger can be partition detection as well if test is not run immediately which is even more entertaining. The fix is of course to clear ->error where necessary. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 09:10:03 -06:00
Hou Pu	2c272542ba	nbd: requeue command if the soecket is changed In commit `2da22da573` (nbd: fix zero cmd timeout handling v2), it is allowed to reset timer when it fires if tag_set.timeout is set to zero. If the server is shutdown and a new socket is reconfigured, the request should be requeued to be processed by new server instead of waiting for response from the old one. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Hou Pu <houpu@bytedance.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 08:01:24 -06:00
Hou Pu	d970958b2d	nbd: enable replace socket if only one connection is configured Nbd server with multiple connections could be upgraded since `560bc4b` (nbd: handle dead connections). But if only one conncection is configured, after we take down nbd server, all inflight IO would finally timeout and return error. We could requeue them like what we do with multiple connections and wait for new socket in submit path. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Hou Pu <houpu@bytedance.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 07:58:39 -06:00
Jackie Liu	91dfa2dd81	block/drbd: delete invalid function drbd_md_mark_dirty_ We deleted last_md_mark_dirty long ago, this function no longer needs to exist, delete it, otherwise a compilation error will occur when DEBUG is opened. Fixes: `ac0acb9e39` ("drbd: use drbd_device_post_work() in more place") Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 07:49:09 -06:00
Takashi Iwai	0348510490	block: aoe: Use scnprintf() for avoiding potential buffer overflow Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 07:39:04 -06:00
Martijn Coenen	0fbcf57982	loop: Only freeze block queue when needed. __loop_update_dio() can be called as a part of loop_set_fd(), when the block queue is not yet up and running; avoid freezing the block queue in that case, since that is an expensive operation. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Martijn Coenen <maco@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 14:10:43 -06:00
Martijn Coenen	7e81f99afd	loop: Only change blocksize when needed. Return early in loop_set_block_size() if the requested block size is identical to the one we already have; this avoids expensive calls to freeze the block queue. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martijn Coenen <maco@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 14:10:41 -06:00
Bart Van Assche	596444e757	null_blk: Add support for init_hctx() fault injection This makes it possible to test the error path in blk_mq_realloc_hw_ctxs() and also several error paths in null_blk. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 07:09:59 -06:00
Bart Van Assche	9b03b71308	null_blk: Handle null_add_dev() failures properly If null_add_dev() fails then null_del_dev() is called with a NULL argument. Make null_del_dev() handle this scenario correctly. This patch fixes the following KASAN complaint: null-ptr-deref in null_del_dev+0x28/0x280 [null_blk] Read of size 8 at addr 0000000000000000 by task find/1062 Call Trace: dump_stack+0xa5/0xe6 __kasan_report.cold+0x65/0x99 kasan_report+0x16/0x20 __asan_load8+0x58/0x90 null_del_dev+0x28/0x280 [null_blk] nullb_group_drop_item+0x7e/0xa0 [null_blk] client_drop_item+0x53/0x80 [configfs] configfs_rmdir+0x395/0x4e0 [configfs] vfs_rmdir+0xb6/0x220 do_rmdir+0x238/0x2c0 __x64_sys_unlinkat+0x75/0x90 do_syscall_64+0x6f/0x2f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 07:09:59 -06:00
Bart Van Assche	2004bfdef9	null_blk: Fix the null_add_dev() error path If null_add_dev() fails, clear dev->nullb. This patch fixes the following KASAN complaint: BUG: KASAN: use-after-free in nullb_device_submit_queues_store+0xcf/0x160 [null_blk] Read of size 8 at addr ffff88803280fc30 by task check/8409 Call Trace: dump_stack+0xa5/0xe6 print_address_description.constprop.0+0x26/0x260 __kasan_report.cold+0x7b/0x99 kasan_report+0x16/0x20 __asan_load8+0x58/0x90 nullb_device_submit_queues_store+0xcf/0x160 [null_blk] configfs_write_file+0x1c4/0x250 [configfs] __vfs_write+0x4c/0x90 vfs_write+0x145/0x2c0 ksys_write+0xd7/0x180 __x64_sys_write+0x47/0x50 do_syscall_64+0x6f/0x2f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7ff370926317 Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007fff2dd2da48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff370926317 RDX: 0000000000000002 RSI: 0000559437ef23f0 RDI: 0000000000000001 RBP: 0000559437ef23f0 R08: 000000000000000a R09: 0000000000000001 R10: 0000559436703471 R11: 0000000000000246 R12: 0000000000000002 R13: 00007ff370a006a0 R14: 00007ff370a014a0 R15: 00007ff370a008a0 Allocated by task 8409: save_stack+0x23/0x90 __kasan_kmalloc.constprop.0+0xcf/0xe0 kasan_kmalloc+0xd/0x10 kmem_cache_alloc_node_trace+0x129/0x4c0 null_add_dev+0x24a/0xe90 [null_blk] nullb_device_power_store+0x1b6/0x270 [null_blk] configfs_write_file+0x1c4/0x250 [configfs] __vfs_write+0x4c/0x90 vfs_write+0x145/0x2c0 ksys_write+0xd7/0x180 __x64_sys_write+0x47/0x50 do_syscall_64+0x6f/0x2f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 8409: save_stack+0x23/0x90 __kasan_slab_free+0x112/0x160 kasan_slab_free+0x12/0x20 kfree+0xdf/0x250 null_add_dev+0xaf3/0xe90 [null_blk] nullb_device_power_store+0x1b6/0x270 [null_blk] configfs_write_file+0x1c4/0x250 [configfs] __vfs_write+0x4c/0x90 vfs_write+0x145/0x2c0 ksys_write+0xd7/0x180 __x64_sys_write+0x47/0x50 do_syscall_64+0x6f/0x2f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: `2984c8684f` ("nullb: factor disk parameters") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 07:09:59 -06:00
Bart Van Assche	78b10be23d	null_blk: Fix changing the number of hardware queues Instead of initializing null_blk hardware queues explicitly after the request queue has been created, provide .init_hctx() and .exit_hctx() callback functions. The latter functions are not only called during request queue allocation but also when the number of hardware queues changes. Allocate nr_cpu_ids queues during initialization to support increasing the number of hardware queues above the initial hardware queue count. This change fixes increasing the number of hardware queues above the initial number of hardware queues and also keeps nullb->nr_queues in sync with the number of hardware queues. Fixes: `45919fbfe1` ("null_blk: Enable modifying 'submit_queues' after an instance has been configured") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 07:09:59 -06:00
Bart Van Assche	b9853b4d6f	null_blk: Suppress an UBSAN complaint triggered when setting 'memory_backed' Although it is not clear to me why UBSAN complains when 'memory_backed' is set, this patch suppresses the UBSAN complaint that is triggered when setting that configfs attribute. UBSAN: Undefined behaviour in drivers/block/null_blk_main.c:327:1 load of value 16 is not a valid value for type '_Bool' CPU: 2 PID: 8396 Comm: check Not tainted 5.6.0-rc1-dbg+ #14 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0xa5/0xe6 ubsan_epilogue+0x9/0x26 __ubsan_handle_load_invalid_value+0x6d/0x76 nullb_device_memory_backed_store.cold+0x2c/0x38 [null_blk] configfs_write_file+0x1c4/0x250 [configfs] __vfs_write+0x4c/0x90 vfs_write+0x145/0x2c0 ksys_write+0xd7/0x180 __x64_sys_write+0x47/0x50 do_syscall_64+0x6f/0x2f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 07:09:59 -06:00
Linus Torvalds	7de41b120b	virtio: fixes Some bug fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl5kvHgPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpMHYH/i3YXD+xcmA6t4hVQp7w+w2Lp0HK/zGCY+nh CZEcH0DThaNfUSZeCANb3BndHp2e7rcKydNdGDQN3q1lC6jmRq+O98ZoR7TDlTLt jIKlGgR+YyCGBkl5HpEEaqUI4YbtgdtZtYOilwPcYQCbTz0SkRI8avcIQbHplttW NsxuvohrVyfCCb+VWVdnXy94A4YHI5tq4Ups/I/NkloxXnKcJ99GrlHWWWKa6oJG HEi67oqVZO4MImPBkA1zekf4mbThbI+FL5gETUvkr6v4cSYa69mqyIt27Ft/e87M 5EJp7GnH0HasZCHVAeGs8Qs09zX+AqPO2aMnoPhKm/mUhWu6gNo= =34RW -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes from Michael Tsirkin: "Some bug fixes all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_balloon: Adjust label in virtballoon_probe virtio-blk: improve virtqueue error to BLK_STS virtio-blk: fix hw_queue stopped on arbitrary error virtio_ring: Fix mem leak with vring_new_virtqueue()	2020-03-09 16:02:32 -07:00
Halil Pasic	3d973b2e9a	virtio-blk: improve virtqueue error to BLK_STS Let's change the mapping between virtqueue_add errors to BLK_STS statuses, so that -ENOSPC, which indicates virtqueue full is still mapped to BLK_STS_DEV_RESOURCE, but -ENOMEM which indicates non-device specific resource outage is mapped to BLK_STS_RESOURCE. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Link: https://lore.kernel.org/r/20200213123728.61216-3-pasic@linux.ibm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-03-08 05:35:24 -04:00
Halil Pasic	f5f6b95c72	virtio-blk: fix hw_queue stopped on arbitrary error Since nobody else is going to restart our hw_queue for us, the blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient necessarily sufficient to ensure that the queue will get started again. In case of global resource outage (-ENOMEM because mapping failure, because of swiotlb full) our virtqueue may be empty and we can get stuck with a stopped hw_queue. Let us not stop the queue on arbitrary errors, but only on -EONSPC which indicates a full virtqueue, where the hw_queue is guaranteed to get started by virtblk_done() before when it makes sense to carry on submitting requests. Let us also remove a stale comment. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Fixes: `f7728002c1` ("virtio_ring: fix return code on DMA mapping fails") Link: https://lore.kernel.org/r/20200213123728.61216-2-pasic@linux.ibm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-03-08 05:35:24 -04:00
Linus Torvalds	cbee7c8b44	xen: branch for v5.6-rc5 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXmNp4AAKCRCAXGG7T9hj vmPeAP42nekgUNbUzEuei1/v4bJoepxIg22UXTVnjWwx9JVQKgEA+fgswmyy4NN2 Ab7ty2zw1s3Vwhoq909lWNIJdz/+1wI= =C3CJ -----END PGP SIGNATURE----- Merge tag 'for-linus-5.6b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Four fixes and a small cleanup patch: - two fixes by Dongli Zhang fixing races in the xenbus driver - two fixes by me fixing issues introduced in 5.6 - a small cleanup by Gustavo Silva replacing a zero-length array with a flexible-array" * tag 'for-linus-5.6b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/blkfront: fix ring info addressing xen/xenbus: fix locking xenbus: req->err should be updated before req->state xenbus: req->body should be updated before req->state xen: Replace zero-length array with flexible-array member	2020-03-07 08:04:54 -06:00
Juergen Gross	4ab50af63d	xen/blkfront: fix ring info addressing Commit `0265d6e8dd` ("xen/blkfront: limit allocated memory size to actual use case") made struct blkfront_ring_info size dynamic. This is fine when running with only one queue, but with multiple queues the addressing of the single queues has to be adapted as the structs are allocated in an array. Fixes: `0265d6e8dd` ("xen/blkfront: limit allocated memory size to actual use case") Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20200305155129.28326-1-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2020-03-05 09:55:01 -06:00
Linus Torvalds	7557c1b3f7	SCSI fixes on 20200229 Four small fixes. Three are in drivers for fairly obvious bugs. The fourth is a set of regressions introduced by the compat_ioctl changes because some of the compat updates wrongly replaced .ioctl instead of .compat_ioctl. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXlpxDCYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSXsAPwOGPkU ObFbUs75Tdmk1M7jqtxgBsNhuNta0S8d7dJ3aAEA/YBtGGQWoeEGivUKwzwA4cwL 1w1GbhPEblpMNO8keVA= =I7qk -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four small fixes. Three are in drivers for fairly obvious bugs. The fourth is a set of regressions introduced by the compat_ioctl changes because some of the compat updates wrongly replaced .ioctl instead of .compat_ioctl" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places scsi: zfcp: fix wrong data and display format of SFP+ temperature scsi: sd_sbc: Fix sd_zbc_report_zones() scsi: libfc: free response frame from GPN_ID	2020-02-29 09:58:47 -06:00
Linus Torvalds	2edc78b9a4	block-5.6-2020-02-28 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl5ZXl0QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmltEACSA4yxdvWsVMYRCijjm/FzBEq7C8PSsWNK H8KPmjQiNpbSiZSi1uMVsHMlhBmBM8ZQ6Zc+gbZSs6xMqa4yP/iRtmzxnGonC7TB f5Ne2QuC0+TKMFJJTG8cCTzrgEOrWYkFKkmabzDml7HtloJtuzgArrmPzRj2sUfY J+d0osdp1b4U4sqhhAnxSm/zYJkGrQb+9UgNdVjhZCUzaX6oCcuK8xUwu2reLGlM qPkSKOywnl3WHCSCJXsCrNLKX0QWtIfMzlWDr40GYgHauPBbWfa8+1yHR1/lWP4R zyxGk63I9f6/+iQSUC72wP77bAVWKW674c53jgd7r1pNL9TiuK+a3E4lgf7eU+rl ymA/rM6Iy3SjTgiLT57PPOecsILJns3cwZ6mhvSRs0+zpao7LOQZXWdu9V0+Fyqo jur+7Ll/Qfdv/CLlM94DeBJtwhaTWiHTfDoaDHlG9p1/vvcWWXTUTIVPwAD+YGbj geio/bIWECnQxDtZL5Jikf5zsC76aQ46vvxK4F6RJlXj6jaugIbN3mWLsg17sUVf Y4h+IEVtQr0zA0LkPrfVdAS9IqVlTrMRDCkrrlhsDt7FI0orCOag7JOcmN2/nPn/ 2H22nl6i02b0gdGrScU5pyBswSPaImddH5tqE9uL2rK4hrFe6oKxL5EicTFDZmTh tHnukoc+Yg== =1bzv -----END PGP SIGNATURE----- Merge tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - Passthrough insertion fix (Ming) - Kill off some unused arguments (John) - blktrace RCU fix (Jan) - Dead fields removal for null_blk (Dongli) - NVMe polled IO fix (Bijan) * tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block: nvme-pci: Hold cq_poll_lock while completing CQEs blk-mq: Remove some unused function arguments null_blk: remove unused fields in 'nullb_cmd' blktrace: Protect q->blk_trace with RCU blk-mq: insert passthrough request into hctx->dispatch directly	2020-02-28 11:43:30 -08:00
Dongli Zhang	93d7c31858	null_blk: remove unused fields in 'nullb_cmd' 'list', 'll_list' and 'csd' are no longer used. The 'list' is not used since it was introduced by commit `f2298c0403` ("null_blk: multi queue aware block test driver"). The 'll_list' is no longer used since commit `3c395a969a` ("null_blk: set a separate timer for each command"). The 'csd' is no longer used since commit `ce2c350b2c` ("null_blk: use blk_complete_request and blk_mq_complete_request"). Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-02-25 09:43:29 -07:00
Adam Williamson	03264ddde2	scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places Arnd Bergmann inadvertently typoed these in `d320a9551e` and 64cbfa96551a; they seem to be the cause of https://bugzilla.redhat.com/show_bug.cgi?id=1801353 , invalid SCSI commands when udev tries to query a DVD drive. [arnd] Found another instance of the same bug, also introduced in my compat_ioctl series. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1801353 Link: https://lore.kernel.org/r/20200219165139.3467320-1-arnd@arndb.de Fixes: `c103d6ee69` ("compat_ioctl: ide: floppy: add handler") Fixes: `64cbfa9655` ("compat_ioctl: move cdrom commands into cdrom.c") Fixes: `d320a9551e` ("compat_ioctl: scsi: move ioctl handling into drivers") Bisected-by: Chris Murphy <bugzilla@colorremedies.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Adam Williamson <awilliam@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-02-24 15:06:07 -05:00
Linus Torvalds	2e90ca68b0	floppy: check FDC index for errors before assigning it Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in wait_til_ready(). Which on the face of it can't happen, since as Willy Tarreau points out, the function does no particular memory access. Except through the FDCS macro, which just indexes a static allocation through teh current fdc, which is always checked against N_FDC. Except the checking happens after we've already assigned the value. The floppy driver is a disgrace (a lot of it going back to my original horrd "design"), and has no real maintainer. Nobody has the hardware, and nobody really cares. But it still gets used in virtual environment because it's one of those things that everybody supports. The whole thing should be re-written, or at least parts of it should be seriously cleaned up. The 'current fdc' index, which is used by the FDCS macro, and which is often shadowed by a local 'fdc' variable, is a prime example of how not to write code. But because nobody has the hardware or the motivation, let's just fix up the immediate problem with a nasty band-aid: test the fdc index before actually assigning it to the static 'fdc' variable. Reported-by: Jordy Zomer <jordy@simplyhacker.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-24 11:25:33 -08:00
Linus Torvalds	c9d35ee049	Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs file system parameter updates from Al Viro: "Saner fs_parser.c guts and data structures. The system-wide registry of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is the horror switch() in fs_parse() that would have to grow another case every time something got added to that system-wide registry. New syntax types can be added by filesystems easily now, and their namespace is that of functions - not of system-wide enum members. IOW, they can be shared or kept private and if some turn out to be widely useful, we can make them common library helpers, etc., without having to do anything whatsoever to fs_parse() itself. And we already get that kind of requests - the thing that finally pushed me into doing that was "oh, and let's add one for timeouts - things like 15s or 2h". If some filesystem really wants that, let them do it. Without somebody having to play gatekeeper for the variants blessed by direct support in fs_parse(), TYVM. Quite a bit of boilerplate is gone. And IMO the data structures make a lot more sense now. -200LoC, while we are at it" * 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits) tmpfs: switch to use of invalfc() cgroup1: switch to use of errorfc() et.al. procfs: switch to use of invalfc() hugetlbfs: switch to use of invalfc() cramfs: switch to use of errofc() et.al. gfs2: switch to use of errorfc() et.al. fuse: switch to use errorfc() et.al. ceph: use errorfc() and friends instead of spelling the prefix out prefix-handling analogues of errorf() and friends turn fs_param_is_... into functions fs_parse: handle optional arguments sanely fs_parse: fold fs_parameter_desc/fs_parameter_spec fs_parser: remove fs_parameter_description name field add prefix to fs_context->log ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log new primitive: __fs_parse() switch rbd and libceph to p_log-based primitives struct p_log, variants of warnf() et.al. taking that one instead teach logfc() to handle prefices, give it saner calling conventions get rid of cg_invalf() ...	2020-02-08 13:26:41 -08:00
Linus Torvalds	e0f121c5cc	virtio: fixes, cleanups Some bug fixes/cleanups. Deprecated scsi passthrough for blk removed. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl49E/4PHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpyecH/AlBzCOlv9kBHKvx30h2QTgbvZlZM++SRQ18 XAuvU/gRVTPLeSsXnJGz0hMD8hxBti6esqvxHzSzs2a6DqkqLrRdnMXsjs6QlAdX 6NwP4VesL7RNKTAjjrtmXQMr8iADtTy8FKCw/sZM+6sqhPeKAzFbBrjfH6amINru orEF+eGwNXLkegK4+QVQx8f1rlIm7+/Z4lAP75FsaisYWLxklvn3VjZ7YjsCNexi 4zMxv64W8AHCRJK8k7/+vluedwwTghY9ayubw4zeRWmcfRw568bxlCZUQBmNWspC lj4/ZWzGmd60UQx9fUvEd7T6QCRzaaUYVzXC0Mh8I3pLV+V5tKY= =uqtg -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: "Some bug fixes/cleanups. The deprecated scsi passthrough for virtio_blk is removed" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_balloon: Fix memory leaks on errors in virtballoon_probe() virtio-balloon: Fix memory leak when unloading while hinting is in progress virtio_balloon: prevent pfn array overflow virtio-blk: remove VIRTIO_BLK_F_SCSI support virtio-pci: check name when counting MSI-X vectors virtio-balloon: initialize all vq callbacks virtio-mmio: convert to devm_platform_ioremap_resource	2020-02-07 12:26:34 -08:00
Al Viro	d7167b1499	fs_parse: fold fs_parameter_desc/fs_parameter_spec The former contains nothing but a pointer to an array of the latter... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-02-07 14:48:37 -05:00
Eric Sandeen	96cafb9ccb	fs_parser: remove fs_parameter_description name field Unused now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-02-07 14:48:36 -05:00
Al Viro	7f5d38141e	new primitive: __fs_parse() fs_parse() analogue taking p_log instead of fs_context. fs_parse() turned into a wrapper, callers in ceph_common and rbd switched to __fs_parse(). As the result, fs_parse() never gets NULL fs_context and neither do fs_context-based logging primitives Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-02-07 14:48:34 -05:00
Al Viro	2c3f3dc315	switch rbd and libceph to p_log-based primitives Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-02-07 14:48:33 -05:00
Al Viro	3fbb8d5554	struct p_log, variants of warnf() et.al. taking that one instead primitives for prefixed logging Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-02-07 14:48:32 -05:00
Al Viro	0f89589a8c	Pass consistent param->type to fs_parse() As it is, vfs_parse_fs_string() makes "foo" and "foo=" indistinguishable; both get fs_value_is_string for ->type and NULL for ->string. To make it even more unpleasant, that combination is impossible to produce with fsconfig(). Much saner rules would be "foo" => fs_value_is_flag, NULL "foo=" => fs_value_is_string, "" "foo=bar" => fs_value_is_string, "bar" All cases are distinguishable, all results are expressable by fsconfig(), ->has_value checks are much simpler that way (to the point of the field being useless) and quite a few regressions go away (gfs2 has no business accepting -o nodebug=, for example). Partially based upon patches from Miklos. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-02-07 00:10:29 -05:00
Linus Torvalds	4c46bef2e9	We have: - a set of patches that fixes various corner cases in mount and umount code (Xiubo Li). This has to do with choosing an MDS, distinguishing between laggy and down MDSes and parsing the server path. - inode initialization fixes (Jeff Layton). The one included here mostly concerns things like open_by_handle() and there is another one that will come through Al. - copy_file_range() now uses the new copy-from2 op (Luis Henriques). The existing copy-from op turned out to be infeasible for generic filesystem use; we disable the copy offload if OSDs don't support copy-from2. - a patch to link "rbd" and "block" devices together in sysfs (Hannes Reinecke) And a smattering of cleanups from Xiubo, Jeff and Chengguang. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl47PUcTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi6LoCACmVli5N6bgnBE4sTixi/jz6aCCbk32 ZPlKiSesHnOGkY6KXHJT58JYy0paITBRik5ypdz06J8aCOtWyPLbn3uCemF9CYn2 g6dId2Lf5vGFrgSm4YSiqp9a86IZmYSDG41LbJD/IJWFDWdMWqNPMDqji6yaIO5O NJI5N0tk+VFXdV+JyjV9X/FnP1r1D2ReZzz21ZiqTJXSmE8YIkioLjkq36QTMMG7 Gm5qdlc1x2r4qfzA1g+OiWgRQCUMgkuYerFzus4mVbW4hrphsavH2DArbOwFmsXF 46hOq+1uGVVyZILLJfKNiktf1GExBF0icbSREJtmjUHbQvNR8BH0C+fV =vvIc -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: - a set of patches that fixes various corner cases in mount and umount code (Xiubo Li). This has to do with choosing an MDS, distinguishing between laggy and down MDSes and parsing the server path. - inode initialization fixes (Jeff Layton). The one included here mostly concerns things like open_by_handle() and there is another one that will come through Al. - copy_file_range() now uses the new copy-from2 op (Luis Henriques). The existing copy-from op turned out to be infeasible for generic filesystem use; we disable the copy offload if OSDs don't support copy-from2. - a patch to link "rbd" and "block" devices together in sysfs (Hannes Reinecke) ... and a smattering of cleanups from Xiubo, Jeff and Chengguang. * tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client: (25 commits) rbd: set the 'device' link in sysfs ceph: move net/ceph/ceph_fs.c to fs/ceph/util.c ceph: print name of xattr in __ceph_{get,set}xattr() douts ceph: print r_direct_hash in hex in __choose_mds() dout ceph: use copy-from2 op in copy_file_range ceph: close holes in structs ceph_mds_session and ceph_mds_request rbd: work around -Wuninitialized warning ceph: allocate the correct amount of extra bytes for the session features ceph: rename get_session and switch to use ceph_get_mds_session ceph: remove the extra slashes in the server path ceph: add possible_max_rank and make the code more readable ceph: print dentry offset in hex and fix xattr_version type ceph: only touch the caps which have the subset mask requested ceph: don't clear I_NEW until inode metadata is fully populated ceph: retry the same mds later after the new session is opened ceph: check availability of mds cluster on mount after wait timeout ceph: keep the session state until it is released ceph: add __send_request helper ceph: ensure we have a new cap before continuing in fill_inode ceph: drop unused ttl_from parameter from fill_inode ...	2020-02-06 12:21:01 +00:00
Christoph Hellwig	782e067dba	virtio-blk: remove VIRTIO_BLK_F_SCSI support Since the need for a special flag to support SCSI passthrough on a block device was added in May 2017 the SCSI passthrough support in virtio-blk has been disabled. It has always been a bad idea (just ask the original author..) and we have virtio-scsi for proper passthrough. The feature also never made it into the virtio 1.0 or later specifications. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-06 03:40:26 -05:00
Linus Torvalds	ed535f2c9e	block-5.6-2020-02-05 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl47ML4QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpvm2EACGaxAxP7pLniNV30cRotF8lPpQ5nUrpiem H1r5WqeI5osCGkRKHaJQ4O0Sw8IV2pWzHTWz+9bv56zLM40yIMaEHLRU00AM047n KFdA2x4xH+HhbR9lF+flYz1oInlIEXxPiERKm/p1pvQEbquzi4X5cQqv6q2pdzJ9 sf8OBJhKs4rp/ooqzWwjVOeP/n1sT2r+XDg9C9WC5aXaVZbbLw50r1WRYFt1zf7N oa+91fq2lasxK1c79OtbbGJlBXWTurAtUaKBM0KKPguiH2h9j47pAs0HsV02kZ2M 1ZltwKTyfDNMzBEgvkdB3R0G9nU422nIF+w319i6on8P8xfz8Px13d1KCQGAmfD6 K1YuaCgOjWuVhOKpMwBq9ql6QVP+1LIMKIl2OGJkrBgl9ZzfE8KMZa2QZTGrGO/U xE/hirYdj5T1O8umUQ4cmZHTROASOJZ8/eU9XHA1vf/eJYXiS31/4ewgRzP3oGX2 5Jvz3o144nBeBTOiFlzs3Fe+wX63QABNG22bijzEGoNTxjXJFroBDYzeiOELjECZ /xGRZG1bLOGMj8Gg4ZADSILQDkqISsQHofl1I9mWTbwB1j7g69ZjV8Ie2dyMaX6b 5z5Smqzd9gcok9hr8NGWkV3c3NypPxIWxrOcyzYbGLUPDGqa+QjGtlLrGgeinhLM SitalHw0KA== =05d8 -----END PGP SIGNATURE----- Merge tag 'block-5.6-2020-02-05' of git://git.kernel.dk/linux-block Pull more block updates from Jens Axboe: "Some later arrivals, but all fixes at this point: - bcache fix series (Coly) - Series of BFQ fixes (Paolo) - NVMe pull request from Keith with a few minor NVMe fixes - Various little tweaks" * tag 'block-5.6-2020-02-05' of git://git.kernel.dk/linux-block: (23 commits) nvmet: update AEN list and array at one place nvmet: Fix controller use after free nvmet: Fix error print message at nvmet_install_queue function brd: check and limit max_part par nvme-pci: remove nvmeq->tags nvmet: fix dsm failure when payload does not match sgl descriptor nvmet: Pass lockdep expression to RCU lists block, bfq: clarify the goal of bfq_split_bfqq() block, bfq: get a ref to a group when adding it to a service tree block, bfq: remove ifdefs from around gets/puts of bfq groups block, bfq: extend incomplete name of field on_st block, bfq: get extra ref to prevent a queue from being freed during a group move block, bfq: do not insert oom queue into position tree block, bfq: do not plug I/O for bfq_queues with no proc refs bcache: check return value of prio_read() bcache: fix incorrect data type usage in btree_flush_write() bcache: add readahead cache policy options via sysfs interface bcache: explicity type cast in bset_bkey_last() bcache: fix memory corruption in bch_cache_accounting_clear() xen/blkfront: limit allocated memory size to actual use case ...	2020-02-06 06:15:23 +00:00
Linus Torvalds	d271ab2923	xen: branch for v5.6-rc1 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXjrKegAKCRCAXGG7T9hj vkAzAQDtV8yxItCMTC/0vxMZnBUk7t+KFuSg7UIoWkwHPvd2CQEAjlhWeX0u3z9D uxwmxdjri1nlrTJBulbvCkJuTfZDYwo= =8Q8T -----END PGP SIGNATURE----- Merge tag 'for-linus-5.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - fix a bug introduced in 5.5 in the Xen gntdev driver - fix the Xen balloon driver when running on ancient Xen versions - allow Xen stubdoms to control interrupt enable flags of passed-through PCI cards - release resources in Xen backends under memory pressure * tag 'for-linus-5.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/blkback: Consistently insert one empty line between functions xen/blkback: Remove unnecessary static variable name prefixes xen/blkback: Squeeze page pools if a memory pressure is detected xenbus/backend: Protect xenbus callback with lock xenbus/backend: Add memory pressure handler callback xen/gntdev: Do not use mm notifiers with autotranslating guests xen/balloon: Support xend-based toolstack take two xen-pciback: optionally allow interrupt enable flag writes	2020-02-05 17:44:14 +00:00
Zhiqiang Liu	c8ab422553	brd: check and limit max_part par In brd_init func, rd_nr num of brd_device are firstly allocated and add in brd_devices, then brd_devices are traversed to add each brd_device by calling add_disk func. When allocating brd_device, the disk->first_minor is set to i * max_part, if rd_nr * max_part is larger than MINORMASK, two different brd_device may have the same devt, then only one of them can be successfully added. when rmmod brd.ko, it will cause oops when calling brd_exit. Follow those steps: # modprobe brd rd_nr=3 rd_size=102400 max_part=1048576 # rmmod brd then, the oops will appear. Oops log: [ 726.613722] Call trace: [ 726.614175] kernfs_find_ns+0x24/0x130 [ 726.614852] kernfs_find_and_get_ns+0x44/0x68 [ 726.615749] sysfs_remove_group+0x38/0xb0 [ 726.616520] blk_trace_remove_sysfs+0x1c/0x28 [ 726.617320] blk_unregister_queue+0x98/0x100 [ 726.618105] del_gendisk+0x144/0x2b8 [ 726.618759] brd_exit+0x68/0x560 [brd] [ 726.619501] __arm64_sys_delete_module+0x19c/0x2a0 [ 726.620384] el0_svc_common+0x78/0x130 [ 726.621057] el0_svc_handler+0x38/0x78 [ 726.621738] el0_svc+0x8/0xc [ 726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260) Here, we add brd_check_and_reset_par func to check and limit max_part par. -- V5->V6: - remove useless code V4->V5:(suggested by Ming Lei) - make sure max_part is not larger than DISK_MAX_PARTS V3->V4:(suggested by Ming Lei) - remove useless change - add one limit of max_part V2->V3: (suggested by Ming Lei) - clear .minors when running out of consecutive minor space in brd_alloc - remove limit of rd_nr V1->V2: - add more checks in brd_check_par_valid as suggested by Ming Lei. Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-02-04 07:19:33 -07:00
Andrew Morton	046755a28f	drivers/block/null_blk_main.c: fix uninitialized var warnings With gcc-7.2, many instances of drivers/block/null_blk_main.c: In function ‘nullb_device_zone_nr_conv_store’: drivers/block/null_blk_main.c:291:12: warning: ‘new_value’ may be used uninitialized in this function [-Wmaybe-uninitialized] dev->NAME = new_value; \ ^ drivers/block/null_blk_main.c:279:7: note: ‘new_value’ was declared here TYPE new_value; \ ^ Presumably notabug, so use uninitialized_var() to suppress them. Cc: Shaohua Li <shli@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-04 03:05:24 +00:00
Andrew Morton	ca0a95a6ac	drivers/block/null_blk_main.c: fix layout Each line here overflows 80 cols by exactly one character. Delete one tab per line to fix. Cc: Shaohua Li <shli@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-02-04 03:05:24 +00:00
Colin Ian King	3b82a051c1	drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store Currently when an error code -EIO or -ENOSPC in the for-loop of writeback_store the error code is being overwritten by a ret = len assignment at the end of the function and the error codes are being lost. Fix this by assigning ret = len at the start of the function and remove the assignment from the end, hence allowing ret to be preserved when error codes are assigned to it. Addresses Coverity ("Unused value") Link: http://lkml.kernel.org/r/20191128122958.178290-1-colin.king@canonical.com Fixes: `a939888ec3` ("zram: support idle/huge page writeback") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-31 10:30:39 -08:00
Taejoon Song	90f82cbfe5	zram: try to avoid worst-case scenario on same element pages The worst-case scenario on finding same element pages is that almost all elements are same at the first glance but only last few elements are different. Since the same element tends to be grouped from the beginning of the pages, if we check the first element with the last element before looping through all elements, we might have some chances to quickly detect non-same element pages. 1. Test is done under LG webOS TV (64-bit arch) 2. Dump the swap-out pages (~819200 pages) 3. Analyze the pages with simple test script which counts the iteration number and measures the speed at off-line Under 64-bit arch, the worst iteration count is PAGE_SIZE / 8 bytes = 512. The speed is based on the time to consume page_same_filled() function only. The result, on average, is listed as below: Num of Iter Speed(MB/s) Looping-Forward (Orig) 38 99265 Looping-Backward 36 102725 Last-element-check (This Patch) 33 125072 The result shows that the average iteration count decreases by 13% and the speed increases by 25% with this patch. This patch does not increase the overall time complexity, though. I also ran simpler version which uses backward loop. Just looping backward also makes some improvement, but less than this patch. [taejoon.song@lge.com: fix off-by-one] Link: http://lkml.kernel.org/r/1578642001-11765-1-git-send-email-taejoon.song@lge.com Link: http://lkml.kernel.org/r/1575424418-16119-1-git-send-email-taejoon.song@lge.com Signed-off-by: Taejoon Song <taejoon.song@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-31 10:30:39 -08:00
Juergen Gross	0265d6e8dd	xen/blkfront: limit allocated memory size to actual use case Today the Xen blkfront driver allocates memory for one struct blkfront_ring_info for each communication ring. This structure is statically sized for the maximum supported configuration resulting in a size of more than 90 kB. As the main size contributor is one array inside the struct, the memory allocation can easily be limited by moving this array to be the last structure element and to allocate only the memory for the actually needed array size. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-01-29 21:13:18 -07:00
Sun Ke	5c0dd228b5	nbd: add a flush_workqueue in nbd_start_device When kzalloc fail, may cause trying to destroy the workqueue from inside the workqueue. If num_connections is m (2 < m), and NO.1 ~ NO.n (1 < n < m) kzalloc are successful. The NO.(n + 1) failed. Then, nbd_start_device will return ENOMEM to nbd_start_device_ioctl, and nbd_start_device_ioctl will return immediately without running flush_workqueue. However, we still have n recv threads. If nbd_release run first, recv threads may have to drop the last config_refs and try to destroy the workqueue from inside the workqueue. To fix it, add a flush_workqueue in nbd_start_device. Fixes: `e9e006f5fc` ("nbd: fix max number of supported devs") Signed-off-by: Sun Ke <sunke32@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-01-29 21:05:53 -07:00
Stephen Kitt	6a365874a4	drbd: fifo_alloc() should use struct_size Switching to struct_size for the allocation in fifo_alloc avoids hard-coding the type of fifo_buffer.values in fifo_alloc. It also provides overflow protection; to avoid pessimistic code being generated by the compiler as a result, this patch also switches fifo_size to unsigned, propagating the change as appropriate. Reviewed-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Stephen Kitt <steve@sk2.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-01-29 21:03:33 -07:00
Linus Torvalds	33c84e89ab	SCSI misc on 20200129 This series is slightly unusual because it includes Arnd's compat ioctl tree here: `1c46a2cf2d` Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue Excluding Arnd's changes, this is mostly an update of the usual drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas. There are a couple of core and base updates around error propagation and atomicity in the attribute container base we use for the SCSI transport classes. The rest is minor changes and updates. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXjHQJyYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZZ8AQC02N+v iUnTl1YxGPjIWBbnHuUxN2Qbb9D3C6gAT1LkigEArlk163K3A1XEQHF/VNCdAz/f 01XYTd3p1VHuegIBHlk= =Cn52 -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This series is slightly unusual because it includes Arnd's compat ioctl tree here: `1c46a2cf2d` Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue Excluding Arnd's changes, this is mostly an update of the usual drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas. There are a couple of core and base updates around error propagation and atomicity in the attribute container base we use for the SCSI transport classes. The rest is minor changes and updates" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits) scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity scsi: hisi_sas: Modify the file permissions of trigger_dump to write only scsi: hisi_sas: Replace magic number when handle channel interrupt scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock scsi: hisi_sas: use threaded irq to process CQ interrupts scsi: ufs: Use UFS device indicated maximum LU number scsi: ufs: Add max_lu_supported in struct ufs_dev_info scsi: ufs: Delete is_init_prefetch from struct ufs_hba scsi: ufs: Inline two functions into their callers scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init() scsi: ufs: Split ufshcd_probe_hba() based on its called flow scsi: ufs: Delete struct ufs_dev_desc scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails scsi: ufs-mediatek: enable low-power mode for hibern8 state scsi: ufs: export some functions for vendor usage scsi: ufs-mediatek: add dbg_register_dump implementation scsi: qla2xxx: Fix a NULL pointer dereference in an error path scsi: qla1280: Make checking for 64bit support consistent scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1 ...	2020-01-29 18:16:16 -08:00
SeongJae Park	8557bbe515	xen/blkback: Consistently insert one empty line between functions The number of empty lines between functions in the xenbus.c is inconsistent. This trivial style cleanup commit fixes the file to consistently place only one empty line. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2020-01-29 07:35:49 -06:00
SeongJae Park	823f209146	xen/blkback: Remove unnecessary static variable name prefixes A few of static variables in blkback have 'xen_blkif_' prefix, though it is unnecessary for static variables. This commit removes such prefixes. Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2020-01-29 07:35:49 -06:00
SeongJae Park	cb9369bdbb	xen/blkback: Squeeze page pools if a memory pressure is detected Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and is increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, host administrators can cause memory pressure in blkback by attaching a large number of block devices and inducing I/O. Such problematic situations can be avoided by limiting the maximum number of devices that can be attached, but finding the optimal limit is not so easy. Improper set of the limit can results in memory pressure or a resource underutilization. This commit avoids such problematic situations by squeezing the pools (returns every free page in the pool to the system) for a while (users can set this duration via a module parameter) if memory pressure is detected. Discussions =========== The `blkback`'s original shrinking mechanism returns only pages in the pool which are not currently be used by `blkback` to the system. In other words, the pages that are not mapped with granted pages. Because this commit is changing only the shrink limit but still uses the same freeing mechanism it does not touch pages which are currently mapping grants. Once memory pressure is detected, this commit keeps the squeezing limit for a user-specified time duration. The duration should be neither too long nor too short. If it is too long, the squeezing incurring overhead can reduce the I/O performance. If it is too short, `blkback` will not free enough pages to reduce the memory pressure. This commit sets the value as `10 milliseconds` by default because it is a short time in terms of I/O while it is a long time in terms of memory operations. Also, as the original shrinking mechanism works for at least every 100 milliseconds, this could be a somewhat reasonable choice. I also tested other durations (refer to the below section for more details) and confirmed that 10 milliseconds is the one that works best with the test. That said, the proper duration depends on actual configurations and workloads. That's why this commit allows users to set the duration as a module parameter. Memory Pressure Test ==================== To show how this commit fixes the memory pressure situation well, I configured a test environment on a xen-running virtualization system. On the `blkfront` running guest instances, I attach a large number of network-backed volume devices and induce I/O to those. Meanwhile, I measure the number of pages that swapped in (pswpin) and out (pswpout) on the `blkback` running guest. The test ran twice, once for the `blkback` before this commit and once for that after this commit. As shown below, this commit has dramatically reduced the memory pressure: pswpin pswpout before 76,672 185,799 after 867 3,967 Optimal Aggressive Shrinking Duration ------------------------------------- To find a best squeezing duration, I repeated the test with three different durations (1ms, 10ms, and 100ms). The results are as below: duration pswpin pswpout 1 707 5,095 10 867 3,967 100 362 3,348 As expected, the memory pressure decreases as the duration increases, but the reduction become slow from the `10ms`. Based on this results, I chose the default duration as 10ms. Performance Overhead Test ========================= This commit could incur I/O performance degradation under severe memory pressure because the squeezing will require more page allocations per I/O. To show the overhead, I artificially made a worst-case squeezing situation and measured the I/O performance of a `blkfront` running guest. For the artificial squeezing, I set the `blkback.max_buffer_pages` using the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. In this test, I set the value to `1024` and `0`. The `1024` is the default value. Setting the value as `0` is same to a situation doing the squeezing always (worst-case). If the underlying block device is slow enough, the squeezing overhead could be hidden. For the reason, I use a fast block device, namely the rbd[1]: # xl block-attach guest phy:/dev/ram0 xvdb w For the I/O performance measurement, I run a simple `dd` command 5 times directly to the device as below and collect the 'MB/s' results. $ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \ bs=4k count=$((256*512)); sync; done The results are as below. 'max_pgs' represents the value of the `blkback.max_buffer_pages` parameter. max_pgs Min Max Median Avg Stddev 0 417 423 420 419.4 2.5099801 1024 414 425 416 417.8 4.4384682 No difference proven at 95.0% confidence In short, even worst case squeezing on ramdisk based fast block device makes no visible performance degradation. Please note that this is just a very simple and minimal test. On systems using super-fast block devices and a special I/O workload, the results might be different. If you have any doubt, test on your machine with your workload to find the optimal squeezing duration for you. [1] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2020-01-29 07:35:49 -06:00
Linus Torvalds	6a1000bd27	ioremap changes for 5.6 - remove ioremap_nocache given that is is equivalent to ioremap everywhere -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl4vKHwLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYMPGBAAuVNUZaZfWYHpiVP2oRcUQUguFiD3NTbknsyzV2oH J9P0GfeENSKwE9OOhZ7XIjnCZAJwQgTK/ppQY5yiQ/KAtYyyXjXEJ6jqqjiTDInr +3+I3t/LhkgrK7tMrb7ylTGa/d7KhaciljnOXC8+b75iddvM9I1z2pbHDbppZMS9 wT4RXL/cFtRb85AfOyPLybcka3f5P2gGvQz38qyimhJYEzHDXZu9VO1Bd20f8+Xf eLBKX0o6yWMhcaPLma8tm0M0zaXHEfLHUKLSOkiOk+eHTWBZ3b/w5nsOQZYZ7uQp 25yaClbameAn7k5dHajduLGEJv//ZjLRWcN3HJWJ5vzO111aHhswpE7JgTZJSVWI ggCVkytD3ESXapvswmACSeCIDMmiJMzvn6JvwuSMVB7a6e5mcqTuGo/FN+DrBF/R IP+/gY/T7zIIOaljhQVkiEIIwiD/akYo0V9fheHTBnqcKEDTHV4WjKbeF6aCwcO+ b8inHyXZSKSMG//UlDuN84/KH/o1l62oKaB1uDIYrrL8JVyjAxctWt3GOt5KgSFq wVz1lMw4kIvWtC/Sy2H4oB+RtODLp6yJDqmvmPkeJwKDUcd/1JKf0KsZ8j3FpGei /rEkBEss0KBKyFAgBSRO2jIpdj2epgcBcsdB/r5mlhcn8L77AS6mHbA173kY4pQ/ Kdg= =TUCJ -----END PGP SIGNATURE----- Merge tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap Pull ioremap updates from Christoph Hellwig: "Remove the ioremap_nocache API (plus wrappers) that are always identical to ioremap" * tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap: remove ioremap_nocache and devm_ioremap_nocache MIPS: define ioremap_nocache to ioremap	2020-01-27 13:03:00 -08:00

1 2 3 4 5 ...

6784 Commits