linux-sg2042/drivers/md
Song Liu 1e6d690b93 md/r5cache: caching phase of r5cache
As described in previous patch, write back cache operates in two
phases: caching and writing-out. The caching phase works as:
1. write data to journal
   (r5c_handle_stripe_dirtying, r5c_cache_data)
2. call bio_endio
   (r5c_handle_data_cached, r5c_return_dev_pending_writes).

Then the writing-out phase is as:
1. Mark the stripe as write-out (r5c_make_stripe_write_out)
2. Calcualte parity (reconstruct or RMW)
3. Write parity (and maybe some other data) to journal device
4. Write data and parity to RAID disks

This patch implements caching phase. The cache is integrated with
stripe cache of raid456. It leverages code of r5l_log to write
data to journal device.

Writing-out phase of the cache is implemented in the next patch.

With r5cache, write operation does not wait for parity calculation
and write out, so the write latency is lower (1 write to journal
device vs. read and then write to raid disks). Also, r5cache will
reduce RAID overhead (multipile IO due to read-modify-write of
parity) and provide more opportunities of full stripe writes.

This patch adds 2 flags to stripe_head.state:
 - STRIPE_R5C_PARTIAL_STRIPE,
 - STRIPE_R5C_FULL_STRIPE,

Instead of inactive_list, stripes with cached data are tracked in
r5conf->r5c_full_stripe_list and r5conf->r5c_partial_stripe_list.
STRIPE_R5C_FULL_STRIPE and STRIPE_R5C_PARTIAL_STRIPE are flags for
stripes in these lists. Note: stripes in r5c_full/partial_stripe_list
are not considered as "active".

For RMW, the code allocates an extra page for each data block
being updated.  This is stored in r5dev->orig_page and the old data
is read into it.  Then the prexor calculation subtracts ->orig_page
from the parity block, and the reconstruct calculation adds the
->page data back into the parity block.

r5cache naturally excludes SkipCopy. When the array has write back
cache, async_copy_data() will not skip copy.

There are some known limitations of the cache implementation:

1. Write cache only covers full page writes (R5_OVERWRITE). Writes
   of smaller granularity are write through.
2. Only one log io (sh->log_io) for each stripe at anytime. Later
   writes for the same stripe have to wait. This can be improved by
   moving log_io to r5dev.
3. With writeback cache, read path must enter state machine, which
   is a significant bottleneck for some workloads.
4. There is no per stripe checkpoint (with r5l_payload_flush) in
   the log, so recovery code has to replay more than necessary data
   (sometimes all the log from last_checkpoint). This reduces
   availability of the array.

This patch includes a fix proposed by ZhengYuan Liu
<liuzhengyuan@kylinos.cn>

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-18 13:26:30 -08:00
..
bcache block: export bio_free_pages to other modules 2016-09-22 07:48:03 -06:00
persistent-data dm array: introduce cursor api 2016-09-22 11:15:04 -04:00
Kconfig dm: add missing newline between DM_DEBUG_BLOCK_STACK_TRACING and DM_BUFIO 2016-03-10 17:12:11 -05:00
Makefile dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
bitmap.c md/bitmap: add blktrace event for writes to the bitmap 2016-11-18 09:34:45 -08:00
bitmap.h md-cluster: sync bitmap when node received RESYNCING msg 2016-05-04 12:39:35 -07:00
dm-bio-prison.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm-bio-prison.h dm bio prison: add dm_cell_promote_or_release() 2015-05-29 14:19:06 -04:00
dm-bio-record.h dm: Refactor for new bio cloning/splitting 2013-11-23 22:33:55 -08:00
dm-bufio.c . various fixes and cleanups for request-based DM core 2016-10-09 17:16:18 -07:00
dm-bufio.h dm snapshot: use dm-bufio prefetch 2014-01-14 23:23:03 -05:00
dm-builtin.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-cache-block-types.h dm cache: revert "remove remainder of distinct discard block size" 2014-11-10 15:25:30 -05:00
dm-cache-metadata.c dm cache metadata: switch to using the new cursor api for loading metadata 2016-09-22 11:15:05 -04:00
dm-cache-metadata.h dm cache: make sure every metadata function checks fail_io 2016-03-10 17:12:12 -05:00
dm-cache-policy-cleaner.c dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-policy-internal.h dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-policy-smq.c dm cache policy smq: distribute entries to random levels when switching to smq 2016-09-22 11:15:03 -04:00
dm-cache-policy.c dm cache: add policy name to status output 2014-01-16 13:44:11 -05:00
dm-cache-policy.h dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-target.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-core.h dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-crypt.c . various fixes and cleanups for request-based DM core 2016-10-09 17:16:18 -07:00
dm-delay.c dm: rename target's per_bio_data_size to per_io_data_size 2016-02-22 22:34:37 -05:00
dm-era-target.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-exception-store.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-exception-store.h dm snapshot: fix hung bios when copy error occurs 2016-01-08 20:03:05 -05:00
dm-flakey.c dm flakey: fix reads to be issued if drop_writes configured 2016-08-24 21:55:05 -04:00
dm-io.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-ioctl.c dm: allow bio-based table to be upgraded to bio-based with DAX support 2016-07-20 23:49:52 -04:00
dm-kcopyd.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-linear.c libnvdimm for 4.8 2016-07-28 17:38:16 -07:00
dm-log-userspace-base.c dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() 2015-10-31 19:06:00 -04:00
dm-log-userspace-transfer.c dm log userspace transfer: match wait_for_completion_timeout return type 2015-04-15 12:10:20 -04:00
dm-log-userspace-transfer.h
dm-log-writes.c Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-block 2016-10-07 14:42:05 -07:00
dm-log.c dm log: fix unitialized bio operation flags 2016-08-24 21:55:05 -04:00
dm-mpath.c dm mpath: always return reservation conflict without failing over 2016-09-29 10:57:07 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h dm path selector: remove 'repeat_count' return from .select_path hook 2016-02-22 22:34:42 -05:00
dm-queue-length.c dm path selector: remove 'repeat_count' return from .select_path hook 2016-02-22 22:34:42 -05:00
dm-raid.c dm raid: fix activation of existing raid4/10 devices 2016-10-17 16:41:31 -04:00
dm-raid1.c dm mirror: use all available legs on multiple failures 2016-10-14 11:55:17 -04:00
dm-region-hash.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-round-robin.c dm round robin: do not use this_cpu_ptr() without having preemption disabled 2016-08-15 09:23:14 -04:00
dm-rq.c - A couple DM raid and DM mirror fixes 2016-10-28 09:27:58 -07:00
dm-rq.h dm rq: introduce dm_mq_kick_requeue_list() 2016-09-15 11:16:05 -04:00
dm-service-time.c dm path selector: remove 'repeat_count' return from .select_path hook 2016-02-22 22:34:42 -05:00
dm-snap-persistent.c dm: use bio op accessors 2016-06-07 13:41:38 -06:00
dm-snap-transient.c dm snapshot: fix hung bios when copy error occurs 2016-01-08 20:03:05 -05:00
dm-snap.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-stats.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-stats.h dm stats: support precise timestamps 2015-06-17 12:40:40 -04:00
dm-stripe.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-switch.c dm switch: simplify conditional in alloc_region_table() 2015-10-31 19:06:06 -04:00
dm-sysfs.c dm: move request-based code out to dm-rq.[hc] 2016-06-10 15:15:44 -04:00
dm-table.c dm table: fix missing dm_put_target_type() in dm_table_add_target() 2016-10-24 11:17:46 -04:00
dm-target.c libnvdimm for 4.8 2016-07-28 17:38:16 -07:00
dm-thin-metadata.c dm thin: fix a race condition between discarding and provisioning a block 2016-07-20 12:43:35 -04:00
dm-thin-metadata.h dm thin: fix a race condition between discarding and provisioning a block 2016-07-20 12:43:35 -04:00
dm-thin.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-uevent.c
dm-uevent.h
dm-verity-fec.c dm verity fec: fix block calculation 2016-07-01 23:29:08 -04:00
dm-verity-fec.h dm verity: add support for forward error correction 2015-12-10 10:39:03 -05:00
dm-verity-target.c dm: rename target's per_bio_data_size to per_io_data_size 2016-02-22 22:34:37 -05:00
dm-verity.h dm verity: add ignore_zero_blocks feature 2015-12-10 10:39:03 -05:00
dm-zero.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm.c - A couple DM raid and DM mirror fixes 2016-10-28 09:27:58 -07:00
dm.h dm: add infrastructure for DAX support 2016-07-20 23:49:49 -04:00
faulty.c MD: rename some functions 2016-01-20 13:52:20 -08:00
linear.c md: add block tracing for bio_remapping 2016-11-18 09:32:50 -08:00
linear.h
md-cluster.c md-cluster: make resync lock also could be interruptted 2016-09-21 09:09:44 -07:00
md-cluster.h md-cluster: gather resync infos and enable recv_thread after bitmap is ready 2016-05-09 09:24:03 -07:00
md.c md: add blktrace event for writes to superblock 2016-11-18 09:47:57 -08:00
md.h md: define mddev flags, recovery flags and r1bio state bits using enums 2016-11-09 12:53:52 -08:00
multipath.c md/multipath: replace printk() with pr_*() 2016-11-07 15:08:22 -08:00
multipath.h
raid0.c md: add block tracing for bio_remapping 2016-11-18 09:32:50 -08:00
raid0.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
raid1.c md/raid1, raid10: add blktrace records when IO is delayed 2016-11-18 09:35:37 -08:00
raid1.h md: define mddev flags, recovery flags and r1bio state bits using enums 2016-11-09 12:53:52 -08:00
raid5-cache.c md/r5cache: caching phase of r5cache 2016-11-18 13:26:30 -08:00
raid5.c md/r5cache: caching phase of r5cache 2016-11-18 13:26:30 -08:00
raid5.h md/r5cache: caching phase of r5cache 2016-11-18 13:26:30 -08:00
raid10.c md/raid1, raid10: add blktrace records when IO is delayed 2016-11-18 09:35:37 -08:00
raid10.h raid10: improve random reads performance 2016-07-19 15:20:28 -07:00