linux-sg2042/drivers
Minchan Kim 3c9959e025 zram: fix lockdep warning of free block handling
Patch series "zram idle page writeback", v3.

Inherently, swap device has many idle pages which are rare touched since
it was allocated.  It is never problem if we use storage device as swap.
However, it's just waste for zram-swap.

This patchset supports zram idle page writeback feature.

* Admin can define what is idle page "no access since X time ago"
* Admin can define when zram should writeback them
* Admin can define when zram should stop writeback to prevent wearout

Details are in each patch's description.

This patch (of 7):

  ================================
  WARNING: inconsistent lock state
  4.19.0+ #390 Not tainted
  --------------------------------
  inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
  zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
  00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
  {SOFTIRQ-ON-W} state was registered at:
    _raw_spin_lock+0x2c/0x40
    zram_make_request+0x755/0xdc9
    generic_make_request+0x373/0x6a0
    submit_bio+0x6c/0x140
    __swap_writepage+0x3a8/0x480
    shrink_page_list+0x1102/0x1a60
    shrink_inactive_list+0x21b/0x3f0
    shrink_node_memcg.constprop.99+0x4f8/0x7e0
    shrink_node+0x7d/0x2f0
    do_try_to_free_pages+0xe0/0x300
    try_to_free_pages+0x116/0x2b0
    __alloc_pages_slowpath+0x3f4/0xf80
    __alloc_pages_nodemask+0x2a2/0x2f0
    __handle_mm_fault+0x42e/0xb50
    handle_mm_fault+0x55/0xb0
    __do_page_fault+0x235/0x4b0
    page_fault+0x1e/0x30
  irq event stamp: 228412
  hardirqs last  enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
  hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
  softirqs last  enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
  softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&zram->bitmap_lock)->rlock);
    <Interrupt>
      lock(&(&zram->bitmap_lock)->rlock);

   *** DEADLOCK ***

  no locks held by zram_verify/2095.

  stack backtrace:
  CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
  Call Trace:
   <IRQ>
   dump_stack+0x67/0x9b
   print_usage_bug+0x1bd/0x1d3
   mark_lock+0x4aa/0x540
   __lock_acquire+0x51d/0x1300
   lock_acquire+0x90/0x180
   _raw_spin_lock+0x2c/0x40
   put_entry_bdev+0x1e/0x50
   zram_free_page+0xf6/0x110
   zram_slot_free_notify+0x42/0xa0
   end_swap_bio_read+0x5b/0x170
   blk_update_request+0x8f/0x340
   scsi_end_request+0x2c/0x1e0
   scsi_io_completion+0x98/0x650
   blk_done_softirq+0x9e/0xd0
   __do_softirq+0xcc/0x427
   irq_exit+0xd1/0xe0
   do_IRQ+0x93/0x120
   common_interrupt+0xf/0xf
   </IRQ>

With writeback feature, zram_slot_free_notify could be called in softirq
context by end_swap_bio_read.  However, bitmap_lock is not aware of that
so lockdep yell out:

  get_entry_bdev
  spin_lock(bitmap->lock);
  irq
  softirq
  end_swap_bio_read
  zram_slot_free_notify
  zram_slot_lock <-- deadlock prone
  zram_free_page
  put_entry_bdev
  spin_lock(bitmap->lock); <-- deadlock prone

With akpm's suggestion (i.e.  bitmap operation is already atomic), we
could remove bitmap lock.  It might fail to find a empty slot if serious
contention happens.  However, it's not severe problem because huge page
writeback has already possiblity to fail if there is severe memory
pressure.  Worst case is just keeping the incompressible in memory, not
storage.

The other problem is zram_slot_lock in zram_slot_slot_free_notify.  To
make it safe is this patch introduces zram_slot_trylock where
zram_slot_free_notify uses it.  Although it's rare to be contented, this
patch adds new debug stat "miss_free" to keep monitoring how often it
happens.

Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:49 -08:00
..
accessibility
acpi pstore improvements and refactorings 2018-12-27 11:15:21 -08:00
amba
android binder: fix race that allows malicious free of live buffer 2018-11-26 20:01:47 +01:00
ata libata: whitelist all SAMSUNG MZ7KM* solid-state disks 2018-12-03 12:54:39 -07:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-28 22:10:54 -08:00
auxdisplay auxdisplay: charlcd: fix x/y command parsing 2018-12-21 21:27:21 +01:00
base drivers/base/memory.c: remove an unnecessary check on NR_MEM_SECTIONS 2018-12-28 12:11:48 -08:00
bcma
block zram: fix lockdep warning of free block handling 2018-12-28 12:11:49 -08:00
bluetooth Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading 2018-12-19 13:43:42 +01:00
bus ARM: SoC driver updates for 4.17 2018-10-29 15:16:01 -07:00
cdrom gdrom: fix mistake in assignment of error 2018-10-25 11:17:40 -06:00
char mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
clk Merge branch 'clk-imx7ulp' into clk-next 2018-12-14 14:03:38 -08:00
clocksource Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-12-25 15:44:08 -08:00
connector
cpufreq powerpc updates for 4.21 2018-12-27 10:43:24 -08:00
cpuidle powerpc updates for 4.21 2018-12-27 10:43:24 -08:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-12-27 13:53:32 -08:00
dax mm, devm_memremap_pages: fix shutdown handling 2018-12-28 12:11:47 -08:00
dca
devfreq PM / devfreq: add devfreq_suspend/resume() functions 2018-12-11 11:40:13 +09:00
dio
dma dmaengine: dw: Fix FIFO size for Intel Merrifield 2018-12-06 22:53:05 +05:30
dma-buf dma-buf: add dma_fence_get_stub 2018-12-03 17:40:18 +01:00
edac EDAC, fsl_ddr: Add LS1021A to the list of supported hardware 2018-12-19 11:57:45 +01:00
eisa
extcon
firewire
firmware pstore improvements and refactorings 2018-12-27 11:15:21 -08:00
fmc
fpga fpga: add devm_fpga_region_create 2018-10-16 11:13:50 +02:00
fsi fsi: fsi-scom.c: Remove duplicate header 2018-11-26 10:13:04 +11:00
gnss gnss: sirf: fix activation retry handling 2018-12-06 17:22:23 +01:00
gpio regmap: Updates for v4.21 2018-12-25 14:48:06 -08:00
gpu mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2018-12-10 11:04:41 -08:00
hsi
hv mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hwmon Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-12-26 16:12:50 -08:00
hwspinlock
hwtracing stm class: Use memcat_p() 2018-10-11 12:12:55 +02:00
i2c platform-drivers-x86 for v4.21-1 2018-12-25 11:04:17 -08:00
i3c i3c: master: cdns: fix I2C transfers in Cadence I3C master driver 2018-12-12 17:08:32 +01:00
ide powerpc updates for 4.21 2018-12-27 10:43:24 -08:00
idle Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
iio platform-drivers-x86 for v4.21-1 2018-12-25 11:04:17 -08:00
infiniband Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-20 11:53:36 -08:00
input Merge branch 'parisc-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2018-12-26 11:14:52 -08:00
iommu iommu/vt-d: Use memunmap to free memremap 2018-11-22 17:02:21 +01:00
ipack
irqchip irqchip updates for 4.21 2018-12-18 18:37:27 +01:00
isdn isdn/hisax: remove set but not used variable 'total' 2018-11-16 23:02:50 -08:00
leds LEDs for 4.21-rc1 2018-12-25 14:52:50 -08:00
lightnvm lightnvm: pblk: guarantee that backpointer is respected on writer stall 2018-10-09 08:25:08 -06:00
macintosh macintosh: Use of_node_name_{eq, prefix} for node name comparisons 2018-12-22 21:29:56 +11:00
mailbox - Convert print users to use the %pOFn format specifier 2018-10-29 10:30:44 -07:00
mcb
md mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
media mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
memory
memstick
message
mfd mfd: axp20x: use explicit bit defines 2018-12-13 16:40:03 +00:00
misc mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
mmc mmc: core: Use a minimum 1600ms timeout when enabling CACHE ctrl 2018-12-17 08:59:42 +01:00
mtd spi: Updates for v4.21 2018-12-25 14:43:54 -08:00
mux This is the bulk of GPIO changes for the v4.20 series: 2018-10-23 08:45:05 +01:00
net Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-12-27 13:53:32 -08:00
nfc NFC: nfcmrvl_uart: fix OF child-node lookup 2018-10-23 13:28:53 -05:00
ntb ntb: idt: Alter the driver info comments 2018-11-01 10:33:12 -04:00
nubus
nvdimm mm, devm_memremap_pages: fix shutdown handling 2018-12-28 12:11:47 -08:00
nvme nvmet-rdma: fix response use after free 2018-12-07 07:11:11 -08:00
nvmem nvmem: core: fix regression in of_nvmem_cell_get() 2018-11-11 09:15:29 -08:00
of Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-12-27 13:04:52 -08:00
opp Merge branch 'opp/genpd/propagation' into opp/linux-next 2018-12-14 16:28:52 +05:30
oprofile
parisc mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
parport
pci mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL 2018-12-28 12:11:48 -08:00
pcmcia powerpc updates for 4.20 2018-10-26 14:36:21 -07:00
perf drivers/perf: Add Cavium ThunderX2 SoC UNCORE PMU driver 2018-12-06 13:03:17 +00:00
phy phy: qcom-qusb2: Fix HSTX_TRIM tuning with fused value for SDM845 2018-11-21 13:13:58 +05:30
pinctrl pinctrl: sunxi: a83t: Fix IRQ offset typo for PH11 2018-12-07 13:32:19 +01:00
platform Here's the main MIPS pull for Linux 4.21. Core architecture changes 2018-12-26 10:45:33 -08:00
pnp
power PM / AVS: SmartReflex: Switch to SPDX Licence ID 2018-12-12 13:54:28 +01:00
powercap Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
pps
ps3
ptp ptp: Fix pass zero to ERR_PTR() in ptp_clock_register 2018-11-23 18:04:02 -08:00
pwm pwm: imx: Add ipg clock operation 2018-12-24 12:06:56 +01:00
rapidio
ras
regulator Merge remote-tracking branch 'regulator/topic/coupled' into regulator-next 2018-12-21 13:43:35 +00:00
remoteproc virtio_ring: disable packed ring on unsupported transports 2018-11-26 22:17:40 -08:00
reset ARM: SoC driver updates for 4.17 2018-10-29 15:16:01 -07:00
rpmsg
rtc Staging and IIO driver fixes for 4.20-rc5 2018-11-30 12:23:44 -08:00
s390 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 21:43:31 -08:00
sbus Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next 2018-12-26 10:32:18 -08:00
scsi Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-12-27 13:04:52 -08:00
sfi mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
sh
siox
slimbus slimbus: ngd: remove unnecessary check 2018-11-07 14:59:28 +01:00
sn
soc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-12-27 13:04:52 -08:00
soundwire
spi spi: Updates for v4.21 2018-12-25 14:43:54 -08:00
spmi
ssb ssb: chipcommon: fix fall-through annotation 2018-10-05 11:37:20 +03:00
staging mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
target Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-12-27 13:04:52 -08:00
tc TC: Set DMA masks for devices 2018-10-11 09:16:44 -07:00
tee
thermal thermal: stm32: Fix stm_thermal_read_factory_settings 2018-12-10 20:15:28 -08:00
thunderbolt thunderbolt: Prevent root port runtime suspend during NVM upgrade 2018-11-26 20:38:49 +01:00
tty audit/stable-4.21 PR 20181224 2018-12-27 11:58:50 -08:00
uio uio_hv_generic: set callbacks on open 2018-12-11 14:23:17 +01:00
usb Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-12-27 13:53:32 -08:00
uwb
vfio vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver 2018-12-21 16:20:47 +11:00
vhost Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-12-27 13:04:52 -08:00
video drm pull request for 4.21-rc1 2018-12-25 11:48:26 -08:00
virt
virtio virtio_ring: advertize packed ring layout 2018-11-26 22:17:40 -08:00
visorbus
vlynq
vme
w1 w1: IAD Register is yet readable trough iad sys file. Fix snprintf (%u for unsigned, count for max size). 2018-10-15 20:50:32 +02:00
watchdog watchdog: ts4800: release syscon device node in ts4800_wdt_probe() 2018-10-22 10:16:28 +02:00
xen mm/memory_hotplug: drop "online" parameter from add_memory_resource() 2018-12-28 12:11:48 -08:00
zorro
Kconfig i3c: Add core I3C infrastructure 2018-11-12 10:33:49 +01:00
Makefile i3c: Add core I3C infrastructure 2018-11-12 10:33:49 +01:00