Commit Graph

1074339 Commits

Author SHA1 Message Date
Keith Busch f3813f4b28 crypto: add rocksoft 64b crc guard tag framework
Hardware specific features may be able to calculate a crc64, so provide
a framework for drivers to register their implementation. If nothing is
registered, fallback to the generic table lookup implementation. The
implementation is modeled after the crct10dif equivalent.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20220303201312.3255347-7-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 12:48:35 -07:00
Keith Busch cbc0a40e17 lib: add rocksoft model crc64
The NVM Express specification extended data integrity fields to 64 bits
using the Rocksoft parameters. Add the poly to the crc64 table
generation, and provide a generic library routine implementing the
algorithm.

The Rocksoft 64-bit CRC model parameters are as follows:
    Poly: 0xAD93D23594C93659
    Initial value: 0xFFFFFFFFFFFFFFFF
    Reflected Input: True
    Reflected Output: True
    Xor Final: 0xFFFFFFFFFFFFFFFF

Since this model used reflected bits, the implementation generates the
reflected table so the result is ordered consistently.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220303201312.3255347-6-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 12:48:35 -07:00
Keith Busch 7ee8809df9 linux/kernel: introduce lower_48_bits function
Recent data integrity field enhancements allow reference tags to be up
to 48 bits. Introduce an inline helper function since this will be a
repeated operation.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220303201312.3255347-5-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 12:48:35 -07:00
Keith Busch c2ea5fcf53 asm-generic: introduce be48 unaligned accessors
The NVMe protocol extended the data integrity fields with unaligned
48-bit reference tags. Provide some helper accessors in
preparation for these.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220303201312.3255347-4-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 12:48:35 -07:00
Keith Busch 84b735429f nvme: allow integrity on extended metadata formats
The block integrity subsystem knows how to construct protection
information buffers with metadata beyond the protection information
fields. Remove the driver restriction.

Note, this can only work if the PI field appears first in the metadata,
as the integrity subsystem doesn't calculate guard tags on preceding
metadata.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20220303201312.3255347-3-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 12:48:35 -07:00
Keith Busch c340b990d5 block: support pi with extended metadata
The nvme spec allows protection information formats with metadata
extending beyond the pi field. Use the actual size of the metadata field
for incrementing the buffer.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20220303201312.3255347-2-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 12:48:35 -07:00
Jens Axboe d57c1cf43e Merge branch 'for-5.18/write-streams' into for-5.18/64bit-pi
* for-5.18/write-streams:
  block: remove the per-bio/request write hint
  nvme: remove support or stream based temperature hint
2022-03-07 12:48:02 -07:00
Jens Axboe e41ffa9cf0 Merge branch 'for-5.18/alloc-cleanups' into for-5.18/64bit-pi
* for-5.18/alloc-cleanups:
  nilfs2: pass the operation to bio_alloc
  ext4: pass the operation to bio_alloc
  mpage: pass the operation to bio_alloc
2022-03-07 12:46:18 -07:00
Jens Axboe b83ac18fce Merge branch 'for-5.18/drivers' into for-5.18/64bit-pi
* for-5.18/drivers: (51 commits)
  bcache: fixup multiple threads crash
  bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing
  floppy: use memcpy_{to,from}_bvec
  drbd: use bvec_kmap_local in recv_dless_read
  drbd: use bvec_kmap_local in drbd_csum_bio
  bcache: use bvec_kmap_local in bio_csum
  nvdimm-btt: use bvec_kmap_local in btt_rw_integrity
  nvdimm-blk: use bvec_kmap_local in nd_blk_rw_integrity
  zram: use memcpy_from_bvec in zram_bvec_write
  zram: use memcpy_to_bvec in zram_bvec_read
  aoe: use bvec_kmap_local in bvcpy
  iss-simdisk: use bvec_kmap_local in simdisk_submit_bio
  nvme: check that EUI/GUID/UUID are globally unique
  nvme: check for duplicate identifiers earlier
  nvme: fix the check for duplicate unique identifiers
  nvme: cleanup __nvme_check_ids
  nvme: remove nssa from struct nvme_ctrl
  nvme: explicitly set non-error for directives
  nvme: expose cntrltype and dctype through sysfs
  nvme: send uevent on connection up
  ...
2022-03-07 12:46:16 -07:00
Jens Axboe bc8419944f Merge branch 'for-5.18/block' into for-5.18/64bit-pi
* for-5.18/block: (96 commits)
  block: remove bio_devname
  ext4: stop using bio_devname
  raid5-ppl: stop using bio_devname
  raid1: stop using bio_devname
  md-multipath: stop using bio_devname
  dm-integrity: stop using bio_devname
  dm-crypt: stop using bio_devname
  pktcdvd: remove a pointless debug check in pkt_submit_bio
  block: remove handle_bad_sector
  block: fix and cleanup bio_check_ro
  bfq: fix use-after-free in bfq_dispatch_request
  blk-crypto: show crypto capabilities in sysfs
  block: don't delete queue kobject before its children
  block: simplify calling convention of elv_unregister_queue()
  block: remove redundant semicolon
  block: default BLOCK_LEGACY_AUTOLOAD to y
  block: update io_ticks when io hang
  block, bfq: don't move oom_bfqq
  block, bfq: avoid moving bfqq to it's parent bfqg
  block, bfq: cleanup bfq_bfqq_to_bfqg()
  ...
2022-03-07 12:46:13 -07:00
Christoph Hellwig c75e707fe1 block: remove the per-bio/request write hint
With the NVMe support for this gone, there are no consumers of these hints
left, so remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 12:45:57 -07:00
Christoph Hellwig 85e6c77576 nvme: remove support or stream based temperature hint
This support was added for RocksDB, but RocksDB ended up not using it.
At the same time drives on the open marked (vs those build for OEMs
for non-Linux support) that actually support streams are extremly
rare.  Don't bloat the nvme driver for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20220304175556.407719-1-hch@lst.de
[axboe: fold in ctrl->nr_streams removal from Keith]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 12:45:56 -07:00
Jens Axboe 8291100963 Merge branch 'for-5.18/alloc-cleanups' into for-5.18/write-streams
* for-5.18/alloc-cleanups:
  nilfs2: pass the operation to bio_alloc
  ext4: pass the operation to bio_alloc
  mpage: pass the operation to bio_alloc
2022-03-07 12:45:53 -07:00
Jens Axboe b46bebaf2a Merge branch 'for-5.18/drivers' into for-5.18/write-streams
* for-5.18/drivers: (51 commits)
  bcache: fixup multiple threads crash
  bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing
  floppy: use memcpy_{to,from}_bvec
  drbd: use bvec_kmap_local in recv_dless_read
  drbd: use bvec_kmap_local in drbd_csum_bio
  bcache: use bvec_kmap_local in bio_csum
  nvdimm-btt: use bvec_kmap_local in btt_rw_integrity
  nvdimm-blk: use bvec_kmap_local in nd_blk_rw_integrity
  zram: use memcpy_from_bvec in zram_bvec_write
  zram: use memcpy_to_bvec in zram_bvec_read
  aoe: use bvec_kmap_local in bvcpy
  iss-simdisk: use bvec_kmap_local in simdisk_submit_bio
  nvme: check that EUI/GUID/UUID are globally unique
  nvme: check for duplicate identifiers earlier
  nvme: fix the check for duplicate unique identifiers
  nvme: cleanup __nvme_check_ids
  nvme: remove nssa from struct nvme_ctrl
  nvme: explicitly set non-error for directives
  nvme: expose cntrltype and dctype through sysfs
  nvme: send uevent on connection up
  ...
2022-03-07 12:44:39 -07:00
Jens Axboe 13400b1454 Merge branch 'for-5.18/block' into for-5.18/write-streams
* for-5.18/block: (96 commits)
  block: remove bio_devname
  ext4: stop using bio_devname
  raid5-ppl: stop using bio_devname
  raid1: stop using bio_devname
  md-multipath: stop using bio_devname
  dm-integrity: stop using bio_devname
  dm-crypt: stop using bio_devname
  pktcdvd: remove a pointless debug check in pkt_submit_bio
  block: remove handle_bad_sector
  block: fix and cleanup bio_check_ro
  bfq: fix use-after-free in bfq_dispatch_request
  blk-crypto: show crypto capabilities in sysfs
  block: don't delete queue kobject before its children
  block: simplify calling convention of elv_unregister_queue()
  block: remove redundant semicolon
  block: default BLOCK_LEGACY_AUTOLOAD to y
  block: update io_ticks when io hang
  block, bfq: don't move oom_bfqq
  block, bfq: avoid moving bfqq to it's parent bfqg
  block, bfq: cleanup bfq_bfqq_to_bfqg()
  ...
2022-03-07 12:44:37 -07:00
Christoph Hellwig 97939610b8 block: remove bio_devname
All callers are gone, so remove this wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig 734294e47a ext4: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig c7dec4623c raid5-ppl: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig ac483eb375 raid1: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig ee1925bd83 md-multipath: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig 0a806cfde8 dm-integrity: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig 6667171965 dm-crypt: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig 47c426d524 pktcdvd: remove a pointless debug check in pkt_submit_bio
->queuedata is set up in pkt_init_queue, so it can't be NULL here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig ad740780bb block: remove handle_bad_sector
Use the %pg format specifier instead of the stack hungry bdevname
function, and remove handle_bad_sector given that it is not pointless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig 57e95e4670 block: fix and cleanup bio_check_ro
Don't use a WARN_ON when printing a potentially user triggered
condition.  Also don't print the partno when the block device name
already includes it, and use the %pg specifier to simplify printing
the block device name.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Linus Torvalds ffb217a13a Linux 5.17-rc7 2022-03-06 14:28:31 -08:00
Linus Torvalds 3ee65c0f07 for-5.17-rc6-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmIk1isACgkQxWXV+ddt
 WDsAVQ//TvkKObQLL/BJ4TFSxr1ZLs83z4vTcss2W/MrMjGWUut1fhUTGlhkqgC6
 RE03VBuUV983k09/Tn3Q0AHSXcMAmxEv/t1QweJNKiVv7YKT3Nj7VF3kHioFz9g/
 gZ5q9FVbTXkrl4tgcwiQXbLJ1BLWBfXTAMatKgsIQBYsYg0ec3GGem/tx3OlvdNt
 9My6EJhNo5X7vrTMjRUygDgHDhcAgp/gYMa2VmnPhK5qcPzmIYbt4CJGLQDwiiiB
 KSsXnsHCXKm/BRPgtgnMBH6O8YruaxUn0nEQMjntGx8RHbZrkdXk90PaK7pmWz1W
 KkbHTM98zclAOWUx6JmGw8mb9aZQo6aGpou2Pa98aBtHhvbhiKYS2W2OOnHbAshK
 2bj6W2o89eYHKgX+5fICWHt7efUoWUh1KPC+TeaV8DKl8q0a9DC3KfIL/v7PZacA
 pIyyy4uyXh3finzI+Q+fW7QVKQhpcQKLuq5EVGCMEotlfsn+SJBselAdwUl9ChUp
 ALAiYn1T8W1Mrt8P2mxB29btGrdckHtpoWTgr++OAZaX4PABF3GAvIxXwmFg2aMK
 zfXKwTxjwKM42H3AWaLHttk4OA7FJhY9sgOproON/3Tn9cBSK2jiO0HSk1dBn/dL
 WQbOKh4Z+VDXi5niF8hmTANTNO0wS0JdiKZX86tYyhcCl0ZBr/w=
 =Bd5z
 -----END PGP SIGNATURE-----

Merge tag 'for-5.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes for various problems that have user visible effects
  or seem to be urgent:

   - fix corruption when combining DIO and non-blocking io_uring over
     multiple extents (seen on MariaDB)

   - fix relocation crash due to premature return from commit

   - fix quota deadlock between rescan and qgroup removal

   - fix item data bounds checks in tree-checker (found on a fuzzed
     image)

   - fix fsync of prealloc extents after EOF

   - add missing run of delayed items after unlink during log replay

   - don't start relocation until snapshot drop is finished

   - fix reversed condition for subpage writers locking

   - fix warning on page error"

* tag 'for-5.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fallback to blocking mode when doing async dio over multiple extents
  btrfs: add missing run of delayed items after unlink during log replay
  btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
  btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
  btrfs: do not start relocation until in progress drops are done
  btrfs: tree-checker: use u64 for item data end to avoid overflow
  btrfs: do not WARN_ON() if we have PageError set
  btrfs: fix lost prealloc extents beyond eof after full fsync
  btrfs: subpage: fix a wrong check on subpage->writers
2022-03-06 12:19:36 -08:00
Linus Torvalds f81664f760 x86 guest:
* Tweaks to the paravirtualization code, to avoid using them
 when they're pointless or harmful
 
 x86 host:
 
 * Fix for SRCU lockdep splat
 
 * Brown paper bag fix for the propagation of errno
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmIkkdsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP15Qf7B8BXNMlNkret5WN/4pGf06gNdIY6
 ZqC8t/Lx1+fCkzGk+VtAw0bxRscOF4z1XzvfywO5ZI5bxQB/b2xTyBkVY90SqhsB
 shug5QpikejpmvVZJXxwD3+loCUah2T6FUT6QJa0sKVhW+XiqOva8fAmYLG5agaa
 VGvqFXTXiVmbiw/O9ZI/CfUC0WNrn+I1iDO+oGWyhv/22tePxGCizVczRFJn6DAD
 Vh5P6AfOqXjmzdpUeOiU544FQZPHAZehb7/xYc0T9GSW4fPnTmHwRzwhUqgJnx7d
 3E+eWGwny+Q/OrpKf7SbxtB65yn7lHRmdN/YtCHygl4sjs6CdjSPY8/9jQ==
 =PPz1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "x86 guest:

   - Tweaks to the paravirtualization code, to avoid using them when
     they're pointless or harmful

  x86 host:

   - Fix for SRCU lockdep splat

   - Brown paper bag fix for the propagation of errno"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: pull kvm->srcu read-side to kvm_arch_vcpu_ioctl_run
  KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
  KVM: x86: Yield to IPI target vCPU only if it is busy
  x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64
  x86/kvm: Don't waste memory if kvmclock is disabled
  x86/kvm: Don't use PV TLB/yield when mwait is advertised
2022-03-06 12:08:42 -08:00
Linus Torvalds 9bdeaca18b powerpc fixes for 5.17 #5
Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set.
 
 Thanks to: Murilo Opsfelder Araujo, Erhard F.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmIkY/gTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIQfD/997ouPSpuJCyG7nFY35R8IIqJESqqO
 RhMrI1b/HjiTHI3+Ha3htnGWa258Klllwr6zerTFYIp9kRzoO8rskgqeTYM2aOXF
 rLGMUz2b6BjsboxOGowd2Z9JB5U0sItpt1MQZrXVnaVVx3PWQUV4PjksdxmqwC4W
 +DtmYisO38FVQey9kC3V12J+KMkm0J0PWqhh+m7w1zkhNvNlcZp+g0gODWRfo3ic
 QBqTyN3mUXnVKqVNXJZqWCkMp2ek8ZxL1plhwdQtbh9Uwttooc/QNYURepjTVglT
 sHusO8CwLKd1hQlMDD+eqZ0pMSYHE1sWxoaiBLZbaC6Qdu/+arTayHOLJi8QGwtt
 g2jDOklXP8rsXA7Tp/qafWDV61YSJP+O8KJsEpnuluUP/SePSk3jdgDoztCe72M+
 f8Xu5AZ5+2x3NaVmNoOOvvvsxlS3ywl2nDTO205Tz6W55ZCWafSf1vG11lRKU3G8
 We0hzDlJNNajNjnBpiiXgyHu4vi2cfh8gWWDxKJhjZV9pomJ1zFU2+IOpN4CA/6D
 qolgraeLLNVtmNMxcwMdpcBXnG7rwzTuJXSXMPM/tPhLFl1bJmQCiVYfEVOLFMtJ
 2+uUyfbbjaf1IDAfBLrIgN1YzIWc3fEPG+bdmulhhHeFN1XY6tfj++JF/DiRxGhn
 wWc9TB8BCY+uPw==
 =U4bW
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set.

  Thanks to Murilo Opsfelder Araujo, and Erhard F"

* tag 'powerpc-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set
2022-03-06 11:57:42 -08:00
Linus Torvalds f40a33f5ea Two tracing fixes:
- Fix sorting on old "cpu" value in histograms
 
  - Fix return value of __setup() boot parameter handlers.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYiQOghQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6quMXAP0TVq+FvVroN42ZS/UpiynnJ0uW1ibV
 93i3M12QQL2zSQEA6a+aWHywTl1tU2F/I4frH5RkIwTulfP/RwBVJG0MFQc=
 =ccPg
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix sorting on old "cpu" value in histograms

 - Fix return value of __setup() boot parameter handlers

* tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix return value of __setup handlers
  tracing/histogram: Fix sorting on old "cpu" value
2022-03-06 11:47:59 -08:00
Jens Axboe a76370690c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/colyli/linux-bcache into for-5.18/drivers
Pull bcache updates from Coly:

"We have 2 patches for Linux v5.18, both of them are from Mingzhe Zou.
 The first patch improves bcache initialization speed by avoid
 unnecessary cost of cache consistency, the second one fixes a potential
 NULL pointer deference in bcache initialization time."

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/colyli/linux-bcache:
  bcache: fixup multiple threads crash
  bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing
2022-03-06 08:13:09 -07:00
Mingzhe Zou 887554ab96 bcache: fixup multiple threads crash
When multiple threads to check btree nodes in parallel, the main
thread wait for all threads to stop or CACHE_SET_IO_DISABLE flag:

wait_event_interruptible(check_state->wait,
                         atomic_read(&check_state->started) == 0 ||
                         test_bit(CACHE_SET_IO_DISABLE, &c->flags));

However, the bch_btree_node_read and bch_btree_node_read_done
maybe call bch_cache_set_error, then the CACHE_SET_IO_DISABLE
will be set. If the flag already set, the main thread return
error. At the same time, maybe some threads still running and
read NULL pointer, the kernel will crash.

This patch change the event wait condition, the main thread must
wait for all threads to stop.

Fixes: 8e7102273f ("bcache: make bch_btree_check() to be multithreaded")
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Coly Li <colyli@suse.de>
2022-03-06 22:33:45 +08:00
Mingzhe Zou 7b1002f7cf bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing
When attaching a cached device (a.k.a backing device) to a cache
device, bch_sectors_dirty_init() is called to count dirty sectors
and stripes (see what bcache_dev_sectors_dirty_add() does) on the
cache device.

When bcache_dev_sectors_dirty_add() is called, set_bit(stripe,
d->full_dirty_stripes) or clear_bit(stripe, d->full_dirty_stripes)
operation will always be performed. In full_dirty_stripes, each 1bit
represents stripe_size (8192) sectors (512B), so 1bit=4MB (8192*512),
and each CPU cache line=64B=512bit=2048MB. When 20 threads process
a cached disk with 100G dirty data, a single thread processes about
23M at a time, and 20 threads total 460M. These full_dirty_stripes
bits corresponding to the 460M data is likely to fall in the same CPU
cache line. When one of these threads performs a set_bit or clear_bit
operation, the same CPU cache line of other threads will become invalid
and must read the full_dirty_stripes from the main memory again. Compared
with single thread, the time of a bcache_dev_sectors_dirty_add()
call is increased by about 50 times in our test (100G dirty data,
20 threads, bcache_dev_sectors_dirty_add() is called more than
20 million times).

This patch tries to test_bit before set_bit or clear_bit operation.
Therefore, a lot of force set and clear operations will be avoided,
and most of bcache_dev_sectors_dirty_add() calls will only read CPU
cache line.

Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Signed-off-by: Coly Li <colyli@suse.de>
2022-03-06 22:33:37 +08:00
Linus Torvalds dcde98da99 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:

 - a fixup for Goodix touchscreen driver allowing it to work on certain
   Cherry Trail devices

 - a fix for imbalanced enable/disable regulator in Elam touchpad driver
   that became apparent when used with Asus TF103C 2-in-1 dock

 - a couple new input keycodes used on newer keyboards

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  HID: add mapping for KEY_ALL_APPLICATIONS
  HID: add mapping for KEY_DICTATE
  Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
  Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
  Input: goodix - workaround Cherry Trail devices with a bogus ACPI Interrupt() resource
  Input: goodix - use the new soc_intel_is_byt() helper
  Input: samsung-keypad - properly state IOMEM dependency
2022-03-05 15:49:45 -08:00
Linus Torvalds 0014404f9c Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "8 patches.

  Subsystems affected by this patch series: mm (hugetlb, pagemap, and
  userfaultfd), memfd, selftests, and kconfig"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  configs/debug: set CONFIG_DEBUG_INFO=y properly
  proc: fix documentation and description of pagemap
  kselftest/vm: fix tests build with old libc
  memfd: fix F_SEAL_WRITE after shmem huge page allocated
  mm: fix use-after-free when anon vma name is used after vma is freed
  mm: prevent vm_area_struct::anon_name refcount saturation
  mm: refactor vm_area_struct::anon_vma_name usage code
  selftests/vm: cleanup hugetlb file after mremap test
2022-03-05 12:03:14 -08:00
Linus Torvalds f9026e19a4 s390 updates for 5.17-rc7
- Fix HAVE_DYNAMIC_FTRACE_WITH_ARGS implementation by providing correct
   switching between ftrace_caller/ftrace_regs_caller and supplying pt_regs
   only when ftrace_regs_caller is activated.
 
 - Fix exception table sorting.
 
 - Fix breakage of kdump tooling by preserving metadata it cannot function
   without.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmIjRzsACgkQjYWKoQLX
 FBj2sQf7BOz0tu+mn/c/Fy/jQt0D+18yOt43SG4+yusQRo7Qa/Q+sub910rRl9N+
 NwB5Z6Lxyv1O9QVD3sYBBCzLVbDNEVt/qeQIMCyNFuphgflLPFAAELs2Qi0hdVtc
 ZM73VaqX7KS3Ts52aJZ2/tpschngLP9aGrxw2Aa56Ylv1Q04eOB+vWAJZVVREVts
 r7nlcghkyBJQIoHVxUTOO8MrUA4FhvPRHQ/OehJt1EkVOJ4l54IOf74aoPsKE3Ma
 AykSs/CeTIhIhtfTK+iE/JqkD9P/TjNULWdnk8lJYFFuVC9KrxytItoVMtu7IjGM
 HDAp4FDyI7j3gx/vA7M4o3SMgLu13A==
 =vjGY
 -----END PGP SIGNATURE-----

Merge tag 's390-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - Fix HAVE_DYNAMIC_FTRACE_WITH_ARGS implementation by providing correct
   switching between ftrace_caller/ftrace_regs_caller and supplying
   pt_regs only when ftrace_regs_caller is activated.

 - Fix exception table sorting.

 - Fix breakage of kdump tooling by preserving metadata it cannot
   function without.

* tag 's390-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/extable: fix exception table sorting
  s390/ftrace: fix arch_ftrace_get_regs implementation
  s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation
  s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE
2022-03-05 11:25:26 -08:00
Qian Cai d1eff16d72 configs/debug: set CONFIG_DEBUG_INFO=y properly
CONFIG_DEBUG_INFO can't be set by user directly, so set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y instead.

Otherwise, we end up with no debuginfo in vmlinux which is a big no-no
for kernel debugging.

Link: https://lkml.kernel.org/r/20220301202920.18488-1-quic_qiancai@quicinc.com
Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-05 11:08:33 -08:00
Yun Zhou dd21bfa425 proc: fix documentation and description of pagemap
Since bit 57 was exported for uffd-wp write-protected (commit
fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"),
fixing it can reduce some unnecessary confusion.

Link: https://lkml.kernel.org/r/20220301044538.3042713-1-yun.zhou@windriver.com
Fixes: fb8e37f35a ("mm/pagemap: export uffd-wp protection information")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com>
Cc: Florian Schmidt <florian.schmidt@nutanix.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Colin Cross <ccross@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-05 11:08:33 -08:00
Chengming Zhou b773827e36 kselftest/vm: fix tests build with old libc
The error message when I build vm tests on debian10 (GLIBC 2.28):

    userfaultfd.c: In function `userfaultfd_pagemap_test':
    userfaultfd.c:1393:37: error: `MADV_PAGEOUT' undeclared (first use
    in this function); did you mean `MADV_RANDOM'?
      if (madvise(area_dst, test_pgsize, MADV_PAGEOUT))
                                         ^~~~~~~~~~~~
                                         MADV_RANDOM

This patch includes these newer definitions from UAPI linux/mman.h, is
useful to fix tests build on systems without these definitions in glibc
sys/mman.h.

Link: https://lkml.kernel.org/r/20220227055330.43087-2-zhouchengming@bytedance.com
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-05 11:08:32 -08:00
Hugh Dickins f2b277c4d1 memfd: fix F_SEAL_WRITE after shmem huge page allocated
Wangyong reports: after enabling tmpfs filesystem to support transparent
hugepage with the following command:

  echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled

the docker program tries to add F_SEAL_WRITE through the following
command, but it fails unexpectedly with errno EBUSY:

  fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1.

That is because memfd_tag_pins() and memfd_wait_for_pins() were never
updated for shmem huge pages: checking page_mapcount() against
page_count() is hopeless on THP subpages - they need to check
total_mapcount() against page_count() on THP heads only.

Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins()
(compared != 1): either can be justified, but given the non-atomic
total_mapcount() calculation, it is better now to be strict.  Bear in
mind that total_mapcount() itself scans all of the THP subpages, when
choosing to take an XA_CHECK_SCHED latency break.

Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a
page has been swapped out since memfd_tag_pins(), then its refcount must
have fallen, and so it can safely be untagged.

Link: https://lkml.kernel.org/r/a4f79248-df75-2c8c-3df-ba3317ccb5da@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Reported-by: wangyong <wang.yong12@zte.com.cn>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: CGEL ZTE <cgel.zte@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-05 11:08:32 -08:00
Suren Baghdasaryan 942341dcc5 mm: fix use-after-free when anon vma name is used after vma is freed
When adjacent vmas are being merged it can result in the vma that was
originally passed to madvise_update_vma being destroyed.  In the current
implementation, the name parameter passed to madvise_update_vma points
directly to vma->anon_name and it is used after the call to vma_merge.
In the cases when vma_merge merges the original vma and destroys it,
this might result in UAF.  For that the original vma would have to hold
the anon_vma_name with the last reference.  The following vma would need
to contain a different anon_vma_name object with the same string.  Such
scenario is shown below:

madvise_vma_behavior(vma)
  madvise_update_vma(vma, ..., anon_name == vma->anon_name)
    vma_merge(vma)
      __vma_adjust(vma) <-- merges vma with adjacent one
        vm_area_free(vma) <-- frees the original vma
    replace_vma_anon_name(anon_name) <-- UAF of vma->anon_name

Fix this by raising the name refcount and stabilizing it.

Link: https://lkml.kernel.org/r/20220224231834.1481408-3-surenb@google.com
Link: https://lkml.kernel.org/r/20220223153613.835563-3-surenb@google.com
Fixes: 9a10064f56 ("mm: add a field to store names for private anonymous memory")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+aa7b3d4b35f9dc46a366@syzkaller.appspotmail.com
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Colin Cross <ccross@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-05 11:08:32 -08:00
Suren Baghdasaryan 96403e1128 mm: prevent vm_area_struct::anon_name refcount saturation
A deep process chain with many vmas could grow really high.  With
default sysctl_max_map_count (64k) and default pid_max (32k) the max
number of vmas in the system is 2147450880 and the refcounter has
headroom of 1073774592 before it reaches REFCOUNT_SATURATED
(3221225472).

Therefore it's unlikely that an anonymous name refcounter will overflow
with these defaults.  Currently the max for pid_max is PID_MAX_LIMIT
(4194304) and for sysctl_max_map_count it's INT_MAX (2147483647).  In
this configuration anon_vma_name refcount overflow becomes theoretically
possible (that still require heavy sharing of that anon_vma_name between
processes).

kref refcounting interface used in anon_vma_name structure will detect a
counter overflow when it reaches REFCOUNT_SATURATED value but will only
generate a warning and freeze the ref counter.  This would lead to the
refcounted object never being freed.  A determined attacker could leak
memory like that but it would be rather expensive and inefficient way to
do so.

To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still
leaves INT_MAX/2 (1073741823) values before the counter reaches
REFCOUNT_SATURATED.  This should provide enough headroom for raising the
refcounts temporarily.

Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Colin Cross <ccross@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-05 11:08:32 -08:00
Suren Baghdasaryan 5c26f6ac94 mm: refactor vm_area_struct::anon_vma_name usage code
Avoid mixing strings and their anon_vma_name referenced pointers by
using struct anon_vma_name whenever possible.  This simplifies the code
and allows easier sharing of anon_vma_name structures when they
represent the same name.

[surenb@google.com: fix comment]

Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com
Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Colin Cross <ccross@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-05 11:08:32 -08:00
Mike Kravetz ff712a627f selftests/vm: cleanup hugetlb file after mremap test
The hugepage-mremap test will create a file in a hugetlb filesystem.  In
a default 'run_vmtests' run, the file will contain all the hugetlb
pages.  After the test, the file remains and there are no free hugetlb
pages for subsequent tests.  This causes those hugetlb tests to fail.

Change hugepage-mremap to take the name of the hugetlb file as an
argument.  Unlink the file within the test, and just to be sure remove
the file in the run_vmtests script.

Link: https://lkml.kernel.org/r/20220201033459.156944-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-05 11:08:32 -08:00
Zhang Wensheng ab552fcb17 bfq: fix use-after-free in bfq_dispatch_request
KASAN reports a use-after-free report when doing normal scsi-mq test

[69832.239032] ==================================================================
[69832.241810] BUG: KASAN: use-after-free in bfq_dispatch_request+0x1045/0x44b0
[69832.243267] Read of size 8 at addr ffff88802622ba88 by task kworker/3:1H/155
[69832.244656]
[69832.245007] CPU: 3 PID: 155 Comm: kworker/3:1H Not tainted 5.10.0-10295-g576c6382529e #8
[69832.246626] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[69832.249069] Workqueue: kblockd blk_mq_run_work_fn
[69832.250022] Call Trace:
[69832.250541]  dump_stack+0x9b/0xce
[69832.251232]  ? bfq_dispatch_request+0x1045/0x44b0
[69832.252243]  print_address_description.constprop.6+0x3e/0x60
[69832.253381]  ? __cpuidle_text_end+0x5/0x5
[69832.254211]  ? vprintk_func+0x6b/0x120
[69832.254994]  ? bfq_dispatch_request+0x1045/0x44b0
[69832.255952]  ? bfq_dispatch_request+0x1045/0x44b0
[69832.256914]  kasan_report.cold.9+0x22/0x3a
[69832.257753]  ? bfq_dispatch_request+0x1045/0x44b0
[69832.258755]  check_memory_region+0x1c1/0x1e0
[69832.260248]  bfq_dispatch_request+0x1045/0x44b0
[69832.261181]  ? bfq_bfqq_expire+0x2440/0x2440
[69832.262032]  ? blk_mq_delay_run_hw_queues+0xf9/0x170
[69832.263022]  __blk_mq_do_dispatch_sched+0x52f/0x830
[69832.264011]  ? blk_mq_sched_request_inserted+0x100/0x100
[69832.265101]  __blk_mq_sched_dispatch_requests+0x398/0x4f0
[69832.266206]  ? blk_mq_do_dispatch_ctx+0x570/0x570
[69832.267147]  ? __switch_to+0x5f4/0xee0
[69832.267898]  blk_mq_sched_dispatch_requests+0xdf/0x140
[69832.268946]  __blk_mq_run_hw_queue+0xc0/0x270
[69832.269840]  blk_mq_run_work_fn+0x51/0x60
[69832.278170]  process_one_work+0x6d4/0xfe0
[69832.278984]  worker_thread+0x91/0xc80
[69832.279726]  ? __kthread_parkme+0xb0/0x110
[69832.280554]  ? process_one_work+0xfe0/0xfe0
[69832.281414]  kthread+0x32d/0x3f0
[69832.282082]  ? kthread_park+0x170/0x170
[69832.282849]  ret_from_fork+0x1f/0x30
[69832.283573]
[69832.283886] Allocated by task 7725:
[69832.284599]  kasan_save_stack+0x19/0x40
[69832.285385]  __kasan_kmalloc.constprop.2+0xc1/0xd0
[69832.286350]  kmem_cache_alloc_node+0x13f/0x460
[69832.287237]  bfq_get_queue+0x3d4/0x1140
[69832.287993]  bfq_get_bfqq_handle_split+0x103/0x510
[69832.289015]  bfq_init_rq+0x337/0x2d50
[69832.289749]  bfq_insert_requests+0x304/0x4e10
[69832.290634]  blk_mq_sched_insert_requests+0x13e/0x390
[69832.291629]  blk_mq_flush_plug_list+0x4b4/0x760
[69832.292538]  blk_flush_plug_list+0x2c5/0x480
[69832.293392]  io_schedule_prepare+0xb2/0xd0
[69832.294209]  io_schedule_timeout+0x13/0x80
[69832.295014]  wait_for_common_io.constprop.1+0x13c/0x270
[69832.296137]  submit_bio_wait+0x103/0x1a0
[69832.296932]  blkdev_issue_discard+0xe6/0x160
[69832.297794]  blk_ioctl_discard+0x219/0x290
[69832.298614]  blkdev_common_ioctl+0x50a/0x1750
[69832.304715]  blkdev_ioctl+0x470/0x600
[69832.305474]  block_ioctl+0xde/0x120
[69832.306232]  vfs_ioctl+0x6c/0xc0
[69832.306877]  __se_sys_ioctl+0x90/0xa0
[69832.307629]  do_syscall_64+0x2d/0x40
[69832.308362]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[69832.309382]
[69832.309701] Freed by task 155:
[69832.310328]  kasan_save_stack+0x19/0x40
[69832.311121]  kasan_set_track+0x1c/0x30
[69832.311868]  kasan_set_free_info+0x1b/0x30
[69832.312699]  __kasan_slab_free+0x111/0x160
[69832.313524]  kmem_cache_free+0x94/0x460
[69832.314367]  bfq_put_queue+0x582/0x940
[69832.315112]  __bfq_bfqd_reset_in_service+0x166/0x1d0
[69832.317275]  bfq_bfqq_expire+0xb27/0x2440
[69832.318084]  bfq_dispatch_request+0x697/0x44b0
[69832.318991]  __blk_mq_do_dispatch_sched+0x52f/0x830
[69832.319984]  __blk_mq_sched_dispatch_requests+0x398/0x4f0
[69832.321087]  blk_mq_sched_dispatch_requests+0xdf/0x140
[69832.322225]  __blk_mq_run_hw_queue+0xc0/0x270
[69832.323114]  blk_mq_run_work_fn+0x51/0x60
[69832.323942]  process_one_work+0x6d4/0xfe0
[69832.324772]  worker_thread+0x91/0xc80
[69832.325518]  kthread+0x32d/0x3f0
[69832.326205]  ret_from_fork+0x1f/0x30
[69832.326932]
[69832.338297] The buggy address belongs to the object at ffff88802622b968
[69832.338297]  which belongs to the cache bfq_queue of size 512
[69832.340766] The buggy address is located 288 bytes inside of
[69832.340766]  512-byte region [ffff88802622b968, ffff88802622bb68)
[69832.343091] The buggy address belongs to the page:
[69832.344097] page:ffffea0000988a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802622a528 pfn:0x26228
[69832.346214] head:ffffea0000988a00 order:2 compound_mapcount:0 compound_pincount:0
[69832.347719] flags: 0x1fffff80010200(slab|head)
[69832.348625] raw: 001fffff80010200 ffffea0000dbac08 ffff888017a57650 ffff8880179fe840
[69832.354972] raw: ffff88802622a528 0000000000120008 00000001ffffffff 0000000000000000
[69832.356547] page dumped because: kasan: bad access detected
[69832.357652]
[69832.357970] Memory state around the buggy address:
[69832.358926]  ffff88802622b980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[69832.360358]  ffff88802622ba00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[69832.361810] >ffff88802622ba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[69832.363273]                       ^
[69832.363975]  ffff88802622bb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
[69832.375960]  ffff88802622bb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[69832.377405] ==================================================================

In bfq_dispatch_requestfunction, it may have function call:

bfq_dispatch_request
	__bfq_dispatch_request
		bfq_select_queue
			bfq_bfqq_expire
				__bfq_bfqd_reset_in_service
					bfq_put_queue
						kmem_cache_free
In this function call, in_serv_queue has beed expired and meet the
conditions to free. In the function bfq_dispatch_request, the address
of in_serv_queue pointing to has been released. For getting the value
of idle_timer_disabled, it will get flags value from the address which
in_serv_queue pointing to, then the problem of use-after-free happens;

Fix the problem by check in_serv_queue == bfqd->in_service_queue, to
get the value of idle_timer_disabled if in_serve_queue is equel to
bfqd->in_service_queue. If the space of in_serv_queue pointing has
been released, this judge will aviod use-after-free problem.
And if in_serv_queue may be expired or finished, the idle_timer_disabled
will be false which would not give effects to bfq_update_dispatch_stats.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
Link: https://lore.kernel.org/r/20220303070334.3020168-1-zhangwensheng5@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-05 06:51:06 -07:00
Murilo Opsfelder Araujo 58dbe9b373 powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set
The following build failure occurs when CONFIG_PPC_64S_HASH_MMU is not
set:

    arch/powerpc/kernel/setup_64.c: In function ‘setup_per_cpu_areas’:
    arch/powerpc/kernel/setup_64.c:811:21: error: ‘mmu_linear_psize’ undeclared (first use in this function); did you mean ‘mmu_virtual_psize’?
      811 |                 if (mmu_linear_psize == MMU_PAGE_4K)
          |                     ^~~~~~~~~~~~~~~~
          |                     mmu_virtual_psize
    arch/powerpc/kernel/setup_64.c:811:21: note: each undeclared identifier is reported only once for each function it appears in

Move the declaration of mmu_linear_psize outside of
CONFIG_PPC_64S_HASH_MMU ifdef.

After the above is fixed, it fails later with the following error:

    ld: arch/powerpc/kexec/file_load_64.o: in function `.arch_kexec_kernel_image_probe':
    file_load_64.c:(.text+0x1c1c): undefined reference to `.add_htab_mem_range'

Fix that, too, by conditioning add_htab_mem_range() symbol to
CONFIG_PPC_64S_HASH_MMU.

Fixes: 387e220a2e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU")
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215567
Link: https://lore.kernel.org/r/20220301204743.45133-1-muriloo@linux.ibm.com
2022-03-05 20:42:21 +11:00
Linus Torvalds ac84e82f78 block-5.17-2022-03-04
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmIihP0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvWwD/4/Rwu4a7plr7HHYKfS5MaTS62edwWIf2Li
 zMZaGS0kuS4DSV3Lk5Y4AlGyz7FrWjbV+hWlotQNZuvmGntlLeBmscuYlSdN55NL
 afRjwhFRmLOfhOJXCsAE2dSqDvReuRdSn9XkDTL/ViByb35UZUaxGR+nTrGQ8B6J
 DyoA2JVpTVs9B7jtnWoCXKz6TgjFIqT7v29Zd2xE5BrJ/vKpvq0z/4BdJlMBSSKT
 FJ5IQjuE1dyudxJAVYc7X4+t7HRw0afRItZIxrn294COoMmdazhBelnES65CMLfN
 u309J2/HGL0hIRI7tb1Gljp2U8oxYgKeg66VPx1LYFoQ0sUqC9rW+sqU8zZky7SG
 oTzG6ZppSrhTSFhgMYIobChIOKmBRW+tj2BvO6ipKwNJVZbMMFmZogf9K75MJ5U7
 L52RdFxf8D5t7lYzl22puBRgzq5G4m2yi6gbV2EMUfWb2SkbbngdVzuG/uJRQv+D
 7zE8XqqevOgLsUgS71+1oAgc1h07j4b2ihe1UIY2Zo0rZ27y9MV66cbllG8s0R3y
 la5xSSi+HuMNcUpmCeERWLf8uXB3Jzwrwo5l7UvpJuPEGSes4jmE+dHsN3r79bV4
 I5Td7wjBASFu7LKEJlP1OinKdQWJvbJhahNN+pqQtNMxyK6IvNlQRgqh0EwGJhH+
 dqwVNNgkIQ==
 =drk+
 -----END PGP SIGNATURE-----

Merge tag 'block-5.17-2022-03-04' of git://git.kernel.dk/linux-block

Pull block fix from Jens Axboe:
 "Just a small UAF fix for blktrace"

* tag 'block-5.17-2022-03-04' of git://git.kernel.dk/linux-block:
  blktrace: fix use after free for struct blk_trace
2022-03-04 16:03:46 -08:00
Linus Torvalds 07ebd38a0d RISC-V Fixes for 5.17-rc7
* Fixes for a handful of KASAN-related crashes.
 * A fix to avoid a crash during boot for SPARSEMEM && !SPARSEMEM_VMEMMAP
   configurations.
 * A fix to stop reporting some incorrect errors under DEBUG_VIRTUAL.
 * A fix for the K210's device tree to properly populate the interrupt
   map, so hart1 will get interrupts again.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAmIiNtYTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQf8cD/92NMaclwHMVjQ07svZloQcgDp+JSA5
 JP2EYHuDy3UCZsJSdJY8zJZ+Ct81MxNSNDDpLCLQCZe8fD8hA+FOOVlt8a21SqNH
 Pc96ycqIhD/QrfBlcYw5+8N3n5zNTpPSMjazrBphKj56qNWcAXdvQwQTh56pXGj+
 3J5vf3L8xlnx8mlTUMYqHivHKl4cJhYOY/ICwXjpZnRYx0NRF32cquo5A4Uh65ls
 qQjeKL2WXZd44avWK9IkDcBLpjyxr+pJmCsbIntvwK23bz37/SXmk4G2f5/8sBtH
 RK6RDLU1LIH8YNCq5KvAv9/qZZPkuvOKig//lWfcsOLYv43+bp2cGVlO4Z4gvUw3
 qRsrQxXxS+FQFxH5Fxre7UWqLlM9EUHUdbx/aXyGSF5e1DXuD8GcDSt0pOwQboiu
 xKqRxuMozr6ZiHlug3mUcEwzeDAHOwPWrIDSXNELMj+5r/8QogkcPaFUFFqmvigj
 gIwGMiPKe0nQ9XfAUAsjVTL3ozlGXa6nabbVNnA4N05a/scToy3hnFkYo2iEpjyH
 0sxyQ96AaKnN4ydWBsy+y/HA13CbWRP+dgfgaG1BaWQCQ4kh/FN3A3FYpvubBjIm
 5rslXvsmWEkCt/U/K0BY3t6Pvw9GNryAXWDsyPACaVFjMErZPqwwkRtp1PqoKpd6
 XiYQ1nJxgZPCRQ==
 =Ff5J
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - Fixes for a handful of KASAN-related crashes.

 - A fix to avoid a crash during boot for SPARSEMEM &&
   !SPARSEMEM_VMEMMAP configurations.

 - A fix to stop reporting some incorrect errors under DEBUG_VIRTUAL.

 - A fix for the K210's device tree to properly populate the interrupt
   map, so hart1 will get interrupts again.

* tag 'riscv-for-linus-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: dts: k210: fix broken IRQs on hart1
  riscv: Fix kasan pud population
  riscv: Move high_memory initialization to setup_bootmem
  riscv: Fix config KASAN && DEBUG_VIRTUAL
  riscv: Fix DEBUG_VIRTUAL false warnings
  riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
  riscv: Fix is_linear_mapping with recent move of KASAN region
2022-03-04 11:54:06 -08:00
Linus Torvalds 3f509f5971 IOMMU Fixes for Linux v5.17-rc6
Including:
 
 	- Fix a double list_add() in Intel VT-d code
 
 	- Add missing put_device() in Tegra SMMU driver
 
 	- Two AMD IOMMU fixes:
 	  - Memory leak in IO page-table freeing code
 	  - Add missing recovery from event-log overflow
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmIiMSIACgkQK/BELZcB
 GuNoHxAAqgsx7bL8KvyZGhh3XwKl638wdNctCmTKTGj04GI0IsiZOyj0jcR+o8Z7
 LJXkZORMbJmrtGkL/jhURkpHYJxfy5MMVSBvcP0G5/24JfM9lkQ88kxL08diqGkd
 /IktUD+TrmgknvsSJ807EWoCrVvHU3YAqHub70uEnxcpcPK33S9b0EUz1MYge6MD
 NY56cAQxCJ9JV5bTZ4X5RNbTvEFC7bOLU218khEgFq5dJ+35/8xGubUfoyX+repU
 RaNHWhNOiEYNQXkqsuZQaAivBj1uPuY/1wL7pB/g2OsRZ7BMZMS7zhncYdWAA0ER
 1npkmOcZWwp2ymUmFzgS4y9bo6gVP6SFzYBc9I5ZUwRaYBMS34qJEevMVZm0cCKr
 i4gpWIeUEXT5v94F4zyzGE+cO5lYmvYFxpTm5l+NcBMWb/el7ht0Kj2CEmx2JKj2
 mG+3/++JDuTNSigIaF5Dk2d5g2L/2gW2sT9/kvinRcbsga+SFOuQXstY9wAGCLFg
 L/YBxnn/cnEphqx23tog/tt+sje6HgNXNhcWTM8ojYFECX3hY9x7sEIR1MAuN6ym
 jRPJHz9zNpNtVrMkYd8e/irOs4ouWMDmd9H6bYbCapCgg2UrCBEjbvqwbz3iaVB6
 hTFAd0RGRDQOldm0L4Q2xTeq2J7EsYCLfOpw72tWV9tp2dfEvRc=
 =j2I8
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Fix a double list_add() in Intel VT-d code

 - Add missing put_device() in Tegra SMMU driver

 - Two AMD IOMMU fixes:
     - Memory leak in IO page-table freeing code
     - Add missing recovery from event-log overflow

* tag 'iommu-fixes-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/tegra-smmu: Fix missing put_device() call in tegra_smmu_find
  iommu/vt-d: Fix double list_add when enabling VMD in scalable mode
  iommu/amd: Fix I/O page table memory leak
  iommu/amd: Recover from event log overflow
2022-03-04 11:30:57 -08:00
Christoph Hellwig 13d4ef0f66 floppy: use memcpy_{to,from}_bvec
Use the helpers instead of open coding them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00