Commit Graph

2418 Commits

Author SHA1 Message Date
Christoph Hellwig 05bdb99653 block: replace fmode_t with a block-specific type for block open flags
The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE.  Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com>		[rnbd]
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:05 -06:00
Christoph Hellwig 7d9d7d59d4 nvme: replace the fmode_t argument to the nvme ioctl handlers with a simple bool
Instead of passing a fmode_t and only checking it fo0r FMODE_WRITE, pass
a bool open_for_write to prepare for callers that won't have the fmode_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-22-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:04 -06:00
Christoph Hellwig ae220766d8 block: remove the unused mode argument to ->release
The mode argument to the ->release block_device_operation is never used,
so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com>			[rnbd]
Link: https://lore.kernel.org/r/20230608110258.189493-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:04 -06:00
Christoph Hellwig d32e2bf837 block: pass a gendisk to ->open
->open is only called on the whole device.  Make that explicit by
passing a gendisk instead of the block_device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com>		[rnbd]
Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:04 -06:00
Linus Torvalds 03e5cb7b50 for-6.4/io_uring-2023-05-07
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRXkp8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsH3EADZ4ex4vvc2/giYjRYTdyda00GIdYGiSZaZ
 OBW59OBYJfDjsS99Z2U4638shyqM3rA6BwzvWD6hPUTShD7PUddIKDaanpCYBs+x
 GhcDhzeJxARjsp/5qZPQ4r3Hdp+kWRnOFu1BSniCEdWt43J6AOm0l5duuNxl1db5
 Jz+BkSgF1pICAVB5YfSPDShyFiDfc0qcv3rHZo/rVRjwmX3x4x+OVr97zbZInnH4
 fm2vcv2e3Dc8uQEB0XFEyBWNeC8hdhTNe60t1y3MlILaPjHrQlzl5lkQFZ9dtLG6
 /FDK+bD6Ex8t6SQnnQGrLa9q4vQYA6zlT5u/wdvzAkakmrSxt6phf23U1TbyZSQ8
 clSg6MSuyJAW7YhiT3OqAsNMvBZ5g/nKHjLxfkAe86N9YIrMebZW9GDF+fWf0q0K
 em7O4XPehjkCXKclCInMv2bkv+/pncrd84D7Jt3OmCPljBL6oXfWF3vJJQ6FaY+t
 G3tNxl1hMHqwX8uaaSqyeuMYAEYizZa+ebMmHJwm/151d0EKylAOh3i6O1e4bpk3
 Kn5irF6pfTV1g1TcmY3fOwNgMffcHc9Y4E8bXxMJAN9WiHDuopgn2C9+MrJ+KRrx
 pLN7VD9UK/5yFzHSMBhyM8lm0/oTIhfTVgb+WyuHmJLcRhAnyNy9RrIBhSD0AoH+
 UMyOexZR8w==
 =l7Mo
 -----END PGP SIGNATURE-----

Merge tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux

Pull more io_uring updates from Jens Axboe:
 "Nothing major in here, just two different parts:

   - A small series from Breno that enables passing the full SQE down
     for ->uring_cmd().

     This is a prerequisite for enabling full network socket operations.
     Queued up a bit late because of some stylistic concerns that got
     resolved, would be nice to have this in 6.4-rc1 so the dependent
     work will be easier to handle for 6.5.

   - Fix for the huge page coalescing, which was a regression introduced
     in the 6.3 kernel release (Tobias)"

* tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux:
  io_uring: Remove unnecessary BUILD_BUG_ON
  io_uring: Pass whole sqe to commands
  io_uring: Create a helper to return the SQE size
  io_uring/rsrc: check for nonconsecutive pages
2023-05-07 10:00:09 -07:00
Breno Leitao fd9b8547bc io_uring: Pass whole sqe to commands
Currently uring CMD operation relies on having large SQEs, but future
operations might want to use normal SQE.

The io_uring_cmd currently only saves the payload (cmd) part of the SQE,
but, for commands that use normal SQE size, it might be necessary to
access the initial SQE fields outside of the payload/cmd block.  So,
saves the whole SQE other than just the pdu.

This changes slightly how the io_uring_cmd works, since the cmd
structures and callbacks are not opaque to io_uring anymore. I.e, the
callbacks can look at the SQE entries, not only, in the cmd structure.

The main advantage is that we don't need to create custom structures for
simple commands.

Creates io_uring_sqe_cmd() that returns the cmd private data as a null
pointer and avoids casting in the callee side.
Also, make most of ublk_drv's sqe->cmd priv structure into const, and use
io_uring_sqe_cmd() to get the private structure, removing the unwanted
cast. (There is one case where the cast is still needed since the
header->{len,addr} is updated in the private structure)

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20230504121856.904491-3-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-04 08:19:05 -06:00
Linus Torvalds 556eb8b791 Driver core changes for 6.4-rc1
Here is the large set of driver core changes for 6.4-rc1.
 
 Once again, a busy development cycle, with lots of changes happening in
 the driver core in the quest to be able to move "struct bus" and "struct
 class" into read-only memory, a task now complete with these changes.
 
 This will make the future rust interactions with the driver core more
 "provably correct" as well as providing more obvious lifetime rules for
 all busses and classes in the kernel.
 
 The changes required for this did touch many individual classes and
 busses as many callbacks were changed to take const * parameters
 instead.  All of these changes have been submitted to the various
 subsystem maintainers, giving them plenty of time to review, and most of
 them actually did so.
 
 Other than those changes, included in here are a small set of other
 things:
   - kobject logging improvements
   - cacheinfo improvements and updates
   - obligatory fw_devlink updates and fixes
   - documentation updates
   - device property cleanups and const * changes
   - firwmare loader dependency fixes.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
 LEGadNS38k5fs+73UaxV
 =7K4B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.4-rc1.

  Once again, a busy development cycle, with lots of changes happening
  in the driver core in the quest to be able to move "struct bus" and
  "struct class" into read-only memory, a task now complete with these
  changes.

  This will make the future rust interactions with the driver core more
  "provably correct" as well as providing more obvious lifetime rules
  for all busses and classes in the kernel.

  The changes required for this did touch many individual classes and
  busses as many callbacks were changed to take const * parameters
  instead. All of these changes have been submitted to the various
  subsystem maintainers, giving them plenty of time to review, and most
  of them actually did so.

  Other than those changes, included in here are a small set of other
  things:

   - kobject logging improvements

   - cacheinfo improvements and updates

   - obligatory fw_devlink updates and fixes

   - documentation updates

   - device property cleanups and const * changes

   - firwmare loader dependency fixes.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
  device property: make device_property functions take const device *
  driver core: update comments in device_rename()
  driver core: Don't require dynamic_debug for initcall_debug probe timing
  firmware_loader: rework crypto dependencies
  firmware_loader: Strip off \n from customized path
  zram: fix up permission for the hot_add sysfs file
  cacheinfo: Add use_arch[|_cache]_info field/function
  arch_topology: Remove early cacheinfo error message if -ENOENT
  cacheinfo: Check cache properties are present in DT
  cacheinfo: Check sib_leaf in cache_leaves_are_shared()
  cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
  cacheinfo: Add arm64 early level initializer implementation
  cacheinfo: Add arch specific early level initializer
  tty: make tty_class a static const structure
  driver core: class: remove struct class_interface * from callbacks
  driver core: class: mark the struct class in struct class_interface constant
  driver core: class: make class_register() take a const *
  driver core: class: mark class_release() as taking a const *
  driver core: remove incorrect comment for device_create*
  MIPS: vpe-cmp: remove module owner pointer from struct class usage.
  ...
2023-04-27 11:53:57 -07:00
Linus Torvalds 9dd6956b38 for-6.4/block-2023-04-21
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRCvcIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpk+JEACj01t7Xen2+Razagu3aTx9tmRGFnTNR3MY
 raFG6B1TADk1TgCWWa2C4Dj67SOispPLm8hbIcOxqB1UscDWCCwjmnr/debADFzW
 Ap6shv/IRwVGmDp+F7ocYas0ynwooOJg4WJTwkSKz2o4m4p3vzlwAKi4fLiSjbXp
 gJTrA7WEvDOVjzajlTFUtjr8rc6PdunbGm25cPIufAxUEhvttYex2VbVqjDmfNsE
 8tyyk9RWbe4AY/ZYaGXVn4yQ/CgL/sXFkVc5noRXNfAQ/K3CVLQrFLJ3JlwUHpiA
 xXBor21TUWCZEo33Y2G5NConAYqE7etoPTkaTDO3/aZ+dAMFyhC/WAYLz1KZGMh1
 +g1fDX1QKEd40H2lfDXvqF1ob7Ut8EzUx+gvBXcc3/AiRpJ5rjfOcj6LPUMUqQJk
 nucLLFTiMKecnDMBERbvixqbaTyrjvkFEj2wYJvgj1LKXAd+x/bj8SGajs9r88Nb
 9YT9ai/+Yl7Ppfb67rCgXJU7oNZQSAQ2H+X/l2jbiqImOgq1u/45AmINnbanS7HH
 Y1I8pbH45AcnCgkJRoQwrNX3BnTOTBJ+D/4Fl4b8jsihq0D3UtwCwPCObHP4LW9S
 MUNPhP3tUuYsAgXqX80+Sao6SYvXDwnbWOM+LOaaZXgjb1ndwDUZXpto8Ra8WB1u
 8kM6s6ZR7g==
 =W1Zb
 -----END PGP SIGNATURE-----

Merge tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - drbd patches, bringing us closer to unifying the out-of-tree version
   and the in tree one (Andreas, Christoph)

 - support for auto-quiesce for the s390 dasd driver (Stefan)

 - MD pull request via Song:
      - md/bitmap: Optimal last page size (Jon Derrick)
      - Various raid10 fixes (Yu Kuai, Li Nan)
      - md: add error_handlers for raid0 and linear (Mariusz Tkaczyk)

 - NVMe pull request via Christoph:
      - Drop redundant pci_enable_pcie_error_reporting (Bjorn Helgaas)
      - Validate nvmet module parameters (Chaitanya Kulkarni)
      - Fence TCP socket on receive error (Chris Leech)
      - Fix async event trace event (Keith Busch)
      - Minor cleanups (Chaitanya Kulkarni, zhenwei pi)
      - Fix and cleanup nvmet Identify handling (Damien Le Moal,
        Christoph Hellwig)
      - Fix double blk_mq_complete_request race in the timeout handler
        (Lei Yin)
      - Fix irq locking in nvme-fcloop (Ming Lei)
      - Remove queue mapping helper for rdma devices (Sagi Grimberg)

 - use structured request attribute checks for nbd (Jakub)

 - fix blk-crypto race conditions between keyslot management (Eric)

 - add sed-opal support for reading read locking range attributes
   (Ondrej)

 - make fault injection configurable for null_blk (Akinobu)

 - clean up the request insertion API (Christoph)

 - clean up the queue running API (Christoph)

 - blkg config helper cleanups (Tejun)

 - lazy init support for blk-iolatency (Tejun)

 - various fixes and tweaks to ublk (Ming)

 - remove hybrid polling. It hasn't really been useful since we got
   async polled IO support, and these days we don't support sync polled
   IO at all (Keith)

 - misc fixes, cleanups, improvements (Zhong, Ondrej, Colin, Chengming,
   Chaitanya, me)

* tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux: (118 commits)
  nbd: fix incomplete validation of ioctl arg
  ublk: don't return 0 in case of any failure
  sed-opal: geometry feature reporting command
  null_blk: Always check queue mode setting from configfs
  block: ublk: switch to ioctl command encoding
  blk-mq: fix the blk_mq_add_to_requeue_list call in blk_kick_flush
  block, bfq: Fix division by zero error on zero wsum
  fault-inject: fix build error when FAULT_INJECTION_CONFIGFS=y and CONFIGFS_FS=m
  block: store bdev->bd_disk->fops->submit_bio state in bdev
  block: re-arrange the struct block_device fields for better layout
  md/raid5: remove unused working_disks variable
  md/raid10: don't call bio_start_io_acct twice for bio which experienced read error
  md/raid10: fix memleak of md thread
  md/raid10: fix memleak for 'conf->bio_split'
  md/raid10: fix leak of 'r10bio->remaining' for recovery
  md/raid10: don't BUG_ON() in raise_barrier()
  md: fix soft lockup in status_resync
  md: add error_handlers for raid0 and linear
  md: Use optimal I/O size for last bitmap page
  md: Fix types in sb writer
  ...
2023-04-26 12:52:58 -07:00
Duy Truong 74391b3e69 nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD
Added a quirk to fix the TeamGroup T-Force Cardea Zero Z330 SSDs reporting
duplicate NGUIDs.

Signed-off-by: Duy Truong <dory@dory.moe>
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-14 07:13:48 +02:00
Sagi Grimberg edde9e70bb blk-mq-rdma: remove queue mapping helper for rdma devices
No rdma device exposes its irq vectors affinity today. So the only
mapping that we have left, is the default blk_mq_map_queues, which
we fallback to anyways. Also fixup the only consumer of this helper
(nvme-rdma).

Remove this now dead code.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-13 08:59:05 +02:00
zhenwei pi 015ad2b1e4 nvme-rdma: minor cleanup in nvme_rdma_create_cq()
Before cleanup:
enum ib_poll_context poll_ctx;

if (nvme_rdma_poll_queue(queue)) {
        poll_ctx = IB_POLL_DIRECT;
        queue->ib_cq = ib_alloc_cq(ibdev, queue, queue->cq_size,
                                   comp_vector, poll_ctx);
} else {
        poll_ctx = IB_POLL_SOFTIRQ;
        queue->ib_cq = ib_cq_pool_get(ibdev, queue->cq_size,
                                      comp_vector, poll_ctx);
}

After cleanup:
if (nvme_rdma_poll_queue(queue))
        queue->ib_cq = ib_alloc_cq(ibdev, queue, queue->cq_size,
                                   comp_vector, IB_POLL_DIRECT);
else
        queue->ib_cq = ib_cq_pool_get(ibdev, queue->cq_size,
                                      comp_vector, IB_POLL_SOFTIRQ);

IB_POLL_SOFTIRQ/IB_POLL_SOFTIRQ gets used directly in function, this
seems more accessible.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-13 08:59:05 +02:00
Lei Yin d4f1d5f7a4 nvme: fix double blk_mq_complete_request for timeout request with low probability
When nvme_cancel_tagset traverses all tagsets and executes
nvme_cancel_request, this request may be executing blk_mq_free_request
that is called by nvme_rdma_complete_timed_out/nvme_tcp_complete_timed_out.
When blk_mq_free_request executes to WRITE_ONCE(rq->state, MQ_RQ_IDLE) and
__blk_mq_free_request(rq), it will cause double blk_mq_complete_request for
this request, and it will cause a null pointer error in the second
execution of this function because rq->mq_hctx has set to NULL in first
execution.

Signed-off-by: Lei Yin <yinlei2@lenovo.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-13 08:59:04 +02:00
Keith Busch 6622b76fe9 nvme: fix async event trace event
Mixing AER Event Type and Event Info has masking clashes. Just print the
event type, but also include the event info of the AER result in the
trace.

Fixes: 09bd1ff4b1 ("nvme-core: add async event trace helper")
Reported-by: Nate Thornton <nate.thornton@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-13 08:59:04 +02:00
Chaitanya Kulkarni cf806e3ab1 nvme-apple: return directly instead of else
There is no need for the else when direct return is used at the end of
the function.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Eric Curtin <ecurtin@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-13 08:55:06 +02:00
Chaitanya Kulkarni 2ce525d40a nvme-apple: return directly instead of else
There is no need for the else when direct return is used at the end of
the function.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Eric Curtin <ecurtin@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-13 08:55:06 +02:00
Chris Leech aeacfcefa2 nvme-tcp: fence TCP socket on receive error
Ensure that no further socket reads occur after a receive processing
error, either from io_work being re-scheduled or nvme_tcp_poll.

Failing to do so can result in unrecognised PDU payloads or TCP stream
garbage being processed as a C2H data PDU, and potentially start copying
the payload to an invalid destination after looking up a request using a
bogus command id.

Signed-off-by: Chris Leech <cleech@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-13 08:55:05 +02:00
Bjorn Helgaas 1ad11eafc6 nvme-pci: drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages.  Since f26e58bf6f ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver.  Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.

Note that this only controls ERR_* Messages from the device.  An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-13 08:55:03 +02:00
Keith Busch d3205ab75e nvme: fix discard support without oncs
The device can report discard support without setting the ONCS DSM bit.
When not set, the driver clears max_discard_size expecting it to be set
later. We don't know the size until we have the namespace format,
though, so setting it is deferred until configuring one, but the driver
was abandoning the discard settings due to that initial clearing.

Move the max_discard_size calculation above the check for a '0' discard
size.

Fixes: 1a86924e4f ("nvme: fix interpretation of DMRSL")
Reported-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-05 17:13:17 +02:00
Greg Kroah-Hartman cd8fe5b6db Merge 6.3-rc5 into driver-core-next
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-03 09:33:30 +02:00
Sagi Grimberg 88eaba8032 nvme-tcp: fix a possible UAF when failing to allocate an io queue
When we allocate a nvme-tcp queue, we set the data_ready callback before
we actually need to use it. This creates the potential that if a stray
controller sends us data on the socket before we connect, we can trigger
the io_work and start consuming the socket.

In this case reported: we failed to allocate one of the io queues, and
as we start releasing the queues that we already allocated, we get
a UAF [1] from the io_work which is running before it should really.

Fix this by setting the socket ops callbacks only before we start the
queue, so that we can't accidentally schedule the io_work in the
initialization phase before the queue started. While we are at it,
rename nvme_tcp_restore_sock_calls to pair with nvme_tcp_setup_sock_ops.

[1]:
[16802.107284] nvme nvme4: starting error recovery
[16802.109166] nvme nvme4: Reconnecting in 10 seconds...
[16812.173535] nvme nvme4: failed to connect socket: -111
[16812.173745] nvme nvme4: Failed reconnect attempt 1
[16812.173747] nvme nvme4: Reconnecting in 10 seconds...
[16822.413555] nvme nvme4: failed to connect socket: -111
[16822.413762] nvme nvme4: Failed reconnect attempt 2
[16822.413765] nvme nvme4: Reconnecting in 10 seconds...
[16832.661274] nvme nvme4: creating 32 I/O queues.
[16833.919887] BUG: kernel NULL pointer dereference, address: 0000000000000088
[16833.920068] nvme nvme4: Failed reconnect attempt 3
[16833.920094] #PF: supervisor write access in kernel mode
[16833.920261] nvme nvme4: Reconnecting in 10 seconds...
[16833.920368] #PF: error_code(0x0002) - not-present page
[16833.921086] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
[16833.921191] RIP: 0010:_raw_spin_lock_bh+0x17/0x30
...
[16833.923138] Call Trace:
[16833.923271]  <TASK>
[16833.923402]  lock_sock_nested+0x1e/0x50
[16833.923545]  nvme_tcp_try_recv+0x40/0xa0 [nvme_tcp]
[16833.923685]  nvme_tcp_io_work+0x68/0xa0 [nvme_tcp]
[16833.923824]  process_one_work+0x1e8/0x390
[16833.923969]  worker_thread+0x53/0x3d0
[16833.924104]  ? process_one_work+0x390/0x390
[16833.924240]  kthread+0x124/0x150
[16833.924376]  ? set_kthread_struct+0x50/0x50
[16833.924518]  ret_from_fork+0x1f/0x30
[16833.924655]  </TASK>

Reported-by: Yanjun Zhang <zhangyanjun@cestc.cn>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Yanjun Zhang <zhangyanjun@cestc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-30 11:24:33 +09:00
Juraj Pecigos 1231363aec nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
A system with more than one of these SSDs will only have one usable.
The kernel fails to detect more than one nvme device due to duplicate
cntlids.

before:
[    9.395229] nvme 0000:01:00.0: platform quirk: setting simple suspend
[    9.395262] nvme nvme0: pci function 0000:01:00.0
[    9.395282] nvme 0000:03:00.0: platform quirk: setting simple suspend
[    9.395305] nvme nvme1: pci function 0000:03:00.0
[    9.409873] nvme nvme0: Duplicate cntlid 1 with nvme1, subsys nqn.2022-07.com.siliconmotion:nvm-subsystem-sn-                    , rejecting
[    9.409982] nvme nvme0: Removing after probe failure status: -22
[    9.427487] nvme nvme1: allocated 64 MiB host memory buffer.
[    9.445088] nvme nvme1: 16/0/0 default/read/poll queues
[    9.449898] nvme nvme1: Ignoring bogus Namespace Identifiers

after:
[    1.161890] nvme 0000:01:00.0: platform quirk: setting simple suspend
[    1.162660] nvme nvme0: pci function 0000:01:00.0
[    1.162684] nvme 0000:03:00.0: platform quirk: setting simple suspend
[    1.162707] nvme nvme1: pci function 0000:03:00.0
[    1.191354] nvme nvme0: allocated 64 MiB host memory buffer.
[    1.193378] nvme nvme1: allocated 64 MiB host memory buffer.
[    1.211044] nvme nvme1: 16/0/0 default/read/poll queues
[    1.211080] nvme nvme0: 16/0/0 default/read/poll queues
[    1.216145] nvme nvme0: Ignoring bogus Namespace Identifiers
[    1.216261] nvme nvme1: Ignoring bogus Namespace Identifiers

Adding the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk to resolves the issue.

Signed-off-by: Juraj Pecigos <kernel@juraj.dev>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-28 10:09:11 +09:00
Martin George def84ab600 nvme: send Identify with CNS 06h only to I/O controllers
Identify CNS 06h (I/O Command Set Specific Identify Controller data
structure) is supported only on i/o controllers.

But nvme_init_non_mdts_limits() currently invokes this on all
controllers.  Correct this by ensuring this is sent to I/O
controllers only.

Signed-off-by: Martin George <marting@netapp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-22 09:17:52 +01:00
Jens Axboe 9d2789ac9d block/io_uring: pass in issue_flags for uring_cmd task_work handling
io_uring_cmd_done() currently assumes that the uring_lock is held
when invoked, and while it generally is, this is not guaranteed.
Pass in the issue_flags associated with it, so that we have
IO_URING_F_UNLOCKED available to be able to lock the CQ ring
appropriately when completing events.

Cc: stable@vger.kernel.org
Fixes: ee692a21e9 ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-20 20:01:25 -06:00
Greg Kroah-Hartman 1aaba11da9 driver core: class: remove module * from class_create()
The module pointer in class_create() never actually did anything, and it
shouldn't have been requred to be set as a parameter even if it did
something.  So just remove it and fix up all callers of the function in
the kernel tree at the same time.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:16:33 +01:00
Greg Kroah-Hartman 10a03c36b7 drivers: remove struct module * setting from struct class
There is no need to manually set the owner of a struct class, as the
registering function does it automatically, so remove all of the
explicit settings from various drivers that did so as it is unneeded.

This allows us to remove this pointer entirely from this structure going
forward.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230313181843.1207845-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:16:27 +01:00
Jens Axboe 890a2fb06e nvme fixes for Linux 6.3
- avoid potential UAF in nvmet_req_complete (Damien Le Moal)
  - more quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen)
  - fix a memory leak in the nvme-pci probe teardown path (Irvin Cote)
  - repair the MAINTAINERS entry (Lukas Bulwahn)
  - fix handling single range discard request (Ming Lei)
  - show more opcode names in trace events (Minwoo Im)
  - fix nvme-tcp timeout reporting (Sagi Grimberg)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmQSy9ILHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYMCqQ/+PcaiHojkyA6B9HBNtVcLemupc2GSc8lozTIqALVT
 EuTJ2AK0ox8F8MYU0ELlpkCGs9Wil135xqLURLDwLi7sKFj7S1IkaSvG+1okDDHK
 hJy4U+bpu1gkyBJkB2v/PrX49lmqplFGD/2eLRFedWbZ4rF82hQRrPHYcDpRL2VF
 0wmVd7lVaUQ9mOngLy1zrBFjt+rOv8d96zwfqF9m67G9rqKJLQhM45M/TiKaeUuy
 3WbsVmZ715rbGv1y7YBObHfdKvs8LfNjdOdJeLNGaNOxuc2y1yOzSbaww2wWW+Wf
 9mB6unjn76xW1UUzPtQs2OALZaa9r0w55IQ5A4V+ugJoDEIdxsyHSUMGDnAc1dTW
 MQGNGhQyGBDD9O9XSvexOXRI+LRxyM3dXfA1dpPUW1BWZm4ACumY26cMasJEJnCr
 YJtIM3SW/Sp3cRnzcN/fxKCfsDUeYTj0mFu5KjamLd1ux5w4pxJyoS9sWxh63qJz
 HBnr6VFczYhf63cUoVgX0qZOKv4jwYaVolVcmuNVWWdFhzr8OC7cb+AztfVdo8M2
 UbIkWxFH7f/obVd2L3N5c8+tKLaoi5R/MzHTlrov2oTP2e46sFuF6cyhtPM4zO//
 lwjkIJ1JSAjvFHxEg44wBHjYIB5y0JCjkIaBLuOiVILQYkxYrV8XYzRaqjjNdk0D
 YSo=
 =GT+2
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme into block-6.3

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.3

 - avoid potential UAF in nvmet_req_complete (Damien Le Moal)
 - more quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen)
 - fix a memory leak in the nvme-pci probe teardown path (Irvin Cote)
 - repair the MAINTAINERS entry (Lukas Bulwahn)
 - fix handling single range discard request (Ming Lei)
 - show more opcode names in trace events (Minwoo Im)
 - fix nvme-tcp timeout reporting (Sagi Grimberg)"

* tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme:
  nvmet: avoid potential UAF in nvmet_req_complete()
  nvme-trace: show more opcode names
  nvme-tcp: add nvme-tcp pdu size build protection
  nvme-tcp: fix opcode reporting in the timeout handler
  nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
  nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
  nvme-pci: fixing memory leak in probe teardown path
  nvme: fix handling single range discard request
  MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
2023-03-16 07:01:48 -06:00
Yu Kuai 5f27571382 block: count 'ios' and 'sectors' when io is done for bio-based device
While using iostat for raid, I observed very strange 'await'
occasionally, and turns out it's due to that 'ios' and 'sectors' is
counted in bdev_start_io_acct(), while 'nsecs' is counted in
bdev_end_io_acct(). I'm not sure why they are ccounted like that
but I think this behaviour is obviously wrong because user will get
wrong disk stats.

Fix the problem by counting 'ios' and 'sectors' when io is done, like
what rq-based device does.

Fixes: 394ffa503b ("blk: introduce generic io stat accounting help function")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230223091226.1135678-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15 09:25:04 -06:00
Sagi Grimberg 7e87965d38 nvme-tcp: add nvme-tcp pdu size build protection
Make sure that we don't somehow mess up the wire structures in the spec.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15 14:58:52 +01:00
Sagi Grimberg a3406352c5 nvme-tcp: fix opcode reporting in the timeout handler
For non in-capsule writes we reuse the request pdu space for a h2cdata
pdu in order to avoid over allocating space (either preallocate or
dynamically upon receving an r2t pdu). However if the request times out
the core expects to find the opcode in the start of the request, which
we override.

In order to prevent that, without sacrificing additional 24 bytes per
request, we just use the tail of the command pdu space instead (last
24 bytes from the 72 bytes command pdu). That should make the command
opcode always available, and we get away from allocating more space.

If in the future we would need the last 24 bytes of the nvme command
available we would need to allocate a dedicated space for it in the
request, but until then we can avoid doing so.

Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com>
Tested-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15 14:58:52 +01:00
Philipp Geulen b65d44fa0f nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
Added a quirk to fix Lexar NM620 1TB SSD reporting duplicate NGUIDs.

Signed-off-by: Philipp Geulen <p.geulen@js-elektronik.de>
Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15 14:58:52 +01:00
Elmer Miroslav Mosher Golovin 9630d80655 nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
Added a quirk to fix the Netac NV3000 SSD reporting duplicate NGUIDs.

Cc: <stable@vger.kernel.org>
Signed-off-by: Elmer Miroslav Mosher Golovin <miroslav@mishamosher.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15 14:58:51 +01:00
Irvin Cote a61d265533 nvme-pci: fixing memory leak in probe teardown path
In case the nvme_probe teardown path is triggered the ctrl ref count does
not reach 0 thus creating a memory leak upon failure of nvme_probe.

Signed-off-by: Irvin Cote <irvincoteg@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15 14:58:51 +01:00
Ming Lei 37f0dc2ec7 nvme: fix handling single range discard request
When investigating one customer report on warning in nvme_setup_discard,
we observed the controller(nvme/tcp) actually exposes
queue_max_discard_segments(req->q) == 1.

Obviously the current code can't handle this situation, since contiguity
merge like normal RW request is taken.

Fix the issue by building range from request sector/nr_sectors directly.

Fixes: b35ba01ea6 ("nvme: support ranged discard requests")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-15 14:58:51 +01:00
Linus Torvalds 9d0281b56b block-6.3-2023-03-03
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmQB57MQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgputpEADVrc1OFzHOivJq+LJ3HS3ufhLBthtgu1Lp
 sEHvDNp9tBGXMLkomuCYpAju5TBAEKC+AJTZyj9iS1j++ItoezdoP55YRIH7t2Or
 UTy8ex3rLPGkQk6k3o8roWCyajTW/ZS+4fmk+NkVYMLsQBp9I+kFbxgJa5bbREdU
 Z8b/9hcBGz58R8Kq+TEMp/bO7oCV4c8xWumrKER+MktDDx0kc5d+afWXoy7bEKFg
 jLB3gleTM9HUpa9a2GPc4fxqdb0KanQdMtiyn/oplg0JcZLMiHfRbiRnsgQkjN0O
 RVtUcdxXmOkQeFra4GXPiHmQBcIfE85wP4wxb8p/F2StYRhb1epzzeCXOhuNZvv4
 dd6OSARgtzWt3OlHka4aC63H4kzs9SxJp0F2uwuPLV0fM91TP1oOTWV+53FrQr9Z
 OQYyB8d9Il4K72NFLwU4ukJ1fPoCRHjpgAXIIkasEjaBftpJlMNnfblncTZTBumy
 XumFVdKfvqc3OFt8LLKWqLDV0j3TknVeCMPKhsbRwQ0NG4vlNOSWaLkGJCDLJ7ga
 ebf8AD5eaLCT9qyYquBuW5VBKZH5Z4rf5yHta9Dx+Omu0JTQYtTkiiM3UTdpDbtq
 SObZ31UvLoYK2dOZcVgjhE2RgM/AV5jJcx7aHhT3UptavAehHbePgiNhuEEntlKv
 L87kXJkSSQ==
 =ezrg
 -----END PGP SIGNATURE-----

Merge tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Christoph:
      - Don't access released socket during error recovery (Akinobu
        Mita)
      - Bring back auto-removal of deleted namespaces during sequential
        scan (Christoph Hellwig)
      - Fix an error code in nvme_auth_process_dhchap_challenge (Dan
        Carpenter)
      - Show well known discovery name (Daniel Wagner)
      - Add a missing endianess conversion in effects masking (Keith
        Busch)

 - Fix for a regression introduced in blk-rq-qos during init in this
   merge window (Breno)

 - Reorder a few fields in struct blk_mq_tag_set, eliminating a few
   holes and shrinking it (Christophe)

 - Remove redundant bdev_get_queue() NULL checks (Juhyung)

 - Add sed-opal single user mode support flag (Luca)

 - Remove SQE128 check in ublk as it isn't needed, saving some memory
   (Ming)

 - Op specific segment checking for cloned requests (Uday)

 - Exclusive open partition scan fixes (Yu)

 - Loop offset/size checking before assigning them in the device (Zhong)

 - Bio polling fixes (me)

* tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux:
  blk-mq: enforce op-specific segment limits in blk_insert_cloned_request
  nvme-fabrics: show well known discovery name
  nvme-tcp: don't access released socket during error recovery
  nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge()
  nvme: bring back auto-removal of deleted namespaces during sequential scan
  blk-iocost: Pass gendisk to ioc_refresh_params
  nvme: fix sparse warning on effects masking
  block: be a bit more careful in checking for NULL bdev while polling
  block: clear bio->bi_bdev when putting a bio back in the cache
  loop: loop_set_status_from_info() check before assignment
  ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd
  block: remove more NULL checks after bdev_get_queue()
  blk-mq: Reorder fields in 'struct blk_mq_tag_set'
  block: fix scan partition for exclusively open device again
  block: Revert "block: Do not reread partition table on exclusively open device"
  sed-opal: add support flag for SUM in status ioctl
2023-03-03 10:21:39 -08:00
Daniel Wagner 26a57cb355 nvme-fabrics: show well known discovery name
The kernel always logs the unique subsystem name for a discovery
controller, even in the case user space asked for the well known.

This has lead to confusion as the logs of nvme-cli and the kernel
logs didn't match.

First, nvme-cli connects to the well known discovery controller to
figure out if it supports TP8013. If so then nvme-cli disconnects and
connects to the unique discovery controller. Currently, the kernel show
that user space connected twice to the unique one.

To avoid further confusion, show the well known discovery controller if
user space asked for it:

  $ nvme connect-all -v -t tcp -a 192.168.0.1
  nvme0: nqn.2014-08.org.nvmexpress.discovery connected
  nvme0: nqn.2014-08.org.nvmexpress.discovery disconnected
  nvme0: nqn.discovery connected

  kernel log:
  nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.0.1:8009
  nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
  nvme nvme0: new ctrl: NQN "nqn.discovery", addr 192.168.0.1:8009

Fixes: e5ea42faa7 ("nvme: display correct subsystem NQN")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-28 06:14:44 -07:00
Akinobu Mita 76d54bf20c nvme-tcp: don't access released socket during error recovery
While the error recovery work is temporarily failing reconnect attempts,
running the 'nvme list' command causes a kernel NULL pointer dereference
by calling getsockname() with a released socket.

During error recovery work, the nvme tcp socket is released and a new one
created, so it is not safe to access the socket without proper check.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Fixes: 02c57a82c0 ("nvme-tcp: print actual source IP address through sysfs "address" attr")
Reviewed-by: Martin Belanger <martin.belanger@dell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-28 06:14:44 -07:00
Dan Carpenter 51d24f701f nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge()
This function was transitioned from returning NVMe status codes to
returning traditional kernel error codes.  However, this particular
return now accidentally returns positive error codes like ENOMEM instead
of negative -ENOMEM.

Fixes: b0ef1b11d3 ("nvme-auth: don't use NVMe status codes")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-28 06:14:44 -07:00
Christoph Hellwig 0dd6fff2aa nvme: bring back auto-removal of deleted namespaces during sequential scan
Bring back the check of the Identify Namespace return value for the
legacy NVMe 1.0-style sequential scanning.  While NVMe 1.0 does not
support namespace management, there are "modern" cloud solutions like
Google Cloud Platform that claim the obsolete 1.0 compliance for no
good reason while supporting proprietary sideband namespace management.

Fixes: 1a893c2bfe ("nvme: refactor namespace probing")
Reported-by: Nils Hanke <nh@edgeless.systems>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Nils Hanke <nh@edgeless.systems>
2023-02-28 06:14:29 -07:00
Keith Busch c0c33b94cf nvme: fix sparse warning on effects masking
The log entries are stored in le32, so use appropriate byte swapping
macros.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302242222.PevBhzvC-lkp@intel.com/
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-27 06:56:04 -07:00
Linus Torvalds 5b7c4cabbb Networking changes for 6.3.
Core
 ----
 
  - Add dedicated kmem_cache for typical/small skb->head, avoid having
    to access struct page at kfree time, and improve memory use.
 
  - Introduce sysctl to set default RPS configuration for new netdevs.
 
  - Define Netlink protocol specification format which can be used
    to describe messages used by each family and auto-generate parsers.
    Add tools for generating kernel data structures and uAPI headers.
 
  - Expose all net/core sysctls inside netns.
 
  - Remove 4s sleep in netpoll if carrier is instantly detected on boot.
 
  - Add configurable limit of MDB entries per port, and port-vlan.
 
  - Continue populating drop reasons throughout the stack.
 
  - Retire a handful of legacy Qdiscs and classifiers.
 
 Protocols
 ---------
 
  - Support IPv4 big TCP (TSO frames larger than 64kB).
 
  - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
    on socket by socket basis.
 
  - Track and report in procfs number of MPTCP sockets used.
 
  - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP
    path manager.
 
  - IPv6: don't check net.ipv6.route.max_size and rely on garbage
    collection to free memory (similarly to IPv4).
 
  - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
 
  - ICMP: add per-rate limit counters.
 
  - Add support for user scanning requests in ieee802154.
 
  - Remove static WEP support.
 
  - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
    reporting.
 
  - WiFi 7 EHT channel puncturing support (client & AP).
 
 BPF
 ---
 
  - Add a rbtree data structure following the "next-gen data structure"
    precedent set by recently added linked list, that is, by using
    kfunc + kptr instead of adding a new BPF map type.
 
  - Expose XDP hints via kfuncs with initial support for RX hash and
    timestamp metadata.
 
  - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key
    to better support decap on GRE tunnel devices not operating
    in collect metadata.
 
  - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
 
  - Remove the need for trace_printk_lock for bpf_trace_printk
    and bpf_trace_vprintk helpers.
 
  - Extend libbpf's bpf_tracing.h support for tracing arguments of
    kprobes/uprobes and syscall as a special case.
 
  - Significantly reduce the search time for module symbols
    by livepatch and BPF.
 
  - Enable cpumasks to be used as kptrs, which is useful for tracing
    programs tracking which tasks end up running on which CPUs in
    different time intervals.
 
  - Add support for BPF trampoline on s390x and riscv64.
 
  - Add capability to export the XDP features supported by the NIC.
 
  - Add __bpf_kfunc tag for marking kernel functions as kfuncs.
 
  - Add cgroup.memory=nobpf kernel parameter option to disable BPF
    memory accounting for container environments.
 
 Netfilter
 ---------
 
  - Remove the CLUSTERIP target. It has been marked as obsolete
    for years, and we still have WARN splats wrt. races of
    the out-of-band /proc interface installed by this target.
 
  - Add 'destroy' commands to nf_tables. They are identical to
    the existing 'delete' commands, but do not return an error if
    the referenced object (set, chain, rule...) did not exist.
 
 Driver API
 ----------
 
  - Improve cpumask_local_spread() locality to help NICs set the right
    IRQ affinity on AMD platforms.
 
  - Separate C22 and C45 MDIO bus transactions more clearly.
 
  - Introduce new DCB table to control DSCP rewrite on egress.
 
  - Support configuration of Physical Layer Collision Avoidance (PLCA)
    Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
    shared medium Ethernet.
 
  - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
    preemption of low priority frames by high priority frames.
 
  - Add support for controlling MACSec offload using netlink SET.
 
  - Rework devlink instance refcounts to allow registration and
    de-registration under the instance lock. Split the code into multiple
    files, drop some of the unnecessarily granular locks and factor out
    common parts of netlink operation handling.
 
  - Add TX frame aggregation parameters (for USB drivers).
 
  - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
    messages with notifications for debug.
 
  - Allow offloading of UDP NEW connections via act_ct.
 
  - Add support for per action HW stats in TC.
 
  - Support hardware miss to TC action (continue processing in SW from
    a specific point in the action chain).
 
  - Warn if old Wireless Extension user space interface is used with
    modern cfg80211/mac80211 drivers. Do not support Wireless Extensions
    for Wi-Fi 7 devices at all. Everyone should switch to using nl80211
    interface instead.
 
  - Improve the CAN bit timing configuration. Use extack to return error
    messages directly to user space, update the SJW handling, including
    the definition of a new default value that will benefit CAN-FD
    controllers, by increasing their oscillator tolerance.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - nVidia BlueField-3 support (control traffic driver)
    - Ethernet support for imx93 SoCs
    - Motorcomm yt8531 gigabit Ethernet PHY
    - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
    - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
    - Amlogic gxl MDIO mux
 
  - WiFi:
    - RealTek RTL8188EU (rtl8xxxu)
    - Qualcomm Wi-Fi 7 devices (ath12k)
 
  - CAN:
    - Renesas R-Car V4H
 
 Drivers
 -------
 
  - Bluetooth:
    - Set Per Platform Antenna Gain (PPAG) for Intel controllers.
 
  - Ethernet NICs:
    - Intel (1G, igc):
      - support TSN / Qbv / packet scheduling features of i226 model
    - Intel (100G, ice):
      - use GNSS subsystem instead of TTY
      - multi-buffer XDP support
      - extend support for GPIO pins to E823 devices
    - nVidia/Mellanox:
      - update the shared buffer configuration on PFC commands
      - implement PTP adjphase function for HW offset control
      - TC support for Geneve and GRE with VF tunnel offload
      - more efficient crypto key management method
      - multi-port eswitch support
    - Netronome/Corigine:
      - add DCB IEEE support
      - support IPsec offloading for NFP3800
    - Freescale/NXP (enetc):
      - enetc: support XDP_REDIRECT for XDP non-linear buffers
      - enetc: improve reconfig, avoid link flap and waiting for idle
      - enetc: support MAC Merge layer
    - Other NICs:
      - sfc/ef100: add basic devlink support for ef100
      - ionic: rx_push mode operation (writing descriptors via MMIO)
      - bnxt: use the auxiliary bus abstraction for RDMA
      - r8169: disable ASPM and reset bus in case of tx timeout
      - cpsw: support QSGMII mode for J721e CPSW9G
      - cpts: support pulse-per-second output
      - ngbe: add an mdio bus driver
      - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
      - r8152: handle devices with FW with NCM support
      - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
      - virtio-net: support multi buffer XDP
      - virtio/vsock: replace virtio_vsock_pkt with sk_buff
      - tsnep: XDP support
 
  - Ethernet high-speed switches:
    - nVidia/Mellanox (mlxsw):
      - add support for latency TLV (in FW control messages)
    - Microchip (sparx5):
      - separate explicit and implicit traffic forwarding rules, make
        the implicit rules always active
      - add support for egress DSCP rewrite
      - IS0 VCAP support (Ingress Classification)
      - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.)
      - ES2 VCAP support (Egress Access Control)
      - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1)
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - add MAB (port auth) offload support
      - enable PTP receive for mv88e6390
    - NXP (ocelot):
      - support MAC Merge layer
      - support for the the vsc7512 internal copper phys
    - Microchip:
      - lan9303: convert to PHYLINK
      - lan966x: support TC flower filter statistics
      - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
      - lan937x: support Credit Based Shaper configuration
      - ksz9477: support Energy Efficient Ethernet
    - other:
      - qca8k: convert to regmap read/write API, use bulk operations
      - rswitch: Improve TX timestamp accuracy
 
  - Intel WiFi (iwlwifi):
    - EHT (Wi-Fi 7) rate reporting
    - STEP equalizer support: transfer some STEP (connection to radio
      on platforms with integrated wifi) related parameters from the
      BIOS to the firmware.
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - IPQ5018 support
    - Fine Timing Measurement (FTM) responder role support
    - channel 177 support
 
  - MediaTek WiFi (mt76):
    - per-PHY LED support
    - mt7996: EHT (Wi-Fi 7) support
    - Wireless Ethernet Dispatch (WED) reset support
    - switch to using page pool allocator
 
  - RealTek WiFi (rtw89):
    - support new version of Bluetooth co-existance
 
  - Mobile:
    - rmnet: support TX aggregation.
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmP1VIYACgkQMUZtbf5S
 IrvsChAApz0rNL/sPKxXTEfxZ1tN7D3sYxYKQPomxvl5BV+MvicrLddJy3KmzEFK
 nnJNO3nuRNuH422JQ/ylZ4mGX1opa6+5QJb0UINImXUI7Fm8HHBIuPGkv7d5CheZ
 7JexFqjPJXUy9nPyh1Rra+IA9AcRd2U7jeGEZR38wb99bHJQj5Bzdk20WArEB0el
 n44aqg49LXH71bSeXRz77x5SjkwVtYiccQxLcnmTbjLU2xVraLvI2J+wAhHnVXWW
 9lrU1+V4Ex2Xcd1xR0L0cHeK+meP1TrPRAeF+JDpVI3a/zJiE7cZjfHdG/jH5xWl
 leZJqghVozrZQNtewWWO7XhUFhMDgFu3W/1vNLjSHPZEqaz1JpM67J1+ql6s63l4
 LMWoXbcYZz+SL9ZRCoPkbGue/5fKSHv8/Jl9Sh58+eTS+c/zgN8uFGRNFXLX1+EP
 n8uvt985PxMd6x1+dHumhOUzxnY4Sfi1vjitSunTsNFQ3Cmp4SO0IfBVJWfLUCuC
 xz5hbJGJJbSpvUsO+HWyCg83E5OWghRE/Onpt2jsQSZCrO9HDg4FRTEf3WAMgaqc
 edb5KfbRZPTJQM08gWdluXzSk1nw3FNP2tXW4XlgUrEbjb+fOk0V9dQg2gyYTxQ1
 Nhvn8ZQPi6/GMMELHAIPGmmW1allyOGiAzGlQsv8EmL+OFM6WDI=
 =xXhC
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Add dedicated kmem_cache for typical/small skb->head, avoid having
     to access struct page at kfree time, and improve memory use.

   - Introduce sysctl to set default RPS configuration for new netdevs.

   - Define Netlink protocol specification format which can be used to
     describe messages used by each family and auto-generate parsers.
     Add tools for generating kernel data structures and uAPI headers.

   - Expose all net/core sysctls inside netns.

   - Remove 4s sleep in netpoll if carrier is instantly detected on
     boot.

   - Add configurable limit of MDB entries per port, and port-vlan.

   - Continue populating drop reasons throughout the stack.

   - Retire a handful of legacy Qdiscs and classifiers.

  Protocols:

   - Support IPv4 big TCP (TSO frames larger than 64kB).

   - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
     on socket by socket basis.

   - Track and report in procfs number of MPTCP sockets used.

   - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
     manager.

   - IPv6: don't check net.ipv6.route.max_size and rely on garbage
     collection to free memory (similarly to IPv4).

   - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).

   - ICMP: add per-rate limit counters.

   - Add support for user scanning requests in ieee802154.

   - Remove static WEP support.

   - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
     reporting.

   - WiFi 7 EHT channel puncturing support (client & AP).

  BPF:

   - Add a rbtree data structure following the "next-gen data structure"
     precedent set by recently added linked list, that is, by using
     kfunc + kptr instead of adding a new BPF map type.

   - Expose XDP hints via kfuncs with initial support for RX hash and
     timestamp metadata.

   - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
     better support decap on GRE tunnel devices not operating in collect
     metadata.

   - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.

   - Remove the need for trace_printk_lock for bpf_trace_printk and
     bpf_trace_vprintk helpers.

   - Extend libbpf's bpf_tracing.h support for tracing arguments of
     kprobes/uprobes and syscall as a special case.

   - Significantly reduce the search time for module symbols by
     livepatch and BPF.

   - Enable cpumasks to be used as kptrs, which is useful for tracing
     programs tracking which tasks end up running on which CPUs in
     different time intervals.

   - Add support for BPF trampoline on s390x and riscv64.

   - Add capability to export the XDP features supported by the NIC.

   - Add __bpf_kfunc tag for marking kernel functions as kfuncs.

   - Add cgroup.memory=nobpf kernel parameter option to disable BPF
     memory accounting for container environments.

  Netfilter:

   - Remove the CLUSTERIP target. It has been marked as obsolete for
     years, and we still have WARN splats wrt races of the out-of-band
     /proc interface installed by this target.

   - Add 'destroy' commands to nf_tables. They are identical to the
     existing 'delete' commands, but do not return an error if the
     referenced object (set, chain, rule...) did not exist.

  Driver API:

   - Improve cpumask_local_spread() locality to help NICs set the right
     IRQ affinity on AMD platforms.

   - Separate C22 and C45 MDIO bus transactions more clearly.

   - Introduce new DCB table to control DSCP rewrite on egress.

   - Support configuration of Physical Layer Collision Avoidance (PLCA)
     Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
     shared medium Ethernet.

   - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
     preemption of low priority frames by high priority frames.

   - Add support for controlling MACSec offload using netlink SET.

   - Rework devlink instance refcounts to allow registration and
     de-registration under the instance lock. Split the code into
     multiple files, drop some of the unnecessarily granular locks and
     factor out common parts of netlink operation handling.

   - Add TX frame aggregation parameters (for USB drivers).

   - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
     messages with notifications for debug.

   - Allow offloading of UDP NEW connections via act_ct.

   - Add support for per action HW stats in TC.

   - Support hardware miss to TC action (continue processing in SW from
     a specific point in the action chain).

   - Warn if old Wireless Extension user space interface is used with
     modern cfg80211/mac80211 drivers. Do not support Wireless
     Extensions for Wi-Fi 7 devices at all. Everyone should switch to
     using nl80211 interface instead.

   - Improve the CAN bit timing configuration. Use extack to return
     error messages directly to user space, update the SJW handling,
     including the definition of a new default value that will benefit
     CAN-FD controllers, by increasing their oscillator tolerance.

  New hardware / drivers:

   - Ethernet:
      - nVidia BlueField-3 support (control traffic driver)
      - Ethernet support for imx93 SoCs
      - Motorcomm yt8531 gigabit Ethernet PHY
      - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
      - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
      - Amlogic gxl MDIO mux

   - WiFi:
      - RealTek RTL8188EU (rtl8xxxu)
      - Qualcomm Wi-Fi 7 devices (ath12k)

   - CAN:
      - Renesas R-Car V4H

  Drivers:

   - Bluetooth:
      - Set Per Platform Antenna Gain (PPAG) for Intel controllers.

   - Ethernet NICs:
      - Intel (1G, igc):
         - support TSN / Qbv / packet scheduling features of i226 model
      - Intel (100G, ice):
         - use GNSS subsystem instead of TTY
         - multi-buffer XDP support
         - extend support for GPIO pins to E823 devices
      - nVidia/Mellanox:
         - update the shared buffer configuration on PFC commands
         - implement PTP adjphase function for HW offset control
         - TC support for Geneve and GRE with VF tunnel offload
         - more efficient crypto key management method
         - multi-port eswitch support
      - Netronome/Corigine:
         - add DCB IEEE support
         - support IPsec offloading for NFP3800
      - Freescale/NXP (enetc):
         - support XDP_REDIRECT for XDP non-linear buffers
         - improve reconfig, avoid link flap and waiting for idle
         - support MAC Merge layer
      - Other NICs:
         - sfc/ef100: add basic devlink support for ef100
         - ionic: rx_push mode operation (writing descriptors via MMIO)
         - bnxt: use the auxiliary bus abstraction for RDMA
         - r8169: disable ASPM and reset bus in case of tx timeout
         - cpsw: support QSGMII mode for J721e CPSW9G
         - cpts: support pulse-per-second output
         - ngbe: add an mdio bus driver
         - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
         - r8152: handle devices with FW with NCM support
         - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
         - virtio-net: support multi buffer XDP
         - virtio/vsock: replace virtio_vsock_pkt with sk_buff
         - tsnep: XDP support

   - Ethernet high-speed switches:
      - nVidia/Mellanox (mlxsw):
         - add support for latency TLV (in FW control messages)
      - Microchip (sparx5):
         - separate explicit and implicit traffic forwarding rules, make
           the implicit rules always active
         - add support for egress DSCP rewrite
         - IS0 VCAP support (Ingress Classification)
         - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
           etc.)
         - ES2 VCAP support (Egress Access Control)
         - support for Per-Stream Filtering and Policing (802.1Q,
           8.6.5.1)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - add MAB (port auth) offload support
         - enable PTP receive for mv88e6390
      - NXP (ocelot):
         - support MAC Merge layer
         - support for the the vsc7512 internal copper phys
      - Microchip:
         - lan9303: convert to PHYLINK
         - lan966x: support TC flower filter statistics
         - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
         - lan937x: support Credit Based Shaper configuration
         - ksz9477: support Energy Efficient Ethernet
      - other:
         - qca8k: convert to regmap read/write API, use bulk operations
         - rswitch: Improve TX timestamp accuracy

   - Intel WiFi (iwlwifi):
      - EHT (Wi-Fi 7) rate reporting
      - STEP equalizer support: transfer some STEP (connection to radio
        on platforms with integrated wifi) related parameters from the
        BIOS to the firmware.

   - Qualcomm 802.11ax WiFi (ath11k):
      - IPQ5018 support
      - Fine Timing Measurement (FTM) responder role support
      - channel 177 support

   - MediaTek WiFi (mt76):
      - per-PHY LED support
      - mt7996: EHT (Wi-Fi 7) support
      - Wireless Ethernet Dispatch (WED) reset support
      - switch to using page pool allocator

   - RealTek WiFi (rtw89):
      - support new version of Bluetooth co-existance

   - Mobile:
      - rmnet: support TX aggregation"

* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
  page_pool: add a comment explaining the fragment counter usage
  net: ethtool: fix __ethtool_dev_mm_supported() implementation
  ethtool: pse-pd: Fix double word in comments
  xsk: add linux/vmalloc.h to xsk.c
  sefltests: netdevsim: wait for devlink instance after netns removal
  selftest: fib_tests: Always cleanup before exit
  net/mlx5e: Align IPsec ASO result memory to be as required by hardware
  net/mlx5e: TC, Set CT miss to the specific ct action instance
  net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
  net/mlx5: Refactor tc miss handling to a single function
  net/mlx5: Kconfig: Make tc offload depend on tc skb extension
  net/sched: flower: Support hardware miss to tc action
  net/sched: flower: Move filter handle initialization earlier
  net/sched: cls_api: Support hardware miss to tc action
  net/sched: Rename user cookie and act cookie
  sfc: fix builds without CONFIG_RTC_LIB
  sfc: clean up some inconsistent indentings
  net/mlx4_en: Introduce flexible array to silence overflow warning
  net: lan966x: Fix possible deadlock inside PTP
  net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
  ...
2023-02-21 18:24:12 -08:00
Linus Torvalds 5b0ed59649 for-6.3/block-2023-02-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPvfncQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpob2EADXJxcr2jjYHm/7cjKkyuVX8fr80dNdMeuY
 JFdsjG1k6Uj73BVhQQWYTcs/PsrWBHWRsv6uz4WgOELj55eXmf5Q0kJszyUeJW33
 /DjqLvtoppVcYf80xE13wKvCfn73BjwQo6xkGM0qAYn15eaXiD/Ax3xC6eJlsBeK
 PEw7EJyhacbSxZa/1D2B6+mqII1jUQWProTCc3udZ4JHi3WvdWa3Rda0qCqHl4a1
 +K2aP2YTFIRPxBzfMNa/CafWVIFubTdht+4Ds6R60RImzB9e0VUBfcsiUyW5Zg7L
 Fwv7ptXuWrALwVNdW56Oz1QikBxn2pdRR2HMLwKJW1MD8kP9r8LMm2jV5Rhiwe0B
 OQsGRYkOzBvw+bxeP5fvk0iPGVMz6ActH4gkraA5QdLqayDaFYOadlhqz0uRo5SH
 Fb42Vl658K/MHDSIk8U58TNkmrsIJsBGohXI9DOGINPPvv3XOPi4Q1HmXkGRmii0
 y+lNU/QEGh7xXXew29SPP76uQpQaYfC7NxXCMw/OpOMwehzjsjshmM2lpxi8zsgt
 PJUmfHv5qxCplNmTJXmUpmX7sS7550HUdu9FJb13DM+gzKg8bk9jWVuLrzqrVlG5
 1hKWEl1+heg1heRfaIuJVLbPI0au6Sb4uqhih/PHyrP9TWIoAruDbDJM65GKTxyE
 2uEgcHzHQw==
 =poRc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe updates via Christoph:
      - Small improvements to the logging functionality (Amit Engel)
      - Authentication cleanups (Hannes Reinecke)
      - Cleanup and optimize the DMA mapping cod in the PCIe driver
        (Keith Busch)
      - Work around the command effects for Format NVM (Keith Busch)
      - Misc cleanups (Keith Busch, Christoph Hellwig)
      - Fix and cleanup freeing single sgl (Keith Busch)

 - MD updates via Song:
      - Fix a rare crash during the takeover process
      - Don't update recovery_cp when curr_resync is ACTIVE
      - Free writes_pending in md_stop
      - Change active_io to percpu

 - Updates to drbd, inching us closer to unifying the out-of-tree driver
   with the in-tree one (Andreas, Christoph, Lars, Robert)

 - BFQ update adding support for multi-actuator drives (Paolo, Federico,
   Davide)

 - Make brd compliant with REQ_NOWAIT (me)

 - Fix for IOPOLL and queue entering, fixing stalled IO waiting on
   timeouts (me)

 - Fix for REQ_NOWAIT with multiple bios (me)

 - Fix memory leak in blktrace cleanup (Greg)

 - Clean up sbitmap and fix a potential hang (Kemeng)

 - Clean up some bits in BFQ, and fix a bug in the request injection
   (Kemeng)

 - Clean up the request allocation and issue code, and fix some bugs
   related to that (Kemeng)

 - ublk updates and fixes:
      - Add support for unprivileged ublk (Ming)
      - Improve device deletion handling (Ming)
      - Misc (Liu, Ziyang)

 - s390 dasd fixes (Alexander, Qiheng)

 - Improve utility of request caching and fixes (Anuj, Xiao)

 - zoned cleanups (Pankaj)

 - More constification for kobjs (Thomas)

 - blk-iocost cleanups (Yu)

 - Remove bio splitting from drivers that don't need it (Christoph)

 - Switch blk-cgroups to use struct gendisk. Some of this is now
   incomplete as select late reverts were done. (Christoph)

 - Add bvec initialization helpers, and convert callers to use that
   rather than open-coding it (Christoph)

 - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin,
   Matthew, Ulf, Zhong)

* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
  brd: use radix_tree_maybe_preload instead of radix_tree_preload
  block: use proper return value from bio_failfast()
  block: bio-integrity: Copy flags when bio_integrity_payload is cloned
  block: Fix io statistics for cgroup in throttle path
  brd: mark as nowait compatible
  brd: check for REQ_NOWAIT and set correct page allocation mask
  brd: return 0/-error from brd_insert_page()
  block: sync mixed merged request's failfast with 1st bio's
  Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
  Revert "blk-cgroup: pass a gendisk to blkg_lookup"
  Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
  Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
  Revert "blk-cgroup: move the cgroup information to struct gendisk"
  nvme-pci: remove iod use_sgls
  nvme-pci: fix freeing single sgl
  block: ublk: check IO buffer based on flag need_get_data
  s390/dasd: Fix potential memleak in dasd_eckd_init()
  s390/dasd: sort out physical vs virtual pointers usage
  block: Remove the ALLOC_CACHE_SLACK constant
  block: make kobj_type structures constant
  ...
2023-02-20 14:27:21 -08:00
Linus Torvalds 0e9fd589e6 block-6.2-2023-02-17
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPv7+wQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgkvD/95SElddH7D5GhwNlAjl+HiNjBg1Oaa+BiC
 6QMXz7x0/r+JG5xhzEC/w8Cxaghi841YyPDKHe+oE3Sy8yeA2bfFlJpHl7kwIcJZ
 WhHa5IutOftXdeY+DMiJGwKZOGuBDla4ca9fpK4SOF7NdTrLFQL46mvawNq/WGTE
 qQlXH3e9d7htfUq0AR5ZnnZ/nl2mBE1jRCajhTk29PT3Nje2Bo9lL6Kga2CysX4g
 Zflc5aXX3KsyfxAx0kEkrjkZJ8ukqKyR7yh+bp+fu2pgpbwI9STFVvopv7r4MH1f
 Y+2W1bCiwK/KneKNObbpzLGVRMfG5qv8z+pR6fpW4FRu0E5ZYDBE8JCG1dqg00K3
 dfGydR27FFtt9YWMmiHaIWD6rZnCwHDoZnV0LLiAJ/i+kvJkGvT8fL7435+kiEVX
 7u4lfce7GPuG2JErB7BnE+sG7J2dr+CuG/o7y7hTh+eKjWk+yTP6vn7QHka1Uj7X
 LesTUEh2Y2tic4hXh//jKM4KCJk5xMJ1TDJadg1RNmD3sGvNnu5IItzyQ3rV1ZyJ
 jqdppWxVUb+kRSRQEu/icN/rq4/f4jvPSV4DWb9NrzwCex3uzrMKW1j0oP9cMvQ0
 KYhq0i6iWjzbCSSkJckHFiPt6hgicEuQlIl0zxYxoYpVWJmjmqG8QrPDbLQtz8P8
 GViQGoiJ8w==
 =/7xW
 -----END PGP SIGNATURE-----

Merge tag 'block-6.2-2023-02-17' of git://git.kernel.dk/linux

Pull block fix from Jens Axboe:
 "I guess this is what can happen when you prep things early for going
  away, something else comes in last minute. This one fixes another
  regression in 6.2 for NVMe, from this release, and hence we should
  probably get it submitted for 6.2.

  Still waiting for the original reporter (see bugzilla linked in the
  commit) to test this, but Keith managed to setup and recreate the
  issue and tested the patch that way"

* tag 'block-6.2-2023-02-17' of git://git.kernel.dk/linux:
  nvme-pci: refresh visible attrs for cmb attributes
2023-02-18 09:56:58 -08:00
David S. Miller 675f176b4d Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Some of the devlink bits were tricky, but I think I got it right.

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-17 11:06:39 +00:00
Keith Busch e917a849c3 nvme-pci: refresh visible attrs for cmb attributes
The sysfs group containing the cmb attributes is registered before the
driver knows if they need to be visible or not. Update the group when
cmb attributes are known to exist so the visibility setting is correct.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217037
Fixes: 86adbf0cdb ("nvme: simplify transport specific device attribute handling")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-17 08:31:05 +01:00
Linus Torvalds d3d6f0eb08 block-6.2-2023-02-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPugusQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpm4sEACrMqKGTddM7rwxW0mUzZO3ywO2oS2rBfd1
 +f1xD395wB3w02kAg8Iz6DKP3cO1l7C/YVek0SwzG5U2nvAURaoGI+Mxle1FT7Ho
 OKf+QO2tBsiVXapOnCkAzCh7UQ4PcPHrbf/Kl41nQDkN3UDS2w3rFLNK7RtLD9F9
 hW/4mxNsyUkOQpCDRuzwFi2KhCvyDYQQOTb+9SMgm9rZtrCkEOZxAEE7ByHfb3SG
 aH0g39OcP7C1sD5Zs66tdYNUA/LldwF8BEJqmrwAkrsi2Vs+pForcinpyqS7ubX/
 oHLkRnZ2TSyvmTKJwtxdXrMVn9zMa6TJ2XbIC6JUlXV3B2geonHKL8JItaYF7bbq
 BUWL/oIY3gP7pO25oS6ZpZSmRUSBPx3AHy6v62dhpHu7T6Ayf6kt+X/YS/ab5EsP
 0ot49lgP9c1A6OQp0/JuSU/dKtbLPpz5MVShTo8oc1qtJGjgjD4kM6wu98DSUWVw
 QoreZSHyJxQWXkd/hrH6WD+zLRJCtUDssjANaQvpk2uAqE/0JDG+f3wQZXNcah9D
 DjFgXuwjKjdZuUWfMjWfdpvQ0b+8FgE1r6J1KGC7A7ZZCQWcTDjFM91LaKRRwPOI
 I+WdYe1PuXMUylIVyE4EoHVXqCcai1c7x+0Cui6M/OGytxdu4B4tgI34RQ1ZN0kp
 KV2Y8kNV9Q==
 =QFJG
 -----END PGP SIGNATURE-----

Merge tag 'block-6.2-2023-02-16' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
 "Just a few NVMe fixes that should go into the 6.2 release, adding a
  quirk and fixing two issues introduced in this release:

   - NVMe fixes via Christoph:
       - Always return an ERR_PTR from nvme_pci_alloc_dev (Irvin Cote)
       - Add bogus ID quirk for ADATA SX6000PNP (Daniel Wagner)
       - Set the DMA mask earlier (Christoph Hellwig)"

* tag 'block-6.2-2023-02-16' of git://git.kernel.dk/linux:
  nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev
  nvme-pci: set the DMA mask earlier
  nvme-pci: add bogus ID quirk for ADATA SX6000PNP
2023-02-16 12:05:33 -08:00
Keith Busch b6c0c237be nvme-pci: remove iod use_sgls
It's not used anywhere anymore, so remove it.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-14 06:40:57 +01:00
Keith Busch 8f0edf45bb nvme-pci: fix freeing single sgl
There may only be a single DMA mapped entry from multiple physical
segments, which means we don't allocate a separte SGL list. Check the
number of allocations prior to know if we need to free something.

Freeing a single list allocation is the same for both PRP and SGL
usages, so we don't need to check the use_sgl flag anymore.

Fixes: 01df742d8c ("nvme-pci: remove SGL segment descriptors")
Reported-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
2023-02-14 06:40:52 +01:00
Irvin Cote dc785d69d7 nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev
Don't mix NULL and ERR_PTR returns.

Fixes: 2e87570be9 ("nvme-pci: factor out a nvme_pci_alloc_dev helper")
Signed-off-by: Irvin Cote <irvin.cote@insa-lyon.fr>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-14 06:39:02 +01:00
Christoph Hellwig 924bd96ea2 nvme-pci: set the DMA mask earlier
Set the DMA mask before calling dma_addressing_limited, which depends on it.

Note that this stop checking the return value of dma_set_mask_and_coherent
as this function can only fail for masks < 32-bit.

Fixes: 3f30a79c2e ("nvme-pci: set constant paramters in nvme_pci_alloc_ctrl")
Reported-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Michael Kelley <mikelley@microsoft.com>
2023-02-14 06:36:40 +01:00
Daniel Wagner 5f69f009b7 nvme-pci: add bogus ID quirk for ADATA SX6000PNP
Yet another device which needs a quirk:

 nvme nvme1: globally duplicate IDs for nsid 1
 nvme nvme1: VID:DID 10ec:5763 model:ADATA SX6000PNP firmware:V9002s94

Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1207827
Reported-by: Gustavo Freitas <freitasmgustavo@gmail.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-13 07:04:53 +01:00