Six late arriving patches for the merge window. Five are minor
assorted fixes and updates. The IPR driver change removes SATA
support, which will now allow a major cleanup in the ATA subsystem
because it was the only driver still using the old attachment
mechanism. The driver is only used on power systems and SATA was used
to support a DVD device, which has long been moved to a different hba.
IBM chose this route instead of porting ipr to the newer SATA
interfaces.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZFZGfiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbgIAP9V4dKN
e5xLwf4q9+bKKCSCmlMrPeAnfJXyWdW6VWvWhQD9H4bgemsWhcUOztvpB2pOUpYx
JaJCgj/MAA0FSXhy3Bk=
=FwUd
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"Six late arriving patches for the merge window. Five are minor
assorted fixes and updates.
The IPR driver change removes SATA support, which will now allow a
major cleanup in the ATA subsystem because it was the only driver
still using the old attachment mechanism. The driver is only used on
power systems and SATA was used to support a DVD device, which has
long been moved to a different hba. IBM chose this route instead of
porting ipr to the newer SATA interfaces"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qedi: Fix use after free bug in qedi_remove()
scsi: ufs: core: mcq: Fix &hwq->cq_lock deadlock issue
scsi: ipr: Remove several unused variables
scsi: pm80xx: Log device registration
scsi: ipr: Remove SATA support
scsi: scsi_debug: Abort commands from scsi_debug_device_reset()
This pull request goes with only a few sysctl moves from the
kernel/sysctl.c file, the rest of the work has been put towards
deprecating two API calls which incur recursion and prevent us
from simplifying the registration process / saving memory per
move. Most of the changes have been soaking on linux-next since
v6.3-rc3.
I've slowed down the kernel/sysctl.c moves due to Matthew Wilcox's
feedback that we should see if we could *save* memory with these
moves instead of incurring more memory. We currently incur more
memory since when we move a syctl from kernel/sysclt.c out to its
own file we end up having to add a new empty sysctl used to register
it. To achieve saving memory we want to allow syctls to be passed
without requiring the end element being empty, and just have our
registration process rely on ARRAY_SIZE(). Without this, supporting
both styles of sysctls would make the sysctl registration pretty
brittle, hard to read and maintain as can be seen from Meng Tang's
efforts to do just this [0]. Fortunately, in order to use ARRAY_SIZE()
for all sysctl registrations also implies doing the work to deprecate
two API calls which use recursion in order to support sysctl
declarations with subdirectories.
And so during this development cycle quite a bit of effort went into
this deprecation effort. I've annotated the following two APIs are
deprecated and in few kernel releases we should be good to remove them:
* register_sysctl_table()
* register_sysctl_paths()
During this merge window we should be able to deprecate and unexport
register_sysctl_paths(), we can probably do that towards the end
of this merge window.
Deprecating register_sysctl_table() will take a bit more time but
this pull request goes with a few example of how to do this.
As it turns out each of the conversions to move away from either of
these two API calls *also* saves memory. And so long term, all these
changes *will* prove to have saved a bit of memory on boot.
The way I see it then is if remove a user of one deprecated call, it
gives us enough savings to move one kernel/sysctl.c out from the
generic arrays as we end up with about the same amount of bytes.
Since deprecating register_sysctl_table() and register_sysctl_paths()
does not require maintainer coordination except the final unexport
you'll see quite a bit of these changes from other pull requests, I've
just kept the stragglers after rc3.
Most of these changes have been soaking on linux-next since around rc3.
[0] https://lkml.kernel.org/r/ZAD+cpbrqlc5vmry@bombadil.infradead.org
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmRHAjQSHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinTzgQAI/uKHKi0VlUR1l2Psl0XbseUVueuyj3
ZDxSJpbVUmsoDf2MlLjzB8mYE3ricnNTDbLr7qOyA6pXdM1N0mY5LQmRVRu8/ffd
2T1hQ5pl7YnJdWP5dPhcF9Y+jnu1tjX1MW5DS4fzllwK7FnD86HuIruGq52RAPS/
/FH+BD9eodLWWXk6A/o2GFqoWxPKQI0GLxEYWa7Hg7yt8E/3PQL9QsRzn8i6U+HW
BrN/+G3YD1VCCzXu0UAeXnm+i1Z7CdvqNdZuSkvE3DObiZ5WpOS+/i7FrDB7zdiu
zAbHaifHnDPtcK3w2ZodbLAAwEWD/mG4iwIjE2kgIMVYxBv7TFDBRREXAWYAevIT
UUuZnWDQsGaWdjywrebaUycEfd6dytKyan0fTXgMFkcoWRjejhitfdM2iZDdQROg
q453p4HqOw4vTrhy4ov4zOX7J3EFiBzpZdl+SmLqcXk+jbLVb/Q9snUWz1AFtHBl
gHoP5bS82uVktGG3MsObjgTzYYMQjO9YGIrVuW1VP9uWs8WaoWx6M9FQJIIhtwE+
h6wG2s7CjuFWnS0/IxWmDOn91QyUn1w7ohiz9TuvYj/5GLSBpBDGCJHsNB5T2WS1
qbQRaZ2Kg3j9TeyWfXxdlxBx7bt3ni+J/IXDY0zom2sTpGHKl8D2g5AzmEXJDTpl
kd7Z3gsmwhDh
=0U0W
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"This only does a few sysctl moves from the kernel/sysctl.c file, the
rest of the work has been put towards deprecating two API calls which
incur recursion and prevent us from simplifying the registration
process / saving memory per move. Most of the changes have been
soaking on linux-next since v6.3-rc3.
I've slowed down the kernel/sysctl.c moves due to Matthew Wilcox's
feedback that we should see if we could *save* memory with these moves
instead of incurring more memory. We currently incur more memory since
when we move a syctl from kernel/sysclt.c out to its own file we end
up having to add a new empty sysctl used to register it. To achieve
saving memory we want to allow syctls to be passed without requiring
the end element being empty, and just have our registration process
rely on ARRAY_SIZE(). Without this, supporting both styles of sysctls
would make the sysctl registration pretty brittle, hard to read and
maintain as can be seen from Meng Tang's efforts to do just this [0].
Fortunately, in order to use ARRAY_SIZE() for all sysctl registrations
also implies doing the work to deprecate two API calls which use
recursion in order to support sysctl declarations with subdirectories.
And so during this development cycle quite a bit of effort went into
this deprecation effort. I've annotated the following two APIs are
deprecated and in few kernel releases we should be good to remove
them:
- register_sysctl_table()
- register_sysctl_paths()
During this merge window we should be able to deprecate and unexport
register_sysctl_paths(), we can probably do that towards the end of
this merge window.
Deprecating register_sysctl_table() will take a bit more time but this
pull request goes with a few example of how to do this.
As it turns out each of the conversions to move away from either of
these two API calls *also* saves memory. And so long term, all these
changes *will* prove to have saved a bit of memory on boot.
The way I see it then is if remove a user of one deprecated call, it
gives us enough savings to move one kernel/sysctl.c out from the
generic arrays as we end up with about the same amount of bytes.
Since deprecating register_sysctl_table() and register_sysctl_paths()
does not require maintainer coordination except the final unexport
you'll see quite a bit of these changes from other pull requests, I've
just kept the stragglers after rc3"
Link: https://lkml.kernel.org/r/ZAD+cpbrqlc5vmry@bombadil.infradead.org [0]
* tag 'sysctl-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (29 commits)
fs: fix sysctls.c built
mm: compaction: remove incorrect #ifdef checks
mm: compaction: move compaction sysctl to its own file
mm: memory-failure: Move memory failure sysctls to its own file
arm: simplify two-level sysctl registration for ctl_isa_vars
ia64: simplify one-level sysctl registration for kdump_ctl_table
utsname: simplify one-level sysctl registration for uts_kern_table
ntfs: simplfy one-level sysctl registration for ntfs_sysctls
coda: simplify one-level sysctl registration for coda_table
fs/cachefiles: simplify one-level sysctl registration for cachefiles_sysctls
xfs: simplify two-level sysctl registration for xfs_table
nfs: simplify two-level sysctl registration for nfs_cb_sysctls
nfs: simplify two-level sysctl registration for nfs4_cb_sysctls
lockd: simplify two-level sysctl registration for nlm_sysctls
proc_sysctl: enhance documentation
xen: simplify sysctl registration for balloon
md: simplify sysctl registration
hv: simplify sysctl registration
scsi: simplify sysctl registration with register_sysctl()
csky: simplify alignment sysctl registration
...
The summary of the changes for this pull requests is:
* Song Liu's new struct module_memory replacement
* Nick Alcock's MODULE_LICENSE() removal for non-modules
* My cleanups and enhancements to reduce the areas where we vmalloc
module memory for duplicates, and the respective debug code which
proves the remaining vmalloc pressure comes from userspace.
Most of the changes have been in linux-next for quite some time except
the minor fixes I made to check if a module was already loaded
prior to allocating the final module memory with vmalloc and the
respective debug code it introduces to help clarify the issue. Although
the functional change is small it is rather safe as it can only *help*
reduce vmalloc space for duplicates and is confirmed to fix a bootup
issue with over 400 CPUs with KASAN enabled. I don't expect stable
kernels to pick up that fix as the cleanups would have also had to have
been picked up. Folks on larger CPU systems with modules will want to
just upgrade if vmalloc space has been an issue on bootup.
Given the size of this request, here's some more elaborate details
on this pull request.
The functional change change in this pull request is the very first
patch from Song Liu which replaces the struct module_layout with a new
struct module memory. The old data structure tried to put together all
types of supported module memory types in one data structure, the new
one abstracts the differences in memory types in a module to allow each
one to provide their own set of details. This paves the way in the
future so we can deal with them in a cleaner way. If you look at changes
they also provide a nice cleanup of how we handle these different memory
areas in a module. This change has been in linux-next since before the
merge window opened for v6.3 so to provide more than a full kernel cycle
of testing. It's a good thing as quite a bit of fixes have been found
for it.
Jason Baron then made dynamic debug a first class citizen module user by
using module notifier callbacks to allocate / remove module specific
dynamic debug information.
Nick Alcock has done quite a bit of work cross-tree to remove module
license tags from things which cannot possibly be module at my request
so to:
a) help him with his longer term tooling goals which require a
deterministic evaluation if a piece a symbol code could ever be
part of a module or not. But quite recently it is has been made
clear that tooling is not the only one that would benefit.
Disambiguating symbols also helps efforts such as live patching,
kprobes and BPF, but for other reasons and R&D on this area
is active with no clear solution in sight.
b) help us inch closer to the now generally accepted long term goal
of automating all the MODULE_LICENSE() tags from SPDX license tags
In so far as a) is concerned, although module license tags are a no-op
for non-modules, tools which would want create a mapping of possible
modules can only rely on the module license tag after the commit
8b41fc4454 ("kbuild: create modules.builtin without Makefile.modbuiltin
or tristate.conf"). Nick has been working on this *for years* and
AFAICT I was the only one to suggest two alternatives to this approach
for tooling. The complexity in one of my suggested approaches lies in
that we'd need a possible-obj-m and a could-be-module which would check
if the object being built is part of any kconfig build which could ever
lead to it being part of a module, and if so define a new define
-DPOSSIBLE_MODULE [0]. A more obvious yet theoretical approach I've
suggested would be to have a tristate in kconfig imply the same new
-DPOSSIBLE_MODULE as well but that means getting kconfig symbol names
mapping to modules always, and I don't think that's the case today. I am
not aware of Nick or anyone exploring either of these options. Quite
recently Josh Poimboeuf has pointed out that live patching, kprobes and
BPF would benefit from resolving some part of the disambiguation as
well but for other reasons. The function granularity KASLR (fgkaslr)
patches were mentioned but Joe Lawrence has clarified this effort has
been dropped with no clear solution in sight [1].
In the meantime removing module license tags from code which could never
be modules is welcomed for both objectives mentioned above. Some
developers have also welcomed these changes as it has helped clarify
when a module was never possible and they forgot to clean this up,
and so you'll see quite a bit of Nick's patches in other pull
requests for this merge window. I just picked up the stragglers after
rc3. LWN has good coverage on the motivation behind this work [2] and
the typical cross-tree issues he ran into along the way. The only
concrete blocker issue he ran into was that we should not remove the
MODULE_LICENSE() tags from files which have no SPDX tags yet, even if
they can never be modules. Nick ended up giving up on his efforts due
to having to do this vetting and backlash he ran into from folks who
really did *not understand* the core of the issue nor were providing
any alternative / guidance. I've gone through his changes and dropped
the patches which dropped the module license tags where an SPDX
license tag was missing, it only consisted of 11 drivers. To see
if a pull request deals with a file which lacks SPDX tags you
can just use:
./scripts/spdxcheck.py -f \
$(git diff --name-only commid-id | xargs echo)
You'll see a core module file in this pull request for the above,
but that's not related to his changes. WE just need to add the SPDX
license tag for the kernel/module/kmod.c file in the future but
it demonstrates the effectiveness of the script.
Most of Nick's changes were spread out through different trees,
and I just picked up the slack after rc3 for the last kernel was out.
Those changes have been in linux-next for over two weeks.
The cleanups, debug code I added and final fix I added for modules
were motivated by David Hildenbrand's report of boot failing on
a systems with over 400 CPUs when KASAN was enabled due to running
out of virtual memory space. Although the functional change only
consists of 3 lines in the patch "module: avoid allocation if module is
already present and ready", proving that this was the best we can
do on the modules side took quite a bit of effort and new debug code.
The initial cleanups I did on the modules side of things has been
in linux-next since around rc3 of the last kernel, the actual final
fix for and debug code however have only been in linux-next for about a
week or so but I think it is worth getting that code in for this merge
window as it does help fix / prove / evaluate the issues reported
with larger number of CPUs. Userspace is not yet fixed as it is taking
a bit of time for folks to understand the crux of the issue and find a
proper resolution. Worst come to worst, I have a kludge-of-concept [3]
of how to make kernel_read*() calls for modules unique / converge them,
but I'm currently inclined to just see if userspace can fix this
instead.
[0] https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/
[1] https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com
[2] https://lwn.net/Articles/927569/
[3] https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmRG4m0SHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinQ2oP/0xlvKwJg6Ey8fHZF0qv8VOskE80zoLF
hMazU3xfqLA+1TQvouW1YBxt3jwS3t1Ehs+NrV+nY9Yzcm0MzRX/n3fASJVe7nRr
oqWWQU+voYl5Pw1xsfdp6C8IXpBQorpYby3Vp0MAMoZyl2W2YrNo36NV488wM9KC
jD4HF5Z6xpnPSZTRR7AgW9mo7FdAtxPeKJ76Bch7lH8U6omT7n36WqTw+5B1eAYU
YTOvrjRs294oqmWE+LeebyiOOXhH/yEYx4JNQgCwPdxwnRiGJWKsk5va0hRApqF/
WW8dIqdEnjsa84lCuxnmWgbcPK8cgmlO0rT0DyneACCldNlldCW1LJ0HOwLk9pea
p3JFAsBL7TKue4Tos6I7/4rx1ufyBGGIigqw9/VX5g0Iif+3BhWnqKRfz+p9wiMa
Fl7cU6u7yC68CHu1HBSisK16cYMCPeOnTSd89upHj8JU/t74O6k/ARvjrQ9qmNUt
c5U+OY+WpNJ1nXQydhY/yIDhFdYg8SSpNuIO90r4L8/8jRQYXNG80FDd1UtvVDuy
eq0r2yZ8C0XHSlOT9QHaua/tWV/aaKtyC/c0hDRrigfUrq8UOlGujMXbUnrmrWJI
tLJLAc7ePWAAoZXGSHrt0U27l029GzLwRdKqJ6kkDANVnTeOdV+mmBg9zGh3/Mp6
agiwdHUMVN7X
=56WK
-----END PGP SIGNATURE-----
Merge tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
"The summary of the changes for this pull requests is:
- Song Liu's new struct module_memory replacement
- Nick Alcock's MODULE_LICENSE() removal for non-modules
- My cleanups and enhancements to reduce the areas where we vmalloc
module memory for duplicates, and the respective debug code which
proves the remaining vmalloc pressure comes from userspace.
Most of the changes have been in linux-next for quite some time except
the minor fixes I made to check if a module was already loaded prior
to allocating the final module memory with vmalloc and the respective
debug code it introduces to help clarify the issue. Although the
functional change is small it is rather safe as it can only *help*
reduce vmalloc space for duplicates and is confirmed to fix a bootup
issue with over 400 CPUs with KASAN enabled. I don't expect stable
kernels to pick up that fix as the cleanups would have also had to
have been picked up. Folks on larger CPU systems with modules will
want to just upgrade if vmalloc space has been an issue on bootup.
Given the size of this request, here's some more elaborate details:
The functional change change in this pull request is the very first
patch from Song Liu which replaces the 'struct module_layout' with a
new 'struct module_memory'. The old data structure tried to put
together all types of supported module memory types in one data
structure, the new one abstracts the differences in memory types in a
module to allow each one to provide their own set of details. This
paves the way in the future so we can deal with them in a cleaner way.
If you look at changes they also provide a nice cleanup of how we
handle these different memory areas in a module. This change has been
in linux-next since before the merge window opened for v6.3 so to
provide more than a full kernel cycle of testing. It's a good thing as
quite a bit of fixes have been found for it.
Jason Baron then made dynamic debug a first class citizen module user
by using module notifier callbacks to allocate / remove module
specific dynamic debug information.
Nick Alcock has done quite a bit of work cross-tree to remove module
license tags from things which cannot possibly be module at my request
so to:
a) help him with his longer term tooling goals which require a
deterministic evaluation if a piece a symbol code could ever be
part of a module or not. But quite recently it is has been made
clear that tooling is not the only one that would benefit.
Disambiguating symbols also helps efforts such as live patching,
kprobes and BPF, but for other reasons and R&D on this area is
active with no clear solution in sight.
b) help us inch closer to the now generally accepted long term goal
of automating all the MODULE_LICENSE() tags from SPDX license tags
In so far as a) is concerned, although module license tags are a no-op
for non-modules, tools which would want create a mapping of possible
modules can only rely on the module license tag after the commit
8b41fc4454 ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf").
Nick has been working on this *for years* and AFAICT I was the only
one to suggest two alternatives to this approach for tooling. The
complexity in one of my suggested approaches lies in that we'd need a
possible-obj-m and a could-be-module which would check if the object
being built is part of any kconfig build which could ever lead to it
being part of a module, and if so define a new define
-DPOSSIBLE_MODULE [0].
A more obvious yet theoretical approach I've suggested would be to
have a tristate in kconfig imply the same new -DPOSSIBLE_MODULE as
well but that means getting kconfig symbol names mapping to modules
always, and I don't think that's the case today. I am not aware of
Nick or anyone exploring either of these options. Quite recently Josh
Poimboeuf has pointed out that live patching, kprobes and BPF would
benefit from resolving some part of the disambiguation as well but for
other reasons. The function granularity KASLR (fgkaslr) patches were
mentioned but Joe Lawrence has clarified this effort has been dropped
with no clear solution in sight [1].
In the meantime removing module license tags from code which could
never be modules is welcomed for both objectives mentioned above. Some
developers have also welcomed these changes as it has helped clarify
when a module was never possible and they forgot to clean this up, and
so you'll see quite a bit of Nick's patches in other pull requests for
this merge window. I just picked up the stragglers after rc3. LWN has
good coverage on the motivation behind this work [2] and the typical
cross-tree issues he ran into along the way. The only concrete blocker
issue he ran into was that we should not remove the MODULE_LICENSE()
tags from files which have no SPDX tags yet, even if they can never be
modules. Nick ended up giving up on his efforts due to having to do
this vetting and backlash he ran into from folks who really did *not
understand* the core of the issue nor were providing any alternative /
guidance. I've gone through his changes and dropped the patches which
dropped the module license tags where an SPDX license tag was missing,
it only consisted of 11 drivers. To see if a pull request deals with a
file which lacks SPDX tags you can just use:
./scripts/spdxcheck.py -f \
$(git diff --name-only commid-id | xargs echo)
You'll see a core module file in this pull request for the above, but
that's not related to his changes. WE just need to add the SPDX
license tag for the kernel/module/kmod.c file in the future but it
demonstrates the effectiveness of the script.
Most of Nick's changes were spread out through different trees, and I
just picked up the slack after rc3 for the last kernel was out. Those
changes have been in linux-next for over two weeks.
The cleanups, debug code I added and final fix I added for modules
were motivated by David Hildenbrand's report of boot failing on a
systems with over 400 CPUs when KASAN was enabled due to running out
of virtual memory space. Although the functional change only consists
of 3 lines in the patch "module: avoid allocation if module is already
present and ready", proving that this was the best we can do on the
modules side took quite a bit of effort and new debug code.
The initial cleanups I did on the modules side of things has been in
linux-next since around rc3 of the last kernel, the actual final fix
for and debug code however have only been in linux-next for about a
week or so but I think it is worth getting that code in for this merge
window as it does help fix / prove / evaluate the issues reported with
larger number of CPUs. Userspace is not yet fixed as it is taking a
bit of time for folks to understand the crux of the issue and find a
proper resolution. Worst come to worst, I have a kludge-of-concept [3]
of how to make kernel_read*() calls for modules unique / converge
them, but I'm currently inclined to just see if userspace can fix this
instead"
Link: https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/ [0]
Link: https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com [1]
Link: https://lwn.net/Articles/927569/ [2]
Link: https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org [3]
* tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (121 commits)
module: add debugging auto-load duplicate module support
module: stats: fix invalid_mod_bytes typo
module: remove use of uninitialized variable len
module: fix building stats for 32-bit targets
module: stats: include uapi/linux/module.h
module: avoid allocation if module is already present and ready
module: add debug stats to help identify memory pressure
module: extract patient module check into helper
modules/kmod: replace implementation with a semaphore
Change DEFINE_SEMAPHORE() to take a number argument
module: fix kmemleak annotations for non init ELF sections
module: Ignore L0 and rename is_arm_mapping_symbol()
module: Move is_arm_mapping_symbol() to module_symbol.h
module: Sync code of is_arm_mapping_symbol()
scripts/gdb: use mem instead of core_layout to get the module address
interconnect: remove module-related code
interconnect: remove MODULE_LICENSE in non-modules
zswap: remove MODULE_LICENSE in non-modules
zpool: remove MODULE_LICENSE in non-modules
x86/mm/dump_pagetables: remove MODULE_LICENSE in non-modules
...
Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening in
the driver core in the quest to be able to move "struct bus" and "struct
class" into read-only memory, a task now complete with these changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules for
all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most of
them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
LEGadNS38k5fs+73UaxV
=7K4B
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...
Core
----
- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances.
- Reduce compound page head access for zero-copy data transfers.
- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible.
- Threaded NAPI improvements, adding defer skb free support and unneeded
softirq avoidance.
- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking.
- Add lockless accesses annotation to sk_err[_soft].
- Optimize again the skb struct layout.
- Extends the skb drop reasons to make it usable by multiple
subsystems.
- Better const qualifier awareness for socket casts.
BPF
---
- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and variable-sized
accesses.
- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward.
- Add more precise memory usage reporting for all BPF map types.
- Adds support for using {FOU,GUE} encap with an ipip device operating
in collect_md mode and add a set of BPF kfuncs for controlling encap
params.
- Allow BPF programs to detect at load time whether a particular kfunc
exists or not, and also add support for this in light skeleton.
- Bigger batch of BPF verifier improvements to prepare for upcoming BPF
open-coded iterators allowing for less restrictive looping capabilities.
- Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
programs to NULL-check before passing such pointers into kfunc.
- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
local storage maps.
- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps.
- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree.
- Add BPF verifier support for ST instructions in convert_ctx_access()
which will help new -mcpu=v4 clang flag to start emitting them.
- Add ARM32 USDT support to libbpf.
- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations.
Protocols
---------
- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address.
- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition.
- Add the handshake upcall mechanism, allowing the user-space
to implement generic TLS handshake on kernel's behalf.
- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures.
- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers.
- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction.
- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore.
- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support
Netfilter
---------
- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged.
- Update bridge netfilter and ovs conntrack helpers to handle
IPv6 Jumbo packets properly, i.e. fetch the packet length
from hop-by-hop extension header. This is needed for BIT TCP
support.
- The iptables 32bit compat interface isn't compiled in by default
anymore.
- Move ip(6)tables builtin icmp matches to the udptcp one.
This has the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used.
- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device.
Driver API
----------
- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time.
- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them.
- Allow the page_pool to directly recycle the pages from safely
localized NAPI.
- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization.
- Add YNL support for user headers and struct attrs.
- Add partial YNL specification for devlink.
- Add partial YNL specification for ethtool.
- Add tc-mqprio and tc-taprio support for preemptible traffic classes.
- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device.
- Add basic LED support for switch/phy.
- Add NAPI documentation, stop relaying on external links.
- Convert dsa_master_ioctl() to netdev notifier. This is a preparatory
work to make the hardware timestamping layer selectable by user
space.
- Add transceiver support and improve the error messages for CAN-FD
controllers.
New hardware / drivers
----------------------
- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY
- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997
- Can:
- STMicroelectronics bxcan stm32f429
Drivers
-------
- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors.
- add support for configuring max SDU for each Tx queue.
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP
- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only
on shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll.
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame
- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates.
- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E
- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware
- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets
- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices
(e.g. MAC address from efuse)
- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility
- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmRI/mUSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkgO0QAJGxpuN67YgYV0BIM+/atWKEEexJYG7B
9MMpU4jMO3EW/pUS5t7VRsBLUybLYVPmqCZoHodObDfnu59jiPOegb6SikJv/ZwJ
Zw62PVk5MvDnQjlu4e6kDcGwkplteN08TlgI+a49BUTedpdFitrxHAYGW8f2fRO6
cK2XSld+ZucMoym5vRwf8yWS1BwdxnslPMxDJ+/8ZbWBZv44qAnG2vMB/kIx7ObC
Vel/4m6MzTwVsLYBsRvcwMVbNNlZ9GuhztlTzEbfGA4ZhTadIAMgb5VTWXB84Ws7
Aic5wTdli+q+x6/2cxhbyeoVuB9HHObYmLBAciGg4GNljP5rnQBY3X3+KVZ/x9TI
HQB7CmhxmAZVrO9pLARFV+ECrMTH2/dy3NyrZ7uYQ3WPOXJi8hJZjOTO/eeEGL7C
eTjdz0dZBWIBK2gON/6s4nExXVQUTEF2ZsPi52jTTClKjfe5pz/ddeFQIWaY1DTm
pInEiWPAvd28JyiFmhFNHsuIBCjX/Zqe2JuMfMBeBibDAC09o/OGdKJYUI15AiRf
F46Pdb7use/puqfrYW44kSAfaPYoBiE+hj1RdeQfen35xD9HVE4vdnLNeuhRlFF9
aQfyIRHYQofkumRDr5f8JEY66cl9NiKQ4IVW1xxQfYDNdC6wQqREPG1md7rJVMrJ
vP7ugFnttneg
=ITVa
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances
- Reduce compound page head access for zero-copy data transfers
- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
possible
- Threaded NAPI improvements, adding defer skb free support and
unneeded softirq avoidance
- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking
- Add lockless accesses annotation to sk_err[_soft]
- Optimize again the skb struct layout
- Extends the skb drop reasons to make it usable by multiple
subsystems
- Better const qualifier awareness for socket casts
BPF:
- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and
variable-sized accesses
- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward
- Add more precise memory usage reporting for all BPF map types
- Adds support for using {FOU,GUE} encap with an ipip device
operating in collect_md mode and add a set of BPF kfuncs for
controlling encap params
- Allow BPF programs to detect at load time whether a particular
kfunc exists or not, and also add support for this in light
skeleton
- Bigger batch of BPF verifier improvements to prepare for upcoming
BPF open-coded iterators allowing for less restrictive looping
capabilities
- Rework RCU enforcement in the verifier, add kptr_rcu and enforce
BPF programs to NULL-check before passing such pointers into kfunc
- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
in local storage maps
- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps
- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree
- Add BPF verifier support for ST instructions in
convert_ctx_access() which will help new -mcpu=v4 clang flag to
start emitting them
- Add ARM32 USDT support to libbpf
- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations
Protocols:
- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address
- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
- Add the handshake upcall mechanism, allowing the user-space to
implement generic TLS handshake on kernel's behalf
- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures
- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers
- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction
- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore
- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support
Netfilter:
- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged
- Update bridge netfilter and ovs conntrack helpers to handle IPv6
Jumbo packets properly, i.e. fetch the packet length from
hop-by-hop extension header. This is needed for BIT TCP support
- The iptables 32bit compat interface isn't compiled in by default
anymore
- Move ip(6)tables builtin icmp matches to the udptcp one. This has
the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used
- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device
Driver API:
- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time
- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them
- Allow the page_pool to directly recycle the pages from safely
localized NAPI
- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization
- Add YNL support for user headers and struct attrs
- Add partial YNL specification for devlink
- Add partial YNL specification for ethtool
- Add tc-mqprio and tc-taprio support for preemptible traffic classes
- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device
- Add basic LED support for switch/phy
- Add NAPI documentation, stop relaying on external links
- Convert dsa_master_ioctl() to netdev notifier. This is a
preparatory work to make the hardware timestamping layer selectable
by user space
- Add transceiver support and improve the error messages for CAN-FD
controllers
New hardware / drivers:
- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY
- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997
- Can:
- STMicroelectronics bxcan stm32f429
Drivers:
- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors
- add support for configuring max SDU for each Tx queue
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP
- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only on
shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame
- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates
- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E
- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware
- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets
- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices (e.g. MAC address from
efuse)
- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility
- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support"
* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
net: phy: hide the PHYLIB_LEDS knob
net: phy: marvell-88x2222: remove unnecessary (void*) conversions
tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
net: amd: Fix link leak when verifying config failed
net: phy: marvell: Fix inconsistent indenting in led_blink_set
lan966x: Don't use xdp_frame when action is XDP_TX
tsnep: Add XDP socket zero-copy TX support
tsnep: Add XDP socket zero-copy RX support
tsnep: Move skb receive action to separate function
tsnep: Add functions for queue enable/disable
tsnep: Rework TX/RX queue initialization
tsnep: Replace modulo operation with mask
net: phy: dp83867: Add led_brightness_set support
net: phy: Fix reading LED reg property
drivers: nfc: nfcsim: remove return value check of `dev_dir`
net: phy: dp83867: Remove unnecessary (void*) conversions
net: ethtool: coalesce: try to make user settings stick twice
net: mana: Check if netdev/napi_alloc_frag returns single page
net: mana: Rename mana_refill_rxoob and remove some empty lines
net: veth: add page_pool stats
...
Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target,
mpi3mr, hisi_sas, arcmsr). The major core change is the
constification of the host templates (which touches everything) along
with other minor fixups and clean ups.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZEmJACYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishU4FAP0WYhFC
rkbY203/+ErUuwvOKum0VwJKUowCaUD0MBwScAD+Ok/NWobmjdXUBbPUbvVkr+hE
8B/xs9hodX+1fVJcVG0=
=fS/j
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target,
mpi3mr, hisi_sas, arcmsr).
The major core change is the constification of the host templates
(which touches everything) along with other minor fixups and clean
ups"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
scsi: ufs: mcq: Use pointer arithmetic in ufshcd_send_command()
scsi: ufs: mcq: Annotate ufshcd_inc_sq_tail() appropriately
scsi: cxlflash: s/semahpore/semaphore/
scsi: lpfc: Silence an incorrect device output
scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation
scsi: scsi_debug: Fix missing error code in scsi_debug_init()
scsi: hisi_sas: Work around build failure in suspend function
scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup()
scsi: mpt3sas: Fix an issue when driver is being removed
scsi: mpt3sas: Remove HBA BIOS version in the kernel log
scsi: target: core: Fix invalid memory access
scsi: scsi_debug: Drop sdebug_queue
scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts
scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store()
scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued()
scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll()
scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd
scsi: scsi_debug: Use scsi_block_requests() to block queues
scsi: scsi_debug: Protect block_unblock_all_queues() with mutex
scsi: scsi_debug: Change shost list lock to a mutex
...
In qedi_probe() we call __qedi_probe() which initializes
&qedi->recovery_work with qedi_recovery_handler() and
&qedi->board_disable_work with qedi_board_disable_work().
When qedi_schedule_recovery_handler() is called, schedule_delayed_work()
will finally start the work.
In qedi_remove(), which is called to remove the driver, the following
sequence may be observed:
Fix this by finishing the work before cleanup in qedi_remove().
CPU0 CPU1
|qedi_recovery_handler
qedi_remove |
__qedi_remove |
iscsi_host_free |
scsi_host_put |
//free shost |
|iscsi_host_for_each_session
|//use qedi->shost
Cancel recovery_work and board_disable_work in __qedi_remove().
Fixes: 4b1068f5d7 ("scsi: qedi: Add MFW error recovery process")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Link: https://lore.kernel.org/r/20230413033422.28003-1-zyytlz.wz@163.com
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
gcc with W=1 reports
drivers/scsi/ipr.c: In function ‘ipr_init_res_entry’:
drivers/scsi/ipr.c:1104:22: error: variable ‘proto’
set but not used [-Werror=unused-but-set-variable]
1104 | unsigned int proto;
| ^~~~~
drivers/scsi/ipr.c: In function ‘ipr_update_res_entry’:
drivers/scsi/ipr.c:1261:22: error: variable ‘proto’
set but not used [-Werror=unused-but-set-variable]
1261 | unsigned int proto;
| ^~~~~
drivers/scsi/ipr.c: In function ‘ipr_change_queue_depth’:
drivers/scsi/ipr.c:4417:36: error: variable ‘res’
set but not used [-Werror=unused-but-set-variable]
4417 | struct ipr_resource_entry *res;
| ^~~
These variables are not used, so remove them. The lock around res is not
needed so remove that. This makes ioa_cfg and lock_flags unneeded so remove
them as well.
Fixes: 65a15d6560 ("scsi: ipr: Remove SATA support")
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230420125035.3888188-1-trix@redhat.com
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Log combination of phy_id and device_id in device registration response.
Signed-off-by: Akshat Jain <akshatzen@google.com>
Signed-off-by: Pranav Prasad <pranavpp@google.com>
Link: https://lore.kernel.org/r/20230411230650.1760757-1-pranavpp@google.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Linux SATA support in ipr has always been limited to SATA DVDs. The last
systems that had the option of including a SATA DVD was Power 8, which have
been withdrawn for some time now, so this support can be removed.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20230412174015.114764-1-brking@linux.vnet.ibm.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently scsi_debug_device_reset() does not do much apart from setting the
SDEBUG_UA_POR ("Power on, reset, or bus device reset") flag, which is
eventually passed back to the SCSI midlayer later for a "unit attention"
command.
There is a report that blktest scsi/007 test fails due to commit
1107c7b24e ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd").
The problem there is that there are dangling scsi_debug queued commands
when we attempt to remove the driver.
scsi/007 test triggers SCSI EH and attempts to abort a timed-out command.
Function scsi_debug_device_reset() is called as part of the EH, but does
not deal with outstanding erroneous command. Prior to the named commit,
removing the driver caused all dangling queued commands to be stopped -
this should have not been necessary.
Fix by aborting outstanding commands on a scsi_device basis from
scsi_debug_device_reset().
Fixes: 1107c7b24e ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd")
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202304071111.e762fcbd-yujie.liu@intel.com
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230416175654.159163-1-john.g.garry@oracle.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fundamentally semaphores are a counted primitive, but
DEFINE_SEMAPHORE() does not expose this and explicitly creates a
binary semaphore.
Change DEFINE_SEMAPHORE() to take a number argument and use that in the
few places that open-coded it using __SEMAPHORE_INITIALIZER().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[mcgrof: add some tribal knowledge about why some folks prefer
binary sempahores over mutexes]
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
One small fix to SCSI Enclosure Services to fix a regression caused by
another recent fix.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZDqyfiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZAiAQCtjwkq
7ZGxoyEI6LCHLx60UXhlwFRyDZ2gooSkjTt34gEA5timHhFNnF4/dgQbRn7EfYRs
lUlUK+4t6zJ23VtjYhg=
=Ze4S
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One small fix to SCSI Enclosure Services to fix a regression caused by
another recent fix"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ses: Handle enclosure with just a primary component gracefully
register_sysctl_table() is a deprecated compatibility wrapper.
register_sysctl() can do the directory creation for you so just use that.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
In lpfc_sli4_pci_mem_unset(), case LPFC_SLI_INTF_IF_TYPE_1 does not have a
break statement, resulting in an incorrect device output.
Fix this by adding a break statement before the default option.
Signed-off-by: Jun Chen <jun_c@hust.edu.cn>
Link: https://lore.kernel.org/r/20230410023724.3209455-1-jun_c@hust.edu.cn
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver uses spin lock without irqsave when it needs to acquire a chain
frame. This is done to protect chain frame allocation from multiple
submission threads. If there is any I/O queued from an interrupt context,
and if that requires a chain frame, and if the chain lock is held by the CPU
which got interrupted, then there will be a possible deadlock.
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230406101819.10109-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Smatch reports: drivers/scsi/scsi_debug.c:6996
scsi_debug_init() warn: missing error code 'ret'
Although it is unlikely that KMEM_CACHE might fail, but if it does then ret
might be zero. So to fix this explicitly mark ret as "-ENOMEM" and then
goto driver_unreg.
Fixes: 1107c7b24e ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20230406074607.3637097-1-harshit.m.mogalapalli@oracle.com
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The suspend/resume functions in this driver seem to have multiple problems,
the latest one just got introduced by a bugfix:
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw':
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count'
5142 | if (atomic_read(&device->power.usage_count)) {
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw':
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count'
5142 | if (atomic_read(&device->power.usage_count)) {
As far as I can tell, the 'usage_count' is not meant to be accessed by
device drivers at all, though I don't know what the driver is supposed to
do instead.
Another problem is the use of the deprecated UNIVERSAL_DEV_PM_OPS(), and
marking functions as __maybe_unused to avoid warnings about unused
functions. This should probably be changed to using
DEFINE_RUNTIME_DEV_PM_OPS().
Both changes require actually understanding what the driver needs to do,
and being able to test this, so instead here is the simplest patch to make
it pass the randconfig builds instead.
Fixes: e368d38cb9 ("scsi: hisi_sas: Exit suspend state when usage count is greater than 0")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230405083611.3376739-1-arnd@kernel.org
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This reverts commit 3fe97ff3d9 ("scsi: ses: Don't attach if enclosure
has no components") and introduces proper handling of case where there are
no detected secondary components, but primary component (enumerated in
num_enclosures) does exist. That fix was originally proposed by Ding Hui
<dinghui@sangfor.com.cn>.
Completely ignoring devices that have one primary enclosure and no
secondary one results in ses_intf_add() bailing completely
scsi 2:0:0:254: enclosure has no enumerated components
scsi 2:0:0:254: Failed to bind enclosure -12ven in valid configurations such
even on valid configurations with 1 primary and 0 secondary enclosures as
below:
# sg_ses /dev/sg0
3PARdata SES 3321
Supported diagnostic pages:
Supported Diagnostic Pages [sdp] [0x0]
Configuration (SES) [cf] [0x1]
Short Enclosure Status (SES) [ses] [0x8]
# sg_ses -p cf /dev/sg0
3PARdata SES 3321
Configuration diagnostic page:
number of secondary subenclosures: 0
generation code: 0x0
enclosure descriptor list
Subenclosure identifier: 0 [primary]
relative ES process id: 0, number of ES processes: 1
number of type descriptor headers: 1
enclosure logical identifier (hex): 20000002ac02068d
enclosure vendor: 3PARdata product: VV rev: 3321
type descriptor header and text list
Element type: Unspecified, subenclosure id: 0
number of possible elements: 1
The changelog for the original fix follows
=====
We can get a crash when disconnecting the iSCSI session,
the call trace like this:
[ffff00002a00fb70] kfree at ffff00000830e224
[ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
[ffff00002a00fbd0] device_del at ffff0000086b6a98
[ffff00002a00fc50] device_unregister at ffff0000086b6d58
[ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
[ffff00002a00fca0] scsi_remove_device at ffff000008706134
[ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
[ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
[ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
[ffff00002a00fdb0] process_one_work at ffff00000810f35c
[ffff00002a00fe00] worker_thread at ffff00000810f648
[ffff00002a00fe70] kthread at ffff000008116e98
In ses_intf_add, components count could be 0, and kcalloc 0 size scomp,
but not saved in edev->component[i].scratch
In this situation, edev->component[0].scratch is an invalid pointer,
when kfree it in ses_intf_remove_enclosure, a crash like above would happen
The call trace also could be other random cases when kfree cannot catch
the invalid pointer
We should not use edev->component[] array when the components count is 0
We also need check index when use edev->component[] array in
ses_enclosure_data_process
=====
Reported-by: Michal Kolar <mich.k@seznam.cz>
Originally-by: Ding Hui <dinghui@sangfor.com.cn>
Cc: stable@vger.kernel.org
Fixes: 3fe97ff3d9 ("scsi: ses: Don't attach if enclosure has no components")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2304042122270.29760@cbobk.fhfr.pm
Tested-by: Michal Kolar <mich.k@seznam.cz>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When if_type equals zero and pci_resource_start(pdev, PCI_64BIT_BAR4)
returns false, drbl_regs_memmap_p is not remapped. This passes a NULL
pointer to iounmap(), which can trigger a WARN() on certain arches.
When if_type equals six and pci_resource_start(pdev, PCI_64BIT_BAR4)
returns true, drbl_regs_memmap_p may has been remapped and
ctrl_regs_memmap_p is not remapped. This is a resource leak and passes a
NULL pointer to iounmap().
To fix these issues, we need to add null checks before iounmap(), and
change some goto labels.
Fixes: 1351e69fc6 ("scsi: lpfc: Add push-to-adapter support to sli4")
Signed-off-by: Shuchang Li <lishuchang@hust.edu.cn>
Link: https://lore.kernel.org/r/20230404072133.1022-1-lishuchang@hust.edu.cn
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Warnings may be logged during driver removal:
mpt3sas 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT ..,
Fix this by deallocating DMA memory later.
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Link: https://lore.kernel.org/r/20230403184736.6399-1-thenzl@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is done to avoid ambiguity between BIOS and UEFI versions. Management
tools can be used for getting accurate firmware version information.
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230322092713.6961-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Four small fixes, all in drivers. They're all one or two lines except
for the ufs one, but that's a simple revert of a previous feature.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZDB8byYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZ4VAQCBB8jH
uxZVFoYfnhNOZxS8PQyW9JEy/NImLD7HVYIWNwEAwP8gNMsznlHAlwVQB6FVgf/d
vchWKg6mG0IK1ALN7Eg=
=j4gB
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four small fixes, all in drivers. They're all one or two lines except
for the ufs one, but that's a simple revert of a previous feature"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()
scsi: qla2xxx: Fix memory leak in qla2x00_probe_one()
scsi: mpi3mr: Handle soft reset in progress fault code (0xF002)
scsi: Revert "scsi: ufs: core: Initialize devfreq synchronously"
The add_dev and remove_dev callbacks in struct class_interface currently
pass in a pointer back to the class_interface structure that is calling
them, but none of the callback implementations actually use this pointer
as it is pointless (the structure is known, the driver passed it in in
the first place if it is really needed again.)
So clean this up and just remove the pointer from the callbacks and fix
up all callback functions.
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Allen Hubbe <allenbh@gmail.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: John Stultz <jstultz@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wang Weiyang <wangweiyang2@huawei.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Cc: Jakob Koschel <jakobkoschel@gmail.com>
Cc: Cai Xinchen <caixinchen1@huawei.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://lore.kernel.org/r/2023040250-pushover-platter-509c@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
John Garry <john.g.garry@oracle.com> says:
It's easy to get scsi_debug to error on throughput testing when we have
multiple shosts:
$ lsscsi
[7:0:0:0] disk Linux scsi_debug 0191
[0:0:0:0] disk Linux scsi_debug 0191
$ fio --filename=/dev/sda --filename=/dev/sdb --direct=1 --rw=read
--bs=4k --iodepth=256 --runtime=60 --numjobs=40 --time_based --name=jpg
--eta-newline=1 --readonly --ioengine=io_uring --hipri --exitall_on_error
jpg: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
...
fio-3.28
Starting 40 processes
[ 27.521809] hrtimer: interrupt took 33067 ns
[ 27.904660] sd 7:0:0:0: [sdb] tag#171 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
[ 27.904660] sd 0:0:0:0: [sda] tag#58 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
fio: io_u error [ 27.904667] sd 0:0:0:0: [sda] tag#58 CDB: Read(10) 28 00 00 00 27 00 00 01 18 00
on file /dev/sda[ 27.904670] sd 0:0:0:0: [sda] tag#62 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
The issue is related to how the driver manages submit queues and tags. A
single array of submit queues - sdebug_q_arr - with its own set of tags is
shared among all shosts. As such, for occasions when we have more than one
host it is possible to overload the submit queues and run out of tags.
Another separate issue that we may reduce the shost submit queue depth,
sdebug_max_queue, dynamically causing the shost to be overloaded. How many
IOs which the shost may be sent is fixed at can_queue at init time, which
is the same initial value for sdebug_max_queue. So reducing
sdebug_max_queue means that the shost may be sent more IOs than it is
configured to handle, causing overloading.
This series removes the scsi_debug submit queue concept and uses
pre-existing APIs to manage and examine tags, like scsi_block_requests()
and blk_mq_tagset_busy_iter(). Using standard APIs makes the driver more
maintainable and extensible in future.
A restriction is also added to allow sdebug_max_queue only be modified when
no shosts are present, i.e. we need to remove shosts, modify
sdebug_max_queue, and then re-add the shosts.
Link: https://lore.kernel.org/r/20230327074310.1862889-1-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It's easy to get scsi_debug to error on throughput testing when we have
multiple shosts:
$ lsscsi
[7:0:0:0] disk Linux scsi_debug 0191
[0:0:0:0] disk Linux scsi_debug 0191
$ fio --filename=/dev/sda --filename=/dev/sdb --direct=1 --rw=read --bs=4k
--iodepth=256 --runtime=60 --numjobs=40 --time_based --name=jpg
--eta-newline=1 --readonly --ioengine=io_uring --hipri --exitall_on_error
jpg: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
...
fio-3.28
Starting 40 processes
[ 27.521809] hrtimer: interrupt took 33067 ns
[ 27.904660] sd 7:0:0:0: [sdb] tag#171 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
[ 27.904660] sd 0:0:0:0: [sda] tag#58 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
fio: io_u error [ 27.904667] sd 0:0:0:0: [sda] tag#58 CDB: Read(10) 28 00 00 00 27 00 00 01 18 00
on file /dev/sda[ 27.904670] sd 0:0:0:0: [sda] tag#62 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
The issue is related to how the driver manages submit queues and tags. A
single array of submit queues - sdebug_q_arr - with its own set of tags is
shared among all shosts. As such, for occasions when we have more than one
shost it is possible to overload the submit queues and run out of tags.
The struct sdebug_queue is to manage tags and hold the associated
queued command entry pointer (for that tag).
Since the tagset iters are now used for functions like
sdebug_blk_mq_poll(), there is no need to manage these queues. Indeed,
blk-mq already provides what we need for managing tags and queues.
Drop sdebug_queue and all its usage in the driver.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-12-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The shost->can_queue value is initially used to set per-HW queue context
tag depth in the block layer. This ensures that the shost is not sent too
many commands which it can deal with. However lowering sdebug_max_queue
separately means that we can easily overload the shost, as in the following
example:
$ cat /sys/bus/pseudo/drivers/scsi_debug/max_queue
192
$ cat /sys/class/scsi_host/host0/can_queue
192
$ echo 100 > /sys/bus/pseudo/drivers/scsi_debug/max_queue
$ cat /sys/class/scsi_host/host0/can_queue
192
$ fio --filename=/dev/sda --direct=1 --rw=read --bs=4k --iodepth=256
--runtime=1200 --numjobs=10 --time_based --group_reporting
--name=iops-test-job --eta-newline=1 --readonly --ioengine=io_uring
--hipri --exitall_on_error
iops-test-job: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
...
fio-3.28
Starting 10 processes
[ 111.269885] scsi_io_completion_action: 400 callbacks suppressed
[ 111.269885] blk_print_req_error: 400 callbacks suppressed
[ 111.269889] I/O error, dev sda, sector 440 op 0x0:(READ) flags 0x1200000 phys_seg 1 prio class 2
[ 111.269892] sd 0:0:0:0: [sda] tag#132 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
[ 111.269897] sd 0:0:0:0: [sda] tag#132 CDB: Read(10) 28 00 00 00 01 68 00 00 08 00
[ 111.277058] I/O error, dev sda, sector 360 op 0x0:(READ) flags 0x1200000 phys_seg 1 prio class 2
[...]
Ensure that this cannot happen by allowing sdebug_max_queue be modified
only when we have no shosts. As such, any shost->can_queue value will match
sdebug_max_queue, and sdebug_max_queue cannot be modified separately.
Since retired_max_queue is no longer set, remove support.
Continue to apply the restriction that sdebug_host_max_queue cannot be
modified when sdebug_host_max_queue is set. Adding support for that would
mean extra code, and no one has complained about this restriction
previously.
A command like the following may be used to remove a shost:
echo -1 > /sys/bus/pseudo/drivers/scsi_debug/add_host
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-11-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The functions to update ndelay and delay value first check whether we have
any in-flight IO for any host. It does this by checking if any tag is used
in the global submit queues.
We can achieve the same by setting the host as blocked and then ensuring
that we have no in-flight commands with scsi_host_busy().
Note that scsi_host_busy() checks SCMD_STATE_INFLIGHT flag, which is only
set per command after we ensure that the host is not blocked, i.e. we see
more commands active after the check for scsi_host_busy() returns 0.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-10-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of iterating all deferred commands in the submission queue
structures, use blk_mq_tagset_busy_iter(), which is a standard API for
this.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-9-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of iterating all deferred commands in the submission queue
structures, use blk_mq_tagset_busy_iter(), which is a standard API for
this.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-8-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Eventually we will drop the sdebug_queue struct as it is not really
required, so start with making the sdebug_queued_cmd dynamically allocated
for the lifetime of the scsi_cmnd in the driver.
As an interim measure, make sdebug_queued_cmd.sd_dp a pointer to struct
sdebug_defer. Also keep a value of the index allocated in
sdebug_queued_cmd.qc_arr in struct sdebug_queued_cmd.
To deal with an races in accessing the scsi cmnd allocated struct
sdebug_queued_cmd, add a spinlock for the scsi command in its priv area.
Races may be between scheduling a command for completion, aborting a
command, and the command actually completing and freeing the struct
sdebug_queued_cmd.
[mkp: typo fix]
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-7-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The feature to block queues is quite dubious, since it races with in-flight
IO. Indeed, it seems unnecessary for block queues for any times we do so.
Anyway, to keep the same behaviour, use standard SCSI API to stop IO being
sent - scsi_{un}block_requests().
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-6-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is no reason that calls to block_unblock_all_queues() from different
context can't race with one another, so protect with the
sdebug_host_list_mutex. There's no need for a more fine-grained per shost
locking here (and we don't have a per-host lock anyway).
Also simplify some touched code in sdebug_change_qdepth().
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-5-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The shost list lock, sdebug_host_list_lock, is a spinlock. We would only
lock in non-atomic context in this driver, so use a mutex instead, which is
friendlier if we need to schedule when iterating.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-4-john.g.garry@oracle.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In clear_luns_changed_on_target(), we iter all devices for all shosts to
conditionally clear the SDEBUG_UA_LUNS_CHANGED flag in the per-device
uas_bm.
One condition to see whether we clear the flag is to test whether the host
for the device under consideration is the same as the matching device's
(devip) host. This check will only ever pass for devices for the same
shost, so only iter the devices for the matching device shost.
We can now drop the spinlock'ing of the sdebug_host_list_lock in the same
function. This will allow us to use a mutex instead of the spinlock for the
global shost lock, as clear_luns_changed_on_target() could be called in
non-blocking context, in scsi_debug_queuecommand() -> make_ua() ->
clear_luns_changed_on_target() (which is why required a spinlock).
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-3-john.g.garry@oracle.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a report that the blktests scsi/004 test for "TASK SET FULL" (TSF)
now fails.
The condition upon we should issue this TSF is when the sdev queue is
full. The check for a full queue has an off-by-1 error. Previously we would
increment the number of requests in the queue after testing if the queue
would be full, i.e. test if one less than full. Since we now use
scsi_device_busy() to count the number of requests in the queue, this would
already account for the current request, so fix the test for queue full
accordingly.
Fixes: 151f0ec9dd ("scsi: scsi_debug: Drop sdebug_dev_info.num_in_q")
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202303201334.18b30edc-oliver.sang@intel.com
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-2-john.g.garry@oracle.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
chenxiang <chenxiang66@hisilicon.com> says:
This series contain some fixes including:
- Grab sas_dev lock when traversing sas_dev list to avoid NULL
pointer
- Handle NCQ error when IPTT is valid
- Ensure all enabled PHYs up during controller reset
- Exit suspend state when usage count of runtime PM is greater than 0
https://lore.kernel.org/r/1679283265-115066-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the current status of the host controller is suspended, enabling a
local PHY just after disabling all local PHYs in expander environment, a
hang as follows occurs:
[ 486.854655] INFO: task kworker/u256:1:899 blocked for more than 120 seconds.
[ 486.862207] Not tainted 6.1.0-rc4+ #1
[ 486.870545] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 486.878893] task:kworker/u256:1 state:D stack:0 pid:899 ppid:2 flags:0x00000008
[ 486.887745] Workqueue: 0000:74:02.0_disco_q sas_discover_domain [libsas]
[ 486.894704] Call trace:
[ 486.897400] __switch_to+0xf0/0x170
[ 486.901146] __schedule+0x3e4/0x1160
[ 486.904970] schedule+0x64/0x104
[ 486.908442] rpm_resume+0x158/0x6a0
[ 486.912163] __pm_runtime_resume+0x5c/0x84
[ 486.916489] smp_execute_task_sg+0x1f8/0x264 [libsas]
[ 486.921773] sas_discover_expander.part.0+0xbc/0x720 [libsas]
[ 486.927750] sas_discover_root_expander+0x90/0x154 [libsas]
[ 486.933552] sas_discover_domain+0x444/0x6d0 [libsas]
[ 486.938826] process_one_work+0x1e0/0x450
[ 486.943057] worker_thread+0x150/0x44c
[ 486.947015] kthread+0x114/0x120
[ 486.950447] ret_from_fork+0x10/0x20
[ 486.954292] INFO: task kworker/u256:2:1780 blocked for more than 120 seconds.
[ 486.961637] Not tainted 6.1.0-rc4+ #1
[ 486.966087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 486.974356] task:kworker/u256:2 state:D stack:0 pid:1780 ppid:2 flags:0x00000208
[ 486.983141] Workqueue: 0000:74:02.0_event_q sas_port_event_worker [libsas]
[ 486.990252] Call trace:
[ 486.992930] __switch_to+0xf0/0x170
[ 486.996645] __schedule+0x3e4/0x1160
[ 487.000439] schedule+0x64/0x104
[ 487.003886] schedule_timeout+0x17c/0x1c0
[ 487.008102] wait_for_completion+0x7c/0x160
[ 487.012488] __flush_workqueue+0x104/0x3e0
[ 487.016782] sas_porte_bytes_dmaed+0x414/0x454 [libsas]
[ 487.022203] sas_port_event_worker+0x38/0x60 [libsas]
[ 487.027449] process_one_work+0x1e0/0x450
[ 487.031645] worker_thread+0x150/0x44c
[ 487.035594] kthread+0x114/0x120
[ 487.039017] ret_from_fork+0x10/0x20
[ 487.042828] INFO: task bash:11488 blocked for more than 121 seconds.
[ 487.049366] Not tainted 6.1.0-rc4+ #1
[ 487.053746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 487.061953] task:bash state:D stack:0 pid:11488 ppid:10977 flags:0x00000204
[ 487.070698] Call trace:
[ 487.073355] __switch_to+0xf0/0x170
[ 487.077050] __schedule+0x3e4/0x1160
[ 487.080833] schedule+0x64/0x104
[ 487.084270] schedule_timeout+0x17c/0x1c0
[ 487.088474] wait_for_completion+0x7c/0x160
[ 487.092851] __flush_workqueue+0x104/0x3e0
[ 487.097137] drain_workqueue+0xb8/0x160
[ 487.101159] __sas_drain_work+0x50/0x90 [libsas]
[ 487.105963] sas_suspend_ha+0x64/0xd4 [libsas]
[ 487.110590] suspend_v3_hw+0x198/0x1e8 [hisi_sas_v3_hw]
[ 487.115989] pci_pm_runtime_suspend+0x5c/0x1d0
[ 487.120606] __rpm_callback+0x50/0x150
[ 487.124535] rpm_callback+0x74/0x80
[ 487.128204] rpm_suspend+0x110/0x640
[ 487.131955] rpm_idle+0x1f4/0x2d0
[ 487.135447] __pm_runtime_idle+0x58/0x94
[ 487.139538] queue_phy_enable+0xcc/0xf0 [libsas]
[ 487.144330] store_sas_phy_enable+0x74/0x100
[ 487.148770] dev_attr_store+0x20/0x34
[ 487.152606] sysfs_kf_write+0x4c/0x5c
[ 487.156437] kernfs_fop_write_iter+0x120/0x1b0
[ 487.161049] vfs_write+0x2d0/0x36c
[ 487.164625] ksys_write+0x70/0x100
[ 487.168194] __arm64_sys_write+0x24/0x30
[ 487.172280] invoke_syscall+0x50/0x120
[ 487.176186] el0_svc_common.constprop.0+0x168/0x190
[ 487.181214] do_el0_svc+0x34/0xc0
[ 487.184680] el0_svc+0x2c/0xb4
[ 487.187879] el0t_64_sync_handler+0xb8/0xbc
[ 487.192205] el0t_64_sync+0x19c/0x1a0
We find that when all local PHYs are disabled, all the devices will be
removed, the ->runtime_suspend() callback suspend_v3_hw() directly execute
since the controller usage count drop to 0. On the other side, the first
local PHY is enabled through the sysfs interface, and ensures that function
phy_up_v3_hw() is completed due to suspend_v3_hw()->
interrupt_disable_v3_hw(). In the expander scenario,
sas_discover_root_expander() is executed in event work
DISCE_DISCOVER_DOMAIN, which will increases the controller usage count and
carry out a resume and sends SMPIO, it cannot be completed because the
runtime PM status of the controller is RPM_SUSPENDING. At the same time,
the ->runtime_suspend() callback suspend_v3_hw() also cannot complete the
process because of drain libsas event queue in sas_suspend_ha(), so hung
occurs.
(thread 1) | (thread 2)
... |
rpm_idle() |
... |
__update_runtime_status(RPM_SUSPENDING)|
... | ...
suspend_v3_hw() | smp_execute_task_sg()
... | ...
interrupt_disable_v3_hw() | pm_runtime_get_sync()
| ...
... | rpm_resume() //RPM_SUSPENDING
|
__sas_drain_work() |
To fix this, check if the current runtime PM status of the controller
allows to be suspended continue after interrupt_disable_v3_hw(), return
immediately if not.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hislicon.com>
Link: https://lore.kernel.org/r/1679283265-115066-5-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For the controller reset operation, hisi_sas_phy_enable() is executed for
each enabled local PHY, and refresh the port id of each device based on the
latest hisi_sas_phy->port_id after 1 second sleep, hisi_sas_phy->port_id is
configured in the interrupt processing function phy_up_v3_hw(). However, in
directly attached scenario, for some SATA disks the amount of time for
phyup more than 1s sometimes. In this case, incorrect port id may be
configured in hisi_sas_refresh_port_id(). As a result, all the internal
IOs fail and disk lost, such as follows:
[10717.666565] hisi_sas_v3_hw 0000:74:02.0: phyup: phy1 link_rate=10(sata)
[10718.826813] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=63
task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc8003f 0x0
0x0 Error info: 0x0 0x0 0x0 0x0
[10718.843428] sas: TMF task open reject failed 5000000000000501
[10718.849242] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=64
task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc80040 0x0
0x0 Error info: 0x0 0x0 0x0 0x0
[10718.865856] sas: TMF task open reject failed 5000000000000501
[10718.871670] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=65
task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc80041 0x0
0x0 Error info: 0x0 0x0 0x0 0x0
[10718.888284] sas: TMF task open reject failed 5000000000000501
[10718.894093] sas: executing TMF for 5000000000000501 failed after 3 attempts!
[10718.901114] hisi_sas_v3_hw 0000:74:02.0: ata disk 5000000000000501 reset failed
[10718.908410] hisi_sas_v3_hw 0000:74:02.0: controller reset complete
.....
[10773.298633] ata216.00: revalidation failed (errno=-19)
[10773.303753] ata216.00: disable device
So the time of waitting for PHYs up is 1s which may be not enough. To solve
the issue, running hisi_sas_phy_enable() in parallel through async
operations and use wait_for_completion_timeout() to wait for PHYs come up
instead of directly sleep for 1 second.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1679283265-115066-4-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an NCQ error occurs when the IPTT is valid and slot->abort flag is set
in completion path, sas_task_abort() will be called to abort only one NCQ
command now, and the host would be set to SHOST_RECOVERY state. But this
may not kick-off EH Immediately until other outstanding QCs timeouts. As a
result, the host may remain in the SHOST_RECOVERY state for up to 30
seconds, such as follows:
[7972317.645234] hisi_sas_v3_hw 0000:74:04.0: erroneous completion iptt=3264 task=00000000466116b8 dev id=2 sas_addr=0x5000000000000502 CQ hdr: 0x1883 0x20cc0 0x40000 0x20420000 Error info: 0x0 0x0 0x200000 0x0
[7972341.508264] sas: Enter sas_scsi_recover_host busy: 32 failed: 32
[7972341.984731] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 32 tries: 1
All NCQ commands that are in the queue should be aborted when an NCQ error
occurs in this scenario.
Fixes: 05d91b557a ("scsi: hisi_sas: Directly trigger SCSI error handling for completion errors")
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1679283265-115066-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
clang with W=1 reports:
drivers/scsi/qla4xxx/ql4_isr.c:475:11: error: variable
'count' set but not used [-Werror,-Wunused-but-set-variable]
uint32_t count = 0;
^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230331175757.1860780-1-trix@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
clang with W=1 reports:
drivers/scsi/snic/snic_scsi.c:490:6: error: variable
'xfer_len' set but not used [-Werror,-Wunused-but-set-variable]
u64 xfer_len = 0;
^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230328001647.1778448-1-trix@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
clang with W=1 reports:
drivers/scsi/qedf/qedf_main.c:2227:6: error: variable
'num_handled' set but not used [-Werror,-Wunused-but-set-variable]
int num_handled = 0;
^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230330203444.1842425-1-trix@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The validity of sock should be checked before assignment to avoid incorrect
values. Commit 57569c37f0 ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref
while calling getpeername()") introduced this change which may lead to
inconsistent values of tcp_sw_conn->sendpage and conn->datadgst_en.
Fix the issue by moving the position of the assignment.
Fixes: 57569c37f0 ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()")
Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
Link: https://lore.kernel.org/r/20230329071739.2175268-1-zhongjinghua@huaweicloud.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>