The calling relationship in blk_mq_destroy_queue() is as follows:
blk_mq_destroy_queue()
...
-> blk_queue_start_drain()
-> blk_freeze_queue_start() <- called
...
-> blk_freeze_queue()
-> blk_freeze_queue_start() <- called again
-> blk_mq_freeze_queue_wait()
...
So there is a redundant call to blk_freeze_queue_start().
Replace blk_freeze_queue() with blk_mq_freeze_queue_wait() to avoid the
redundant call.
Signed-off-by: Jinlong Chen <nickyc975@zju.edu.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221030083212.1251255-1-nickyc975@zju.edu.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The only caller that needs queue_is_mq check is del_gendisk, so move the
check into it.
Signed-off-by: Jinlong Chen <nickyc975@zju.edu.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221030094730.1275463-1-nickyc975@zju.edu.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Convert current looping-based implementation into bit operation,
which can bring improvement for:
1) bitops is more efficient for its arch-level optimization.
2) Given that blksize_bits() is inline, _if_ @size is compile-time
constant, it's possible that order_base_2() _may_ make output
compile-time evaluated, depending on code context and compiler behavior.
Signed-off-by: Dawei Li <set_pte_at@outlook.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/TYCP286MB23238842958D7C083D6B67CECA349@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Jeffery found one double ->queue_rq() issue, so far it can
be triggered in VM use case because of long vmexit latency or preempt
latency of vCPU pthread or long page fault in vCPU pthread, then block
IO req could be timed out before queuing the request to hardware but after
calling blk_mq_start_request() during ->queue_rq(), then timeout handler
may handle it by requeue, then double ->queue_rq() is caused, and kernel
panic.
So far, it is driver's responsibility to cover the race between timeout
and completion, so it seems supposed to be solved in driver in theory,
given driver has enough knowledge.
But it is really one common problem, lots of driver could have similar
issue, and could be hard to fix all affected drivers, even it isn't easy
for driver to handle the race. So David suggests this patch by draining
in-progress ->queue_rq() for solving this issue.
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: virtualization@lists.linux-foundation.org
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221026051957.358818-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Document which functions do not modify the queue limits.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221025191755.1711437-3-bvanassche@acm.org
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit c75e707fe1 ("block: remove the per-bio/request write hint")
removed all code that uses the struct request write_hint member. Hence
also remove 'write_hint' itself.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221025191755.1711437-2-bvanassche@acm.org
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that blk_mq_destroy_queue does not release the queue reference, there
is no need for a second admin queue reference to be held by the
apple_nvme structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20221018135720.670094-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that blk_mq_destroy_queue does not release the queue reference, there
is no need for a second admin queue reference to be held by the nvme_dev.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20221018135720.670094-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that blk_mq_destroy_queue does not release the queue reference, there
is no need for a second queue reference to be held by the scsi_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20221018135720.670094-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The fact that blk_mq_destroy_queue also drops a queue reference leads
to various places having to grab an extra reference. Move the call to
blk_put_queue into the callers to allow removing the extra references.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20221018135720.670094-2-hch@lst.de
[axboe: fix fabrics_q vs admin_q conflict in nvme core.c]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The current reference management logic of io scheduler modules contains
refcnt problems. For example, blk_mq_init_sched may fail before or after
the calling of e->ops.init_sched. If it fails before the calling, it does
nothing to the reference to the io scheduler module. But if it fails after
the calling, it releases the reference by calling kobject_put(&eq->kobj).
As the callers of blk_mq_init_sched can't know exactly where the failure
happens, they can't handle the reference to the io scheduler module
properly: releasing the reference on failure results in double-release if
blk_mq_init_sched has released it, and not releasing the reference results
in ghost reference if blk_mq_init_sched did not release it either.
The same problem also exists in io schedulers' init_sched implementations.
We can address the problem by adding releasing statements to the error
handling procedures of blk_mq_init_sched and init_sched implementations.
But that is counterintuitive and requires modifications to existing io
schedulers.
Instead, We make elevator_alloc get the io scheduler module references
that will be released by elevator_release. And then, we match each
elevator_get with an elevator_put. Therefore, each reference to an io
scheduler module explicitly has its own getter and releaser, and we no
longer need to worry about the refcnt problems.
The bugs and the patch can be validated with tools here:
https://github.com/nickyc975/linux_elv_refcnt_bug.git
[hch: split out a few bits into separate patches, use a non-try
module_get in elevator_alloc]
Signed-off-by: Jinlong Chen <nickyc975@zju.edu.cn>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221020064819.1469928-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
No need to find the actual elevator_type struct for this comparism,
the name is all that is needed.
Signed-off-by: Jinlong Chen <nickyc975@zju.edu.cn>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221020064819.1469928-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The stripped name should also be used for the none check. To do so
strip it in the caller and pass in the sanitized name. Drop the pointless
__ prefix in the function name while we're at it.
Based on a patch from Jinlong Chen <nickyc975@zju.edu.cn>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221020064819.1469928-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Make sure we have helpers for all relevant module refcount operations on
the elevator_type in elevator.h, and use them. Move the call to the get
helper in blk_mq_elv_switch_none a bit so that it is obvious with a less
verbose comment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221020064819.1469928-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit b5dc5d4d1f ("block,bfq: Disable writeback throttling") tries to
disable wbt for bfq, it's done by calling wbt_disable_default() in
bfq_init_queue(). However, wbt is still enabled if default elevator is
bfq:
device_add_disk
elevator_init_mq
bfq_init_queue
wbt_disable_default -> done nothing
blk_register_queue
wbt_enable_default -> wbt is enabled
Fix the problem by adding a new flag ELEVATOR_FLAG_DISBALE_WBT, bfq
will set the flag in bfq_init_queue, and following wbt_enable_default()
won't enable wbt while the flag is set.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019121518.3865235-7-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There are only one flag to indicate that elevator is registered currently,
prepare to add a flag to disable wbt if default elevator is bfq.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019121518.3865235-6-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, if wbt is initialized and then disabled by
wbt_disable_default(), sysfs will still show valid wbt_lat_usec, which
will confuse users that wbt is still enabled.
This patch shows wbt_lat_usec as zero if it's disabled.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reported-and-tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019121518.3865235-5-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, if user disable wbt through sysfs, 'enable_state' will be
'WBT_STATE_ON_MANUAL', which will be confusing. Add a new state
'WBT_STATE_OFF_MANUAL' to cover that case.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019121518.3865235-4-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
"elevator_queue *e" is already declared and initialized in the beginning
of elv_unregister_queue().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221019121518.3865235-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
'ioc->params' is updated in ioc_refresh_params(), which is proteced by
'ioc->lock', however, ioc_timer_fn() read params outside the lock.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20221012094035.390056-5-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This won't cause any severe problem currently, however, this doesn't
seems appropriate:
1) 'ioc->params' is read from multiple places without holding
'ioc->lock', unexpected value might be read if writing it concurrently.
2) If configuration is changed while io is throttling, the functionality
might be affected. For example, if module params is updated and cost
becomes smaller, waiting for timer that is caculated under old
configuration is not appropriate.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20221012094035.390056-4-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
ioc_qos_write() and ioc_cost_model_write() are the same:
1) hold lock to read 'ioc->params' to local variable;
2) update params to local variable without lock;
3) hold lock to write local variable to 'ioc->params';
In theroy, if user updates params concurrenty, the params might be lost:
t1: update params a t2: update params b
spin_lock_irq(&ioc->lock);
memcpy(qos, ioc->params.qos, sizeof(qos))
spin_unlock_irq(&ioc->lock);
qos[a] = xxx;
spin_lock_irq(&ioc->lock);
memcpy(qos, ioc->params.qos, sizeof(qos))
spin_unlock_irq(&ioc->lock);
qos[b] = xxx;
spin_lock_irq(&ioc->lock);
memcpy(ioc->params.qos, qos, sizeof(qos));
ioc_refresh_params(ioc, true);
spin_unlock_irq(&ioc->lock);
spin_lock_irq(&ioc->lock);
// updates of a will be lost
memcpy(ioc->params.qos, qos, sizeof(qos));
ioc_refresh_params(ioc, true);
spin_unlock_irq(&ioc->lock);
Althrough this is not common case, the problem can by fixed easily by
holding the lock through the read, update, write process.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20221012094035.390056-3-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit b5dc5d4d1f ("block,bfq: Disable writeback throttling") disable
wbt for bfq, because different write-throttling heuristics should not
work together.
For the same reason, wbt and iocost should not work together as well,
unless admin really want to do that, dispite that performance is
affected.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20221012094035.390056-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Fix compilation without RISCV_ISA_ZICBOM
- Fix kvm_riscv_vcpu_timer_pending() for Sstc
ARM:
- Fix a bug preventing restoring an ITS containing mappings
for very large and very sparse device topology
- Work around a relocation handling error when compiling
the nVHE object with profile optimisation
- Fix for stage-2 invalidation holding the VM MMU lock
for too long by limiting the walk to the largest
block mapping size
- Enable stack protection and branch profiling for VHE
- Two selftest fixes
x86:
- add compat implementation for KVM_X86_SET_MSR_FILTER ioctl
selftests:
- synchronize includes between include/uapi and tools/include/uapi
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNT2hQUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNeVwgAkGGk2F2SF5s+MQUQ9tDPxyuRbddN
NPo/YRTKszKc8rK6d1TCbQi56I3e8Oa7kNkMF7CiBlAekB7B1r1ySg5qc+3lQebx
moME30Ru4nmfqPcZ7971MA8Me7zZxGzvIviL5KIwm1ownGifdTsPZ9jCvu4EPdzv
3dd10guH3GeBIq8QeQGEqNP4fticziwhE+IA3HZstcWsq96800Le7WNAgklfzdC+
YTB81QU6whHv6N/7YvRcTbp+tER3VIKdFMmRD1FwC90flhXMbxTymESFXULfHCM2
x/arGz2E31/QGgJo0/Yy2VPenr5ZMU57dL4SYWR02mwSfJQnJWb1cRdWnw==
=rxQ7
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"RISC-V:
- Fix compilation without RISCV_ISA_ZICBOM
- Fix kvm_riscv_vcpu_timer_pending() for Sstc
ARM:
- Fix a bug preventing restoring an ITS containing mappings for very
large and very sparse device topology
- Work around a relocation handling error when compiling the nVHE
object with profile optimisation
- Fix for stage-2 invalidation holding the VM MMU lock for too long
by limiting the walk to the largest block mapping size
- Enable stack protection and branch profiling for VHE
- Two selftest fixes
x86:
- add compat implementation for KVM_X86_SET_MSR_FILTER ioctl
selftests:
- synchronize includes between include/uapi and tools/include/uapi"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
tools: include: sync include/api/linux/kvm.h
KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER
KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter()
kvm: Add support for arch compat vm ioctls
RISC-V: KVM: Fix kvm_riscv_vcpu_timer_pending() for Sstc
RISC-V: Fix compilation without RISCV_ISA_ZICBOM
KVM: arm64: vgic: Fix exit condition in scan_its_table()
KVM: arm64: nvhe: Fix build with profile optimization
KVM: selftests: Fix number of pages for memory slot in memslot_modification_stress_test
KVM: arm64: selftests: Fix multiple versions of GIC creation
KVM: arm64: Enable stack protection and branch profiling for VHE
KVM: arm64: Limit stage2_apply_range() batch size to largest block
KVM: arm64: Work out supported block level at compile time
This reverts commit 72a9585972.
It broke reboots on big-endian MIPS and MIPS64 malta QEMU instances,
which use the syscon driver. Little-endian is not effected, which means
likely it's important to handle regmap_get_val_endian() in this function
after all.
Fixes: 72a9585972 ("mfd: syscon: Remove repetition of the regmap_get_val_endian()")
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Lee Jones <lee@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Rework how SIGTRAPs get delivered to events to address a bunch of
problems with it. Add a selftest for that too
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmNVE9UACgkQEsHwGGHe
VUpodRAAsn3s+xX02qfxRG0mYX/1nnOmnFnDhmUEPAZpDXjB6g3PFXAh26F9tw92
dLm8fbfp8W3tK7TQrGyOGWMnb7oj4gEoXAcQBXvsq2KOJMVwRHHKwCeSSJ89DLAW
zZEl2nzgea2cGyZn2RZJvhmbm8YdFON3v++Vv34yovs1MMoCcD7FOPLLGWkD2gk5
6PK6lIlWEI/+vYKWhZdxgk6/PanBInXRUnbaBVj42US1XwfDLzPBEi9yyUUJQrht
CQhfTpHn4Z5MC/hJTlFat4Jlaajql4JBcQ1SS5LW59M+6gdlBK4tL6zFt10zvU2m
+kywOOIWiYLRRgFf6idGO45P5BuWOdmsXEaEg5KW7b6nJfGvgkd7WUYgCVOgtEY1
r4Mf4hAQUunDHGQ4e8eHk7XemFJsoBSweYCTQ2O0yr/QzO2M6QBi/BR9PzUajyH+
yShKEfrxXt4595BMH0nonSMpTKcE4Zxdj06LZSnGecEN8UUlx/n49uYwhFNdUkqM
s6Wz6kSR76YRlKUmYnNzP1gkY6nJZ1nR6z7SjmMkioxf3VxhT9SY8K587r6hRUlr
/NVA69iUhJy75VdttxZEmZzB03A7AjdudmZEisF0ImEmB1hxzYLHcDKJMTIj/r4/
f8OXCg5ACKhFlnx1SdBVRtA+6+5ab368Fs2rItJQ4dzdxRi6VVY=
=DXG7
-----END PGP SIGNATURE-----
Merge tag 'perf_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Fix raw data handling when perf events are used in bpf
- Rework how SIGTRAPs get delivered to events to address a bunch of
problems with it. Add a selftest for that too
* tag 'perf_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bpf: Fix sample_flags for bpf_perf_event_output
selftests/perf_events: Add a SIGTRAP stress test with disables
perf: Fix missing SIGTRAPs
Intel perf LBR
- A CFI fix to ftrace along with a simplification
- Adjust handling of zero capacity bit mask for resctrl cache allocation
on AMD
- A fix to the AMD microcode loader to attempt patch application on
every logical thread
- A couple of topology fixes to handle CPUID leaf 0x1f enumeration info
properly
- Drop a -mabi=ms compiler option check as both compilers support it now
anyway
- A couple of fixes to how the initial, statically allocated FPU buffer
state is setup and its interaction with dynamic states at runtime
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmNVCj4ACgkQEsHwGGHe
VUoalw//WV4aEXgJx6Zz4PoawE1zeAFa4LlrDZDpZ8HxdcT0yXr9MNMIn4qN3j4H
xsc/MWrdEWWS0rmt//IWzI3ee+VYTnrLjkBajQcNJ9qmqGJaS0ooKm9+j2eWG8J5
ZRLTx5+jJhoEkyYhDQvTXeSGLMjs+X/3Y6CXKzDKaIA5PVqMsc2lxa70MwsjKoVm
7XqFHmmv3Vw9JmmqsLBRp+VmYzNlgZ9eqaD+pujiYnz7h8fHYyN2I6LhNIYpoHHr
gUY0LwxjqJ9O+CaF2fsTDH6oDevVMt5qGntzsa1hx09gO6UzgZwTr+Sme74zE3ox
Lj3IDj/EuxCNIeCByHpmvVOdKnvEEsx6+BDAuiq+EPSDykPUHmp1fiDywR1TC9o0
PwQ06+/AbXtbKA0JOb1ArFlBB+aE2W7r+Rj8yH3UpsLjDh7sKM0kERm9RuFw7aKJ
j35y9mzvJtR3fstb5MD9rI0lBEL6p5JIQSZvulWqf5l5h0K2DS/7OYGJu2GPjU+c
PFA+tnGgs0vc98w2clBzQVk5Yo9gJCma5eGUy8bJ/ESwTfBzERBTscD53kT7jyWn
TW4nMSkr1H4aR7dXjb5suhSo1IqDMtFAWPECh0f6btgTzf5JOeS2dCihGHRbmh8z
5ByhZHdg8tJiX0xBv1s31KGZECQQxCy8AJr3MXKxOjqjXoI4fFg=
=2hxy
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v6.0_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"As usually the case, right after a major release, the tip urgent
branches accumulate a couple more fixes than normal. And here is the
x86, a bit bigger, urgent pile.
- Use the correct CPU capability clearing function on the error path
in Intel perf LBR
- A CFI fix to ftrace along with a simplification
- Adjust handling of zero capacity bit mask for resctrl cache
allocation on AMD
- A fix to the AMD microcode loader to attempt patch application on
every logical thread
- A couple of topology fixes to handle CPUID leaf 0x1f enumeration
info properly
- Drop a -mabi=ms compiler option check as both compilers support it
now anyway
- A couple of fixes to how the initial, statically allocated FPU
buffer state is setup and its interaction with dynamic states at
runtime"
* tag 'x86_urgent_for_v6.0_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly
perf/x86/intel/lbr: Use setup_clear_cpu_cap() instead of clear_cpu_cap()
ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()
x86/ftrace: Remove ftrace_epilogue()
x86/resctrl: Fix min_cbm_bits for AMD
x86/microcode/AMD: Apply the patch early on every logical thread
x86/topology: Fix duplicated core ID within a package
x86/topology: Fix multiple packages shown on a single-package system
hwmon/coretemp: Handle large core ID value
x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUB
x86/fpu: Exclude dynamic states from init_fpstate
x86/fpu: Fix the init_fpstate size check with the actual size
x86/fpu: Configure init_fpstate attributes orderly
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmNUFz4QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqrSEAC+jhEaIB4srOr5DMta/CxBoKwiZIcMmsaK
pzRMFSTKWWtsx3COGjT0vwzm4VuZsZztE+A6buYs5riEDsI0l5TJiZ0fOqi9N0nB
orehq+7T2Cn5E848bMzLo7tSmlibAionrOQde5PbtmDcltuKddu9TiXNzD6XufLB
dLWbTRLdfxuFW0c8DYwng+KNBocXad64gu3ADuxKVGkWDs9tfOvaFWE/NgoXsEoq
a7uaBCF+DBWYhHWk0WPOA2+BLyNMN+g7owX1GWqW/Sr48CQDJSw5YnpKCi8+jZdb
uHreUIH96w/2A7CFfNCOfx5MhYCrX/j9ik6mDt2B8Gbh6vg3LlMADSo6xXSPag0r
7Lu7AVr7Sko6NKU2x/pCzv/U85TFuvuqSfH+YFK3rdEouZk26o5PfJEugAtD3gKv
smAk+ATmgx/iye5a2Uq7ClVVdOcdQULrC15/8XdcG7eI+l2q3AbgTa53PdBw3oF7
S+ANKMP5kPkPe1wDxFR0g3v7vsZmmfahRuss3xWC+PnHZPFZPQFRIohjWSsu1Exl
Ztri7Xy/ypC7bZ5F1pch1AjiLfLCGzpmKjT4QAy/mSFAJVboRDb0PTwN1w7uVCBQ
qK8TIw2iVKjEeIps/CedO+nQQrxhOKcizxLIPTyfaT6ZOJrbHalcQheEJqpWMnrF
w7dYkDnjYw==
=tTNe
-----END PGP SIGNATURE-----
Merge tag 'io_uring-6.1-2022-10-22' of git://git.kernel.dk/linux
Pull io_uring follow-up from Jens Axboe:
"Currently the zero-copy has automatic fallback to normal transmit, and
it was decided that it'd be cleaner to return an error instead if the
socket type doesn't support it.
Zero-copy does work with UDP and TCP, it's more of a future proofing
kind of thing (eg for samba)"
* tag 'io_uring-6.1-2022-10-22' of git://git.kernel.dk/linux:
io_uring/net: fail zc sendmsg when unsupported by socket
io_uring/net: fail zc send when unsupported by socket
net: flag sockets supporting msghdr originated zerocopy
- corsair-psu: Fix typo in USB id description, and add USB ID for new PSU
- pwm-fan: Fix fan power handling when disabling fan control
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEiHPvMQj9QTOCiqgVyx8mb86fmYEFAmNT97AACgkQyx8mb86f
mYEOxhAAlVhKbeTPvqefdXxuJSy5OKy7EeQktzRV0Mt4uWgOxZOW01IVXoTDHEiw
d8X8mW4HSa9/e/9uqMUtKLNRqkcZ0YEHkp06K/3FH1rwonpAlRt4fYw7UEujMVnR
PLgAAlkeROvYpNphW7YkUle0RDu41ugOuuM3DyjPSGE4CCdDQXMQBJSR2TBr5LZ7
Y84d/9vVPtEUZ957KlAxSVJ5RQHWomDDKwXocDG6sLLDCOdwsPNNuKj0aj7Z5Mmh
C3eh8zbLW1txOrKh/AiOioql4UcnV8RlWGNhDUYpF1VDkq3CFFdk3h1BX8eXcKag
sgeh6QkW5QGnmy99BxzPpGYIRu9zWM5Dq3LlJmvBX+zU+hv+Sb4IZFrfPUgQmDR0
acX66qfAvpxszST3XQKR4pVwqI6fVq5A7RSWdyrEtT6z1EwB2Ju5PYM5ncOQHrp/
+erMQwWXMRSGP7gGQNZHxZw7vckeYPZUsCRi2JCOoHxzmu6pcKkj0P+/VjyjMf8I
nqTbDhqOkBqxvJkNocKsvtapIqK4kpHHVL91VZqUJ4ePlpt8W7ax58oxIGJv/c+L
u2rdVyZpeg0x20a6VrmJc92y7QBlB904Fk+nPwdNh07ltSw2jmkgdXuwIAIIuRZu
I48c6PLGYOaBgSEpv8ew4cVOk2edBT45OKpA96OHHlIPK6um72g=
=ajp4
-----END PGP SIGNATURE-----
Merge tag 'hwmon-for-v6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- corsair-psu: Fix typo in USB id description, and add USB ID for new
PSU
- pwm-fan: Fix fan power handling when disabling fan control
* tag 'hwmon-for-v6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (corsair-psu) Add USB id of the new HX1500i psu
hwmon: (pwm-fan) Explicitly switch off fan power when setting pwm1_enable to 0
hwmon: (corsair-psu) fix typo in USB id description
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmNTCwIUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vy0ABAAltWCeaI31solG+0Nng1vi8MpNG+G
wNd73u2J3rCUq6l+vyukAoHOV+gVIMpN18J+Qj/dqU7Uwaj4PSkXL0lXFPoQurIQ
n5f5Qi9viMPpQnIx1Alf7sKMxTmfGs2QfJ2i7owFg66C5OowZZZoNxO0OLx2IWP3
Cl9U5SDfVXT32eNdLe0kqNIw0UJT5C/WoHD3koFjp2Ic3ad/zQOYOSjXvTt5CRKq
2ae1EbZ9diPsiLxBDWaskYYVJvX9haAYVUdHAQ/aI3yyabiWUKsN1BCoVbAS/fBc
tV0hEYQYwNhMtpBOn2eJd06gPfiZ9eWeVYri1DOxC/qAK/iIqxPiX088/ms0R3qj
SM2ANn0XJRsEd4vW65JHxz4xtkBKnJ1mQUSqhp9OB9edquyWp6DDvCxWJ0Ee5ta2
Uoop4HnUbwiRLZRBMPNQ6unFQlcQ5H2eOUMiGjqPRCZTzkdzms++PLIk8Xy9U46i
RRaear6KjIfYK2CsW2HJGWHcM/rjVFjAuZd4e3PjR7xu/qagPqfKbKeVBdWRXvN9
C+kPL90Xw4YT97RcBL1beuZaoX6dBrez2OaS3emeIoJ1urHbXH7tMp1eHHBIzTBE
iJriaj/qUWdUcs9APBdMp9QXojgAeK8kQa+Qqu+1Xqdiczg9V3emFBmqDj0gpJ82
O9jyhr25FuiOnvs=
=osgb
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:
- Revert a simplification that broke pci-tegra due to a masking error
- Update MAINTAINERS for Kishon's email address change and TI
DRA7XX/J721E maintainer change
* tag 'pci-v6.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Update Kishon's email address in PCI endpoint subsystem
MAINTAINERS: Add Vignesh Raghavendra as maintainer of TI DRA7XX/J721E PCI driver
Revert "PCI: tegra: Use PCI_CONF1_EXT_ADDRESS() macro"
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmNTCJ8ACgkQCF8+vY7k
4RV/Ag//Ws8bIdedAQsbakBOq9JPOMiqHZnBde5DWn3XqU8aAY9bby70Rf2OTbl7
8mpqzeJY6wFfvesGCJ8L2hprpVqNW1KCrNAxSyaq/8Wau1D77wyEofqPNarNJzqn
oKbH0JWd8hfStJpgmwyxUXjLanDXNx2s4lRm6R1WMWPH6dLeHydx4CtFMbmOn1L8
+jTtLK6631plWw/Kkp1A9z8N1D/9b4iMOgpoQZZLuzL1DouoYWlltz+Kw9HU7rsQ
1/wGmMwTwiV6Zt2UPwB4qudq3UpUMB3tm0KWprkmSx3Xv14Rr1o3zdwALTXib0Ez
wZuzWzWaf9Fjp7CHOfEpm4x3+kU9181iw4ACk34cq7SglMYCdQ2hiwW5b9hhTN2m
tYxv78fXJD2lHyxZQAHNN7XRmiWfMWMA0Z7GwCLVFXJ24Vjzv5AfuD3rJEE6Fv3X
UOjPTNdNt4tpxX8A2Yd7WlfIBBGm2h63MVIYh50R54JCdLLLB8vhtob7pP2Y94pg
FqXxfwc216cArKVsIjmUUkJs153IlQPYzBv9xXBBbD2DXhguWhLQnf9L/KdCnFkF
6NTULAHNezkss6dbLPIL08lCEIvTqeQabPBlCEtXNqqxBWfJwdwLbeS8mg2dTxao
wwR5D37JbNuDSj0/4N/DlvVJozcCLJ2ZZ9R3c2j8/4Z0HERIhqA=
=gJf4
-----END PGP SIGNATURE-----
Merge tag 'media/v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull missed media updates from Mauro Carvalho Chehab:
"It seems I screwed-up my previous pull request: it ends up that only
half of the media patches that were in linux-next got merged in -rc1.
The script which creates the signed tags silently failed due to
5.19->6.0 so it ended generating a tag with incomplete stuff.
So here are the missing parts:
- a DVB core security fix
- lots of fixes and cleanups for atomisp staging driver
- old drivers that are VB1 are being moved to staging to be
deprecated
- several driver updates - mostly for embedded systems, but there are
also some things addressing issues with some PC webcams, in the UVC
video driver"
* tag 'media/v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (163 commits)
media: sun6i-csi: Move csi buffer definition to main header file
media: sun6i-csi: Introduce and use video helper functions
media: sun6i-csi: Add media ops with link notify callback
media: sun6i-csi: Remove controls handler from the driver
media: sun6i-csi: Register the media device after creation
media: sun6i-csi: Pass and store csi device directly in video code
media: sun6i-csi: Tidy up video code
media: sun6i-csi: Tidy up v4l2 code
media: sun6i-csi: Tidy up Kconfig
media: sun6i-csi: Use runtime pm for clocks and reset
media: sun6i-csi: Define and use variant to get module clock rate
media: sun6i-csi: Always set exclusive module clock rate
media: sun6i-csi: Tidy up platform code
media: sun6i-csi: Refactor main driver data structures
media: sun6i-csi: Define and use driver name and (reworked) description
media: cedrus: Add a Kconfig dependency on RESET_CONTROLLER
media: sun8i-rotate: Add a Kconfig dependency on RESET_CONTROLLER
media: sun8i-di: Add a Kconfig dependency on RESET_CONTROLLER
media: sun4i-csi: Add a Kconfig dependency on RESET_CONTROLLER
media: sun6i-csi: Add a Kconfig dependency on RESET_CONTROLLER
...
If a protocol doesn't support zerocopy it will silently fall back to
copying. This type of behaviour has always been a source of troubles
so it's better to fail such requests instead.
Cc: <stable@vger.kernel.org> # 6.0
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2db3c7f16bb6efab4b04569cd16e6242b40c5cb3.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We need an efficient way in io_uring to check whether a socket supports
zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
socket flags fields.
Cc: <stable@vger.kernel.org> # 6.0
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Provide a definition of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
Fixes: 17601bfed9 ("KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option")
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The KVM_X86_SET_MSR_FILTER ioctls contains a pointer in the passed in
struct which means it has a different struct size depending on whether
it gets called from 32bit or 64bit code.
This patch introduces compat code that converts from the 32bit struct to
its 64bit counterpart which then gets used going forward internally.
With this applied, 32bit QEMU can successfully set MSR bitmaps when
running on 64bit kernels.
Reported-by: Andrew Randrianasulu <randrianasulu@gmail.com>
Fixes: 1a155254ff ("KVM: x86: Introduce MSR filtering")
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-4-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the next patch we want to introduce a second caller to
set_msr_filter() which constructs its own filter list on the stack.
Refactor the original function so it takes it as argument instead of
reading it through copy_from_user().
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-3-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We will introduce the first architecture specific compat vm ioctl in the
next patch. Add all necessary boilerplate to allow architectures to
override compat vm ioctls when necessary.
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-2-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Fix a bug preventing restoring an ITS containing mappings
for very large and very sparse device topology
- Work around a relocation handling error when compiling
the nVHE object with profile optimisation
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmNRGjoPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDVcEP/2zn+J/pAv1FoNKxSiSk/H2hx5PH685mqEwv
I7YPYPOuGJDv/uJFXzMMDpjjs8RHG9yWwPwtFZYGKqtmlkXgM3a0jk5jtsgfSVcp
vsvqgR0kL67x5D/Mc+I/bkVMNEecrucvkewf0Sp7W2gdXnC+rm1EFN5McK5U60tW
+s6fKVkIB0/fxNJCXPdETGiNpoYGEBeVVzoTrXGFm9QBcSDKvqdaPF8koZzOecpS
3UcVYKBaxNf1zowAseNYiS4kndwKuopaYsk4BY/KDgTbPyoFQBwMZYm6DAkb9ugW
sFK6g8FmOJJ5meLWIg4vTdruILL+YP6gbJ0HrzFoxJJEdd2iRzEUgQlBp7G+STSY
npXmeTJh5DJelm1Qd4OdiSBlqwYTIxKEyaV34/xeFh9P3w/LbN9uSiZAybbTw92h
sE1kQpT//s8lTXOnn23YHycZ1Dsy0JSExoJutCt6E5YnTb0wtPQ6ZLoWmyWMnTL/
6Gyj7LTy+trv4l0N2ND+LF7BfN6Cb4eSXMfLLhwo2qfQI2kD/gBrbrTYYdyiVvHD
YnZftvVViOcQVaocXP1SeQ5/yhdj6ASrCiFWie7nJW9Z3IJ9BJuXCxk3gSW5yCde
Rw37+FpVSciyhpEK9f4aVGfAGA8ifO1Gi5yp70ioF8Y3i1CvS238g0knx6v6p8Vm
dAi3i1ga
=C0jb
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.1, take #2
- Fix a bug preventing restoring an ITS containing mappings
for very large and very sparse device topology
- Work around a relocation handling error when compiling
the nVHE object with profile optimisation
- Fix for stage-2 invalidation holding the VM MMU lock
for too long by limiting the walk to the largest
block mapping size
- Enable stack protection and branch profiling for VHE
- Two selftest fixes
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmNIEAUPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDIXcP/AyWlCEJlmc1Jcd9rlaW1Wenr82U+StLVeMy
qP5P02gMWdbGExWIWEi4zkt+pAm7K2WRgXid9z5Vjw7kZY/+WwswTzKHWcQhVuZv
cBHfeOqgtoHVGR8NcwX6xcp406y3WRqYIsyAmbc5qmo75L8Ew1o3m+3eDfFtAq7l
3XuTCv+lQGNSGMhXHN2SVewZ+pCAo3XJmuHfCBXTqRjwqH4Tzh+54IKzo+9mqBWW
7yeIm5qcbIKGuXLuLL7XCf99gWy/3kQ0xQ1yJeXLAyiHswHqEISZXGHnKeATvD+6
RdbmQ9oRmIYfZfoDKZRUJg8TyTvW1rIKokFbe0q2iyuDnI5D/fAJ48epZaLw+kEf
PUzdB3UgPk19SLwgZKQddqY4wOD420ZD5x1TUFUQuLL7sjVv1vUILDvuCLWpq7F7
GyfSB+LEMgexHGsZ1wjslN/ivTbG+dQgaSS9mlV8/WDOLPtD2uOf65vYR3P28hAX
zOHrwm3e2+UV83BsEFEY2FQiiIBD24JmSecMbmAIHY09MCSZ+vJ/WbF4J1PcPP8C
3vjueIYTcjhzLtQrfIkGZcS7+wC9ji/RRmpJjbg79EpwrjhEs9G8h1+HyL9+zBZ4
Xn6X+ZG/cv0/ZYdin0ZRzJMvM0RutbsR77blVCLY97PBuLtBlqJDcxr+lmmjyIZ2
Db8Qd6uW
=IOxM
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.1, take #1
- Fix for stage-2 invalidation holding the VM MMU lock
for too long by limiting the walk to the largest
block mapping size
- Enable stack protection and branch profiling for VHE
- Two selftest fixes