This series of patches fix some critical bugs such as memory leak in compression
flows, kernel panic when handling errors, and swapon failure due to newly added
condition check.
-----BEGIN PGP SIGNATURE-----
iQIyBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmCeTNcACgkQQBSofoJI
UNKc0Q/3X9Tngns5DpnzQw9kbWochucG/7Vf1HjdvV2gE7IxruDPofGtBdhPdSYV
uR+nP9dxhXLQQNfWS2KICzdj0yceKCKq8xpnnNdq9SGjVJmUjCD39ByGJV3GMOGM
fY6dizcywltH7iBQboMZ0Eh3ivPh6ugl5klDo20WzYsH6F/UF7CPSuSJ3K5ezGuY
T6R3NkqG8v1cS6+5u+teDpmdCCHOCBEeizBFQ6XskNDBavbw7KEA0liwKOv6eghB
PdoeqYemg1VfOHAKqP6F3o+eSlsT5Ljs0Zmc5x8h8qRS76JK/hb9REFngrcERaIw
GPDnMPCHniCHMd61z90oqGtMf4PFqLUVnBTdxhxLK5G4+u874dlZLciKWGDIGTLv
eNU2W+8c9s+KdJAZFJbYN5zVoyJUR5SW7RcYTuvZWt8wX38Ch+FZEGqOC8FxWyWU
i1WHXZiGeifIlIeqUOPJP5sbslL2hfK5OqMYJotAeIW/E2RyJWnc+Yo2UwvmvXVU
xPOKFOn9nAAQz2GGgSpvGaWAVfMNqhoLw7/gzwdkabP0EASIzHuW2PCJhp59c1NO
Fb9eUi7yhgd94vDbYftXRBgAhrUmCd0u+/gySyou4vujWtfWcO+AvOEreV7VRFfL
Su0bGkBJ03ThFEFAZnY14RenydGSwpk5Fd9wkRy4Qk7mBv9sRg==
=NeKH
-----END PGP SIGNATURE-----
Merge tag 'f2fs-5.13-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
"This fixes some critical bugs such as memory leak in compression
flows, kernel panic when handling errors, and swapon failure due to
newly added condition check"
* tag 'f2fs-5.13-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: return EINVAL for hole cases in swap file
f2fs: avoid swapon failure by giving a warning first
f2fs: compress: fix to assign cc.cluster_idx correctly
f2fs: compress: fix race condition of overwrite vs truncate
f2fs: compress: fix to free compress page correctly
f2fs: support iflag change given the mask
f2fs: avoid null pointer access when handling IPU error
two MAINTAINERS updates.
amdgpu:
- Fixes for flexible array conversions
- Fix sysfs attribute init
- Harvesting fixes
- VCN CG/PG fixes for Picasso
radeon:
- Fixes for flexible array conversions
- Fix for flickering on Oland with multiple 4K displays
vc4:
- drop an used function
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJgneCaAAoJEAx081l5xIa+F+wP+wSg9t3IKkev8dcyHkHXHHyO
k7o4SCxXZ8KCNo9uOj2r6N8k45RQJYMYFZxxlMMex/cfGcHIBUkTVdL/6cz1Ks3i
Ou/CLlij9k2irBi0LGV+BcDPs3ud4DBQpP0i9RWzecvJDt0tiXnmmzIdhyDaG4HV
daYV4qVuF/GBmKpyAOtHT6OXgU4g8GjYjeTA+WrEJgMLJvY7qQiRgsoz4Fwog4hi
e2ALZkEPqxrszBQ3Xsl/LPhlMZ/5oVs/+JSQEsL0UM88w7TTf4dCOefFlPw05fQ8
N4T0iOIBoMWQ0SnMOEiIZxJqtt4v/eQy7GgeI7DOCxx6Z4Bc8ALSDJAaqaK+xdNQ
AXyJSMlg5K/r7sCkP1nQjWP6j9tWHf82kDNoHNceVspa3+eLJ3MDzWZ6PYXsJQ+q
CKT+ov0AkZZfQgafHWA+vUEIcWGeVVI1yla0TJbqf0ZvmSYo5Qr+NsS5kYLHGleE
38KYuj09mcfg3zWY9tHcZXDNEeRwLqIAOX2hy37XtwAri8QOW5ZA19KoH9Y3ZRUE
s7fvAI9raHmOo0O79wGKa89k6ggDbUkC7oT9lzqCQQ005VlmfKsGNtwwvGIMkabW
9NneHx0n6+uvb1NIkTU3qP9wsbIA2hzk2iDp1zeOY2U8cGhyjFgm7apc/JyCVzsG
XKHR8awHEEvC81txicI6
=iwKz
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2021-05-14' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Not much here, mostly amdgpu fixes, with a couple of radeon, and a
cosmetic vc4.
Two MAINTAINERS file updates also.
amdgpu:
- Fixes for flexible array conversions
- Fix sysfs attribute init
- Harvesting fixes
- VCN CG/PG fixes for Picasso
radeon:
- Fixes for flexible array conversions
- Fix for flickering on Oland with multiple 4K displays
vc4:
- drop unused function"
* tag 'drm-fixes-2021-05-14' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: update vcn1.0 Non-DPG suspend sequence
drm/amdgpu: set vcn mgcg flag for picasso
drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
drm/amdgpu: update the method for harvest IP for specific SKU
drm/amdgpu: add judgement when add ip blocks (v2)
drm/amd/display: Initialize attribute for hdcp_srm sysfs file
drm/amd/pm: Fix out-of-bounds bug
drm/radeon/si_dpm: Fix SMU power state load
drm/radeon/ni_dpm: Fix booting bug
MAINTAINERS: Update address for Emma Anholt
MAINTAINERS: Update my e-mail
drm/vc4: remove unused function
drm/ttm: Do not add non-system domain BO into swap list
To ensure that instructions are observable in a new mapping, the arm64
set_pte_at() implementation cleans the D-cache and invalidates the
I-cache to the PoU. As an optimisation, this is only done on executable
mappings and the PG_dcache_clean page flag is set to avoid future cache
maintenance on the same page.
When two different processes map the same page (e.g. private executable
file or shared mapping) there's a potential race on checking and setting
PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the
fault paths the page is locked (PG_locked), mprotect() does not take the
page lock. The result is that one process may see the PG_dcache_clean
flag set but the I/D cache maintenance not yet performed.
Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit()
and set_bit(). In the rare event of a race, the cache maintenance is
done twice.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210514095001.13236-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If a tag set is shared across request queues (e.g. SCSI LUNs) then the
block layer core keeps track of the number of active request queues in
tags->active_queues. blk_mq_tag_busy() and blk_mq_tag_idle() update that
atomic counter if the hctx flag BLK_MQ_F_TAG_QUEUE_SHARED is set. Make
sure that blk_mq_exit_queue() calls blk_mq_tag_idle() before that flag is
cleared by blk_mq_del_queue_tag_set().
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Fixes: 0d2602ca30 ("blk-mq: improve support for shared tags maps")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210513171529.7977-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In case of shared sbitmap, request won't be held in plug list any more
sine commit 32bc15afed ("blk-mq: Facilitate a shared sbitmap per
tagset"), this way makes request merge from flush plug list & batching
submission not possible, so cause performance regression.
Yanhui reports performance regression when running sequential IO
test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk
is emulated with image stored on xfs/megaraid_sas.
Fix the issue by recovering original behavior to allow to hold request
in plug list.
Cc: Yanhui Ma <yama@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: kashyap.desai@broadcom.com
Fixes: 32bc15afed ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210514022052.1047665-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
xen_swiotlb_init calls swiotlb_late_init_with_tbl, which fails with
-ENOMEM if the swiotlb has already been initialized.
Add an explicit check io_tlb_default_mem != NULL at the beginning of
xen_swiotlb_init. If the swiotlb is already initialized print a warning
and return -EEXIST.
On x86, the error propagates.
On ARM, we don't actually need a special swiotlb buffer (yet), any
buffer would do. So ignore the error and continue.
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Boris Ostrovsky <boris.ostrvsky@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210512201823.1963-3-sstabellini@kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Although SWIOTLB_NO_FORCE is meant to allow later calls to swiotlb_init,
today dma_direct_map_page returns error if SWIOTLB_NO_FORCE.
For now, without a larger overhaul of SWIOTLB_NO_FORCE, the best we can
do is to avoid setting SWIOTLB_NO_FORCE in mem_init when we know that it
is going to be required later (e.g. Xen requires it).
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
CC: catalin.marinas@arm.com
CC: will@kernel.org
CC: linux-arm-kernel@lists.infradead.org
Fixes: 2726bf3ff2 ("swiotlb: Make SWIOTLB_NO_FORCE perform no allocation")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210512201823.1963-2-sstabellini@kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Mohammed reports (https://bugzilla.kernel.org/show_bug.cgi?id=213029)
the commit e4ab4658f1 ("clocksource/drivers/hyper-v: Handle vDSO
differences inline") broke vDSO on x86. The problem appears to be that
VDSO_CLOCKMODE_HVCLOCK is an enum value in 'enum vdso_clock_mode' and
'#ifdef VDSO_CLOCKMODE_HVCLOCK' branch evaluates to false (it is not
a define).
Use a dedicated HAVE_VDSO_CLOCKMODE_HVCLOCK define instead.
Fixes: e4ab4658f1 ("clocksource/drivers/hyper-v: Handle vDSO differences inline")
Reported-by: Mohammed Gamal <mgamal@redhat.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210513073246.1715070-1-vkuznets@redhat.com
Since recent changes instead of storing a large array of struct
io_mapped_ubuf, we store pointers to them, that is 4 times slimmer and
we should not to so worry about restricting max number of registererd
buffer slots, increase the limit 4 times.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d3dee1da37f46da416aa96a16bf9e5094e10584d.1620990371.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
cd5676db05 ("misc: eeprom: at24: support pm_runtime control") disables
regulator in runtime suspend. If runtime suspend is called before
regulator disable, it will results in regulator unbalanced disabling.
Fixes: cd5676db05 ("misc: eeprom: at24: support pm_runtime control")
Cc: stable <stable@vger.kernel.org>
Acked-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Link: https://lore.kernel.org/r/20210420133050.377209-1-hsinyi@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Memory allocated by 'vmbus_alloc_ring()' at the beginning of the probe
function is never freed in the error handling path.
Add the missing 'vmbus_free_ring()' call.
Note that it is already freed in the .remove function.
Fixes: cdfa835c6e ("uio_hv_generic: defer opening vmbus until first use")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/0d86027b8eeed8e6360bc3d52bcdb328ff9bdca1.1620544055.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If 'vmbus_establish_gpadl()' fails, the (recv|send)_gpadl will not be
updated and 'hv_uio_cleanup()' in the error handling path will not be
able to free the corresponding buffer.
In such a case, we need to free the buffer explicitly.
Fixes: cdfa835c6e ("uio_hv_generic: defer opening vmbus until first use")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/4fdaff557deef6f0475d02ba7922ddbaa1ab08a6.1620544055.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit ef84928cff ("uio/uio_pci_generic: use device-managed function
equivalents") was able to simplify various error paths thanks to no
longer having to clean up on the way out. Some error paths were dropped,
others were simplified. In one of those simplifications, the return
value was accidentally changed from -ENODEV to -ENOMEM. Restore the old
return value.
Fixes: ef84928cff ("uio/uio_pci_generic: use device-managed function equivalents")
Cc: stable <stable@vger.kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Link: https://lore.kernel.org/r/20210422192240.1136373-1-martin.agren@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The snd_firewire_lib:amdtp_packet tracepoints event includes index of
packet processed in a context handling. However in IR context, it is not
calculated as expected.
Cc: <stable@vger.kernel.org>
Fixes: 753e717986 ("ALSA: firewire-lib: use packet descriptor for IR context")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20210513125652.110249-6-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The quadlets for CIP header is handled as a part of IR context header,
thus it doesn't join in IR context payload. However current calculation
includes the quadlets in IR context payload.
Cc: <stable@vger.kernel.org>
Fixes: f11453c7cc ("ALSA: firewire-lib: use 16 bytes IR context header to separate CIP header")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20210513125652.110249-5-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The check for size of isochronous packet payload just cares of the size of
IR context payload without the size of CIP header.
Cc: <stable@vger.kernel.org>
Fixes: f11453c7cc ("ALSA: firewire-lib: use 16 bytes IR context header to separate CIP header")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20210513125652.110249-4-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Mackie d.2 has an extension card for IEEE 1394 communication, which uses
BridgeCo DM1000 ASIC. On the other hand, Mackie d.4 Pro has built-in
function for IEEE 1394 communication by Oxford Semiconductor OXFW971,
according to schematic diagram available in Mackie website. Although I
misunderstood that Mackie d.2 Pro would be also a model with OXFW971,
it's wrong. Mackie d.2 Pro is a model which includes the extension card
as factory settings.
This commit fixes entries in Kconfig and comment in ALSA OXFW driver.
Cc: <stable@vger.kernel.org>
Fixes: fd6f4b0dc1 ("ALSA: bebob: Add skelton for BeBoB based devices")
Fixes: ec4dba5053 ("ALSA: oxfw: Add support for Behringer/Mackie devices")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20210513125652.110249-3-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Alesis iO 26 FireWire has two pairs of digital optical interface. It
delivers PCM frames from the interfaces by second isochronous packet
streaming. Although both of the interfaces are available at 44.1/48.0
kHz, first one of them is only available at 88.2/96.0 kHz. It reduces
the number of PCM samples to 4 in Multi Bit Linear Audio data channel
of data blocks on the second isochronous packet streaming.
This commit fixes hardcoded stream formats.
Cc: <stable@vger.kernel.org>
Fixes: 28b208f600 ("ALSA: dice: add parameters of stream formats for models produced by Alesis")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20210513125652.110249-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Some interrupt handlers have an "extra" that saves 1 or 2
registers (r14, r15) in the paca save area and makes them available to
use by the handler.
The change to always save nvgprs in exception handlers lead to some
interrupt handlers saving those scratch r14 / r15 registers into the
interrupt frame's GPR saves, which get restored on interrupt exit.
Fix this by always reloading those scratch registers from paca before
the EXCEPTION_COMMON that saves nvgprs.
Fixes: 4228b2c3d2 ("powerpc/64e/interrupt: always save nvgprs on interrupt")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210514044008.1955783-1-npiggin@gmail.com
The stf entry barrier fallback is unsafe to execute in a semi-patched
state, which can happen when enabling/disabling the mitigation with
strict kernel RWX enabled and using the hash MMU.
See the previous commit for more details.
Fix it by changing the order in which we patch the instructions.
Note the stf barrier fallback is only used on Power6 or earlier.
Fixes: bd573a8131 ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210513140800.1391706-2-mpe@ellerman.id.au
The entry flush mitigation can be enabled/disabled at runtime. When this
happens it results in the kernel patching its own instructions to
enable/disable the mitigation sequence.
With strict kernel RWX enabled instruction patching happens via a
secondary mapping of the kernel text, so that we don't have to make the
primary mapping writable. With the hash MMU this leads to a hash fault,
which causes us to execute the exception entry which contains the entry
flush mitigation.
This means we end up executing the entry flush in a semi-patched state,
ie. after we have patched the first instruction but before we patch the
second or third instruction of the sequence.
On machines with updated firmware the entry flush is a series of special
nops, and it's safe to to execute in a semi-patched state.
However when using the fallback flush the sequence is mflr/branch/mtlr,
and so it's not safe to execute if we have patched out the mflr but not
the other two instructions. Doing so leads to us corrputing LR, leading
to an oops, for example:
# echo 0 > /sys/kernel/debug/powerpc/entry_flush
kernel tried to execute exec-protected page (c000000002971000) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xc000000002971000
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 0 PID: 2215 Comm: bash Not tainted 5.13.0-rc1-00010-gda3bb206c9ce #1
NIP: c000000002971000 LR: c000000002971000 CTR: c000000000120c40
REGS: c000000013243840 TRAP: 0400 Not tainted (5.13.0-rc1-00010-gda3bb206c9ce)
MSR: 8000000010009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48428482 XER: 00000000
...
NIP 0xc000000002971000
LR 0xc000000002971000
Call Trace:
do_patch_instruction+0xc4/0x340 (unreliable)
do_entry_flush_fixups+0x100/0x3b0
entry_flush_set+0x50/0xe0
simple_attr_write+0x160/0x1a0
full_proxy_write+0x8c/0x110
vfs_write+0xf0/0x340
ksys_write+0x84/0x140
system_call_exception+0x164/0x2d0
system_call_common+0xec/0x278
The simplest fix is to change the order in which we patch the
instructions, so that the sequence is always safe to execute. For the
non-fallback flushes it doesn't matter what order we patch in.
Fixes: bd573a8131 ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210513140800.1391706-1-mpe@ellerman.id.au
The entry flush mitigation can be enabled/disabled at runtime via a
debugfs file (entry_flush), which causes the kernel to patch itself to
enable/disable the relevant mitigations.
However depending on which mitigation we're using, it may not be safe to
do that patching while other CPUs are active. For example the following
crash:
sleeper[15639]: segfault (11) at c000000000004c20 nip c000000000004c20 lr c000000000004c20
Shows that we returned to userspace with a corrupted LR that points into
the kernel, due to executing the partially patched call to the fallback
entry flush (ie. we missed the LR restore).
Fix it by doing the patching under stop machine. The CPUs that aren't
doing the patching will be spinning in the core of the stop machine
logic. That is currently sufficient for our purposes, because none of
the patching we do is to that code or anywhere in the vicinity.
Fixes: f79643787e ("powerpc/64s: flush L1D on kernel entry")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210506044959.1298123-2-mpe@ellerman.id.au
The STF (store-to-load forwarding) barrier mitigation can be
enabled/disabled at runtime via a debugfs file (stf_barrier), which
causes the kernel to patch itself to enable/disable the relevant
mitigations.
However depending on which mitigation we're using, it may not be safe to
do that patching while other CPUs are active. For example the following
crash:
User access of kernel address (c00000003fff5af0) - exploit attempt? (uid: 0)
segfault (11) at c00000003fff5af0 nip 7fff8ad12198 lr 7fff8ad121f8 code 1
code: 40820128 e93c00d0 e9290058 7c292840 40810058 38600000 4bfd9a81 e8410018
code: 2c030006 41810154 3860ffb6 e9210098 <e94d8ff0> 7d295279 39400000 40820a3c
Shows that we returned to userspace without restoring the user r13
value, due to executing the partially patched STF exit code.
Fix it by doing the patching under stop machine. The CPUs that aren't
doing the patching will be spinning in the core of the stop machine
logic. That is currently sufficient for our purposes, because none of
the patching we do is to that code or anywhere in the vicinity.
Fixes: a048a07d7f ("powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit")
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210506044959.1298123-1-mpe@ellerman.id.au
When we move one inode from one directory to another and both the inode
and its previous parent directory were logged before, we are not supposed
to have the dentry for the old parent if we have a power failure after the
log is synced. Only the new dentry is supposed to exist.
Generally this works correctly, however there is a scenario where this is
not currently working, because the old parent of the file/directory that
was moved is not authoritative for a range that includes the dir index and
dir item keys of the old dentry. This case is better explained with the
following example and reproducer:
# The test requires a very specific layout of keys and items in the
# fs/subvolume btree to trigger the bug. So we want to make sure that
# on whatever platform we are, we have the same leaf/node size.
#
# Currently in btrfs the node/leaf size can not be smaller than the page
# size (but it can be greater than the page size). So use the largest
# supported node/leaf size (64K).
$ mkfs.btrfs -f -n 65536 /dev/sdc
$ mount /dev/sdc /mnt
# "testdir" is inode 257.
$ mkdir /mnt/testdir
$ chmod 755 /mnt/testdir
# Create several empty files to have the directory "testdir" with its
# items spread over several leaves (7 in this case).
$ for ((i = 1; i <= 1200; i++)); do
echo -n > /mnt/testdir/file$i
done
# Create our test directory "dira", inode number 1458, which gets all
# its items in leaf 7.
#
# The BTRFS_DIR_ITEM_KEY item for inode 257 ("testdir") that points to
# the entry named "dira" is in leaf 2, while the BTRFS_DIR_INDEX_KEY
# item that points to that entry is in leaf 3.
#
# For this particular filesystem node size (64K), file count and file
# names, we endup with the directory entry items from inode 257 in
# leaves 2 and 3, as previously mentioned - what matters for triggering
# the bug exercised by this test case is that those items are not placed
# in leaf 1, they must be placed in a leaf different from the one
# containing the inode item for inode 257.
#
# The corresponding BTRFS_DIR_ITEM_KEY and BTRFS_DIR_INDEX_KEY items for
# the parent inode (257) are the following:
#
# item 460 key (257 DIR_ITEM 3724298081) itemoff 48344 itemsize 34
# location key (1458 INODE_ITEM 0) type DIR
# transid 6 data_len 0 name_len 4
# name: dira
#
# and:
#
# item 771 key (257 DIR_INDEX 1202) itemoff 36673 itemsize 34
# location key (1458 INODE_ITEM 0) type DIR
# transid 6 data_len 0 name_len 4
# name: dira
$ mkdir /mnt/testdir/dira
# Make sure everything done so far is durably persisted.
$ sync
# Now do a change to inode 257 ("testdir") that does not result in
# COWing leaves 2 and 3 - the leaves that contain the directory items
# pointing to inode 1458 (directory "dira").
#
# Changing permissions, the owner/group, updating or adding a xattr,
# etc, will not change (COW) leaves 2 and 3. So for the sake of
# simplicity change the permissions of inode 257, which results in
# updating its inode item and therefore change (COW) only leaf 1.
$ chmod 700 /mnt/testdir
# Now fsync directory inode 257.
#
# Since only the first leaf was changed/COWed, we log the inode item of
# inode 257 and only the dentries found in the first leaf, all have a
# key type of BTRFS_DIR_ITEM_KEY, and no keys of type
# BTRFS_DIR_INDEX_KEY, because they sort after the former type and none
# exist in the first leaf.
#
# We also log 3 items that represent ranges for dir items and dir
# indexes for which the log is authoritative:
#
# 1) a key of type BTRFS_DIR_LOG_ITEM_KEY, which indicates the log is
# authoritative for all BTRFS_DIR_ITEM_KEY keys that have an offset
# in the range [0, 2285968570] (the offset here is the crc32c of the
# dentry's name). The value 2285968570 corresponds to the offset of
# the first key of leaf 2 (which is of type BTRFS_DIR_ITEM_KEY);
#
# 2) a key of type BTRFS_DIR_LOG_ITEM_KEY, which indicates the log is
# authoritative for all BTRFS_DIR_ITEM_KEY keys that have an offset
# in the range [4293818216, (u64)-1] (the offset here is the crc32c
# of the dentry's name). The value 4293818216 corresponds to the
# offset of the highest key of type BTRFS_DIR_ITEM_KEY plus 1
# (4293818215 + 1), which is located in leaf 2;
#
# 3) a key of type BTRFS_DIR_LOG_INDEX_KEY, with an offset of 1203,
# which indicates the log is authoritative for all keys of type
# BTRFS_DIR_INDEX_KEY that have an offset in the range
# [1203, (u64)-1]. The value 1203 corresponds to the offset of the
# last key of type BTRFS_DIR_INDEX_KEY plus 1 (1202 + 1), which is
# located in leaf 3;
#
# Also, because "testdir" is a directory and inode 1458 ("dira") is a
# child directory, we log inode 1458 too.
$ xfs_io -c "fsync" /mnt/testdir
# Now move "dira", inode 1458, to be a child of the root directory
# (inode 256).
#
# Because this inode was previously logged, when "testdir" was fsynced,
# the log is updated so that the old inode reference, referring to inode
# 257 as the parent, is deleted and the new inode reference, referring
# to inode 256 as the parent, is added to the log.
$ mv /mnt/testdir/dira /mnt
# Now change some file and fsync it. This guarantees the log changes
# made by the previous move/rename operation are persisted. We do not
# need to do any special modification to the file, just any change to
# any file and sync the log.
$ xfs_io -c "pwrite -S 0xab 0 64K" -c "fsync" /mnt/testdir/file1
# Simulate a power failure and then mount again the filesystem to
# replay the log tree. We want to verify that we are able to mount the
# filesystem, meaning log replay was successful, and that directory
# inode 1458 ("dira") only has inode 256 (the filesystem's root) as
# its parent (and no longer a child of inode 257).
#
# It used to happen that during log replay we would end up having
# inode 1458 (directory "dira") with 2 hard links, being a child of
# inode 257 ("testdir") and inode 256 (the filesystem's root). This
# resulted in the tree checker detecting the issue and causing the
# mount operation to fail (with -EIO).
#
# This happened because in the log we have the new name/parent for
# inode 1458, which results in adding the new dentry with inode 256
# as the parent, but the previous dentry, under inode 257 was never
# removed - this is because the ranges for dir items and dir indexes
# of inode 257 for which the log is authoritative do not include the
# old dir item and dir index for the dentry of inode 257 referring to
# inode 1458:
#
# - for dir items, the log is authoritative for the ranges
# [0, 2285968570] and [4293818216, (u64)-1]. The dir item at inode 257
# pointing to inode 1458 has a key of (257 DIR_ITEM 3724298081), as
# previously mentioned, so the dir item is not deleted when the log
# replay procedure processes the authoritative ranges, as 3724298081
# is outside both ranges;
#
# - for dir indexes, the log is authoritative for the range
# [1203, (u64)-1], and the dir index item of inode 257 pointing to
# inode 1458 has a key of (257 DIR_INDEX 1202), as previously
# mentioned, so the dir index item is not deleted when the log
# replay procedure processes the authoritative range.
<power failure>
$ mount /dev/sdc /mnt
mount: /mnt: can't read superblock on /dev/sdc.
$ dmesg
(...)
[87849.840509] BTRFS info (device sdc): start tree-log replay
[87849.875719] BTRFS critical (device sdc): corrupt leaf: root=5 block=30539776 slot=554 ino=1458, invalid nlink: has 2 expect no more than 1 for dir
[87849.878084] BTRFS info (device sdc): leaf 30539776 gen 7 total ptrs 557 free space 2092 owner 5
[87849.879516] BTRFS info (device sdc): refs 1 lock_owner 0 current 2099108
[87849.880613] item 0 key (1181 1 0) itemoff 65275 itemsize 160
[87849.881544] inode generation 6 size 0 mode 100644
[87849.882692] item 1 key (1181 12 257) itemoff 65258 itemsize 17
(...)
[87850.562549] item 556 key (1458 12 257) itemoff 16017 itemsize 14
[87850.563349] BTRFS error (device dm-0): block=30539776 write time tree block corruption detected
[87850.564386] ------------[ cut here ]------------
[87850.564920] WARNING: CPU: 3 PID: 2099108 at fs/btrfs/disk-io.c:465 csum_one_extent_buffer+0xed/0x100 [btrfs]
[87850.566129] Modules linked in: btrfs dm_zero dm_snapshot (...)
[87850.573789] CPU: 3 PID: 2099108 Comm: mount Not tainted 5.12.0-rc8-btrfs-next-86 #1
(...)
[87850.587481] Call Trace:
[87850.587768] btree_csum_one_bio+0x244/0x2b0 [btrfs]
[87850.588354] ? btrfs_bio_fits_in_stripe+0xd8/0x110 [btrfs]
[87850.589003] btrfs_submit_metadata_bio+0xb7/0x100 [btrfs]
[87850.589654] submit_one_bio+0x61/0x70 [btrfs]
[87850.590248] submit_extent_page+0x91/0x2f0 [btrfs]
[87850.590842] write_one_eb+0x175/0x440 [btrfs]
[87850.591370] ? find_extent_buffer_nolock+0x1c0/0x1c0 [btrfs]
[87850.592036] btree_write_cache_pages+0x1e6/0x610 [btrfs]
[87850.592665] ? free_debug_processing+0x1d5/0x240
[87850.593209] do_writepages+0x43/0xf0
[87850.593798] ? __filemap_fdatawrite_range+0xa4/0x100
[87850.594391] __filemap_fdatawrite_range+0xc5/0x100
[87850.595196] btrfs_write_marked_extents+0x68/0x160 [btrfs]
[87850.596202] btrfs_write_and_wait_transaction.isra.0+0x4d/0xd0 [btrfs]
[87850.597377] btrfs_commit_transaction+0x794/0xca0 [btrfs]
[87850.598455] ? _raw_spin_unlock_irqrestore+0x32/0x60
[87850.599305] ? kmem_cache_free+0x15a/0x3d0
[87850.600029] btrfs_recover_log_trees+0x346/0x380 [btrfs]
[87850.601021] ? replay_one_extent+0x7d0/0x7d0 [btrfs]
[87850.601988] open_ctree+0x13c9/0x1698 [btrfs]
[87850.602846] btrfs_mount_root.cold+0x13/0xed [btrfs]
[87850.603771] ? kmem_cache_alloc_trace+0x7c9/0x930
[87850.604576] ? vfs_parse_fs_string+0x5d/0xb0
[87850.605293] ? kfree+0x276/0x3f0
[87850.605857] legacy_get_tree+0x30/0x50
[87850.606540] vfs_get_tree+0x28/0xc0
[87850.607163] fc_mount+0xe/0x40
[87850.607695] vfs_kern_mount.part.0+0x71/0x90
[87850.608440] btrfs_mount+0x13b/0x3e0 [btrfs]
(...)
[87850.629477] ---[ end trace 68802022b99a1ea0 ]---
[87850.630849] BTRFS: error (device sdc) in btrfs_commit_transaction:2381: errno=-5 IO failure (Error while writing out transaction)
[87850.632422] BTRFS warning (device sdc): Skipping commit of aborted transaction.
[87850.633416] BTRFS: error (device sdc) in cleanup_transaction:1978: errno=-5 IO failure
[87850.634553] BTRFS: error (device sdc) in btrfs_replay_log:2431: errno=-5 IO failure (Failed to recover log tree)
[87850.637529] BTRFS error (device sdc): open_ctree failed
In this example the inode we moved was a directory, so it was easy to
detect the problem because directories can only have one hard link and
the tree checker immediately detects that. If the moved inode was a file,
then the log replay would succeed and we would end up having both the
new hard link (/mnt/foo) and the old hard link (/mnt/testdir/foo) present,
but only the new one should be present.
Fix this by forcing re-logging of the old parent directory when logging
the new name during a rename operation. This ensures we end up with a log
that is authoritative for a range covering the keys for the old dentry,
therefore causing the old dentry do be deleted when replaying the log.
A test case for fstests will follow up soon.
Fixes: 64d6b281ba ("btrfs: remove unnecessary check_parent_dirs_for_sync()")
CC: stable@vger.kernel.org # 5.12+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
`xfs_io -c 'fiemap <off> <len>' <file>`
can give surprising results on btrfs that differ from xfs.
btrfs prints out extents trimmed to fit the user input. If the user's
fiemap request has an offset, then rather than returning each whole
extent which intersects that range, we also trim the start extent to not
have start < off.
Documentation in filesystems/fiemap.txt and the xfs_io man page suggests
that returning the whole extent is expected.
Some cases which all yield the same fiemap in xfs, but not btrfs:
dd if=/dev/zero of=$f bs=4k count=1
sudo xfs_io -c 'fiemap 0 1024' $f
0: [0..7]: 26624..26631
sudo xfs_io -c 'fiemap 2048 1024' $f
0: [4..7]: 26628..26631
sudo xfs_io -c 'fiemap 2048 4096' $f
0: [4..7]: 26628..26631
sudo xfs_io -c 'fiemap 3584 512' $f
0: [7..7]: 26631..26631
sudo xfs_io -c 'fiemap 4091 5' $f
0: [7..6]: 26631..26630
I believe this is a consequence of the logic for merging contiguous
extents represented by separate extent items. That logic needs to track
the last offset as it loops through the extent items, which happens to
pick up the start offset on the first iteration, and trim off the
beginning of the full extent. To fix it, start `off` at 0 rather than
`start` so that we keep the iteration/merging intact without cutting off
the start of the extent.
after the fix, all the above commands give:
0: [0..7]: 26624..26631
The merging logic is exercised by fstest generic/483, and I have written
a new fstest for checking we don't have backwards or zero-length fiemaps
for cases like those above.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Generally a delayed iput is added when we might do the final iput, so
usually we'll end up sleeping while processing the delayed iputs
naturally. However there's no guarantee of this, especially for small
files. In production we noticed 5 instances of RCU stalls while testing
a kernel release overnight across 1000 machines, so this is relatively
common:
host count: 5
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: ....: (20998 ticks this GP) idle=59e/1/0x4000000000000002 softirq=12333372/12333372 fqs=3208
(t=21031 jiffies g=27810193 q=41075) NMI backtrace for cpu 1
CPU: 1 PID: 1713 Comm: btrfs-cleaner Kdump: loaded Not tainted 5.6.13-0_fbk12_rc1_5520_gec92bffc1ec9 #1
Call Trace:
<IRQ> dump_stack+0x50/0x70
nmi_cpu_backtrace.cold.6+0x30/0x65
? lapic_can_unplug_cpu.cold.30+0x40/0x40
nmi_trigger_cpumask_backtrace+0xba/0xca
rcu_dump_cpu_stacks+0x99/0xc7
rcu_sched_clock_irq.cold.90+0x1b2/0x3a3
? trigger_load_balance+0x5c/0x200
? tick_sched_do_timer+0x60/0x60
? tick_sched_do_timer+0x60/0x60
update_process_times+0x24/0x50
tick_sched_timer+0x37/0x70
__hrtimer_run_queues+0xfe/0x270
hrtimer_interrupt+0xf4/0x210
smp_apic_timer_interrupt+0x5e/0x120
apic_timer_interrupt+0xf/0x20 </IRQ>
RIP: 0010:queued_spin_lock_slowpath+0x17d/0x1b0
RSP: 0018:ffffc9000da5fe48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: ffff889fa81d0cd8 RCX: 0000000000000029
RDX: ffff889fff86c0c0 RSI: 0000000000080000 RDI: ffff88bfc2da7200
RBP: ffff888f2dcdd768 R08: 0000000001040000 R09: 0000000000000000
R10: 0000000000000001 R11: ffffffff82a55560 R12: ffff88bfc2da7200
R13: 0000000000000000 R14: ffff88bff6c2a360 R15: ffffffff814bd870
? kzalloc.constprop.57+0x30/0x30
list_lru_add+0x5a/0x100
inode_lru_list_add+0x20/0x40
iput+0x1c1/0x1f0
run_delayed_iput_locked+0x46/0x90
btrfs_run_delayed_iputs+0x3f/0x60
cleaner_kthread+0xf2/0x120
kthread+0x10b/0x130
Fix this by adding a cond_resched_lock() to the loop processing delayed
iputs so we can avoid these sort of stalls.
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 7000babdda ("btrfs: assign proper values to a bool variable in
dev_extent_hole_check_zoned") assigned false to the hole_start parameter
of dev_extent_hole_check_zoned().
The hole_start parameter is not boolean and returns the start location of
the found hole.
Fixes: 7000babdda ("btrfs: assign proper values to a bool variable in dev_extent_hole_check_zoned")
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
MAINTAINERS update.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCYJ0rMAAKCRDj7w1vZxhR
xXy7AQCoRykDurhX833kRT3RuMZpxNdxVybC/P8buYVGtHwC3QD8D7yXL+DY8/mt
iJGaVnb+ZBp8XsO034ROlVux/pB1DA8=
=3uhR
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-fixes-2021-05-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Remove an unused function and a MAINTAINERS update.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20210513133617.xq77wwrehpuh7yn2@hendrix
This reverts commit 4667a6fc17.
Takashi writes:
I have already started working on the bigger cleanup of this driver
code based on 5.13-rc1, so could you drop this revert?
I missed our previous discussion about this, my fault for applying it.
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- Make intel_pstate work as expected on systems where the platform
firmware enables HWP even though the HWP EPP support is not
advertised (Rafael Wysocki).
- Fix possible runtime PM child count imbalance that may
occur if other runtime PM functions are called after invoking
pm_runtime_force_suspend() and before pm_runtime_force_resume()
is called (Tony Lindgren).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmCddyMSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxMRkP/jcq9A6eWOQoGBtT4vrIANt1/S+YcjNi
PFNa8BIUN80lyA+em3v0t8KUcfFMQhvbXK+KDOMqE3PpzgxbVZvY5t7Lxr/dhtOI
ZQRAuhcspt133hSlBzafonm1k/y2BVI4mEVivB+ElDHyY14Ll9vEmbvBcsFNJha7
E6DvSOBRSqDsFSUKIFnhz9FuFa/HaUcrcILTu4wRGcCGEt4kF+XrPaVvAwWveNfq
8TJJj3MK6QdpoVG+hXebzj7/A2iWlVymFoT/2+B71tQgdYabVbeFbkyV467h5IKz
rLaiUtDnSPTletnbJ2WNphgY902LFHBPJ99fpnfk61BcHiDXmf7PP2z8uxIwU4S4
1+ggNbGwxOBjr3FRLmoZdg9lZG0zEf9GjcO78Ivq2u1pqz0UYWrSyGzl0Ye/DOmc
9bGZsZrf6vvL64Ye60VpW68bevH0oDhZ+84ihTIN+Clz6+8bJVLr3m4qGFdXd0gm
8kBb4n3UAT9VR4Wj/Joj/Ocv+lO82CeEi/Ti/xjA+27p7SFRm3manFLP9w4SllUQ
ky6sywUATVjNi76WkIHk4XnCjAQjR83N41xwPX8WEXhHxwXGbeCG9LIjTZ5r2pVD
Ps4FUeMB2q9E5NZmv1L2j+r79YdWWYtYY9D5kdoto6WXuJRR5OB1oKScBFNc9MS8
pFVLW8kPBNAj
=0GZZ
-----END PGP SIGNATURE-----
Merge tag 'pm-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These close a coverage gap in the intel_pstate driver and fix runtime
PM child count imbalance related to interactions with system-wide
suspend.
Specifics:
- Make intel_pstate work as expected on systems where the platform
firmware enables HWP even though the HWP EPP support is not
advertised (Rafael Wysocki).
- Fix possible runtime PM child count imbalance that may occur if
other runtime PM functions are called after invoking
pm_runtime_force_suspend() and before pm_runtime_force_resume()
is called (Tony Lindgren)"
* tag 'pm-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Fix unpaired parent child_count for force_resume
cpufreq: intel_pstate: Use HWP if enabled by platform firmware
- Revert a revert of a recent ACPI power management change that
does not need to be reverted after all (Rafael Wysocki).
- Add missing fan device ID to the list of device IDs for which
the devices should not be put into the ACPI PM domain (Sumeet
Pawnikar).
- Fix possible memory leak in an error path in the ACPI device
enumeration code (Christophe JAILLET).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmCddrASHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxaz8P/iYD+HR24PG8tTDrPlHYJEAsR7ZPLTDM
XzxI5yt6rnN+lfhS/y5HZnQZJ89GonIm+MqAMiaL9TludHi5uzE0a66LkpWfrRGH
z7C507sXTF+jbId5xPhwx0+++EDrYBTevdPLvRHwWVh2MOvxpEw7ja5qyTIpwhth
z+q+F2KYFU/jpOSXkdTtaKxi6ncHhSpT3NDZ0UCpIPd7kTQ6KBjj1CfQoTMj2y3/
lsctuAd4gUCwiPWhSQYwRtABkdxhqRd/zRs8Bo61M6FH1aTo5NcjGR6lHZjeaC1Z
ObMkEEz3/Dq+zFB4oCY6ehaOOBnO6DsNmtoNxzENLL1adDO5qfDGNrAN8qXS+uhk
pMhwzw0yPR8YVFmxZ0wP1cHxKr2lnqm3qomyc+MMyifN+xXjo3xFqREoonkp92AD
QQw1apfd/HOlsSLVWrKViJzOCgTRtJjAnHr4DVZmjTP5I46rg206pthi8PSlXm2j
AyRASrrkQbjDM7M/5Enc+ztZ6GHQc0DHu9awQ9ezt0ONMRI2UZ/yY/4Tra+4mWOO
1/tCjxjup0Z+tz0/ZuTbXgcFYn6Gejjcq4sJzor236mWXUwAi+byni9j0W5GuVTw
fFs3WtSehVBiukweLI4IZJXpMxoGZ9p+weKeURJ5brfbL/oT+cfbnPW3Kqdv5wD3
qPdzAz7Bem1s
=N/vn
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert an unnecessary revert of an ACPI power management commit,
add a missing device ID to one of the lists and fix a possible memory
leak in an error path.
Specifics:
- Revert a revert of a recent ACPI power management change that does
not need to be reverted after all (Rafael Wysocki).
- Add missing fan device ID to the list of device IDs for which the
devices should not be put into the ACPI PM domain (Sumeet
Pawnikar).
- Fix possible memory leak in an error path in the ACPI device
enumeration code (Christophe JAILLET)"
* tag 'acpi-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: Add ACPI ID of Alder Lake Fan
ACPI: scan: Fix a memory leak in an error handling path
Revert "Revert "ACPI: scan: Turn off unused power resources during initialization""
Use the types __le* instead of __u* to fix sparse warnings.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Revert the commit 7a5b96b478 ("dm integrity:
use discard support when recalculating").
There's a bug that when we write some data beyond the current recalculate
boundary, the checksum will be rewritten with the discard filler later.
And the data will no longer have integrity protection. There's no easy
fix for this case.
Also, another problematic case is if dm-integrity is used to detect
bitrot (random device errors, bit flips, etc); dm-integrity should
detect that even for unused sectors. With commit 7a5b96b478 it can
happen that such change is undetected (because discard filler is not a
valid checksum).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Remove a vpr_info which I added in 2012, when I knew even less than now.
In 2020, a simpler pr_fmt stripped it of context, and any remaining value.
no functional change.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20210504222235.1033685-3-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wrap function in a static-inline one, which checks flags to avoid
calling the function unnecessarily.
And hoist its output-buffer initialization to the grand-caller, which
is already allocating the buffer on the stack, and can trivially
initialize it too.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20210504222235.1033685-2-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The following commands will crash the kernel:
modprobe brd rd_size=1048576
dmsetup create o --table "0 `blockdev --getsize /dev/ram0` snapshot-origin /dev/ram0"
dmsetup create s --table "0 `blockdev --getsize /dev/ram0` snapshot /dev/ram0 /dev/ram1 N 0"
The reason is that when we test for zero chunk size, we jump to the label
bad_read_metadata without setting the "r" variable. The function
snapshot_ctr destroys all the structures and then exits with "r == 0". The
kernel then crashes because it falsely believes that snapshot_ctr
succeeded.
In order to fix the bug, we set the variable "r" to -EINVAL.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
All the other ioctl paths return EFAULT in case the
copy_from_user/copy_to_user call fails, make oneway spam detection
follow the same paradigm.
Fixes: a7dc1e6f99 ("binder: tell userspace to dump current backtrace when detected oneway spamming")
Acked-by: Todd Kjos <tkjos@google.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
Link: https://lore.kernel.org/r/20210506193726.45118-1-luca.stefani.ge1@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a trace event uses the %*.s notation, the trace_check_vprintf() will
fail and will warn about a bad processing of strings, because it does not
take into account the length field when processing the star (*) part.
Have it handle this case as well.
Link: https://lore.kernel.org/linux-nfs/238C0E2D-C2A4-4578-ADD2-C565B3B99842@oracle.com/
Reported-by: Chuck Lever III <chuck.lever@oracle.com>
Fixes: 9a6944fee6 ("tracing: Add a verifier to check string pointers for trace events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Merge VT_RESIZEX fixes from Maciej Rozycki:
"I got to the bottom of the issue with VT_RESIZEX recently discussed
and came up with this small patch series, fixing an additional issue
that I originally thought might be broken VGA hardware emulation with
my laptop, which however turned out to be intertwined with the
original problem and also a regression introduced somewhat later.
The fix for that because the first patch, and then to make backporting
feasible I had to put a revert of the offending change from last
September next, followed by a proper fix for the framebuffer issue
that change had tried to address.
See individual change descriptions for details.
These have been verified with true VGA hardware (a Trident TVGA8900
ISA video adapter) using various combinations of `svgatextmode' and
`setfont' command invocations to change both the VT size and the font
size, and also switching between the text console and X11, both by
starting/stopping the X server and by switching between VTs.
All this to ensure bringing the behaviour of VGA text console back to
correct operation as it used to be with Linux 2.6.18"
* emailed patches from Maciej W. Rozycki <macro@orcam.me.uk>:
vt: Fix character height handling with VT_RESIZEX
vt_ioctl: Revert VT_RESIZEX parameter handling removal
vgacon: Record video mode changes with VT_RESIZEX
Restore the original intent of the VT_RESIZEX ioctl's `v_clin' parameter
which is the number of pixel rows per character (cell) rather than the
height of the font used.
For framebuffer devices the two values are always the same, because the
former is inferred from the latter one. For VGA used as a true text
mode device these two parameters are independent from each other: the
number of pixel rows per character is set in the CRT controller, while
font height is in fact hardwired to 32 pixel rows and fonts of heights
below that value are handled by padding their data with blanks when
loaded to hardware for use by the character generator. One can change
the setting in the CRT controller and it will update the screen contents
accordingly regardless of the font loaded.
The `v_clin' parameter is used by the `vgacon' driver to set the height
of the character cell and then the cursor position within. Make the
parameter explicit then, by defining a new `vc_cell_height' struct
member of `vc_data', set it instead of `vc_font.height' from `v_clin' in
the VT_RESIZEX ioctl, and then use it throughout the `vgacon' driver
except where actual font data is accessed which as noted above is
independent from the CRTC setting.
This way the framebuffer console driver is free to ignore the `v_clin'
parameter as irrelevant, as it always should have, avoiding any issues
attempts to give the parameter a meaning there could have caused, such
as one that has led to commit 988d076336 ("vt_ioctl: make VT_RESIZEX
behave like VT_RESIZE"):
"syzbot is reporting UAF/OOB read at bit_putcs()/soft_cursor() [1][2],
for vt_resizex() from ioctl(VT_RESIZEX) allows setting font height
larger than actual font height calculated by con_font_set() from
ioctl(PIO_FONT). Since fbcon_set_font() from con_font_set() allocates
minimal amount of memory based on actual font height calculated by
con_font_set(), use of vt_resizex() can cause UAF/OOB read for font
data."
The problem first appeared around Linux 2.5.66 which predates our repo
history, but the origin could be identified with the old MIPS/Linux repo
also at: <git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux.git>
as commit 9736a3546de7 ("Merge with Linux 2.5.66."), where VT_RESIZEX
code in `vt_ioctl' was updated as follows:
if (clin)
- video_font_height = clin;
+ vc->vc_font.height = clin;
making the parameter apply to framebuffer devices as well, perhaps due
to the use of "font" in the name of the original `video_font_height'
variable. Use "cell" in the new struct member then to avoid ambiguity.
References:
[1] https://syzkaller.appspot.com/bug?id=32577e96d88447ded2d3b76d71254fb855245837
[2] https://syzkaller.appspot.com/bug?id=6b8355d27b2b94fb5cedf4655e3a59162d9e48e3
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org # v2.6.12+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert the removal of code handling extra VT_RESIZEX ioctl's parameters
beyond those that VT_RESIZE supports, fixing a functional regression
causing `svgatextmode' not to resize the VT anymore.
As a consequence of the reverted change when the video adapter is
reprogrammed from the original say 80x25 text mode using a 9x16
character cell (720x400 pixel resolution) to say 80x37 text mode and the
same character cell (720x592 pixel resolution), the VT geometry does not
get updated and only upper two thirds of the screen are used for the VT,
and the lower part remains blank. The proportions change according to
text mode geometries chosen.
Revert the change verbatim then, bringing back previous VT resizing.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: 988d076336 ("vt_ioctl: make VT_RESIZEX behave like VT_RESIZE")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>