Commit Graph

873723 Commits

Author SHA1 Message Date
Juergen Gross 7521d9b90c xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
Commit 3777ea7bac upstream.

Letting xenbus_grant_ring() tear down grants in the error case is
problematic, as the other side could already have used these grants.
Calling gnttab_end_foreign_access_ref() without checking success is
resulting in an unclear situation for any caller of xenbus_grant_ring()
as in the error case the memory pages of the ring page might be
partially mapped. Freeing them would risk unwanted foreign access to
them, while not freeing them would leak memory.

In order to remove the need to undo any gnttab_grant_foreign_access()
calls, use gnttab_alloc_grant_references() to make sure no further
error can occur in the loop granting access to the ring pages.

It should be noted that this way of handling removes leaking of
grant entries in the error case, too.

This is CVE-2022-23040 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:34 +08:00
Juergen Gross 3fd2171f05 xen/gntalloc: don't use gnttab_query_foreign_access()
Commit d3b6372c58 upstream.

Using gnttab_query_foreign_access() is unsafe, as it is racy by design.

The use case in the gntalloc driver is not needed at all. While at it
replace the call of gnttab_end_foreign_access_ref() with a call of
gnttab_end_foreign_access(), which is what is really wanted there. In
case the grant wasn't used due to an allocation failure, just free the
grant via gnttab_free_grant_reference().

This is CVE-2022-23039 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:33 +08:00
Juergen Gross 30379972d3 xen/grant-table: add gnttab_try_end_foreign_access()
Commit 6b1775f26a upstream.

Add a new grant table function gnttab_try_end_foreign_access(), which
will remove and free a grant if it is not in use.

Its main use case is to either free a grant if it is no longer in use,
or to take some other action if it is still in use. This other action
can be an error exit, or (e.g. in the case of blkfront persistent grant
feature) some special handling.

This is CVE-2022-23036, CVE-2022-23038 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:33 +08:00
Juergen Gross 500d9c8472 xen/console: harden hvc_xen against event channel storms
commit fe415186b4 upstream.

The Xen console driver is still vulnerable for an attack via excessive
number of events sent by the backend. Fix that by using a lateeoi event
channel.

For the normal domU initial console this requires the introduction of
bind_evtchn_to_irq_lateeoi() as there is no xenbus device available
at the time the event channel is bound to the irq.

As the decision whether an interrupt was spurious or not requires to
test for bytes having been read from the backend, move sending the
event into the if statement, as sending an event without having found
any bytes to be read is making no sense at all.

This is part of XSA-391

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:32 +08:00
Greg Kroah-Hartman 49dad487c9 usb: gadget: rndis: check size of RNDIS_MSG_SET command
commit 38ea1eac7d upstream.

Check the size of the RNDIS_MSG_SET command given to us before
attempting to respond to an invalid message size.

Reported-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Cc: stable@kernel.org
Tested-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:32 +08:00
Szymon Heidrich 3425449eb4 USB: gadget: validate interface OS descriptor requests
commit 75e5b4849b upstream.

Stall the control endpoint in case provided index exceeds array size of
MAX_CONFIG_INTERFACES or when the retrieved function pointer is null.

Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:32 +08:00
Hangyu Hua 1a4b25b4d5 usb: gadget: don't release an existing dev->buf
commit 89f3594d0d upstream.

dev->buf does not need to be released if it already exists before
executing dev_config.

Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Link: https://lore.kernel.org/r/20211231172138.7993-2-hbh25y@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:31 +08:00
Hans de Goede 3c6773d64f HID: asus: Add depends on USB_HID to HID_ASUS Kconfig option
commit c4f0126d48 upstream.

Since commit 4bc43a4212 ("HID: asus: Add
hid_is_using_ll_driver(usb_hid_driver) check") the hid-asus.c depends
on the usb_hid_driver symbol. Add a depends on USB_HID to Kconfig to
fix missing symbols errors in hid-asus when USB_HID is not enabled.

Fixes: 4bc43a4212 ("HID: asus: Add hid_is_using_ll_driver(usb_hid_driver) check")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Jason Self <jason@bluehome.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:31 +08:00
Benjamin Tissoires def3308a8e HID: holtek: fix mouse probing
commit 93a2207c25 upstream.

An overlook from the previous commit: we don't even parse or start the
device, meaning that the device is not presented to user space.

Fixes: 93020953d0 ("HID: check for valid USB device for many HID drivers")
Cc: stable@vger.kernel.org
Link: https://bugs.archlinux.org/task/73048
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215341
Link: https://lore.kernel.org/r/e4efbf13-bd8d-0370-629b-6c80c0044b15@leemhuis.info/
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:31 +08:00
Greg Kroah-Hartman 86c935bd12 HID: check for valid USB device for many HID drivers
commit 93020953d0 upstream.

Many HID drivers assume that the HID device assigned to them is a USB
device as that was the only way HID devices used to be able to be
created in Linux.  However, with the additional ways that HID devices
can be created for many different bus types, that is no longer true, so
properly check that we have a USB device associated with the HID device
before allowing a driver that makes this assumption to claim it.

Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Michael Zaidman <michael.zaidman@gmail.com>
Cc: Stefan Achatz <erazor_de@users.sourceforge.net>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: linux-input@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
[bentiss: amended for thrustmater.c hunk to apply]
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20211201183503.2373082-3-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:30 +08:00
Greg Kroah-Hartman dae308fcab HID: wacom: fix problems when device is not a valid USB device
commit 720ac46720 upstream.

The wacom driver accepts devices of more than just USB types, but some
code paths can cause problems if the device being controlled is not a
USB device due to a lack of checking.  Add the needed checks to ensure
that the USB device accesses are only happening on a "real" USB device,
and not one on some other bus.

Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: linux-input@vger.kernel.org
Cc: stable@vger.kernel.org
Tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20211201183503.2373082-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:30 +08:00
Benjamin Tissoires 8e28f1143c HID: bigbenff: prevent null pointer dereference
commit 918aa1ef10 upstream.

When emulating the device through uhid, there is a chance we don't have
output reports and so report_field is null.

Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20211202095334.14399-3-benjamin.tissoires@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:29 +08:00
Greg Kroah-Hartman dac4a35233 HID: add USB_HID dependancy on some USB HID drivers
commit f237d9028f upstream.

Some HID drivers are only for USB drivers, yet did not depend on
CONFIG_USB_HID.  This was hidden by the fact that the USB functions were
stubbed out in the past, but now that drivers are checking for USB
devices properly, build errors can occur with some random
configurations.

Reported-by: kernel test robot <lkp@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20211202114819.2511954-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:29 +08:00
Greg Kroah-Hartman d16750e2d1 HID: add USB_HID dependancy to hid-chicony
commit d080811f27 upstream.

The chicony HID driver only controls USB devices, yet did not have a
dependancy on USB_HID.  This causes build errors on some configurations
like sparc when building due to new changes to the chicony driver.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: stable@vger.kernel.org
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20211203075927.2829218-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:29 +08:00
Greg Kroah-Hartman 1a0d2341de HID: add USB_HID dependancy to hid-prodikeys
commit 30cb3c2ad2 upstream.

The prodikeys HID driver only controls USB devices, yet did not have a
dependancy on USB_HID.  This causes build errors on some configurations
like nios2 when building due to new changes to the prodikeys driver.

Reported-by: kernel test robot <lkp@intel.com>
Cc: stable@vger.kernel.org
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20211203081231.2856936-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:28 +08:00
Greg Kroah-Hartman 8c8e51fb23 HID: add hid_is_usb() function to make it simpler for USB detection
commit f83baa0cb6 upstream.

A number of HID drivers already call hid_is_using_ll_driver() but only
for the detection of if this is a USB device or not.  Make this more
obvious by creating hid_is_usb() and calling the function that way.

Also converts the existing hid_is_using_ll_driver() functions to use the
new call.

Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: linux-input@vger.kernel.org
Cc: stable@vger.kernel.org
Tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20211201183503.2373082-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:28 +08:00
Greg Kroah-Hartman 0d5a02c47a USB: gadget: detect too-big endpoint 0 requests
commit 153a2d7e33 upstream.

Sometimes USB hosts can ask for buffers that are too large from endpoint
0, which should not be allowed.  If this happens for OUT requests, stall
the endpoint, but for IN requests, trim the request size to the endpoint
buffer size.

Co-developed-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:28 +08:00
Takashi Iwai 1f9b52756c ALSA: pcm: Fix races among concurrent hw_params and hw_free calls
commit 92ee3c60ec upstream.

Currently we have neither proper check nor protection against the
concurrent calls of PCM hw_params and hw_free ioctls, which may result
in a UAF.  Since the existing PCM stream lock can't be used for
protecting the whole ioctl operations, we need a new mutex to protect
those racy calls.

This patch introduced a new mutex, runtime->buffer_mutex, and applies
it to both hw_params and hw_free ioctl code paths.  Along with it, the
both functions are slightly modified (the mmap_count check is moved
into the state-check block) for code simplicity.

Reported-by: Hu Jiahui <kirin.say@gmail.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20220322170720.3529-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[OP: backport to 5.4: adjusted context]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:27 +08:00
Christian Löhle 033f9059dc mmc: block: fix read single on recovery logic
commit 54309fde1a upstream.

On reads with MMC_READ_MULTIPLE_BLOCK that fail,
the recovery handler will use MMC_READ_SINGLE_BLOCK for
each of the blocks, up to MMC_READ_SINGLE_RETRIES times each.
The logic for this is fixed to never report unsuccessful reads
as success to the block layer.

On command error with retries remaining, blk_update_request was
called with whatever value error was set last to.
In case it was last set to BLK_STS_OK (default), the read will be
reported as success, even though there was no data read from the device.
This could happen on a CRC mismatch for the response,
a card rejecting the command (e.g. again due to a CRC mismatch).
In case it was last set to BLK_STS_IOERR, the error is reported correctly,
but no retries will be attempted.

Fixes: 81196976ed ("mmc: block: Add blk-mq support")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/bc706a6ab08c4fe2834ba0c05a804672@hyperstone.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:27 +08:00
Jens Wiklander e033a847a3 tee: handle lookup of shm with reference count 0
commit dfd0743f1d upstream.

Since the tee subsystem does not keep a strong reference to its idle
shared memory buffers, it races with other threads that try to destroy a
shared memory through a close of its dma-buf fd or by unmapping the
memory.

In tee_shm_get_from_id() when a lookup in teedev->idr has been
successful, it is possible that the tee_shm is in the dma-buf teardown
path, but that path is blocked by the teedev mutex. Since we don't have
an API to tell if the tee_shm is in the dma-buf teardown path or not we
must find another way of detecting this condition.

Fix this by doing the reference counting directly on the tee_shm using a
new refcount_t refcount field. dma-buf is replaced by using
anon_inode_getfd() instead, this separates the life-cycle of the
underlying file from the tee_shm. tee_shm_put() is updated to hold the
mutex when decreasing the refcount to 0 and then remove the tee_shm from
teedev->idr before releasing the mutex. This means that the tee_shm can
never be found unless it has a refcount larger than 0.

Fixes: 967c9cca2c ("tee: generic TEE subsystem")
Cc: stable@vger.kernel.org
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Lars Persson <larper@axis.com>
Reviewed-by: Sumit Garg <sumit.garg@linaro.org>
Reported-by: Patrik Lantz <patrik.lantz@axis.com>
[JW: backport to 5.4-stable]
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:26 +08:00
Greg Kroah-Hartman c045c85dd3 moxart: fix potential use-after-free on remove path
commit bd2db32e7c upstream.

It was reported that the mmc host structure could be accessed after it
was freed in moxart_remove(), so fix this by saving the base register of
the device and using it instead of the pointer dereference.

Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Cc: Xin Xiong <xiongx18@fudan.edu.cn>
Cc: Xin Tan <tanxin.ctf@gmail.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: linux-mmc@vger.kernel.org
Cc: stable <stable@vger.kernel.org>
Reported-by: whitehat002 <hackyzh002@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220127071638.4057899-1-gregkh@linuxfoundation.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:26 +08:00
Mathias Krause 1b1293c2b9 drm/vmwgfx: Fix stale file descriptors on failed usercopy
commit a0f90c8815 upstream.

A failing usercopy of the fence_rep object will lead to a stale entry in
the file descriptor table as put_unused_fd() won't release it. This
enables userland to refer to a dangling 'file' object through that still
valid file descriptor, leading to all kinds of use-after-free
exploitation scenarios.

Fix this by deferring the call to fd_install() until after the usercopy
has succeeded.

Fixes: c906965dee ("drm/vmwgfx: Add export fence to file descriptor support")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:26 +08:00
Tvrtko Ursulin de3f325935 drm/i915: Flush TLBs before releasing backing store
commit 7938d61591 upstream.

We need to flush TLBs before releasing backing store otherwise userspace
is able to encounter stale entries if a) it is not declaring access to
certain buffers and b) it races with the backing store release from a
such undeclared execution already executing on the GPU in parallel.

The approach taken is to mark any buffer objects which were ever bound
to the GPU and to trigger a serialized TLB flush when their backing
store is released.

Alternatively the flushing could be done on VMA unbind, at which point
we would be able to ascertain whether there is potential a parallel GPU
execution (which could race), but essentially it boils down to paying
the cost of TLB flushes potentially needlessly at VMA unbind time (when
the backing store is not known to be going away so not needed for
safety), versus potentially needlessly at backing store relase time
(since we at that point cannot tell whether there is anything executing
on the GPU which uses that object).

Thereforce simplicity of implementation has been chosen for now with
scope to benchmark and refine later as required.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: samuelliao <samuelliao@tencent.com>
2024-06-11 20:41:25 +08:00
johnnyaiai b9837b4aca Revert 'sched: adaptive default skew_tick value'
Signed-off-by: johnnyaiai <johnnyaiai@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
2024-06-11 20:41:25 +08:00
johnnyaiai c30056cab7 config/ARM64/performance: Enable CONFIG_ASYNC_PAGE_LOCKING
Signed-off-by: johnnyaiai <johnnyaiai@tencent.com>
2024-06-11 20:41:25 +08:00
johnnyaiai e9417015a3 ARM64/conf: Disable CONFIG_FS_ENCRYPTION
[tapd]
ID877978657

This configuration resulted in a 10% regression on
unixbench's execl testing.

Signed-off-by: johnnyaiai <johnnyaiai@tencent.com>
2024-06-11 20:41:24 +08:00
johnnyaiai df14100ebc ARM64/Kconfig: Add CONFIG_ASYNC_PAGE_LOCKING
Signed-off-by: johnnyaiai <johnnyaiai@tencent.com>
Reviewed-by: richardni <richardni@tencent.com>
2024-06-11 20:41:24 +08:00
johnnyaiai 3686f44e66 ARM64/conf: Disable CONFIG_RODATA_FULL_DEFAULT_ENABLED
[tapd]
ID877978657

This configuration resulted in a 15% regression on
unixbench's execl testing.

This additional enhancement can be turned on with
rodata=full after this patch.

Signed-off-by: johnnyaiai <johnnyaiai@tencent.com>
2024-06-11 20:41:24 +08:00
johnnyaiai 29ebe78817 ARM64/conf: Adjust PAGE_SIZE to 4K from 64K
[tapd]
ID880199289

Change default page size from 64K to 4K for ARM64.

Following configs are involved:
CONFIG_ARM64_PAGE_SHIFT=12
CONFIG_ARM64_CONT_SHIFT=4
CONFIG_ARCH_MMAP_RND_BITS_MIN=18
CONFIG_ARCH_MMAP_RND_BITS_MAX=33
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=11
CONFIG_PGTABLE_LEVELS=4
CONFIG_ARM64_4K_PAGES=y
CONFIG_ARM64_64K_PAGES=n
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_FORCE_MAX_ZONEORDER=11

Signed-off-by: johnnyaiai <johnnyaiai@tencent.com>
Reviewed-by: flyingpeng <flyingpeng@tencent.com>
2024-06-11 20:41:23 +08:00
Jianping Liu 1c606f5c80 open CONFIG_JUMP_LABEL to optimize very unlikely/likely branches
This option enables a transparent branch optimization that
makes certain almost-always-true or almost-always-false branch
conditions even cheaper to execute within the kernel.

Signed-off-by: johnnyaiai <johnnyaiai@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:23 +08:00
johnnyaiai 793146009d config/x86: Disable CONFIG_LATENCYTOP by default
Performance degradation due to multi-core contending for
global spinlock.

Signed-off-by: johnnyaiai <johnnyaiai@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:22 +08:00
johnnyaiai d3adf65bb5 config/performance: Enable CONFIG_ASYNC_PAGE_LOCKING
Signed-off-by: johnnyaiai <johnnyaiai@tencent.com>
2024-06-11 20:41:22 +08:00
Ni Xun ac0afcc6c6 mm/filemap/c: break generic_file_buffered_read up into multiple functions
Patch series "generic_file_buffered_read() improvements", v2.

upstream commit id: 723ef24b9b

generic_file_buffered_read() has turned into a real monstrosity to work
with.  And it's a major performance improvement, for both small random and
large sequential reads.  On my test box, 4k buffered random reads go from
~150k to ~250k iops, and the improvements to big sequential reads are even
bigger.

This incorporates the fix for IOCB_WAITQ handling that Jens just posted as
well, also factors out lock_page_for_iocb() to improve handling of the
various iocb flags.

This patch (of 2):

This is prep work for changing generic_file_buffered_read() to use
find_get_pages_contig() to batch up all the pagecache lookups.

This patch should be functionally identical to the existing code and
changes as little as of the flow control as possible.  More refactoring
could be done, this patch is intended to be relatively minimal.

Link: https://lkml.kernel.org/r/20201025212949.602194-1-kent.overstreet@gmail.com
Link: https://lkml.kernel.org/r/20201025212949.602194-2-kent.overstreet@gmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:22 +08:00
Goldwyn Rodrigues e02258e0ab fs: export generic_file_buffered_read()
upstream commit id: d85dc2e116

Export generic_file_buffered_read() to be used to supplement incomplete
direct reads.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:21 +08:00
Souptick Joarder 081fe59b26 mm/filemap.c: remove unused argument from shrink_readahead_size_eio()
upstream commit id: 0f8e2db4ea

The first argument of shrink_readahead_size_eio() is not used.  Hence
remove it from the function definition and from all the callers.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1583868093-24342-1-git-send-email-jrdr.linux@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:21 +08:00
Ni Xun 82b5fd569d mm: allow a controlled amount of unfairness in the page lock
upstream commit id: 5ef64cc898

Commit 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic") made
the page locking entirely fair, in that if a waiter came in while the
lock was held, the lock would be transferred to the lockers strictly in
order.

That was intended to finally get rid of the long-reported watchdog
failures that involved the page lock under extreme load, where a process
could end up waiting essentially forever, as other page lockers stole
the lock from under it.

It also improved some benchmarks, but it ended up causing huge
performance regressions on others, simply because fair lock behavior
doesn't end up giving out the lock as aggressively, causing better
worst-case latency, but potentially much worse average latencies and
throughput.

Instead of reverting that change entirely, this introduces a controlled
amount of unfairness, with a sysctl knob to tune it if somebody needs
to.  But the default value should hopefully be good for any normal load,
allowing a few rounds of lock stealing, but enforcing the strict
ordering before the lock has been stolen too many times.

There is also a hint from Matthieu Baerts that the fair page coloring
may end up exposing an ABBA deadlock that is hidden by the usual
optimistic lock stealing, and while the unfairness doesn't fix the
fundamental issue (and I'm still looking at that), it avoids it in
practice.

The amount of unfairness can be modified by writing a new value to the
'sysctl_page_lock_unfairness' variable (default value of 5, exposed
through /proc/sys/vm/page_lock_unfairness), but that is hopefully
something we'd use mainly for debugging rather than being necessary for
any deep system tuning.

This whole issue has exposed just how critical the page lock can be, and
how contended it gets under certain locks.  And the main contention
doesn't really seem to be anything related to IO (which was the origin
of this lock), but for things like just verifying that the page file
mapping is stable while faulting in the page into a page table.

Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/
Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1
Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/
Reported-and-tested-by: Michael Larabel <Michael@michaellarabel.com>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:21 +08:00
Linus Torvalds 6d4706063e list: add "list_del_init_careful()" to go with "list_empty_careful()"
upstream commit id: c6fe44d96f

That gives us ordering guarantees around the pair.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:20 +08:00
Ni Xun ad03dcb52e mm: rewrite wait_on_page_bit_common() logic
upstream commit id: 2a9127fcf2

It turns out that wait_on_page_bit_common() had several problems,
ranging from just unfair behavioe due to re-queueing at the end of the
wait queue when re-trying, and an outright bug that could result in
missed wakeups (but probably never happened in practice).

This rewrites the whole logic to avoid both issues, by simply moving the
logic to check (and possibly take) the bit lock into the wakeup path
instead.

That makes everything much more straightforward, and means that we never
need to re-queue the wait entry: if we get woken up, we'll be notified
through WQ_FLAG_WOKEN, and the wait queue entry will have been removed,
and everything will have been done for us.

Link: https://lore.kernel.org/lkml/CAHk-=wjJA2Z3kUFb-5s=6+n0qbTs8ELqKFt9B3pH85a8fGD73w@mail.gmail.com/
Link: https://lore.kernel.org/lkml/alpine.LSU.2.11.2007221359450.1017@eggly.anvils/
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:20 +08:00
Ni Xun 5a0b6562a2 fs: Add IOCB_NOIO flag for generic_file_read_iter
upstream commit id: 41da51bce3

Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
shouldn't trigger any filesystem I/O for the actual request or for
readahead.  This allows to do tentative reads out of the page cache as
some filesystems allow, and to take the appropriate locks and retry the
reads only if the requested pages are not cached.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:19 +08:00
Jens Axboe 5fbbe273c1 mm: mark async iocb read as NOWAIT once some data has been copied
upstream commit id: 13bd691421

Once we've copied some data for an iocb that is marked with IOCB_WAITQ,
we should no longer attempt to async lock a new page. Instead make sure
we return the copied amount, and let the caller retry, instead of
returning -EIOCBQUEUED for a new page.

This should only be possible with read-ahead disabled on the below
device, and multiple threads racing on the same file. Haven't been able
to reproduce on anything else.

Cc: stable@vger.kernel.org # v5.9
Fixes: 1a0a7853b9 ("mm: support async buffered reads in generic_file_buffered_read()")
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:19 +08:00
Hugh Dickins 9e38b8351e mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)
upstream commit id: 073861ed77

Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
no longer an ext4 page at all.

The problem is that PageWriteback is not accompanied by a page reference
(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
soon as TestClearPageWriteback has been done, that page could be removed
from page cache, freed, and reused for something else by the time that
wake_up_page() is reached.

https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/
Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
check; but I'm paranoid about even looking at an unreferenced struct page,
lest its memory might itself have already been reused or hotremoved (and
wake_up_page_bit() may modify that memory with its ClearPageWaiters()).

Then on crashing a second time, realized there's a stronger reason against
that approach.  If my testing just occasionally crashes on that check,
when the page is reused for part of a compound page, wouldn't it be much
more common for the page to get reused as an order-0 page before reaching
wake_up_page()?  And on rare occasions, might that reused page already be
marked PageWriteback by its new user, and already be waited upon?  What
would that look like?

It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
in write_cache_pages() (though I have never seen that crash myself).

Matthew Wilcox explaining this to himself:
 "page is allocated, added to page cache, dirtied, writeback starts,

  --- thread A ---
  filesystem calls end_page_writeback()
        test_clear_page_writeback()
  --- context switch to thread B ---
  truncate_inode_pages_range() finds the page, it doesn't have writeback set,
  we delete it from the page cache.  Page gets reallocated, dirtied, writeback
  starts again.  Then we call write_cache_pages(), see
  PageWriteback() set, call wait_on_page_writeback()
  --- context switch back to thread A ---
  wake_up_page(page, PG_writeback);
  ... thread B is woken, but because the wakeup was for the old use of
  the page, PageWriteback is still set.

  Devious"

And prior to 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic")
this would have been much less likely: before that, wake_page_function()'s
non-exclusive case would stop walking and not wake if it found Writeback
already set again; whereas now the non-exclusive case proceeds to wake.

I have not thought of a fix that does not add a little overhead: the
simplest fix is for end_page_writeback() to get_page() before calling
test_clear_page_writeback(), then put_page() after wake_up_page().

Was there a chance of missed wakeups before, since a page freed before
reaching wake_up_page() would have PageWaiters cleared?  I think not,
because each waiter does hold a reference on the page.  This bug comes
when the old use of the page, the one we do TestClearPageWriteback on,
had *no* waiters, so no additional page reference beyond the page cache
(and whoever racily freed it).  The reuse of the page has a waiter
holding a reference, and its own PageWriteback set; but the belated
wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).

Reported-by: syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com
Reported-by: Qian Cai <cai@lca.pw>
Fixes: 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:19 +08:00
Jens Axboe 6dd74eaec7 mm: never attempt async page lock if we've transferred data already
upstream commit id: 0abed7c69b

We catch the case where we enter generic_file_buffered_read() with data
already transferred, but we also need to be careful not to allow an async
page lock if we're looping transferring data. If not, we could be
returning -EIOCBQUEUED instead of the transferred amount, and it could
result in double waitqueue additions as well.

Cc: stable@vger.kernel.org # v5.9
Fixes: 1a0a7853b9 ("mm: support async buffered reads in generic_file_buffered_read()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:18 +08:00
Ni Xun d0f72f7c0d io_uring: fix async buffered reads when readahead is disabled
upstream commit id: c8d317aa18

The async buffered reads feature is not working when readahead is
turned off. There are two things to concern:

- when doing retry in io_read, not only the IOCB_WAITQ flag but also
  the IOCB_NOWAIT flag is still set, which makes it goes to would_block
  phase in generic_file_buffered_read() and then return -EAGAIN. After
  that, the io-wq thread work is queued, and later doing the async
  reads in the old way.

- even if we remove IOCB_NOWAIT when doing retry, the feature is still
  not running properly, since in generic_file_buffered_read() it goes to
  lock_page_killable() after calling mapping->a_ops->readpage() to do
  IO, and thus causing process to sleep.

Fixes: 1a0a7853b9 ("mm: support async buffered reads in generic_file_buffered_read()")
Fixes: 3b2a4439e0 ("io_uring: get rid of kiocb_wait_page_queue_init()")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:18 +08:00
Jens Axboe 6a252ef099 mm: support async buffered reads in generic_file_buffered_read()
upstream commit id: 1a0a7853b9

Use the async page locking infrastructure, if IOCB_WAITQ is set in the
passed in iocb. The caller must expect an -EIOCBQUEUED return value,
which means that IO is started but not done yet. This is similar to how
O_DIRECT signals the same operation. Once the callback is received by
the caller for IO completion, the caller must retry the operation.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:18 +08:00
Ni Xun e3930bc8fe mm: add support for async page locking
upstream commit id: dd3e6d5039

Normally waiting for a page to become unlocked, or locking the page,
requires waiting for IO to complete. Add support for lock_page_async()
and wait_on_page_locked_async(), which are callback based instead. This
allows a caller to get notified when a page becomes unlocked, rather
than wait for it.

We add a new iocb field, ki_waitq, to pass in the necessary data for this
to happen. We can unionize this with ki_cookie, since that is only used
for polled IO. Polled IO can never co-exist with async callbacks, as it is
(by definition) polled completions. struct wait_page_key is made public,
and we define struct wait_page_async as the interface between the caller
and the core.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:17 +08:00
Jens Axboe 965811582e mm: abstract out wake_page_match() from wake_page_function()
upstream commit id: c7510ab2cf

No functional changes in this patch, just in preparation for allowing
more callers.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:17 +08:00
Ni Xun 60a94f4645 ext4: drop unnecessary journal handle in delalloc write
mainline inclusion
from mainline-5.15-rc4
commit cc883236b7
category: perf
bugzilla: NA
CVE: NA
---------------------------

After we factor out the inline data write procedure from
ext4_da_write_end(), we don't need to start journal handle for the cases
of both buffer overwrite and append-write. If we need to update
i_disksize, mark_inode_dirty() do start handle and update inode buffer.
So we could just remove all the journal handle codes in the delalloc
write procedure.

After this patch, we could get a lot of performance improvement. Below
is the Unixbench comparison data test on my machine with 'Intel Xeon
Gold 5120' CPU and nvme SSD backend.

Test cmd:

  ./Run -c 56 -i 3 fstime fsbuffer fsdisk

Before this patch:

  System Benchmarks Partial Index           BASELINE       RESULT   INDEX
  File Copy 1024 bufsize 2000 maxblocks       3960.0     422965.0   1068.1
  File Copy 256 bufsize 500 maxblocks         1655.0     105077.0   634.9
  File Copy 4096 bufsize 8000 maxblocks       5800.0    1429092.0   2464.0
                                                                    ======
  System Benchmarks Index Score (Partial Only)                      1186.6

After this patch:

  System Benchmarks Partial Index           BASELINE       RESULT   INDEX
  File Copy 1024 bufsize 2000 maxblocks       3960.0     732716.0   1850.3
  File Copy 256 bufsize 500 maxblocks         1655.0     184940.0   1117.5
  File Copy 4096 bufsize 8000 maxblocks       5800.0    2427152.0   4184.7
                                                                    ======
  System Benchmarks Index Score (Partial Only)                      2053.0

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-5-yi.zhang@huawei.com
Reviewed-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Ni Xun <richardni@tencent.com>
2024-06-11 20:41:16 +08:00
Ni Xun 1a5b328498 ext4: factor out write end code of inline file
mainline inclusion
from mainline-5.15-rc4
commit 6984aef598
category: perf
bugzilla: NA
CVE: NA
---------------------------

Now that the inline_data file write end procedure are falled into the
common write end functions, it is not clear. Factor them out and do
some cleanup. This patch also drop ext4_da_write_inline_data_end()
and switch to use ext4_write_inline_data_end() instead because we also
need to do the same error processing if we failed to write data into
inline entry.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-4-yi.zhang@huawei.com

Conflicts:
        fs/ext4/inline.c
        fs/ext4/inode.c
Reviewed-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Ni Xun <richardni@tencent.com>
2024-06-11 20:41:16 +08:00
Ni Xun cecb02f6de ext4: correct the error path of ext4_write_inline_data_end()
mainline inclusion
from mainline-5.15-rc4
commit 55ce2f649b
category: perf
bugzilla: NA
CVE: NA
---------------------------

Current error path of ext4_write_inline_data_end() is not correct.

Firstly, it should pass out the error value if ext4_get_inode_loc()
return fail, or else it could trigger infinite loop if we inject error
here. And then it's better to add inode to orphan list if it return fail
in ext4_journal_stop(), otherwise we could not restore inline xattr
entry after power failure. Finally, we need to reset the 'ret' value if
ext4_write_inline_data_end() return success in ext4_write_end() and
ext4_journalled_write_end(), otherwise we could not get the error return
value of ext4_journal_stop().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-3-yi.zhang@huawei.com

Conflicts:
        fs/ext4/inode.c
Reviewed-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Ni Xun <richardni@tencent.com>
2024-06-11 20:41:16 +08:00
Zhang Yi f61a0b1964 ext4: check and update i_disksize properly
upstream commit id: 4df031ff58

After commit 3da40c7b08 ("ext4: only call ext4_truncate when size <=
isize"), i_disksize could always be updated to i_size in ext4_setattr(),
and we could sure that i_disksize <= i_size since holding inode lock and
if i_disksize < i_size there are delalloc writes pending in the range
upto i_size. If the end of the current write is <= i_size, there's no
need to touch i_disksize since writeback will push i_disksize upto
i_size eventually. So we can switch to check i_size instead of
i_disksize in ext4_da_write_end() when write to the end of the file.
we also could remove ext4_mark_inode_dirty() together because we defer
inode dirtying to generic_write_end() or ext4_da_write_inline_data_end().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-2-yi.zhang@huawei.com

Conflicts:
	fs/ext4/inode.c
Reviewed-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Alex Shi <alexsshi@tencent.com>
Reviewed-by: robinlai <robinlai@tencent.com>
2024-06-11 20:41:15 +08:00