Filesystem/VFS changes for 5.18, part two

- Remove ->readpages infrastructure
  - Remove AOP_FLAG_CONT_EXPAND
  - Move read_descriptor_t to networking code
  - Pass the iocb to generic_perform_write
  - Minor updates to iomap, btrfs, ext4, f2fs, ntfs
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmJHSY8ACgkQDpNsjXcp
 gj59lgf/UJsVQjF+emdQAHa9AkFtZAb7TNv5QKLHp935c/OXREvHaQ956FyVhrc1
 n3pH3VRLFjXFQ3QZpWtArMQbIPr77I9KNs75zX0i+mutP5ieYcQVJKsGPIamiJAf
 eNTBoVaTxCVcTL43xCvnflvAeumoKzwdxGj6Hkgln8wuQ9B9p8923nBZpy94ajqp
 6b6E1rtrJlpEioqar2vCNpdJhEeN/jN93BwIynQMt1snPrBWQRYqv5pL3puUh7Gx
 UgJkCC6XvsUsXXOCu7n22RUKnDGiUW7QN99fmrztwnmiQY4hYmK2AoVMG16riDb+
 WmxIXbhaTo5qJT0rlQipi5d61TSuTA==
 =gwgb
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18d' of git://git.infradead.org/users/willy/pagecache

Pull more filesystem folio updates from Matthew Wilcox:
 "A mixture of odd changes that didn't quite make it into the original
  pull and fixes for things that did. Also the readpages changes had to
  wait for the NFS tree to be pulled first.

   - Remove ->readpages infrastructure

   - Remove AOP_FLAG_CONT_EXPAND

   - Move read_descriptor_t to networking code

   - Pass the iocb to generic_perform_write

   - Minor updates to iomap, btrfs, ext4, f2fs, ntfs"

* tag 'folio-5.18d' of git://git.infradead.org/users/willy/pagecache:
  btrfs: Remove a use of PAGE_SIZE in btrfs_invalidate_folio()
  ntfs: Correct mark_ntfs_record_dirty() folio conversion
  f2fs: Get the superblock from the mapping instead of the page
  f2fs: Correct f2fs_dirty_data_folio() conversion
  ext4: Correct ext4_journalled_dirty_folio() conversion
  filemap: Remove AOP_FLAG_CONT_EXPAND
  fs: Pass an iocb to generic_perform_write()
  fs, net: Move read_descriptor_t to net.h
  fs: Remove read_actor_t
  iomap: Simplify is_partially_uptodate a little
  readahead: Update comments
  mm: remove the skip_page argument to read_pages
  mm: remove the pages argument to read_pages
  fs: Remove ->readpages address space operation
  readahead: Remove read_cache_pages()
This commit is contained in:
Linus Torvalds 2022-04-01 13:50:50 -07:00
commit cda4351252
28 changed files with 113 additions and 236 deletions

View File

@ -549,7 +549,7 @@ Pagecache
~~~~~~~~~
For filesystems using Linux's pagecache, the ``->readpage()`` and
``->readpages()`` methods must be modified to verify pages before they
``->readahead()`` methods must be modified to verify pages before they
are marked Uptodate. Merely hooking ``->read_iter()`` would be
insufficient, since ``->read_iter()`` is not used for memory maps.
@ -611,7 +611,7 @@ workqueue, and then the workqueue work does the decryption or
verification. Finally, pages where no decryption or verity error
occurred are marked Uptodate, and the pages are unlocked.
Files on ext4 and f2fs may contain holes. Normally, ``->readpages()``
Files on ext4 and f2fs may contain holes. Normally, ``->readahead()``
simply zeroes holes and sets the corresponding pages Uptodate; no bios
are issued. To prevent this case from bypassing fs-verity, these
filesystems use fsverity_verify_page() to verify hole pages.
@ -778,7 +778,7 @@ weren't already directly answered in other parts of this document.
- To prevent bypassing verification, pages must not be marked
Uptodate until they've been verified. Currently, each
filesystem is responsible for marking pages Uptodate via
``->readpages()``. Therefore, currently it's not possible for
``->readahead()``. Therefore, currently it's not possible for
the VFS to do the verification on its own. Changing this would
require significant changes to the VFS and all filesystems.

View File

@ -241,8 +241,6 @@ prototypes::
int (*writepages)(struct address_space *, struct writeback_control *);
bool (*dirty_folio)(struct address_space *, struct folio *folio);
void (*readahead)(struct readahead_control *);
int (*readpages)(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages);
int (*write_begin)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata);
@ -274,7 +272,6 @@ readpage: yes, unlocks shared
writepages:
dirty_folio maybe
readahead: yes, unlocks shared
readpages: no shared
write_begin: locks the page exclusive
write_end: yes, unlocks exclusive
bmap:
@ -300,9 +297,6 @@ completion.
->readahead() unlocks the pages that I/O is attempted on like ->readpage().
->readpages() populates the pagecache with the passed pages and starts
I/O against them. They come unlocked upon I/O completion.
->writepage() is used for two purposes: for "memory cleansing" and for
"sync". These are quite different operations and the behaviour may differ
depending upon the mode.

View File

@ -726,8 +726,6 @@ cache in your filesystem. The following members are defined:
int (*writepages)(struct address_space *, struct writeback_control *);
bool (*dirty_folio)(struct address_space *, struct folio *);
void (*readahead)(struct readahead_control *);
int (*readpages)(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages);
int (*write_begin)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata);
@ -817,15 +815,6 @@ cache in your filesystem. The following members are defined:
completes successfully. Setting PageError on any page will be
ignored; simply unlock the page if an I/O error occurs.
``readpages``
called by the VM to read pages associated with the address_space
object. This is essentially just a vector version of readpage.
Instead of just one page, several pages are requested.
readpages is only used for read-ahead, so read errors are
ignored. If anything goes wrong, feel free to give up.
This interface is deprecated and will be removed by the end of
2020; implement readahead instead.
``write_begin``
Called by the generic buffered write code to ask the filesystem
to prepare to write len bytes at the given offset in the file.

View File

@ -8296,7 +8296,7 @@ static void btrfs_invalidate_folio(struct folio *folio, size_t offset,
* cover the full folio, like invalidating the last folio, we're
* still safe to wait for ordered extent to finish.
*/
if (!(offset == 0 && length == PAGE_SIZE)) {
if (!(offset == 0 && length == folio_size(folio))) {
btrfs_releasepage(&folio->page, GFP_NOFS);
return;
}

View File

@ -645,7 +645,7 @@ static int btrfs_extent_same_range(struct inode *src, u64 loff, u64 len,
int ret;
/*
* Lock destination range to serialize with concurrent readpages() and
* Lock destination range to serialize with concurrent readahead() and
* source range to serialize with relocation.
*/
btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
@ -739,7 +739,7 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
}
/*
* Lock destination range to serialize with concurrent readpages() and
* Lock destination range to serialize with concurrent readahead() and
* source range to serialize with relocation.
*/
btrfs_double_extent_lock(src, off, inode, destoff, len);

View File

@ -2352,8 +2352,7 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size)
if (err)
goto out;
err = pagecache_write_begin(NULL, mapping, size, 0,
AOP_FLAG_CONT_EXPAND, &page, &fsdata);
err = pagecache_write_begin(NULL, mapping, size, 0, 0, &page, &fsdata);
if (err)
goto out;

View File

@ -1869,7 +1869,7 @@ retry_snap:
* are pending vmtruncate. So write and vmtruncate
* can not run at the same time
*/
written = generic_perform_write(file, from, pos);
written = generic_perform_write(iocb, from);
if (likely(written >= 0))
iocb->ki_pos = pos + written;
ceph_end_io_write(inode);

View File

@ -597,7 +597,7 @@ CIFSSMBNegotiate(const unsigned int xid,
set_credits(server, server->maxReq);
/* probably no need to store and check maxvcs */
server->maxBuf = le32_to_cpu(pSMBr->MaxBufferSize);
/* set up max_read for readpages check */
/* set up max_read for readahead check */
server->max_read = server->maxBuf;
server->max_rw = le32_to_cpu(pSMBr->MaxRawSize);
cifs_dbg(NOISY, "Max buf = %d\n", ses->server->maxBuf);

View File

@ -49,7 +49,7 @@ static void cifs_set_ops(struct inode *inode)
inode->i_fop = &cifs_file_ops;
}
/* check if server can support readpages */
/* check if server can support readahead */
if (cifs_sb_master_tcon(cifs_sb)->ses->server->max_read <
PAGE_SIZE + MAX_CIFS_HDR_SIZE)
inode->i_data.a_ops = &cifs_addr_ops_smallbuf;

View File

@ -248,7 +248,7 @@ EXPORT_SYMBOL(fscrypt_encrypt_block_inplace);
* which must still be locked and not uptodate. Normally, blocksize ==
* PAGE_SIZE and the whole page is decrypted at once.
*
* This is for use by the filesystem's ->readpages() method.
* This is for use by the filesystem's ->readahead() method.
*
* Return: 0 on success; -errno on failure
*/

View File

@ -267,7 +267,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
goto out;
current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(iocb->ki_filp, from, iocb->ki_pos);
ret = generic_perform_write(iocb, from);
current->backing_dev_info = NULL;
out:

View File

@ -3589,7 +3589,7 @@ const struct iomap_ops ext4_iomap_report_ops = {
static bool ext4_journalled_dirty_folio(struct address_space *mapping,
struct folio *folio)
{
WARN_ON_ONCE(!page_has_buffers(&folio->page));
WARN_ON_ONCE(!folio_buffers(folio));
folio_set_checked(folio);
return filemap_dirty_folio(mapping, folio);
}

View File

@ -109,7 +109,7 @@ static void verity_work(struct work_struct *work)
struct bio *bio = ctx->bio;
/*
* fsverity_verify_bio() may call readpages() again, and although verity
* fsverity_verify_bio() may call readahead() again, and although verity
* will be disabled for that, decryption may still be needed, causing
* another bio_post_read_ctx to be allocated. So to guarantee that
* mempool_alloc() never deadlocks we must free the current ctx first.

View File

@ -456,7 +456,7 @@ static bool f2fs_dirty_meta_folio(struct address_space *mapping,
folio_mark_uptodate(folio);
if (!folio_test_dirty(folio)) {
filemap_dirty_folio(mapping, folio);
inc_page_count(F2FS_P_SB(&folio->page), F2FS_DIRTY_META);
inc_page_count(F2FS_M_SB(mapping), F2FS_DIRTY_META);
set_page_private_reference(&folio->page);
return true;
}

View File

@ -164,7 +164,7 @@ static void f2fs_verify_bio(struct work_struct *work)
bool may_have_compressed_pages = (ctx->enabled_steps & STEP_DECOMPRESS);
/*
* fsverity_verify_bio() may call readpages() again, and while verity
* fsverity_verify_bio() may call readahead() again, and while verity
* will be disabled for this, decryption and/or decompression may still
* be needed, resulting in another bio_post_read_ctx being allocated.
* So to prevent deadlocks we need to release the current ctx to the
@ -2392,7 +2392,7 @@ static void f2fs_readahead(struct readahead_control *rac)
if (!f2fs_is_compress_backend_ready(inode))
return;
/* If the file has inline data, skip readpages */
/* If the file has inline data, skip readahead */
if (f2fs_has_inline_data(inode))
return;
@ -3571,7 +3571,7 @@ static bool f2fs_dirty_data_folio(struct address_space *mapping,
f2fs_update_dirty_folio(inode, folio);
return true;
}
return true;
return false;
}

View File

@ -4448,7 +4448,7 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
return -EOPNOTSUPP;
current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(file, from, iocb->ki_pos);
ret = generic_perform_write(iocb, from);
current->backing_dev_info = NULL;
if (ret > 0) {

View File

@ -2146,11 +2146,11 @@ static bool f2fs_dirty_node_folio(struct address_space *mapping,
folio_mark_uptodate(folio);
#ifdef CONFIG_F2FS_CHECK_FS
if (IS_INODE(&folio->page))
f2fs_inode_chksum_set(F2FS_P_SB(&folio->page), &folio->page);
f2fs_inode_chksum_set(F2FS_M_SB(mapping), &folio->page);
#endif
if (!folio_test_dirty(folio)) {
filemap_dirty_folio(mapping, folio);
inc_page_count(F2FS_P_SB(&folio->page), F2FS_DIRTY_NODES);
inc_page_count(F2FS_M_SB(mapping), F2FS_DIRTY_NODES);
set_page_private_reference(&folio->page);
return true;
}

View File

@ -627,7 +627,7 @@ struct fuse_conn {
/** Connection successful. Only set in INIT */
unsigned conn_init:1;
/** Do readpages asynchronously? Only set in INIT */
/** Do readahead asynchronously? Only set in INIT */
unsigned async_read:1;
/** Return an unique read error after abort. Only set in INIT */

View File

@ -435,18 +435,17 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
{
struct iomap_page *iop = to_iomap_page(folio);
struct inode *inode = folio->mapping->host;
size_t len;
unsigned first, last, i;
if (!iop)
return false;
/* Limit range to this folio */
len = min(folio_size(folio) - from, count);
/* Caller's range may extend past the end of this folio */
count = min(folio_size(folio) - from, count);
/* First and last blocks in range within page */
/* First and last blocks in range within folio */
first = from >> inode->i_blkbits;
last = (from + len - 1) >> inode->i_blkbits;
last = (from + count - 1) >> inode->i_blkbits;
for (i = first; i <= last; i++)
if (!test_bit(i, iop->uptodate))

View File

@ -646,7 +646,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
result = generic_write_checks(iocb, from);
if (result > 0) {
current->backing_dev_info = inode_to_bdi(inode);
result = generic_perform_write(file, from, iocb->ki_pos);
result = generic_perform_write(iocb, from);
current->backing_dev_info = NULL;
}
nfs_end_io_write(inode);

View File

@ -1746,7 +1746,7 @@ void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs) {
set_buffer_dirty(bh);
} while ((bh = bh->b_this_page) != head);
spin_unlock(&mapping->private_lock);
block_dirty_folio(mapping, page_folio(page));
filemap_dirty_folio(mapping, page_folio(page));
if (unlikely(buffers_to_free)) {
do {
bh = buffers_to_free->b_this_page;

View File

@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Data verification functions, i.e. hooks for ->readpages()
* Data verification functions, i.e. hooks for ->readahead()
*
* Copyright 2019 Google LLC
*/
@ -214,7 +214,7 @@ EXPORT_SYMBOL_GPL(fsverity_verify_page);
* that fail verification are set to the Error state. Verification is skipped
* for pages already in the Error state, e.g. due to fscrypt decryption failure.
*
* This is a helper function for use by the ->readpages() method of filesystems
* This is a helper function for use by the ->readahead() method of filesystems
* that issue bios to read data directly into the page cache. Filesystems that
* populate the page cache without issuing bios (e.g. non block-based
* filesystems) must instead call fsverity_verify_page() directly on each page.

View File

@ -275,7 +275,6 @@ enum positive_aop_returns {
AOP_TRUNCATED_PAGE = 0x80001,
};
#define AOP_FLAG_CONT_EXPAND 0x0001 /* called from cont_expand */
#define AOP_FLAG_NOFS 0x0002 /* used by filesystem to direct
* helper code (eg buffer layer)
* to clear GFP_FS from alloc */
@ -338,28 +337,6 @@ static inline bool is_sync_kiocb(struct kiocb *kiocb)
return kiocb->ki_complete == NULL;
}
/*
* "descriptor" for what we're up to with a read.
* This allows us to use the same read code yet
* have multiple different users of the data that
* we read from a file.
*
* The simplest case just copies the data to user
* mode.
*/
typedef struct {
size_t written;
size_t count;
union {
char __user *buf;
void *data;
} arg;
int error;
} read_descriptor_t;
typedef int (*read_actor_t)(read_descriptor_t *, struct page *,
unsigned long, unsigned long);
struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
int (*readpage)(struct file *, struct page *);
@ -370,12 +347,6 @@ struct address_space_operations {
/* Mark a folio dirty. Return true if this dirtied it */
bool (*dirty_folio)(struct address_space *, struct folio *);
/*
* Reads in the requested pages. Unlike ->readpage(), this is
* PURELY used for read-ahead!.
*/
int (*readpages)(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages);
void (*readahead)(struct readahead_control *);
int (*write_begin)(struct file *, struct address_space *mapping,
@ -3027,7 +2998,7 @@ extern ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *);
extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *);
extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *);
extern ssize_t generic_perform_write(struct file *, struct iov_iter *, loff_t);
ssize_t generic_perform_write(struct kiocb *, struct iov_iter *);
ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
rwf_t flags);

View File

@ -221,7 +221,7 @@ static inline void fsverity_enqueue_verify_work(struct work_struct *work)
*
* This checks whether ->i_verity_info has been set.
*
* Filesystems call this from ->readpages() to check whether the pages need to
* Filesystems call this from ->readahead() to check whether the pages need to
* be verified or not. Don't use IS_VERITY() for this purpose; it's subject to
* a race condition where the file is being read concurrently with
* FS_IOC_ENABLE_VERITY completing. (S_VERITY is set before ->i_verity_info.)

View File

@ -125,6 +125,25 @@ struct socket {
struct socket_wq wq;
};
/*
* "descriptor" for what we're up to with a read.
* This allows us to use the same read code yet
* have multiple different users of the data that
* we read from a file.
*
* The simplest case just copies the data to user
* mode.
*/
typedef struct {
size_t written;
size_t count;
union {
char __user *buf;
void *data;
} arg;
int error;
} read_descriptor_t;
struct vm_area_struct;
struct page;
struct sockaddr;

View File

@ -752,8 +752,6 @@ struct page *read_cache_page(struct address_space *, pgoff_t index,
filler_t *filler, void *data);
extern struct page * read_cache_page_gfp(struct address_space *mapping,
pgoff_t index, gfp_t gfp_mask);
extern int read_cache_pages(struct address_space *mapping,
struct list_head *pages, filler_t *filler, void *data);
static inline struct page *read_mapping_page(struct address_space *mapping,
pgoff_t index, struct file *file)

View File

@ -2538,7 +2538,7 @@ static int filemap_create_folio(struct file *file,
* the page cache as the locked folio would then be enough to
* synchronize with hole punching. But there are code paths
* such as filemap_update_page() filling in partially uptodate
* pages or ->readpages() that need to hold invalidate_lock
* pages or ->readahead() that need to hold invalidate_lock
* while mapping blocks for IO so let's hold the lock here as
* well to keep locking rules simple.
*/
@ -3752,9 +3752,10 @@ out:
}
EXPORT_SYMBOL(generic_file_direct_write);
ssize_t generic_perform_write(struct file *file,
struct iov_iter *i, loff_t pos)
ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
{
struct file *file = iocb->ki_filp;
loff_t pos = iocb->ki_pos;
struct address_space *mapping = file->f_mapping;
const struct address_space_operations *a_ops = mapping->a_ops;
long status = 0;
@ -3884,7 +3885,8 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
goto out;
status = generic_perform_write(file, from, pos = iocb->ki_pos);
pos = iocb->ki_pos;
status = generic_perform_write(iocb, from);
/*
* If generic_perform_write() returned a synchronous error
* then we want to return the number of bytes which were
@ -3916,7 +3918,7 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
*/
}
} else {
written = generic_perform_write(file, from, iocb->ki_pos);
written = generic_perform_write(iocb, from);
if (likely(written > 0))
iocb->ki_pos += written;
}

View File

@ -13,29 +13,29 @@
*
* Readahead is used to read content into the page cache before it is
* explicitly requested by the application. Readahead only ever
* attempts to read pages that are not yet in the page cache. If a
* page is present but not up-to-date, readahead will not try to read
* attempts to read folios that are not yet in the page cache. If a
* folio is present but not up-to-date, readahead will not try to read
* it. In that case a simple ->readpage() will be requested.
*
* Readahead is triggered when an application read request (whether a
* systemcall or a page fault) finds that the requested page is not in
* system call or a page fault) finds that the requested folio is not in
* the page cache, or that it is in the page cache and has the
* %PG_readahead flag set. This flag indicates that the page was loaded
* as part of a previous read-ahead request and now that it has been
* accessed, it is time for the next read-ahead.
* readahead flag set. This flag indicates that the folio was read
* as part of a previous readahead request and now that it has been
* accessed, it is time for the next readahead.
*
* Each readahead request is partly synchronous read, and partly async
* read-ahead. This is reflected in the struct file_ra_state which
* contains ->size being to total number of pages, and ->async_size
* which is the number of pages in the async section. The first page in
* this async section will have %PG_readahead set as a trigger for a
* subsequent read ahead. Once a series of sequential reads has been
* readahead. This is reflected in the struct file_ra_state which
* contains ->size being the total number of pages, and ->async_size
* which is the number of pages in the async section. The readahead
* flag will be set on the first folio in this async section to trigger
* a subsequent readahead. Once a series of sequential reads has been
* established, there should be no need for a synchronous component and
* all read ahead request will be fully asynchronous.
* all readahead request will be fully asynchronous.
*
* When either of the triggers causes a readahead, three numbers need to
* be determined: the start of the region, the size of the region, and
* the size of the async tail.
* When either of the triggers causes a readahead, three numbers need
* to be determined: the start of the region to read, the size of the
* region, and the size of the async tail.
*
* The start of the region is simply the first page address at or after
* the accessed address, which is not currently populated in the page
@ -45,14 +45,14 @@
* was explicitly requested from the determined request size, unless
* this would be less than zero - then zero is used. NOTE THIS
* CALCULATION IS WRONG WHEN THE START OF THE REGION IS NOT THE ACCESSED
* PAGE.
* PAGE. ALSO THIS CALCULATION IS NOT USED CONSISTENTLY.
*
* The size of the region is normally determined from the size of the
* previous readahead which loaded the preceding pages. This may be
* discovered from the struct file_ra_state for simple sequential reads,
* or from examining the state of the page cache when multiple
* sequential reads are interleaved. Specifically: where the readahead
* was triggered by the %PG_readahead flag, the size of the previous
* was triggered by the readahead flag, the size of the previous
* readahead is assumed to be the number of pages from the triggering
* page to the start of the new readahead. In these cases, the size of
* the previous readahead is scaled, often doubled, for the new
@ -65,52 +65,52 @@
* larger than the current request, and it is not scaled up, unless it
* is at the start of file.
*
* In general read ahead is accelerated at the start of the file, as
* In general readahead is accelerated at the start of the file, as
* reads from there are often sequential. There are other minor
* adjustments to the read ahead size in various special cases and these
* adjustments to the readahead size in various special cases and these
* are best discovered by reading the code.
*
* The above calculation determines the readahead, to which any requested
* read size may be added.
* The above calculation, based on the previous readahead size,
* determines the size of the readahead, to which any requested read
* size may be added.
*
* Readahead requests are sent to the filesystem using the ->readahead()
* address space operation, for which mpage_readahead() is a canonical
* implementation. ->readahead() should normally initiate reads on all
* pages, but may fail to read any or all pages without causing an IO
* folios, but may fail to read any or all folios without causing an I/O
* error. The page cache reading code will issue a ->readpage() request
* for any page which ->readahead() does not provided, and only an error
* for any folio which ->readahead() did not read, and only an error
* from this will be final.
*
* ->readahead() will generally call readahead_page() repeatedly to get
* each page from those prepared for read ahead. It may fail to read a
* page by:
* ->readahead() will generally call readahead_folio() repeatedly to get
* each folio from those prepared for readahead. It may fail to read a
* folio by:
*
* * not calling readahead_page() sufficiently many times, effectively
* ignoring some pages, as might be appropriate if the path to
* * not calling readahead_folio() sufficiently many times, effectively
* ignoring some folios, as might be appropriate if the path to
* storage is congested.
*
* * failing to actually submit a read request for a given page,
* * failing to actually submit a read request for a given folio,
* possibly due to insufficient resources, or
*
* * getting an error during subsequent processing of a request.
*
* In the last two cases, the page should be unlocked to indicate that
* the read attempt has failed. In the first case the page will be
* unlocked by the caller.
* In the last two cases, the folio should be unlocked by the filesystem
* to indicate that the read attempt has failed. In the first case the
* folio will be unlocked by the VFS.
*
* Those pages not in the final ``async_size`` of the request should be
* Those folios not in the final ``async_size`` of the request should be
* considered to be important and ->readahead() should not fail them due
* to congestion or temporary resource unavailability, but should wait
* for necessary resources (e.g. memory or indexing information) to
* become available. Pages in the final ``async_size`` may be
* become available. Folios in the final ``async_size`` may be
* considered less urgent and failure to read them is more acceptable.
* In this case it is best to use delete_from_page_cache() to remove the
* pages from the page cache as is automatically done for pages that
* were not fetched with readahead_page(). This will allow a
* subsequent synchronous read ahead request to try them again. If they
* In this case it is best to use filemap_remove_folio() to remove the
* folios from the page cache as is automatically done for folios that
* were not fetched with readahead_folio(). This will allow a
* subsequent synchronous readahead request to try them again. If they
* are left in the page cache, then they will be read individually using
* ->readpage().
*
* ->readpage() which may be less efficient.
*/
#include <linux/kernel.h>
@ -142,91 +142,14 @@ file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
}
EXPORT_SYMBOL_GPL(file_ra_state_init);
/*
* see if a page needs releasing upon read_cache_pages() failure
* - the caller of read_cache_pages() may have set PG_private or PG_fscache
* before calling, such as the NFS fs marking pages that are cached locally
* on disk, thus we need to give the fs a chance to clean up in the event of
* an error
*/
static void read_cache_pages_invalidate_page(struct address_space *mapping,
struct page *page)
{
if (page_has_private(page)) {
if (!trylock_page(page))
BUG();
page->mapping = mapping;
folio_invalidate(page_folio(page), 0, PAGE_SIZE);
page->mapping = NULL;
unlock_page(page);
}
put_page(page);
}
/*
* release a list of pages, invalidating them first if need be
*/
static void read_cache_pages_invalidate_pages(struct address_space *mapping,
struct list_head *pages)
{
struct page *victim;
while (!list_empty(pages)) {
victim = lru_to_page(pages);
list_del(&victim->lru);
read_cache_pages_invalidate_page(mapping, victim);
}
}
/**
* read_cache_pages - populate an address space with some pages & start reads against them
* @mapping: the address_space
* @pages: The address of a list_head which contains the target pages. These
* pages have their ->index populated and are otherwise uninitialised.
* @filler: callback routine for filling a single page.
* @data: private data for the callback routine.
*
* Hides the details of the LRU cache etc from the filesystems.
*
* Returns: %0 on success, error return by @filler otherwise
*/
int read_cache_pages(struct address_space *mapping, struct list_head *pages,
int (*filler)(void *, struct page *), void *data)
{
struct page *page;
int ret = 0;
while (!list_empty(pages)) {
page = lru_to_page(pages);
list_del(&page->lru);
if (add_to_page_cache_lru(page, mapping, page->index,
readahead_gfp_mask(mapping))) {
read_cache_pages_invalidate_page(mapping, page);
continue;
}
put_page(page);
ret = filler(data, page);
if (unlikely(ret)) {
read_cache_pages_invalidate_pages(mapping, pages);
break;
}
task_io_account_read(PAGE_SIZE);
}
return ret;
}
EXPORT_SYMBOL(read_cache_pages);
static void read_pages(struct readahead_control *rac, struct list_head *pages,
bool skip_page)
static void read_pages(struct readahead_control *rac)
{
const struct address_space_operations *aops = rac->mapping->a_ops;
struct page *page;
struct blk_plug plug;
if (!readahead_count(rac))
goto out;
return;
blk_start_plug(&plug);
@ -234,7 +157,7 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
aops->readahead(rac);
/*
* Clean up the remaining pages. The sizes in ->ra
* maybe be used to size next read-ahead, so make sure
* may be used to size the next readahead, so make sure
* they accurately reflect what happened.
*/
while ((page = readahead_page(rac))) {
@ -246,13 +169,6 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
unlock_page(page);
put_page(page);
}
} else if (aops->readpages) {
aops->readpages(rac->file, rac->mapping, pages,
readahead_count(rac));
/* Clean up the remaining pages */
put_pages_list(pages);
rac->_index += rac->_nr_pages;
rac->_nr_pages = 0;
} else {
while ((page = readahead_page(rac))) {
aops->readpage(rac->file, page);
@ -262,12 +178,7 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
blk_finish_plug(&plug);
BUG_ON(pages && !list_empty(pages));
BUG_ON(readahead_count(rac));
out:
if (skip_page)
rac->_index++;
}
/**
@ -289,7 +200,6 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
{
struct address_space *mapping = ractl->mapping;
unsigned long index = readahead_index(ractl);
LIST_HEAD(page_pool);
gfp_t gfp_mask = readahead_gfp_mask(mapping);
unsigned long i;
@ -321,7 +231,8 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
* have a stable reference to this page, and it's
* not worth getting one just for that.
*/
read_pages(ractl, &page_pool, true);
read_pages(ractl);
ractl->_index++;
i = ractl->_index + ractl->_nr_pages - index - 1;
continue;
}
@ -329,13 +240,11 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
folio = filemap_alloc_folio(gfp_mask, 0);
if (!folio)
break;
if (mapping->a_ops->readpages) {
folio->index = index + i;
list_add(&folio->lru, &page_pool);
} else if (filemap_add_folio(mapping, folio, index + i,
if (filemap_add_folio(mapping, folio, index + i,
gfp_mask) < 0) {
folio_put(folio);
read_pages(ractl, &page_pool, true);
read_pages(ractl);
ractl->_index++;
i = ractl->_index + ractl->_nr_pages - index - 1;
continue;
}
@ -349,7 +258,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
* uptodate then the caller will launch readpage again, and
* will then handle the error.
*/
read_pages(ractl, &page_pool, false);
read_pages(ractl);
filemap_invalidate_unlock_shared(mapping);
memalloc_nofs_restore(nofs);
}
@ -394,8 +303,7 @@ void force_page_cache_ra(struct readahead_control *ractl,
struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
unsigned long max_pages, index;
if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages &&
!mapping->a_ops->readahead))
if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readahead))
return;
/*
@ -512,7 +420,7 @@ static pgoff_t count_history_pages(struct address_space *mapping,
}
/*
* page cache context based read-ahead
* page cache context based readahead
*/
static int try_context_readahead(struct address_space *mapping,
struct file_ra_state *ra,
@ -624,7 +532,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
ra->async_size += index - limit - 1;
}
read_pages(ractl, NULL, false);
read_pages(ractl);
/*
* If there were already pages in the page cache, then we may have
@ -763,9 +671,9 @@ void page_cache_sync_ra(struct readahead_control *ractl,
bool do_forced_ra = ractl->file && (ractl->file->f_mode & FMODE_RANDOM);
/*
* Even if read-ahead is disabled, issue this request as read-ahead
* Even if readahead is disabled, issue this request as readahead
* as we'll need it to satisfy the requested range. The forced
* read-ahead will do the right thing and limit the read to just the
* readahead will do the right thing and limit the read to just the
* requested range, which we'll set to 1 page for this case.
*/
if (!ractl->ra->ra_pages || blk_cgroup_congested()) {
@ -781,7 +689,6 @@ void page_cache_sync_ra(struct readahead_control *ractl,
return;
}
/* do read-ahead */
ondemand_readahead(ractl, NULL, req_count);
}
EXPORT_SYMBOL_GPL(page_cache_sync_ra);
@ -789,7 +696,7 @@ EXPORT_SYMBOL_GPL(page_cache_sync_ra);
void page_cache_async_ra(struct readahead_control *ractl,
struct folio *folio, unsigned long req_count)
{
/* no read-ahead */
/* no readahead */
if (!ractl->ra->ra_pages)
return;
@ -804,7 +711,6 @@ void page_cache_async_ra(struct readahead_control *ractl,
if (blk_cgroup_congested())
return;
/* do read-ahead */
ondemand_readahead(ractl, folio, req_count);
}
EXPORT_SYMBOL_GPL(page_cache_async_ra);