Reduce and handle EAGAIN errors on AIO label reads

At least FreeBSD has a limit of 256 simultaneous AIO requests per
process. Attempt to issue more results in EAGAIN errors. Since we
issue 4 requests per disk/partition from 2xCPUs threads, it is
quite easy to reach that limit on large systems, that results in
random pool import failures.  It annoyed me for quite a while on
a system with 64 CPUs and 70+ partitioned disks.

This patch from one side limits the number of threads to avoid the
error, while from another should softly fall back to sync reads in
case of error.  It takes into account _SC_AIO_MAX as a system-wide
AIO limit and _SC_AIO_LISTIO_MAX as a closest value to per-process
limit.  The last not exactly right, but it is the best I found.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16551
This commit is contained in:
Alexander Motin 2024-09-21 13:36:25 -04:00 committed by GitHub
parent 2aafc2ea1f
commit 3014dcb762
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 16 additions and 1 deletions

View File

@ -1071,6 +1071,7 @@ zpool_read_label(int fd, nvlist_t **config, int *num_labels)
* Try the slow method.
*/
zfs_fallthrough;
case EAGAIN:
case EOPNOTSUPP:
case ENOSYS:
do_slow = B_TRUE;
@ -1464,7 +1465,21 @@ zpool_find_import_impl(libpc_handle_t *hdl, importargs_t *iarg,
* validating labels, a large number of threads can be used due to
* minimal contention.
*/
t = tpool_create(1, 2 * sysconf(_SC_NPROCESSORS_ONLN), 0, NULL);
long threads = 2 * sysconf(_SC_NPROCESSORS_ONLN);
#ifdef HAVE_AIO_H
long am;
#ifdef _SC_AIO_LISTIO_MAX
am = sysconf(_SC_AIO_LISTIO_MAX);
if (am >= VDEV_LABELS)
threads = MIN(threads, am / VDEV_LABELS);
#endif
#ifdef _SC_AIO_MAX
am = sysconf(_SC_AIO_MAX);
if (am >= VDEV_LABELS)
threads = MIN(threads, am / VDEV_LABELS);
#endif
#endif
t = tpool_create(1, threads, 0, NULL);
for (slice = avl_first(cache); slice;
(slice = avl_walk(cache, slice, AVL_AFTER)))
(void) tpool_dispatch(t, zpool_open_func, slice);