From 5a46079a96451cfb15e4f5f01f73f7ba24ef851a Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Jun 2022 00:06:57 -0700 Subject: [PATCH 01/63] PM: domains: Delete usage of driver_deferred_probe_check_state() Now that fw_devlink=on by default and fw_devlink supports "power-domains" property, the execution will never get to the point where driver_deferred_probe_check_state() is called before the supplier has probed successfully or before deferred probe timeout has expired. So, delete the call and replace it with -ENODEV. Tested-by: Geert Uytterhoeven Reviewed-by: Ulf Hansson Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220601070707.3946847-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/power/domain.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 739e52cd4aba..3e86772d5fac 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev, mutex_unlock(&gpd_list_lock); dev_dbg(dev, "%s() failed to find PM domain: %ld\n", __func__, PTR_ERR(pd)); - return driver_deferred_probe_check_state(base_dev); + return -ENODEV; } dev_dbg(dev, "adding to PM domain %s\n", pd->name); From 24a026f85241a01bbcfe1b263caeeaa9a79bab40 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Jun 2022 00:06:58 -0700 Subject: [PATCH 02/63] pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() Now that fw_devlink=on by default and fw_devlink supports "pinctrl-[0-8]" property, the execution will never get to the point where driver_deferred_probe_check_state() is called before the supplier has probed successfully or before deferred probe timeout has expired. So, delete the call and replace it with -ENODEV. Tested-by: Geert Uytterhoeven Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220601070707.3946847-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/devicetree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c index 3fb238714718..ef898ee8ca6b 100644 --- a/drivers/pinctrl/devicetree.c +++ b/drivers/pinctrl/devicetree.c @@ -129,7 +129,7 @@ static int dt_to_map_one_config(struct pinctrl *p, np_pctldev = of_get_next_parent(np_pctldev); if (!np_pctldev || of_node_is_root(np_pctldev)) { of_node_put(np_pctldev); - ret = driver_deferred_probe_check_state(p->dev); + ret = -ENODEV; /* keep deferring if modules are enabled */ if (IS_ENABLED(CONFIG_MODULES) && !allow_default && ret < 0) ret = -EPROBE_DEFER; From f8217275b57aa48d98cc42051c2aac34152718d6 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Jun 2022 00:06:59 -0700 Subject: [PATCH 03/63] net: mdio: Delete usage of driver_deferred_probe_check_state() Now that fw_devlink=on by default and fw_devlink supports interrupt properties, the execution will never get to the point where driver_deferred_probe_check_state() is called before the supplier has probed successfully or before deferred probe timeout has expired. So, delete the call and replace it with -ENODEV. Tested-by: Geert Uytterhoeven Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220601070707.3946847-4-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- drivers/net/mdio/fwnode_mdio.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/mdio/fwnode_mdio.c b/drivers/net/mdio/fwnode_mdio.c index 1c1584fca632..3e79c2c51929 100644 --- a/drivers/net/mdio/fwnode_mdio.c +++ b/drivers/net/mdio/fwnode_mdio.c @@ -47,9 +47,7 @@ int fwnode_mdiobus_phy_device_register(struct mii_bus *mdio, * just fall back to poll mode */ if (rc == -EPROBE_DEFER) - rc = driver_deferred_probe_check_state(&phy->mdio.dev); - if (rc == -EPROBE_DEFER) - return rc; + rc = -ENODEV; if (rc > 0) { phy->irq = rc; From 2f8c3ae8288e4a4018330ed5c4e758b878d9c555 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Jun 2022 00:07:00 -0700 Subject: [PATCH 04/63] driver core: Add wait_for_init_devices_probe helper function Some devices might need to be probed and bound successfully before the kernel boot sequence can finish and move on to init/userspace. For example, a network interface might need to be bound to be able to mount a NFS rootfs. With fw_devlink=on by default, some of these devices might be blocked from probing because they are waiting on a optional supplier that doesn't have a driver. While fw_devlink will eventually identify such devices and unblock the probing automatically, it might be too late by the time it unblocks the probing of devices. For example, the IP4 autoconfig might timeout before fw_devlink unblocks probing of the network interface. This function is available to temporarily try and probe all devices that have a driver even if some of their suppliers haven't been added or don't have drivers. The drivers can then decide which of the suppliers are optional vs mandatory and probe the device if possible. By the time this function returns, all such "best effort" probes are guaranteed to be completed. If a device successfully probes in this mode, we delete all fw_devlink discovered dependencies of that device where the supplier hasn't yet probed successfully because they have to be optional dependencies. This also means that some devices that aren't needed for init and could have waited for their optional supplier to probe (when the supplier's module is loaded later on) would end up probing prematurely with limited functionality. So call this function only when boot would fail without it. Tested-by: Geert Uytterhoeven Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220601070707.3946847-5-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/base.h | 1 + drivers/base/core.c | 100 ++++++++++++++++++++++++++++++++-- drivers/base/dd.c | 19 +++++-- include/linux/device/driver.h | 1 + 4 files changed, 110 insertions(+), 11 deletions(-) diff --git a/drivers/base/base.h b/drivers/base/base.h index ab71403d102f..b3a43a164dcd 100644 --- a/drivers/base/base.h +++ b/drivers/base/base.h @@ -160,6 +160,7 @@ extern int devres_release_all(struct device *dev); extern void device_block_probing(void); extern void device_unblock_probing(void); extern void deferred_probe_extend_timeout(void); +extern void driver_deferred_probe_trigger(void); /* /sys/devices directory */ extern struct kset *devices_kset; diff --git a/drivers/base/core.c b/drivers/base/core.c index 7cd789c4985d..61fdfe99b348 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -54,6 +54,7 @@ static unsigned int defer_sync_state_count = 1; static DEFINE_MUTEX(fwnode_link_lock); static bool fw_devlink_is_permissive(void); static bool fw_devlink_drv_reg_done; +static bool fw_devlink_best_effort; /** * fwnode_link_add - Create a link between two fwnode_handles. @@ -965,6 +966,11 @@ static void device_links_missing_supplier(struct device *dev) } } +static bool dev_is_best_effort(struct device *dev) +{ + return fw_devlink_best_effort && dev->can_match; +} + /** * device_links_check_suppliers - Check presence of supplier drivers. * @dev: Consumer device. @@ -984,7 +990,7 @@ static void device_links_missing_supplier(struct device *dev) int device_links_check_suppliers(struct device *dev) { struct device_link *link; - int ret = 0; + int ret = 0, fwnode_ret = 0; struct fwnode_handle *sup_fw; /* @@ -997,12 +1003,17 @@ int device_links_check_suppliers(struct device *dev) sup_fw = list_first_entry(&dev->fwnode->suppliers, struct fwnode_link, c_hook)->supplier; - dev_err_probe(dev, -EPROBE_DEFER, "wait for supplier %pfwP\n", - sup_fw); - mutex_unlock(&fwnode_link_lock); - return -EPROBE_DEFER; + if (!dev_is_best_effort(dev)) { + fwnode_ret = -EPROBE_DEFER; + dev_err_probe(dev, -EPROBE_DEFER, + "wait for supplier %pfwP\n", sup_fw); + } else { + fwnode_ret = -EAGAIN; + } } mutex_unlock(&fwnode_link_lock); + if (fwnode_ret == -EPROBE_DEFER) + return fwnode_ret; device_links_write_lock(); @@ -1012,6 +1023,14 @@ int device_links_check_suppliers(struct device *dev) if (link->status != DL_STATE_AVAILABLE && !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) { + + if (dev_is_best_effort(dev) && + link->flags & DL_FLAG_INFERRED && + !link->supplier->can_match) { + ret = -EAGAIN; + continue; + } + device_links_missing_supplier(dev); dev_err_probe(dev, -EPROBE_DEFER, "supplier %s not ready\n", @@ -1024,7 +1043,8 @@ int device_links_check_suppliers(struct device *dev) dev->links.status = DL_DEV_PROBING; device_links_write_unlock(); - return ret; + + return ret ? ret : fwnode_ret; } /** @@ -1289,6 +1309,18 @@ void device_links_driver_bound(struct device *dev) * save to drop the managed link completely. */ device_link_drop_managed(link); + } else if (dev_is_best_effort(dev) && + link->flags & DL_FLAG_INFERRED && + link->status != DL_STATE_CONSUMER_PROBE && + !link->supplier->can_match) { + /* + * When dev_is_best_effort() is true, we ignore device + * links to suppliers that don't have a driver. If the + * consumer device still managed to probe, there's no + * point in maintaining a device link in a weird state + * (consumer probed before supplier). So delete it. + */ + device_link_drop_managed(link); } else { WARN_ON(link->status != DL_STATE_CONSUMER_PROBE); WRITE_ONCE(link->status, DL_STATE_ACTIVE); @@ -1655,6 +1687,62 @@ void fw_devlink_drivers_done(void) device_links_write_unlock(); } +/** + * wait_for_init_devices_probe - Try to probe any device needed for init + * + * Some devices might need to be probed and bound successfully before the kernel + * boot sequence can finish and move on to init/userspace. For example, a + * network interface might need to be bound to be able to mount a NFS rootfs. + * + * With fw_devlink=on by default, some of these devices might be blocked from + * probing because they are waiting on a optional supplier that doesn't have a + * driver. While fw_devlink will eventually identify such devices and unblock + * the probing automatically, it might be too late by the time it unblocks the + * probing of devices. For example, the IP4 autoconfig might timeout before + * fw_devlink unblocks probing of the network interface. + * + * This function is available to temporarily try and probe all devices that have + * a driver even if some of their suppliers haven't been added or don't have + * drivers. + * + * The drivers can then decide which of the suppliers are optional vs mandatory + * and probe the device if possible. By the time this function returns, all such + * "best effort" probes are guaranteed to be completed. If a device successfully + * probes in this mode, we delete all fw_devlink discovered dependencies of that + * device where the supplier hasn't yet probed successfully because they have to + * be optional dependencies. + * + * Any devices that didn't successfully probe go back to being treated as if + * this function was never called. + * + * This also means that some devices that aren't needed for init and could have + * waited for their optional supplier to probe (when the supplier's module is + * loaded later on) would end up probing prematurely with limited functionality. + * So call this function only when boot would fail without it. + */ +void __init wait_for_init_devices_probe(void) +{ + if (!fw_devlink_flags || fw_devlink_is_permissive()) + return; + + /* + * Wait for all ongoing probes to finish so that the "best effort" is + * only applied to devices that can't probe otherwise. + */ + wait_for_device_probe(); + + pr_info("Trying to probe devices needed for running init ...\n"); + fw_devlink_best_effort = true; + driver_deferred_probe_trigger(); + + /* + * Wait for all "best effort" probes to finish before going back to + * normal enforcement. + */ + wait_for_device_probe(); + fw_devlink_best_effort = false; +} + static void fw_devlink_unblock_consumers(struct device *dev) { struct device_link *link; diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 11b0fb6414d3..4a55fbb7e0da 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -172,7 +172,7 @@ static bool driver_deferred_probe_enable; * changes in the midst of a probe, then deferred processing should be triggered * again. */ -static void driver_deferred_probe_trigger(void) +void driver_deferred_probe_trigger(void) { if (!driver_deferred_probe_enable) return; @@ -580,7 +580,7 @@ static int really_probe(struct device *dev, struct device_driver *drv) { bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) && !drv->suppress_bind_attrs; - int ret; + int ret, link_ret; if (defer_all_probes) { /* @@ -592,9 +592,9 @@ static int really_probe(struct device *dev, struct device_driver *drv) return -EPROBE_DEFER; } - ret = device_links_check_suppliers(dev); - if (ret) - return ret; + link_ret = device_links_check_suppliers(dev); + if (link_ret == -EPROBE_DEFER) + return link_ret; pr_debug("bus: '%s': %s: probing driver %s with device %s\n", drv->bus->name, __func__, drv->name, dev_name(dev)); @@ -633,6 +633,15 @@ re_probe: ret = call_driver_probe(dev, drv); if (ret) { + /* + * If fw_devlink_best_effort is active (denoted by -EAGAIN), the + * device might actually probe properly once some of its missing + * suppliers have probed. So, treat this as if the driver + * returned -EPROBE_DEFER. + */ + if (link_ret == -EAGAIN) + ret = -EPROBE_DEFER; + /* * Return probe errors as positive values so that the callers * can distinguish them from other errors. diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index 700453017e1c..2114d65b862f 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -129,6 +129,7 @@ extern struct device_driver *driver_find(const char *name, struct bus_type *bus); extern int driver_probe_done(void); extern void wait_for_device_probe(void); +void __init wait_for_init_devices_probe(void); /* sysfs interface for exporting driver attributes */ From dd429036e778a3d789e6f6df6d1684764048fb50 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Jun 2022 00:07:01 -0700 Subject: [PATCH 05/63] net: ipconfig: Relax fw_devlink if we need to mount a network rootfs If there are network devices that could probe without some of their suppliers probing and those network devices are needed to mount a network rootfs, then fw_devlink=on might break that usecase by blocking the network devices from probing by the time IP auto config starts. So, if no network devices are available when IP auto config is enabled and we have a network rootfs, make sure fw_devlink doesn't block the probing of any device that has a driver and then retry finding a network device. Tested-by: Geert Uytterhoeven Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220601070707.3946847-6-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- net/ipv4/ipconfig.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c index 9d41d5d5cd1e..2342debd7066 100644 --- a/net/ipv4/ipconfig.c +++ b/net/ipv4/ipconfig.c @@ -1434,6 +1434,7 @@ __be32 __init root_nfs_parse_addr(char *name) static int __init wait_for_devices(void) { int i; + bool try_init_devs = true; for (i = 0; i < DEVICE_WAIT_MAX; i++) { struct net_device *dev; @@ -1452,6 +1453,11 @@ static int __init wait_for_devices(void) rtnl_unlock(); if (found) return 0; + if (try_init_devs && + (ROOT_DEV == Root_NFS || ROOT_DEV == Root_CIFS)) { + try_init_devs = false; + wait_for_init_devices_probe(); + } ssleep(1); } return -ENODEV; From f516d01b9df2782b9399c44fa1d21c3d09211f8a Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Jun 2022 00:07:02 -0700 Subject: [PATCH 06/63] Revert "driver core: Set default deferred_probe_timeout back to 0." This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61. Let's take another shot at getting deferred_probe_timeout=10 to work. Tested-by: Geert Uytterhoeven Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220601070707.3946847-7-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/dd.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 4a55fbb7e0da..335e71d3a618 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -256,7 +256,12 @@ static int deferred_devs_show(struct seq_file *s, void *data) } DEFINE_SHOW_ATTRIBUTE(deferred_devs); +#ifdef CONFIG_MODULES +int driver_deferred_probe_timeout = 10; +#else int driver_deferred_probe_timeout; +#endif + EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout); static int __init deferred_probe_timeout_setup(char *str) From 71066545b48e4259f89481199a0bbc7c35457738 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Jun 2022 00:07:03 -0700 Subject: [PATCH 07/63] driver core: Set fw_devlink.strict=1 by default Now that deferred_probe_timeout is non-zero by default, fw_devlink will never permanently block the probing of devices. It'll try its best to probe the devices in the right order and then finally let devices probe even if their suppliers don't have any drivers. Tested-by: Geert Uytterhoeven Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220601070707.3946847-8-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 61fdfe99b348..977b379a495b 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -1613,7 +1613,7 @@ static int __init fw_devlink_setup(char *arg) } early_param("fw_devlink", fw_devlink_setup); -static bool fw_devlink_strict; +static bool fw_devlink_strict = true; static int __init fw_devlink_strict_setup(char *arg) { return strtobool(arg, &fw_devlink_strict); From b09796d528bbf06e3e10a4a8f78038719da7ebc6 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Jun 2022 00:07:04 -0700 Subject: [PATCH 08/63] iommu/of: Delete usage of driver_deferred_probe_check_state() Now that fw_devlink=on and fw_devlink.strict=1 by default and fw_devlink supports iommu DT properties, the execution will never get to the point where driver_deferred_probe_check_state() is called before the supplier has probed successfully or before deferred probe timeout has expired. So, delete the call and replace it with -ENODEV. Tested-by: Geert Uytterhoeven Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220601070707.3946847-9-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- drivers/iommu/of_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index 5696314ae69e..41f4eb005219 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -40,7 +40,7 @@ static int of_iommu_xlate(struct device *dev, * a proper probe-ordering dependency mechanism in future. */ if (!ops) - return driver_deferred_probe_check_state(dev); + return -ENODEV; if (!try_module_get(ops->owner)) return -ENODEV; From 9cbffc7a59561be950ecc675d19a3d2b45202b2b Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Jun 2022 00:07:05 -0700 Subject: [PATCH 09/63] driver core: Delete driver_deferred_probe_check_state() The function is no longer used. So delete it. Tested-by: Geert Uytterhoeven Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220601070707.3946847-10-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/dd.c | 30 ------------------------------ include/linux/device/driver.h | 1 - 2 files changed, 31 deletions(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 335e71d3a618..e600dd2afc35 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -274,42 +274,12 @@ static int __init deferred_probe_timeout_setup(char *str) } __setup("deferred_probe_timeout=", deferred_probe_timeout_setup); -/** - * driver_deferred_probe_check_state() - Check deferred probe state - * @dev: device to check - * - * Return: - * * -ENODEV if initcalls have completed and modules are disabled. - * * -ETIMEDOUT if the deferred probe timeout was set and has expired - * and modules are enabled. - * * -EPROBE_DEFER in other cases. - * - * Drivers or subsystems can opt-in to calling this function instead of directly - * returning -EPROBE_DEFER. - */ -int driver_deferred_probe_check_state(struct device *dev) -{ - if (!IS_ENABLED(CONFIG_MODULES) && initcalls_done) { - dev_warn(dev, "ignoring dependency for device, assuming no driver\n"); - return -ENODEV; - } - - if (!driver_deferred_probe_timeout && initcalls_done) { - dev_warn(dev, "deferred probe timeout, ignoring dependency\n"); - return -ETIMEDOUT; - } - - return -EPROBE_DEFER; -} -EXPORT_SYMBOL_GPL(driver_deferred_probe_check_state); - static void deferred_probe_timeout_work_func(struct work_struct *work) { struct device_private *p; fw_devlink_drivers_done(); - driver_deferred_probe_timeout = 0; driver_deferred_probe_trigger(); flush_work(&deferred_probe_work); diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index 2114d65b862f..7acaabde5396 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -242,7 +242,6 @@ driver_find_device_by_acpi_dev(struct device_driver *drv, const void *adev) extern int driver_deferred_probe_timeout; void driver_deferred_probe_add(struct device *dev); -int driver_deferred_probe_check_state(struct device *dev); void driver_init(void); /** From 82b070beae1ef55b0049768c8dc91d87565bb191 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 10 Jun 2022 15:02:18 +0300 Subject: [PATCH 10/63] driver core: Introduce device_find_any_child() helper There are several places in the kernel where this kind of functionality is being used. Provide a generic helper for such cases. Reviewed-by: Rafael J. Wysocki Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220610120219.18988-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/core.c | 20 ++++++++++++++++++++ include/linux/device.h | 2 ++ 2 files changed, 22 insertions(+) diff --git a/drivers/base/core.c b/drivers/base/core.c index 977b379a495b..839f64485a55 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -3920,6 +3920,26 @@ struct device *device_find_child_by_name(struct device *parent, } EXPORT_SYMBOL_GPL(device_find_child_by_name); +static int match_any(struct device *dev, void *unused) +{ + return 1; +} + +/** + * device_find_any_child - device iterator for locating a child device, if any. + * @parent: parent struct device + * + * This is similar to the device_find_child() function above, but it + * returns a reference to a child device, if any. + * + * NOTE: you will need to drop the reference with put_device() after use. + */ +struct device *device_find_any_child(struct device *parent) +{ + return device_find_child(parent, NULL, match_any); +} +EXPORT_SYMBOL_GPL(device_find_any_child); + int __init devices_init(void) { devices_kset = kset_create_and_add("devices", &device_uevent_ops, NULL); diff --git a/include/linux/device.h b/include/linux/device.h index dc941997795c..424b55df0272 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -905,6 +905,8 @@ struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); struct device *device_find_child_by_name(struct device *parent, const char *name); +struct device *device_find_any_child(struct device *parent); + int device_rename(struct device *dev, const char *new_name); int device_move(struct device *dev, struct device *new_parent, enum dpm_order dpm_order); From c21b0837983d3b00c4f73927dae8441bf478087f Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 10 Jun 2022 15:02:19 +0300 Subject: [PATCH 11/63] spi: Use device_find_any_child() instead of custom approach We have already a helper to get the first child device, use it and drop custom approach. Reviewed-by: Rafael J. Wysocki Acked-by: Mark Brown Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220610120219.18988-2-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/spi/spi.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index ea09d1b42bf6..b04be04dddfa 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -2613,11 +2613,6 @@ int spi_slave_abort(struct spi_device *spi) } EXPORT_SYMBOL_GPL(spi_slave_abort); -static int match_true(struct device *dev, void *data) -{ - return 1; -} - static ssize_t slave_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -2625,7 +2620,7 @@ static ssize_t slave_show(struct device *dev, struct device_attribute *attr, dev); struct device *child; - child = device_find_child(&ctlr->dev, NULL, match_true); + child = device_find_any_child(&ctlr->dev); return sprintf(buf, "%s\n", child ? to_spi_device(child)->modalias : NULL); } @@ -2644,7 +2639,7 @@ static ssize_t slave_store(struct device *dev, struct device_attribute *attr, if (rc != 1 || !name[0]) return -EINVAL; - child = device_find_child(&ctlr->dev, NULL, match_true); + child = device_find_any_child(&ctlr->dev); if (child) { /* Remove registered slave */ device_unregister(child); From 77515ebaf01920e2db49e04672ef669a7c2907f2 Mon Sep 17 00:00:00 2001 From: Duoming Zhou Date: Tue, 7 Jun 2022 11:26:25 +0800 Subject: [PATCH 12/63] devcoredump: remove the useless gfp_t parameter in dev_coredumpv and dev_coredumpm The dev_coredumpv() and dev_coredumpm() could not be used in atomic context, because they call kvasprintf_const() and kstrdup() with GFP_KERNEL parameter. The process is shown below: dev_coredumpv(.., gfp_t gfp) dev_coredumpm(.., gfp_t gfp) dev_set_name kobject_set_name_vargs kvasprintf_const(GFP_KERNEL, ...); //may sleep kstrdup(s, GFP_KERNEL); //may sleep This patch removes gfp_t parameter of dev_coredumpv() and dev_coredumpm() and changes the gfp_t parameter of kzalloc() in dev_coredumpm() to GFP_KERNEL in order to show they could not be used in atomic context. Fixes: 833c95456a70 ("device coredump: add new device coredump class") Reviewed-by: Brian Norris Reviewed-by: Johannes Berg Signed-off-by: Duoming Zhou Link: https://lore.kernel.org/r/df72af3b1862bac7d8e793d1f3931857d3779dfd.1654569290.git.duoming@zju.edu.cn Signed-off-by: Greg Kroah-Hartman --- drivers/base/devcoredump.c | 16 ++++++---------- drivers/bluetooth/btmrvl_sdio.c | 2 +- drivers/bluetooth/hci_qca.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_dump.c | 2 +- drivers/gpu/drm/msm/disp/msm_disp_snapshot.c | 4 ++-- drivers/gpu/drm/msm/msm_gpu.c | 4 ++-- drivers/media/platform/qcom/venus/core.c | 2 +- drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c | 2 +- drivers/net/wireless/ath/ath10k/coredump.c | 2 +- .../net/wireless/ath/wil6210/wil_crash_dump.c | 2 +- .../wireless/broadcom/brcm80211/brcmfmac/debug.c | 2 +- drivers/net/wireless/intel/iwlwifi/fw/dbg.c | 6 ++---- drivers/net/wireless/marvell/mwifiex/main.c | 3 +-- drivers/net/wireless/mediatek/mt76/mt7615/mac.c | 3 +-- drivers/net/wireless/mediatek/mt76/mt7921/mac.c | 3 +-- drivers/net/wireless/realtek/rtw88/main.c | 2 +- drivers/net/wireless/realtek/rtw89/ser.c | 2 +- drivers/remoteproc/qcom_q6v5_mss.c | 2 +- drivers/remoteproc/remoteproc_coredump.c | 8 ++++---- include/drm/drm_print.h | 2 +- include/linux/devcoredump.h | 13 ++++++------- sound/soc/intel/avs/apl.c | 2 +- sound/soc/intel/avs/skl.c | 2 +- sound/soc/intel/catpt/dsp.c | 2 +- 24 files changed, 40 insertions(+), 50 deletions(-) diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c index f4d794d6bb85..8535f0bd5dfb 100644 --- a/drivers/base/devcoredump.c +++ b/drivers/base/devcoredump.c @@ -173,15 +173,13 @@ static void devcd_freev(void *data) * @dev: the struct device for the crashed device * @data: vmalloc data containing the device coredump * @datalen: length of the data - * @gfp: allocation flags * * This function takes ownership of the vmalloc'ed data and will free * it when it is no longer used. See dev_coredumpm() for more information. */ -void dev_coredumpv(struct device *dev, void *data, size_t datalen, - gfp_t gfp) +void dev_coredumpv(struct device *dev, void *data, size_t datalen) { - dev_coredumpm(dev, NULL, data, datalen, gfp, devcd_readv, devcd_freev); + dev_coredumpm(dev, NULL, data, datalen, devcd_readv, devcd_freev); } EXPORT_SYMBOL_GPL(dev_coredumpv); @@ -236,7 +234,6 @@ static ssize_t devcd_read_from_sgtable(char *buffer, loff_t offset, * @owner: the module that contains the read/free functions, use %THIS_MODULE * @data: data cookie for the @read/@free functions * @datalen: length of the data - * @gfp: allocation flags * @read: function to read from the given buffer * @free: function to free the given buffer * @@ -246,7 +243,7 @@ static ssize_t devcd_read_from_sgtable(char *buffer, loff_t offset, * function will be called to free the data. */ void dev_coredumpm(struct device *dev, struct module *owner, - void *data, size_t datalen, gfp_t gfp, + void *data, size_t datalen, ssize_t (*read)(char *buffer, loff_t offset, size_t count, void *data, size_t datalen), void (*free)(void *data)) @@ -268,7 +265,7 @@ void dev_coredumpm(struct device *dev, struct module *owner, if (!try_module_get(owner)) goto free; - devcd = kzalloc(sizeof(*devcd), gfp); + devcd = kzalloc(sizeof(*devcd), GFP_KERNEL); if (!devcd) goto put_module; @@ -318,7 +315,6 @@ EXPORT_SYMBOL_GPL(dev_coredumpm); * @dev: the struct device for the crashed device * @table: the dump data * @datalen: length of the data - * @gfp: allocation flags * * Creates a new device coredump for the given device. If a previous one hasn't * been read yet, the new coredump is discarded. The data lifetime is determined @@ -326,9 +322,9 @@ EXPORT_SYMBOL_GPL(dev_coredumpm); * it will free the data. */ void dev_coredumpsg(struct device *dev, struct scatterlist *table, - size_t datalen, gfp_t gfp) + size_t datalen) { - dev_coredumpm(dev, NULL, table, datalen, gfp, devcd_read_from_sgtable, + dev_coredumpm(dev, NULL, table, datalen, devcd_read_from_sgtable, devcd_free_sgtable); } EXPORT_SYMBOL_GPL(dev_coredumpsg); diff --git a/drivers/bluetooth/btmrvl_sdio.c b/drivers/bluetooth/btmrvl_sdio.c index b8ef66f89fc1..9b9728719db2 100644 --- a/drivers/bluetooth/btmrvl_sdio.c +++ b/drivers/bluetooth/btmrvl_sdio.c @@ -1515,7 +1515,7 @@ done: /* fw_dump_data will be free in device coredump release function * after 5 min */ - dev_coredumpv(&card->func->dev, fw_dump_data, fw_dump_len, GFP_KERNEL); + dev_coredumpv(&card->func->dev, fw_dump_data, fw_dump_len); BT_INFO("== btmrvl firmware dump to /sys/class/devcoredump end"); } diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index eab34e24d944..2e4074211ae9 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -1120,7 +1120,7 @@ static void qca_controller_memdump(struct work_struct *work) qca_memdump->ram_dump_size); memdump_buf = qca_memdump->memdump_buf_head; dev_coredumpv(&hu->serdev->dev, memdump_buf, - qca_memdump->received_dump, GFP_KERNEL); + qca_memdump->received_dump); cancel_delayed_work(&qca->ctrl_memdump_timeout); kfree(qca->qca_memdump); qca->qca_memdump = NULL; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c index f418e0b75772..519fcb234b3e 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c @@ -225,5 +225,5 @@ void etnaviv_core_dump(struct etnaviv_gem_submit *submit) etnaviv_core_dump_header(&iter, ETDUMP_BUF_END, iter.data); - dev_coredumpv(gpu->dev, iter.start, iter.data - iter.start, GFP_KERNEL); + dev_coredumpv(gpu->dev, iter.start, iter.data - iter.start); } diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c index e75b97127c0d..f057d294c30b 100644 --- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c +++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c @@ -74,8 +74,8 @@ static void _msm_disp_snapshot_work(struct kthread_work *work) * If there is a codedump pending for the device, the dev_coredumpm() * will also free new coredump state. */ - dev_coredumpm(disp_state->dev, THIS_MODULE, disp_state, 0, GFP_KERNEL, - disp_devcoredump_read, msm_disp_state_free); + dev_coredumpm(disp_state->dev, THIS_MODULE, disp_state, 0, + disp_devcoredump_read, msm_disp_state_free); } void msm_disp_snapshot_state(struct drm_device *drm_dev) diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index eb8a6663f309..30576ced0a0a 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -317,8 +317,8 @@ static void msm_gpu_crashstate_capture(struct msm_gpu *gpu, gpu->crashstate = state; /* FIXME: Release the crashstate if this errors out? */ - dev_coredumpm(gpu->dev->dev, THIS_MODULE, gpu, 0, GFP_KERNEL, - msm_gpu_devcoredump_read, msm_gpu_devcoredump_free); + dev_coredumpm(gpu->dev->dev, THIS_MODULE, gpu, 0, + msm_gpu_devcoredump_read, msm_gpu_devcoredump_free); } #else static void msm_gpu_crashstate_capture(struct msm_gpu *gpu, diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c index 877eca125803..db84dfb3fb11 100644 --- a/drivers/media/platform/qcom/venus/core.c +++ b/drivers/media/platform/qcom/venus/core.c @@ -49,7 +49,7 @@ static void venus_coredump(struct venus_core *core) memcpy(data, mem_va, mem_size); memunmap(mem_va); - dev_coredumpv(dev, data, mem_size, GFP_KERNEL); + dev_coredumpv(dev, data, mem_size); } static void venus_event_notify(struct venus_core *core, u32 event) diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c index c991b30bc9f0..fa520ab7c960 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c @@ -281,5 +281,5 @@ void mcp251xfd_dump(const struct mcp251xfd_priv *priv) mcp251xfd_dump_end(priv, &iter); dev_coredumpv(&priv->spi->dev, iter.start, - iter.data - iter.start, GFP_KERNEL); + iter.data - iter.start); } diff --git a/drivers/net/wireless/ath/ath10k/coredump.c b/drivers/net/wireless/ath/ath10k/coredump.c index fe6b6f97a916..dc9237069921 100644 --- a/drivers/net/wireless/ath/ath10k/coredump.c +++ b/drivers/net/wireless/ath/ath10k/coredump.c @@ -1607,7 +1607,7 @@ int ath10k_coredump_submit(struct ath10k *ar) return -ENODATA; } - dev_coredumpv(ar->dev, dump, le32_to_cpu(dump->len), GFP_KERNEL); + dev_coredumpv(ar->dev, dump, le32_to_cpu(dump->len)); return 0; } diff --git a/drivers/net/wireless/ath/wil6210/wil_crash_dump.c b/drivers/net/wireless/ath/wil6210/wil_crash_dump.c index 89c12cb2aaab..79299609dd62 100644 --- a/drivers/net/wireless/ath/wil6210/wil_crash_dump.c +++ b/drivers/net/wireless/ath/wil6210/wil_crash_dump.c @@ -117,6 +117,6 @@ void wil_fw_core_dump(struct wil6210_priv *wil) /* fw_dump_data will be free in device coredump release function * after 5 min */ - dev_coredumpv(wil_to_dev(wil), fw_dump_data, fw_dump_size, GFP_KERNEL); + dev_coredumpv(wil_to_dev(wil), fw_dump_data, fw_dump_size); wil_info(wil, "fw core dumped, size %d bytes\n", fw_dump_size); } diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c index eecf8a38d94a..87f3652ef3bd 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c @@ -37,7 +37,7 @@ int brcmf_debug_create_memdump(struct brcmf_bus *bus, const void *data, return err; } - dev_coredumpv(bus->dev, dump, len + ramsize, GFP_KERNEL); + dev_coredumpv(bus->dev, dump, len + ramsize); return 0; } diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c index abf49022edbe..f2f7cf494a8c 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c +++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c @@ -2601,8 +2601,7 @@ static void iwl_fw_error_dump(struct iwl_fw_runtime *fwrt, fw_error_dump.trans_ptr->data, fw_error_dump.trans_ptr->len, fw_error_dump.fwrt_len); - dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len, - GFP_KERNEL); + dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len); } vfree(fw_error_dump.fwrt_ptr); vfree(fw_error_dump.trans_ptr); @@ -2647,8 +2646,7 @@ static void iwl_fw_error_ini_dump(struct iwl_fw_runtime *fwrt, entry->data, entry->size, offs); offs += entry->size; } - dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len, - GFP_KERNEL); + dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len); } iwl_dump_ini_list_free(&dump_list); } diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index ace7371c4773..26fef0ab1b0d 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -1115,8 +1115,7 @@ void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter) */ mwifiex_dbg(adapter, MSG, "== mwifiex dump information to /sys/class/devcoredump start\n"); - dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len, - GFP_KERNEL); + dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len); mwifiex_dbg(adapter, MSG, "== mwifiex dump information to /sys/class/devcoredump end\n"); diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c index bd687f7de628..5336fe8c668d 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c @@ -2421,6 +2421,5 @@ void mt7615_coredump_work(struct work_struct *work) dev_kfree_skb(skb); } - dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ, - GFP_KERNEL); + dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ); } diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c index a630ddbf19e5..cac284f95ce0 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c @@ -1630,8 +1630,7 @@ void mt7921_coredump_work(struct work_struct *work) } if (dump) - dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ, - GFP_KERNEL); + dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ); mt7921_reset(&dev->mt76); } diff --git a/drivers/net/wireless/realtek/rtw88/main.c b/drivers/net/wireless/realtek/rtw88/main.c index efabd5b1bf5b..a276544cecdd 100644 --- a/drivers/net/wireless/realtek/rtw88/main.c +++ b/drivers/net/wireless/realtek/rtw88/main.c @@ -414,7 +414,7 @@ static void rtw_fwcd_dump(struct rtw_dev *rtwdev) * framework. Note that a new dump will be discarded if a previous one * hasn't been released yet. */ - dev_coredumpv(rtwdev->dev, desc->data, desc->size, GFP_KERNEL); + dev_coredumpv(rtwdev->dev, desc->data, desc->size); } static void rtw_fwcd_free(struct rtw_dev *rtwdev, bool free_self) diff --git a/drivers/net/wireless/realtek/rtw89/ser.c b/drivers/net/wireless/realtek/rtw89/ser.c index 9e95ed972710..d28fe01ad729 100644 --- a/drivers/net/wireless/realtek/rtw89/ser.c +++ b/drivers/net/wireless/realtek/rtw89/ser.c @@ -127,7 +127,7 @@ static void rtw89_ser_cd_send(struct rtw89_dev *rtwdev, * will be discarded if a previous one hasn't been released by * framework yet. */ - dev_coredumpv(rtwdev->dev, buf, sizeof(*buf), GFP_KERNEL); + dev_coredumpv(rtwdev->dev, buf, sizeof(*buf)); } static void rtw89_ser_cd_free(struct rtw89_dev *rtwdev, diff --git a/drivers/remoteproc/qcom_q6v5_mss.c b/drivers/remoteproc/qcom_q6v5_mss.c index af217de75e4d..813d87faef6c 100644 --- a/drivers/remoteproc/qcom_q6v5_mss.c +++ b/drivers/remoteproc/qcom_q6v5_mss.c @@ -597,7 +597,7 @@ static void q6v5_dump_mba_logs(struct q6v5 *qproc) data = vmalloc(MBA_LOG_SIZE); if (data) { memcpy(data, mba_region, MBA_LOG_SIZE); - dev_coredumpv(&rproc->dev, data, MBA_LOG_SIZE, GFP_KERNEL); + dev_coredumpv(&rproc->dev, data, MBA_LOG_SIZE); } memunmap(mba_region); } diff --git a/drivers/remoteproc/remoteproc_coredump.c b/drivers/remoteproc/remoteproc_coredump.c index 4b093420d98a..cd55c2abd227 100644 --- a/drivers/remoteproc/remoteproc_coredump.c +++ b/drivers/remoteproc/remoteproc_coredump.c @@ -309,7 +309,7 @@ void rproc_coredump(struct rproc *rproc) phdr += elf_size_of_phdr(class); } if (dump_conf == RPROC_COREDUMP_ENABLED) { - dev_coredumpv(&rproc->dev, data, data_size, GFP_KERNEL); + dev_coredumpv(&rproc->dev, data, data_size); return; } @@ -318,7 +318,7 @@ void rproc_coredump(struct rproc *rproc) dump_state.header = data; init_completion(&dump_state.dump_done); - dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size, GFP_KERNEL, + dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size, rproc_coredump_read, rproc_coredump_free); /* @@ -449,7 +449,7 @@ void rproc_coredump_using_sections(struct rproc *rproc) } if (dump_conf == RPROC_COREDUMP_ENABLED) { - dev_coredumpv(&rproc->dev, data, data_size, GFP_KERNEL); + dev_coredumpv(&rproc->dev, data, data_size); return; } @@ -458,7 +458,7 @@ void rproc_coredump_using_sections(struct rproc *rproc) dump_state.header = data; init_completion(&dump_state.dump_done); - dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size, GFP_KERNEL, + dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size, rproc_coredump_read, rproc_coredump_free); /* Wait until the dump is read and free is called. Data is freed diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h index 22fabdeed297..b41850366bcc 100644 --- a/include/drm/drm_print.h +++ b/include/drm/drm_print.h @@ -162,7 +162,7 @@ struct drm_print_iterator { * void makecoredump(...) * { * ... - * dev_coredumpm(dev, THIS_MODULE, data, 0, GFP_KERNEL, + * dev_coredumpm(dev, THIS_MODULE, data, 0, * coredump_read, ...) * } * diff --git a/include/linux/devcoredump.h b/include/linux/devcoredump.h index c008169ed2c6..c7d840d824c3 100644 --- a/include/linux/devcoredump.h +++ b/include/linux/devcoredump.h @@ -52,27 +52,26 @@ static inline void _devcd_free_sgtable(struct scatterlist *table) #ifdef CONFIG_DEV_COREDUMP -void dev_coredumpv(struct device *dev, void *data, size_t datalen, - gfp_t gfp); +void dev_coredumpv(struct device *dev, void *data, size_t datalen); void dev_coredumpm(struct device *dev, struct module *owner, - void *data, size_t datalen, gfp_t gfp, + void *data, size_t datalen, ssize_t (*read)(char *buffer, loff_t offset, size_t count, void *data, size_t datalen), void (*free)(void *data)); void dev_coredumpsg(struct device *dev, struct scatterlist *table, - size_t datalen, gfp_t gfp); + size_t datalen); #else static inline void dev_coredumpv(struct device *dev, void *data, - size_t datalen, gfp_t gfp) + size_t datalen) { vfree(data); } static inline void dev_coredumpm(struct device *dev, struct module *owner, - void *data, size_t datalen, gfp_t gfp, + void *data, size_t datalen, ssize_t (*read)(char *buffer, loff_t offset, size_t count, void *data, size_t datalen), void (*free)(void *data)) @@ -81,7 +80,7 @@ dev_coredumpm(struct device *dev, struct module *owner, } static inline void dev_coredumpsg(struct device *dev, struct scatterlist *table, - size_t datalen, gfp_t gfp) + size_t datalen) { _devcd_free_sgtable(table); } diff --git a/sound/soc/intel/avs/apl.c b/sound/soc/intel/avs/apl.c index b8e2b23c9f64..1ff57f1a483d 100644 --- a/sound/soc/intel/avs/apl.c +++ b/sound/soc/intel/avs/apl.c @@ -164,7 +164,7 @@ static int apl_coredump(struct avs_dev *adev, union avs_notify_msg *msg) } while (offset < msg->ext.coredump.stack_dump_size); exit: - dev_coredumpv(adev->dev, dump, dump_size, GFP_KERNEL); + dev_coredumpv(adev->dev, dump, dump_size); return 0; } diff --git a/sound/soc/intel/avs/skl.c b/sound/soc/intel/avs/skl.c index bda5ec7510fe..3413162768dc 100644 --- a/sound/soc/intel/avs/skl.c +++ b/sound/soc/intel/avs/skl.c @@ -88,7 +88,7 @@ static int skl_coredump(struct avs_dev *adev, union avs_notify_msg *msg) return -ENOMEM; memcpy_fromio(dump, avs_sram_addr(adev, AVS_FW_REGS_WINDOW), AVS_FW_REGS_SIZE); - dev_coredumpv(adev->dev, dump, AVS_FW_REGS_SIZE, GFP_KERNEL); + dev_coredumpv(adev->dev, dump, AVS_FW_REGS_SIZE); return 0; } diff --git a/sound/soc/intel/catpt/dsp.c b/sound/soc/intel/catpt/dsp.c index 346bec000306..d2afe9ff1e3a 100644 --- a/sound/soc/intel/catpt/dsp.c +++ b/sound/soc/intel/catpt/dsp.c @@ -539,7 +539,7 @@ int catpt_coredump(struct catpt_dev *cdev) pos += CATPT_DMA_REGS_SIZE; } - dev_coredumpv(cdev->dev, dump, dump_size, GFP_KERNEL); + dev_coredumpv(cdev->dev, dump, dump_size); return 0; } From a52ed4866d2b90dd5e4ae9dabd453f3ed8fa3cbc Mon Sep 17 00:00:00 2001 From: Duoming Zhou Date: Tue, 7 Jun 2022 11:26:26 +0800 Subject: [PATCH 13/63] mwifiex: fix sleep in atomic context bugs caused by dev_coredumpv There are sleep in atomic context bugs when uploading device dump data in mwifiex. The root cause is that dev_coredumpv could not be used in atomic contexts, because it calls dev_set_name which include operations that may sleep. The call tree shows execution paths that could lead to bugs: (Interrupt context) fw_dump_timer_fn mwifiex_upload_device_dump dev_coredumpv(..., GFP_KERNEL) dev_coredumpm() kzalloc(sizeof(*devcd), gfp); //may sleep dev_set_name kobject_set_name_vargs kvasprintf_const(GFP_KERNEL, ...); //may sleep kstrdup(s, GFP_KERNEL); //may sleep The corresponding fail log is shown below: [ 135.275938] usb 1-1: == mwifiex dump information to /sys/class/devcoredump start [ 135.281029] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265 ... [ 135.293613] Call Trace: [ 135.293613] [ 135.293613] dump_stack_lvl+0x57/0x7d [ 135.293613] __might_resched.cold+0x138/0x173 [ 135.293613] ? dev_coredumpm+0xca/0x2e0 [ 135.293613] kmem_cache_alloc_trace+0x189/0x1f0 [ 135.293613] ? devcd_match_failing+0x30/0x30 [ 135.293613] dev_coredumpm+0xca/0x2e0 [ 135.293613] ? devcd_freev+0x10/0x10 [ 135.293613] dev_coredumpv+0x1c/0x20 [ 135.293613] ? devcd_match_failing+0x30/0x30 [ 135.293613] mwifiex_upload_device_dump+0x65/0xb0 [ 135.293613] ? mwifiex_dnld_fw+0x1b0/0x1b0 [ 135.293613] call_timer_fn+0x122/0x3d0 [ 135.293613] ? msleep_interruptible+0xb0/0xb0 [ 135.293613] ? lock_downgrade+0x3c0/0x3c0 [ 135.293613] ? __next_timer_interrupt+0x13c/0x160 [ 135.293613] ? lockdep_hardirqs_on_prepare+0xe/0x220 [ 135.293613] ? mwifiex_dnld_fw+0x1b0/0x1b0 [ 135.293613] __run_timers.part.0+0x3f8/0x540 [ 135.293613] ? call_timer_fn+0x3d0/0x3d0 [ 135.293613] ? arch_restore_msi_irqs+0x10/0x10 [ 135.293613] ? lapic_next_event+0x31/0x40 [ 135.293613] run_timer_softirq+0x4f/0xb0 [ 135.293613] __do_softirq+0x1c2/0x651 ... [ 135.293613] RIP: 0010:default_idle+0xb/0x10 [ 135.293613] RSP: 0018:ffff888006317e68 EFLAGS: 00000246 [ 135.293613] RAX: ffffffff82ad8d10 RBX: ffff888006301cc0 RCX: ffffffff82ac90e1 [ 135.293613] RDX: ffffed100d9ff1b4 RSI: ffffffff831ad140 RDI: ffffffff82ad8f20 [ 135.293613] RBP: 0000000000000003 R08: 0000000000000000 R09: ffff88806cff8d9b [ 135.293613] R10: ffffed100d9ff1b3 R11: 0000000000000001 R12: ffffffff84593410 [ 135.293613] R13: 0000000000000000 R14: 0000000000000000 R15: 1ffff11000c62fd2 ... [ 135.389205] usb 1-1: == mwifiex dump information to /sys/class/devcoredump end This patch uses delayed work to replace timer and moves the operations that may sleep into a delayed work in order to mitigate bugs, it was tested on Marvell 88W8801 chip whose port is usb and the firmware is usb8801_uapsta.bin. The following is the result after using delayed work to replace timer. [ 134.936453] usb 1-1: == mwifiex dump information to /sys/class/devcoredump start [ 135.043344] usb 1-1: == mwifiex dump information to /sys/class/devcoredump end As we can see, there is no bug now. Fixes: f5ecd02a8b20 ("mwifiex: device dump support for usb interface") Reviewed-by: Brian Norris Signed-off-by: Duoming Zhou Link: https://lore.kernel.org/r/b63b77fc84ed3e8a6bef02378e17c7c71a0bc3be.1654569290.git.duoming@zju.edu.cn Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/marvell/mwifiex/init.c | 9 +++++---- drivers/net/wireless/marvell/mwifiex/main.h | 3 ++- drivers/net/wireless/marvell/mwifiex/sta_event.c | 6 +++--- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/net/wireless/marvell/mwifiex/init.c b/drivers/net/wireless/marvell/mwifiex/init.c index 88c72d1827a0..fca3ab948f6c 100644 --- a/drivers/net/wireless/marvell/mwifiex/init.c +++ b/drivers/net/wireless/marvell/mwifiex/init.c @@ -63,9 +63,10 @@ static void wakeup_timer_fn(struct timer_list *t) adapter->if_ops.card_reset(adapter); } -static void fw_dump_timer_fn(struct timer_list *t) +static void fw_dump_work(struct work_struct *work) { - struct mwifiex_adapter *adapter = from_timer(adapter, t, devdump_timer); + struct mwifiex_adapter *adapter = + container_of(work, struct mwifiex_adapter, devdump_work.work); mwifiex_upload_device_dump(adapter); } @@ -321,7 +322,7 @@ static void mwifiex_init_adapter(struct mwifiex_adapter *adapter) adapter->active_scan_triggered = false; timer_setup(&adapter->wakeup_timer, wakeup_timer_fn, 0); adapter->devdump_len = 0; - timer_setup(&adapter->devdump_timer, fw_dump_timer_fn, 0); + INIT_DELAYED_WORK(&adapter->devdump_work, fw_dump_work); } /* @@ -400,7 +401,7 @@ static void mwifiex_adapter_cleanup(struct mwifiex_adapter *adapter) { del_timer(&adapter->wakeup_timer); - del_timer_sync(&adapter->devdump_timer); + cancel_delayed_work_sync(&adapter->devdump_work); mwifiex_cancel_all_pending_cmd(adapter); wake_up_interruptible(&adapter->cmd_wait_q.wait); wake_up_interruptible(&adapter->hs_activate_wait_q); diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index 332dd1c8db35..5d8646f16162 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -49,6 +49,7 @@ #include #include #include +#include #include "decl.h" #include "ioctl.h" @@ -1055,7 +1056,7 @@ struct mwifiex_adapter { /* Device dump data/length */ void *devdump_data; int devdump_len; - struct timer_list devdump_timer; + struct delayed_work devdump_work; bool ignore_btcoex_events; }; diff --git a/drivers/net/wireless/marvell/mwifiex/sta_event.c b/drivers/net/wireless/marvell/mwifiex/sta_event.c index 7d42c5d2dbf6..4d93386494c5 100644 --- a/drivers/net/wireless/marvell/mwifiex/sta_event.c +++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c @@ -623,8 +623,8 @@ mwifiex_fw_dump_info_event(struct mwifiex_private *priv, * transmission event get lost, in this cornel case, * user would still get partial of the dump. */ - mod_timer(&adapter->devdump_timer, - jiffies + msecs_to_jiffies(MWIFIEX_TIMER_10S)); + schedule_delayed_work(&adapter->devdump_work, + msecs_to_jiffies(MWIFIEX_TIMER_10S)); } /* Overflow check */ @@ -643,7 +643,7 @@ mwifiex_fw_dump_info_event(struct mwifiex_private *priv, return; upload_dump: - del_timer_sync(&adapter->devdump_timer); + cancel_delayed_work_sync(&adapter->devdump_work); mwifiex_upload_device_dump(adapter); } From 5f8954e099b8ae96e7de1bb95950e00c85bedd40 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 27 Jun 2022 16:35:59 +0200 Subject: [PATCH 14/63] Revert "mwifiex: fix sleep in atomic context bugs caused by dev_coredumpv" This reverts commit a52ed4866d2b90dd5e4ae9dabd453f3ed8fa3cbc as it causes build problems in linux-next. It needs to be reintroduced in a way that can allow the api to evolve and not require a "flag day" to catch all users. Link: https://lore.kernel.org/r/20220623160723.7a44b573@canb.auug.org.au Cc: Duoming Zhou Cc: Brian Norris Cc: Johannes Berg Reported-by: Stephen Rothwell Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/marvell/mwifiex/init.c | 9 ++++----- drivers/net/wireless/marvell/mwifiex/main.h | 3 +-- drivers/net/wireless/marvell/mwifiex/sta_event.c | 6 +++--- 3 files changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/net/wireless/marvell/mwifiex/init.c b/drivers/net/wireless/marvell/mwifiex/init.c index fca3ab948f6c..88c72d1827a0 100644 --- a/drivers/net/wireless/marvell/mwifiex/init.c +++ b/drivers/net/wireless/marvell/mwifiex/init.c @@ -63,10 +63,9 @@ static void wakeup_timer_fn(struct timer_list *t) adapter->if_ops.card_reset(adapter); } -static void fw_dump_work(struct work_struct *work) +static void fw_dump_timer_fn(struct timer_list *t) { - struct mwifiex_adapter *adapter = - container_of(work, struct mwifiex_adapter, devdump_work.work); + struct mwifiex_adapter *adapter = from_timer(adapter, t, devdump_timer); mwifiex_upload_device_dump(adapter); } @@ -322,7 +321,7 @@ static void mwifiex_init_adapter(struct mwifiex_adapter *adapter) adapter->active_scan_triggered = false; timer_setup(&adapter->wakeup_timer, wakeup_timer_fn, 0); adapter->devdump_len = 0; - INIT_DELAYED_WORK(&adapter->devdump_work, fw_dump_work); + timer_setup(&adapter->devdump_timer, fw_dump_timer_fn, 0); } /* @@ -401,7 +400,7 @@ static void mwifiex_adapter_cleanup(struct mwifiex_adapter *adapter) { del_timer(&adapter->wakeup_timer); - cancel_delayed_work_sync(&adapter->devdump_work); + del_timer_sync(&adapter->devdump_timer); mwifiex_cancel_all_pending_cmd(adapter); wake_up_interruptible(&adapter->cmd_wait_q.wait); wake_up_interruptible(&adapter->hs_activate_wait_q); diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index 5d8646f16162..332dd1c8db35 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -49,7 +49,6 @@ #include #include #include -#include #include "decl.h" #include "ioctl.h" @@ -1056,7 +1055,7 @@ struct mwifiex_adapter { /* Device dump data/length */ void *devdump_data; int devdump_len; - struct delayed_work devdump_work; + struct timer_list devdump_timer; bool ignore_btcoex_events; }; diff --git a/drivers/net/wireless/marvell/mwifiex/sta_event.c b/drivers/net/wireless/marvell/mwifiex/sta_event.c index 4d93386494c5..7d42c5d2dbf6 100644 --- a/drivers/net/wireless/marvell/mwifiex/sta_event.c +++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c @@ -623,8 +623,8 @@ mwifiex_fw_dump_info_event(struct mwifiex_private *priv, * transmission event get lost, in this cornel case, * user would still get partial of the dump. */ - schedule_delayed_work(&adapter->devdump_work, - msecs_to_jiffies(MWIFIEX_TIMER_10S)); + mod_timer(&adapter->devdump_timer, + jiffies + msecs_to_jiffies(MWIFIEX_TIMER_10S)); } /* Overflow check */ @@ -643,7 +643,7 @@ mwifiex_fw_dump_info_event(struct mwifiex_private *priv, return; upload_dump: - cancel_delayed_work_sync(&adapter->devdump_work); + del_timer_sync(&adapter->devdump_timer); mwifiex_upload_device_dump(adapter); } From 38a523a2946d3a0961d141d477a1ee2b1f3bdbb1 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 27 Jun 2022 16:36:57 +0200 Subject: [PATCH 15/63] Revert "devcoredump: remove the useless gfp_t parameter in dev_coredumpv and dev_coredumpm" This reverts commit 77515ebaf01920e2db49e04672ef669a7c2907f2 as it causes build problems in linux-next. It needs to be reintroduced in a way that can allow the api to evolve and not require a "flag day" to catch all users. Link: https://lore.kernel.org/r/20220623160723.7a44b573@canb.auug.org.au Cc: Duoming Zhou Cc: Brian Norris Cc: Johannes Berg Reported-by: Stephen Rothwell Signed-off-by: Greg Kroah-Hartman --- drivers/base/devcoredump.c | 16 ++++++++++------ drivers/bluetooth/btmrvl_sdio.c | 2 +- drivers/bluetooth/hci_qca.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_dump.c | 2 +- drivers/gpu/drm/msm/disp/msm_disp_snapshot.c | 4 ++-- drivers/gpu/drm/msm/msm_gpu.c | 4 ++-- drivers/media/platform/qcom/venus/core.c | 2 +- drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c | 2 +- drivers/net/wireless/ath/ath10k/coredump.c | 2 +- .../net/wireless/ath/wil6210/wil_crash_dump.c | 2 +- .../wireless/broadcom/brcm80211/brcmfmac/debug.c | 2 +- drivers/net/wireless/intel/iwlwifi/fw/dbg.c | 6 ++++-- drivers/net/wireless/marvell/mwifiex/main.c | 3 ++- drivers/net/wireless/mediatek/mt76/mt7615/mac.c | 3 ++- drivers/net/wireless/mediatek/mt76/mt7921/mac.c | 3 ++- drivers/net/wireless/realtek/rtw88/main.c | 2 +- drivers/net/wireless/realtek/rtw89/ser.c | 2 +- drivers/remoteproc/qcom_q6v5_mss.c | 2 +- drivers/remoteproc/remoteproc_coredump.c | 8 ++++---- include/drm/drm_print.h | 2 +- include/linux/devcoredump.h | 13 +++++++------ sound/soc/intel/avs/apl.c | 2 +- sound/soc/intel/avs/skl.c | 2 +- sound/soc/intel/catpt/dsp.c | 2 +- 24 files changed, 50 insertions(+), 40 deletions(-) diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c index 8535f0bd5dfb..f4d794d6bb85 100644 --- a/drivers/base/devcoredump.c +++ b/drivers/base/devcoredump.c @@ -173,13 +173,15 @@ static void devcd_freev(void *data) * @dev: the struct device for the crashed device * @data: vmalloc data containing the device coredump * @datalen: length of the data + * @gfp: allocation flags * * This function takes ownership of the vmalloc'ed data and will free * it when it is no longer used. See dev_coredumpm() for more information. */ -void dev_coredumpv(struct device *dev, void *data, size_t datalen) +void dev_coredumpv(struct device *dev, void *data, size_t datalen, + gfp_t gfp) { - dev_coredumpm(dev, NULL, data, datalen, devcd_readv, devcd_freev); + dev_coredumpm(dev, NULL, data, datalen, gfp, devcd_readv, devcd_freev); } EXPORT_SYMBOL_GPL(dev_coredumpv); @@ -234,6 +236,7 @@ static ssize_t devcd_read_from_sgtable(char *buffer, loff_t offset, * @owner: the module that contains the read/free functions, use %THIS_MODULE * @data: data cookie for the @read/@free functions * @datalen: length of the data + * @gfp: allocation flags * @read: function to read from the given buffer * @free: function to free the given buffer * @@ -243,7 +246,7 @@ static ssize_t devcd_read_from_sgtable(char *buffer, loff_t offset, * function will be called to free the data. */ void dev_coredumpm(struct device *dev, struct module *owner, - void *data, size_t datalen, + void *data, size_t datalen, gfp_t gfp, ssize_t (*read)(char *buffer, loff_t offset, size_t count, void *data, size_t datalen), void (*free)(void *data)) @@ -265,7 +268,7 @@ void dev_coredumpm(struct device *dev, struct module *owner, if (!try_module_get(owner)) goto free; - devcd = kzalloc(sizeof(*devcd), GFP_KERNEL); + devcd = kzalloc(sizeof(*devcd), gfp); if (!devcd) goto put_module; @@ -315,6 +318,7 @@ EXPORT_SYMBOL_GPL(dev_coredumpm); * @dev: the struct device for the crashed device * @table: the dump data * @datalen: length of the data + * @gfp: allocation flags * * Creates a new device coredump for the given device. If a previous one hasn't * been read yet, the new coredump is discarded. The data lifetime is determined @@ -322,9 +326,9 @@ EXPORT_SYMBOL_GPL(dev_coredumpm); * it will free the data. */ void dev_coredumpsg(struct device *dev, struct scatterlist *table, - size_t datalen) + size_t datalen, gfp_t gfp) { - dev_coredumpm(dev, NULL, table, datalen, devcd_read_from_sgtable, + dev_coredumpm(dev, NULL, table, datalen, gfp, devcd_read_from_sgtable, devcd_free_sgtable); } EXPORT_SYMBOL_GPL(dev_coredumpsg); diff --git a/drivers/bluetooth/btmrvl_sdio.c b/drivers/bluetooth/btmrvl_sdio.c index 9b9728719db2..b8ef66f89fc1 100644 --- a/drivers/bluetooth/btmrvl_sdio.c +++ b/drivers/bluetooth/btmrvl_sdio.c @@ -1515,7 +1515,7 @@ done: /* fw_dump_data will be free in device coredump release function * after 5 min */ - dev_coredumpv(&card->func->dev, fw_dump_data, fw_dump_len); + dev_coredumpv(&card->func->dev, fw_dump_data, fw_dump_len, GFP_KERNEL); BT_INFO("== btmrvl firmware dump to /sys/class/devcoredump end"); } diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index 2e4074211ae9..eab34e24d944 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -1120,7 +1120,7 @@ static void qca_controller_memdump(struct work_struct *work) qca_memdump->ram_dump_size); memdump_buf = qca_memdump->memdump_buf_head; dev_coredumpv(&hu->serdev->dev, memdump_buf, - qca_memdump->received_dump); + qca_memdump->received_dump, GFP_KERNEL); cancel_delayed_work(&qca->ctrl_memdump_timeout); kfree(qca->qca_memdump); qca->qca_memdump = NULL; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c index 519fcb234b3e..f418e0b75772 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c @@ -225,5 +225,5 @@ void etnaviv_core_dump(struct etnaviv_gem_submit *submit) etnaviv_core_dump_header(&iter, ETDUMP_BUF_END, iter.data); - dev_coredumpv(gpu->dev, iter.start, iter.data - iter.start); + dev_coredumpv(gpu->dev, iter.start, iter.data - iter.start, GFP_KERNEL); } diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c index f057d294c30b..e75b97127c0d 100644 --- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c +++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c @@ -74,8 +74,8 @@ static void _msm_disp_snapshot_work(struct kthread_work *work) * If there is a codedump pending for the device, the dev_coredumpm() * will also free new coredump state. */ - dev_coredumpm(disp_state->dev, THIS_MODULE, disp_state, 0, - disp_devcoredump_read, msm_disp_state_free); + dev_coredumpm(disp_state->dev, THIS_MODULE, disp_state, 0, GFP_KERNEL, + disp_devcoredump_read, msm_disp_state_free); } void msm_disp_snapshot_state(struct drm_device *drm_dev) diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index 30576ced0a0a..eb8a6663f309 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -317,8 +317,8 @@ static void msm_gpu_crashstate_capture(struct msm_gpu *gpu, gpu->crashstate = state; /* FIXME: Release the crashstate if this errors out? */ - dev_coredumpm(gpu->dev->dev, THIS_MODULE, gpu, 0, - msm_gpu_devcoredump_read, msm_gpu_devcoredump_free); + dev_coredumpm(gpu->dev->dev, THIS_MODULE, gpu, 0, GFP_KERNEL, + msm_gpu_devcoredump_read, msm_gpu_devcoredump_free); } #else static void msm_gpu_crashstate_capture(struct msm_gpu *gpu, diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c index db84dfb3fb11..877eca125803 100644 --- a/drivers/media/platform/qcom/venus/core.c +++ b/drivers/media/platform/qcom/venus/core.c @@ -49,7 +49,7 @@ static void venus_coredump(struct venus_core *core) memcpy(data, mem_va, mem_size); memunmap(mem_va); - dev_coredumpv(dev, data, mem_size); + dev_coredumpv(dev, data, mem_size, GFP_KERNEL); } static void venus_event_notify(struct venus_core *core, u32 event) diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c index fa520ab7c960..c991b30bc9f0 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c @@ -281,5 +281,5 @@ void mcp251xfd_dump(const struct mcp251xfd_priv *priv) mcp251xfd_dump_end(priv, &iter); dev_coredumpv(&priv->spi->dev, iter.start, - iter.data - iter.start); + iter.data - iter.start, GFP_KERNEL); } diff --git a/drivers/net/wireless/ath/ath10k/coredump.c b/drivers/net/wireless/ath/ath10k/coredump.c index dc9237069921..fe6b6f97a916 100644 --- a/drivers/net/wireless/ath/ath10k/coredump.c +++ b/drivers/net/wireless/ath/ath10k/coredump.c @@ -1607,7 +1607,7 @@ int ath10k_coredump_submit(struct ath10k *ar) return -ENODATA; } - dev_coredumpv(ar->dev, dump, le32_to_cpu(dump->len)); + dev_coredumpv(ar->dev, dump, le32_to_cpu(dump->len), GFP_KERNEL); return 0; } diff --git a/drivers/net/wireless/ath/wil6210/wil_crash_dump.c b/drivers/net/wireless/ath/wil6210/wil_crash_dump.c index 79299609dd62..89c12cb2aaab 100644 --- a/drivers/net/wireless/ath/wil6210/wil_crash_dump.c +++ b/drivers/net/wireless/ath/wil6210/wil_crash_dump.c @@ -117,6 +117,6 @@ void wil_fw_core_dump(struct wil6210_priv *wil) /* fw_dump_data will be free in device coredump release function * after 5 min */ - dev_coredumpv(wil_to_dev(wil), fw_dump_data, fw_dump_size); + dev_coredumpv(wil_to_dev(wil), fw_dump_data, fw_dump_size, GFP_KERNEL); wil_info(wil, "fw core dumped, size %d bytes\n", fw_dump_size); } diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c index 87f3652ef3bd..eecf8a38d94a 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/debug.c @@ -37,7 +37,7 @@ int brcmf_debug_create_memdump(struct brcmf_bus *bus, const void *data, return err; } - dev_coredumpv(bus->dev, dump, len + ramsize); + dev_coredumpv(bus->dev, dump, len + ramsize, GFP_KERNEL); return 0; } diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c index f2f7cf494a8c..abf49022edbe 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c +++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c @@ -2601,7 +2601,8 @@ static void iwl_fw_error_dump(struct iwl_fw_runtime *fwrt, fw_error_dump.trans_ptr->data, fw_error_dump.trans_ptr->len, fw_error_dump.fwrt_len); - dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len); + dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len, + GFP_KERNEL); } vfree(fw_error_dump.fwrt_ptr); vfree(fw_error_dump.trans_ptr); @@ -2646,7 +2647,8 @@ static void iwl_fw_error_ini_dump(struct iwl_fw_runtime *fwrt, entry->data, entry->size, offs); offs += entry->size; } - dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len); + dev_coredumpsg(fwrt->trans->dev, sg_dump_data, file_len, + GFP_KERNEL); } iwl_dump_ini_list_free(&dump_list); } diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index 26fef0ab1b0d..ace7371c4773 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -1115,7 +1115,8 @@ void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter) */ mwifiex_dbg(adapter, MSG, "== mwifiex dump information to /sys/class/devcoredump start\n"); - dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len); + dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len, + GFP_KERNEL); mwifiex_dbg(adapter, MSG, "== mwifiex dump information to /sys/class/devcoredump end\n"); diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c index 5336fe8c668d..bd687f7de628 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c @@ -2421,5 +2421,6 @@ void mt7615_coredump_work(struct work_struct *work) dev_kfree_skb(skb); } - dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ); + dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ, + GFP_KERNEL); } diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c index cac284f95ce0..a630ddbf19e5 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7921/mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt7921/mac.c @@ -1630,7 +1630,8 @@ void mt7921_coredump_work(struct work_struct *work) } if (dump) - dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ); + dev_coredumpv(dev->mt76.dev, dump, MT76_CONNAC_COREDUMP_SZ, + GFP_KERNEL); mt7921_reset(&dev->mt76); } diff --git a/drivers/net/wireless/realtek/rtw88/main.c b/drivers/net/wireless/realtek/rtw88/main.c index a276544cecdd..efabd5b1bf5b 100644 --- a/drivers/net/wireless/realtek/rtw88/main.c +++ b/drivers/net/wireless/realtek/rtw88/main.c @@ -414,7 +414,7 @@ static void rtw_fwcd_dump(struct rtw_dev *rtwdev) * framework. Note that a new dump will be discarded if a previous one * hasn't been released yet. */ - dev_coredumpv(rtwdev->dev, desc->data, desc->size); + dev_coredumpv(rtwdev->dev, desc->data, desc->size, GFP_KERNEL); } static void rtw_fwcd_free(struct rtw_dev *rtwdev, bool free_self) diff --git a/drivers/net/wireless/realtek/rtw89/ser.c b/drivers/net/wireless/realtek/rtw89/ser.c index d28fe01ad729..9e95ed972710 100644 --- a/drivers/net/wireless/realtek/rtw89/ser.c +++ b/drivers/net/wireless/realtek/rtw89/ser.c @@ -127,7 +127,7 @@ static void rtw89_ser_cd_send(struct rtw89_dev *rtwdev, * will be discarded if a previous one hasn't been released by * framework yet. */ - dev_coredumpv(rtwdev->dev, buf, sizeof(*buf)); + dev_coredumpv(rtwdev->dev, buf, sizeof(*buf), GFP_KERNEL); } static void rtw89_ser_cd_free(struct rtw89_dev *rtwdev, diff --git a/drivers/remoteproc/qcom_q6v5_mss.c b/drivers/remoteproc/qcom_q6v5_mss.c index 813d87faef6c..af217de75e4d 100644 --- a/drivers/remoteproc/qcom_q6v5_mss.c +++ b/drivers/remoteproc/qcom_q6v5_mss.c @@ -597,7 +597,7 @@ static void q6v5_dump_mba_logs(struct q6v5 *qproc) data = vmalloc(MBA_LOG_SIZE); if (data) { memcpy(data, mba_region, MBA_LOG_SIZE); - dev_coredumpv(&rproc->dev, data, MBA_LOG_SIZE); + dev_coredumpv(&rproc->dev, data, MBA_LOG_SIZE, GFP_KERNEL); } memunmap(mba_region); } diff --git a/drivers/remoteproc/remoteproc_coredump.c b/drivers/remoteproc/remoteproc_coredump.c index cd55c2abd227..4b093420d98a 100644 --- a/drivers/remoteproc/remoteproc_coredump.c +++ b/drivers/remoteproc/remoteproc_coredump.c @@ -309,7 +309,7 @@ void rproc_coredump(struct rproc *rproc) phdr += elf_size_of_phdr(class); } if (dump_conf == RPROC_COREDUMP_ENABLED) { - dev_coredumpv(&rproc->dev, data, data_size); + dev_coredumpv(&rproc->dev, data, data_size, GFP_KERNEL); return; } @@ -318,7 +318,7 @@ void rproc_coredump(struct rproc *rproc) dump_state.header = data; init_completion(&dump_state.dump_done); - dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size, + dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size, GFP_KERNEL, rproc_coredump_read, rproc_coredump_free); /* @@ -449,7 +449,7 @@ void rproc_coredump_using_sections(struct rproc *rproc) } if (dump_conf == RPROC_COREDUMP_ENABLED) { - dev_coredumpv(&rproc->dev, data, data_size); + dev_coredumpv(&rproc->dev, data, data_size, GFP_KERNEL); return; } @@ -458,7 +458,7 @@ void rproc_coredump_using_sections(struct rproc *rproc) dump_state.header = data; init_completion(&dump_state.dump_done); - dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size, + dev_coredumpm(&rproc->dev, NULL, &dump_state, data_size, GFP_KERNEL, rproc_coredump_read, rproc_coredump_free); /* Wait until the dump is read and free is called. Data is freed diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h index b41850366bcc..22fabdeed297 100644 --- a/include/drm/drm_print.h +++ b/include/drm/drm_print.h @@ -162,7 +162,7 @@ struct drm_print_iterator { * void makecoredump(...) * { * ... - * dev_coredumpm(dev, THIS_MODULE, data, 0, + * dev_coredumpm(dev, THIS_MODULE, data, 0, GFP_KERNEL, * coredump_read, ...) * } * diff --git a/include/linux/devcoredump.h b/include/linux/devcoredump.h index c7d840d824c3..c008169ed2c6 100644 --- a/include/linux/devcoredump.h +++ b/include/linux/devcoredump.h @@ -52,26 +52,27 @@ static inline void _devcd_free_sgtable(struct scatterlist *table) #ifdef CONFIG_DEV_COREDUMP -void dev_coredumpv(struct device *dev, void *data, size_t datalen); +void dev_coredumpv(struct device *dev, void *data, size_t datalen, + gfp_t gfp); void dev_coredumpm(struct device *dev, struct module *owner, - void *data, size_t datalen, + void *data, size_t datalen, gfp_t gfp, ssize_t (*read)(char *buffer, loff_t offset, size_t count, void *data, size_t datalen), void (*free)(void *data)); void dev_coredumpsg(struct device *dev, struct scatterlist *table, - size_t datalen); + size_t datalen, gfp_t gfp); #else static inline void dev_coredumpv(struct device *dev, void *data, - size_t datalen) + size_t datalen, gfp_t gfp) { vfree(data); } static inline void dev_coredumpm(struct device *dev, struct module *owner, - void *data, size_t datalen, + void *data, size_t datalen, gfp_t gfp, ssize_t (*read)(char *buffer, loff_t offset, size_t count, void *data, size_t datalen), void (*free)(void *data)) @@ -80,7 +81,7 @@ dev_coredumpm(struct device *dev, struct module *owner, } static inline void dev_coredumpsg(struct device *dev, struct scatterlist *table, - size_t datalen) + size_t datalen, gfp_t gfp) { _devcd_free_sgtable(table); } diff --git a/sound/soc/intel/avs/apl.c b/sound/soc/intel/avs/apl.c index 1ff57f1a483d..b8e2b23c9f64 100644 --- a/sound/soc/intel/avs/apl.c +++ b/sound/soc/intel/avs/apl.c @@ -164,7 +164,7 @@ static int apl_coredump(struct avs_dev *adev, union avs_notify_msg *msg) } while (offset < msg->ext.coredump.stack_dump_size); exit: - dev_coredumpv(adev->dev, dump, dump_size); + dev_coredumpv(adev->dev, dump, dump_size, GFP_KERNEL); return 0; } diff --git a/sound/soc/intel/avs/skl.c b/sound/soc/intel/avs/skl.c index 3413162768dc..bda5ec7510fe 100644 --- a/sound/soc/intel/avs/skl.c +++ b/sound/soc/intel/avs/skl.c @@ -88,7 +88,7 @@ static int skl_coredump(struct avs_dev *adev, union avs_notify_msg *msg) return -ENOMEM; memcpy_fromio(dump, avs_sram_addr(adev, AVS_FW_REGS_WINDOW), AVS_FW_REGS_SIZE); - dev_coredumpv(adev->dev, dump, AVS_FW_REGS_SIZE); + dev_coredumpv(adev->dev, dump, AVS_FW_REGS_SIZE, GFP_KERNEL); return 0; } diff --git a/sound/soc/intel/catpt/dsp.c b/sound/soc/intel/catpt/dsp.c index d2afe9ff1e3a..346bec000306 100644 --- a/sound/soc/intel/catpt/dsp.c +++ b/sound/soc/intel/catpt/dsp.c @@ -539,7 +539,7 @@ int catpt_coredump(struct catpt_dev *cdev) pos += CATPT_DMA_REGS_SIZE; } - dev_coredumpv(cdev->dev, dump, dump_size); + dev_coredumpv(cdev->dev, dump, dump_size, GFP_KERNEL); return 0; } From 31c779f293b343577690c01369a5019ca6ec5de9 Mon Sep 17 00:00:00 2001 From: Yangxi Xiang Date: Mon, 27 Jun 2022 20:04:09 +0800 Subject: [PATCH 16/63] devtmpfs: fix the dangling pointer of global devtmpfsd thread When the devtmpfs fails to mount, a dangling pointer still remains in global. Specifically, the err variable is passed by a pointer to the devtmpfsd. When the devtmpfsd exits, it sets the error and completes the setup_done. In this situation, the thread pointer is not set to null. After the devtmpfsd exited, the devtmpfs can wakes up the destroyed devtmpfsd thread by wake_up_process if a device change event comes. Signed-off-by: Yangxi Xiang Link: https://lore.kernel.org/r/20220627120409.11174-1-xyangxi5@gmail.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/devtmpfs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c index 8a3ddbae3b70..e4bffeabf344 100644 --- a/drivers/base/devtmpfs.c +++ b/drivers/base/devtmpfs.c @@ -482,6 +482,7 @@ int __init devtmpfs_init(void) if (err) { printk(KERN_ERR "devtmpfs: unable to create devtmpfs %i\n", err); unregister_filesystem(&dev_fs_type); + thread = NULL; return err; } From 1d248d2302da5b96e06281a52056eeafec0d2e11 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Sun, 26 Jun 2022 10:32:21 +0100 Subject: [PATCH 17/63] ABI: testing/sysfs-devices-system-cpu: remove duplicated core_id This was already defined at stable/sysfs-devices-system-cpu with the same description, as pointed by get_abi.pl: Warning: /sys/devices/system/cpu/cpuX/topology/core_id is defined 2 times: Documentation/ABI/stable/sysfs-devices-system-cpu:38 Documentation/ABI/testing/sysfs-devices-system-cpu:69 Remove the duplicated one. Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/1e92337c1ef74f5eb9e1c1871e20b858b490d269.1656235926.git.mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-devices-system-cpu | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 2ad01cad7f1c..7856f468e464 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -67,8 +67,7 @@ Description: Discover NUMA node a CPU belongs to /sys/devices/system/cpu/cpu42/node2 -> ../../node/node2 -What: /sys/devices/system/cpu/cpuX/topology/core_id - /sys/devices/system/cpu/cpuX/topology/core_siblings +What: /sys/devices/system/cpu/cpuX/topology/core_siblings /sys/devices/system/cpu/cpuX/topology/core_siblings_list /sys/devices/system/cpu/cpuX/topology/physical_package_id /sys/devices/system/cpu/cpuX/topology/thread_siblings @@ -84,10 +83,6 @@ Description: CPU topology files that describe a logical CPU's relationship Briefly, the files above are: - core_id: the CPU core ID of cpuX. Typically it is the - hardware platform's identifier (rather than the kernel's). - The actual value is architecture and platform dependent. - core_siblings: internal kernel map of cpuX's hardware threads within the same physical_package_id. From 70fe758352cafdee72a7b13bf9db065f9613ced8 Mon Sep 17 00:00:00 2001 From: Zhang Wensheng Date: Wed, 22 Jun 2022 15:43:27 +0800 Subject: [PATCH 18/63] driver core: fix potential deadlock in __driver_attach In __driver_attach function, There are also AA deadlock problem, like the commit b232b02bf3c2 ("driver core: fix deadlock in __device_attach"). stack like commit b232b02bf3c2 ("driver core: fix deadlock in __device_attach"). list below: In __driver_attach function, The lock holding logic is as follows: ... __driver_attach if (driver_allows_async_probing(drv)) device_lock(dev) // get lock dev async_schedule_dev(__driver_attach_async_helper, dev); // func async_schedule_node async_schedule_node_domain(func) entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC); /* when fail or work limit, sync to execute func, but __driver_attach_async_helper will get lock dev as will, which will lead to A-A deadlock. */ if (!entry || atomic_read(&entry_count) > MAX_WORK) { func; else queue_work_node(node, system_unbound_wq, &entry->work) device_unlock(dev) As above show, when it is allowed to do async probes, because of out of memory or work limit, async work is not be allowed, to do sync execute instead. it will lead to A-A deadlock because of __driver_attach_async_helper getting lock dev. Reproduce: and it can be reproduce by make the condition (if (!entry || atomic_read(&entry_count) > MAX_WORK)) untenable, like below: [ 370.785650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 370.787154] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00004000 [ 370.788865] Call Trace: [ 370.789374] [ 370.789841] __schedule+0x482/0x1050 [ 370.790613] schedule+0x92/0x1a0 [ 370.791290] schedule_preempt_disabled+0x2c/0x50 [ 370.792256] __mutex_lock.isra.0+0x757/0xec0 [ 370.793158] __mutex_lock_slowpath+0x1f/0x30 [ 370.794079] mutex_lock+0x50/0x60 [ 370.794795] __device_driver_lock+0x2f/0x70 [ 370.795677] ? driver_probe_device+0xd0/0xd0 [ 370.796576] __driver_attach_async_helper+0x1d/0xd0 [ 370.797318] ? driver_probe_device+0xd0/0xd0 [ 370.797957] async_schedule_node_domain+0xa5/0xc0 [ 370.798652] async_schedule_node+0x19/0x30 [ 370.799243] __driver_attach+0x246/0x290 [ 370.799828] ? driver_allows_async_probing+0xa0/0xa0 [ 370.800548] bus_for_each_dev+0x9d/0x130 [ 370.801132] driver_attach+0x22/0x30 [ 370.801666] bus_add_driver+0x290/0x340 [ 370.802246] driver_register+0x88/0x140 [ 370.802817] ? virtio_scsi_init+0x116/0x116 [ 370.803425] scsi_register_driver+0x1a/0x30 [ 370.804057] init_sd+0x184/0x226 [ 370.804533] do_one_initcall+0x71/0x3a0 [ 370.805107] kernel_init_freeable+0x39a/0x43a [ 370.805759] ? rest_init+0x150/0x150 [ 370.806283] kernel_init+0x26/0x230 [ 370.806799] ret_from_fork+0x1f/0x30 To fix the deadlock, move the async_schedule_dev outside device_lock, as we can see, in async_schedule_node_domain, the parameter of queue_work_node is system_unbound_wq, so it can accept concurrent operations. which will also not change the code logic, and will not lead to deadlock. Fixes: ef0ff68351be ("driver core: Probe devices asynchronously instead of the driver") Signed-off-by: Zhang Wensheng Link: https://lore.kernel.org/r/20220622074327.497102-1-zhangwensheng5@huawei.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/dd.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index e600dd2afc35..70f79fc71539 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -1099,6 +1099,7 @@ static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie) static int __driver_attach(struct device *dev, void *data) { struct device_driver *drv = data; + bool async = false; int ret; /* @@ -1137,9 +1138,11 @@ static int __driver_attach(struct device *dev, void *data) if (!dev->driver && !dev->p->async_driver) { get_device(dev); dev->p->async_driver = drv; - async_schedule_dev(__driver_attach_async_helper, dev); + async = true; } device_unlock(dev); + if (async) + async_schedule_dev(__driver_attach_async_helper, dev); return 0; } From dcab8da13ff4886aab26348b925d20dca4f12bac Mon Sep 17 00:00:00 2001 From: Lin Feng Date: Fri, 17 Jun 2022 17:17:46 +0800 Subject: [PATCH 19/63] kernfs/file.c: remove redundant error return counter assignment Since previous 'rc = -EINVAL;', rc value doesn't change, so not necessary to re-assign it again. Signed-off-by: Lin Feng Link: https://lore.kernel.org/r/20220617091746.206515-1-linf@wangsu.com Signed-off-by: Greg Kroah-Hartman --- fs/kernfs/file.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index e3abfa843879..54b2a13ac9a2 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -484,7 +484,6 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma) * It is not possible to successfully wrap close. * So error if someone is trying to use close. */ - rc = -EINVAL; if (vma->vm_ops && vma->vm_ops->close) goto out_put; From 086c00c71fc8d47db6983f419a45f9ee167de03f Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Wed, 15 Jun 2022 12:10:56 +1000 Subject: [PATCH 20/63] kernfs: make ->attr.open RCU protected. After removal of kernfs_open_node->refcnt in the previous patch, kernfs_open_node_lock can be removed as well by making ->attr.open RCU protected. kernfs_put_open_node can delegate freeing to ->attr.open to RCU and other readers of ->attr.open can do so under rcu_read_(un)lock. Suggested by: Al Viro Acked-by: Tejun Heo Signed-off-by: Imran Khan Link: https://lore.kernel.org/r/20220615021059.862643-2-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman --- fs/kernfs/file.c | 147 ++++++++++++++++++++++++++++------------- include/linux/kernfs.h | 2 +- 2 files changed, 102 insertions(+), 47 deletions(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 54b2a13ac9a2..22e7481e7b63 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -23,16 +23,16 @@ * for each kernfs_node with one or more open files. * * kernfs_node->attr.open points to kernfs_open_node. attr.open is - * protected by kernfs_open_node_lock. + * RCU protected. * * filp->private_data points to seq_file whose ->private points to * kernfs_open_file. kernfs_open_files are chained at * kernfs_open_node->files, which is protected by kernfs_open_file_mutex. */ -static DEFINE_SPINLOCK(kernfs_open_node_lock); static DEFINE_MUTEX(kernfs_open_file_mutex); struct kernfs_open_node { + struct rcu_head rcu_head; atomic_t event; wait_queue_head_t poll; struct list_head files; /* goes through kernfs_open_file.list */ @@ -51,6 +51,52 @@ struct kernfs_open_node { static DEFINE_SPINLOCK(kernfs_notify_lock); static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL; +/** + * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn. + * + * @of: associated kernfs_open_file instance. + * @kn: target kernfs_node. + * + * Fetch and return ->attr.open of @kn if @of->list is non empty. + * If @of->list is not empty we can safely assume that @of is on + * @kn->attr.open->files list and this guarantees that @kn->attr.open + * will not vanish i.e. dereferencing outside RCU read-side critical + * section is safe here. + * + * The caller needs to make sure that @of->list is not empty. + */ +static struct kernfs_open_node * +kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn) +{ + struct kernfs_open_node *on; + + on = rcu_dereference_check(kn->attr.open, !list_empty(&of->list)); + + return on; +} + +/** + * kernfs_deref_open_node_protected - Get kernfs_open_node corresponding to @kn + * + * @kn: target kernfs_node. + * + * Fetch and return ->attr.open of @kn when caller holds the + * kernfs_open_file_mutex. + * + * Update of ->attr.open happens under kernfs_open_file_mutex. So when + * the caller guarantees that this mutex is being held, other updaters can't + * change ->attr.open and this means that we can safely deref ->attr.open + * outside RCU read-side critical section. + * + * The caller needs to make sure that kernfs_open_file_mutex is held. + */ +static struct kernfs_open_node * +kernfs_deref_open_node_protected(struct kernfs_node *kn) +{ + return rcu_dereference_protected(kn->attr.open, + lockdep_is_held(&kernfs_open_file_mutex)); +} + static struct kernfs_open_file *kernfs_of(struct file *file) { return ((struct seq_file *)file->private_data)->private; @@ -156,8 +202,12 @@ static void kernfs_seq_stop(struct seq_file *sf, void *v) static int kernfs_seq_show(struct seq_file *sf, void *v) { struct kernfs_open_file *of = sf->private; + struct kernfs_open_node *on = kernfs_deref_open_node(of, of->kn); - of->event = atomic_read(&of->kn->attr.open->event); + if (!on) + return -EINVAL; + + of->event = atomic_read(&on->event); return of->kn->attr.ops->seq_show(sf, v); } @@ -180,6 +230,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) struct kernfs_open_file *of = kernfs_of(iocb->ki_filp); ssize_t len = min_t(size_t, iov_iter_count(iter), PAGE_SIZE); const struct kernfs_ops *ops; + struct kernfs_open_node *on; char *buf; buf = of->prealloc_buf; @@ -201,7 +252,15 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) goto out_free; } - of->event = atomic_read(&of->kn->attr.open->event); + on = kernfs_deref_open_node(of, of->kn); + if (!on) { + len = -EINVAL; + mutex_unlock(&of->mutex); + goto out_free; + } + + of->event = atomic_read(&on->event); + ops = kernfs_ops(of->kn); if (ops->read) len = ops->read(of, buf, len, iocb->ki_pos); @@ -518,36 +577,29 @@ static int kernfs_get_open_node(struct kernfs_node *kn, { struct kernfs_open_node *on, *new_on = NULL; - retry: mutex_lock(&kernfs_open_file_mutex); - spin_lock_irq(&kernfs_open_node_lock); - - if (!kn->attr.open && new_on) { - kn->attr.open = new_on; - new_on = NULL; - } - - on = kn->attr.open; - if (on) - list_add_tail(&of->list, &on->files); - - spin_unlock_irq(&kernfs_open_node_lock); - mutex_unlock(&kernfs_open_file_mutex); + on = kernfs_deref_open_node_protected(kn); if (on) { - kfree(new_on); + list_add_tail(&of->list, &on->files); + mutex_unlock(&kernfs_open_file_mutex); return 0; + } else { + /* not there, initialize a new one */ + new_on = kmalloc(sizeof(*new_on), GFP_KERNEL); + if (!new_on) { + mutex_unlock(&kernfs_open_file_mutex); + return -ENOMEM; + } + atomic_set(&new_on->event, 1); + init_waitqueue_head(&new_on->poll); + INIT_LIST_HEAD(&new_on->files); + list_add_tail(&of->list, &new_on->files); + rcu_assign_pointer(kn->attr.open, new_on); } + mutex_unlock(&kernfs_open_file_mutex); - /* not there, initialize a new one and retry */ - new_on = kmalloc(sizeof(*new_on), GFP_KERNEL); - if (!new_on) - return -ENOMEM; - - atomic_set(&new_on->event, 1); - init_waitqueue_head(&new_on->poll); - INIT_LIST_HEAD(&new_on->files); - goto retry; + return 0; } /** @@ -566,24 +618,25 @@ static int kernfs_get_open_node(struct kernfs_node *kn, static void kernfs_unlink_open_file(struct kernfs_node *kn, struct kernfs_open_file *of) { - struct kernfs_open_node *on = kn->attr.open; - unsigned long flags; + struct kernfs_open_node *on; mutex_lock(&kernfs_open_file_mutex); - spin_lock_irqsave(&kernfs_open_node_lock, flags); + + on = kernfs_deref_open_node_protected(kn); + if (!on) { + mutex_unlock(&kernfs_open_file_mutex); + return; + } if (of) list_del(&of->list); - if (list_empty(&on->files)) - kn->attr.open = NULL; - else - on = NULL; + if (list_empty(&on->files)) { + rcu_assign_pointer(kn->attr.open, NULL); + kfree_rcu(on, rcu_head); + } - spin_unlock_irqrestore(&kernfs_open_node_lock, flags); mutex_unlock(&kernfs_open_file_mutex); - - kfree(on); } static int kernfs_fop_open(struct inode *inode, struct file *file) @@ -773,17 +826,16 @@ void kernfs_drain_open_files(struct kernfs_node *kn) * check under kernfs_open_file_mutex will ensure bailing out if * ->attr.open became NULL while waiting for the mutex. */ - if (!kn->attr.open) + if (!rcu_access_pointer(kn->attr.open)) return; mutex_lock(&kernfs_open_file_mutex); - if (!kn->attr.open) { + on = kernfs_deref_open_node_protected(kn); + if (!on) { mutex_unlock(&kernfs_open_file_mutex); return; } - on = kn->attr.open; - list_for_each_entry(of, &on->files, list) { struct inode *inode = file_inode(of->file); @@ -814,7 +866,10 @@ void kernfs_drain_open_files(struct kernfs_node *kn) __poll_t kernfs_generic_poll(struct kernfs_open_file *of, poll_table *wait) { struct kernfs_node *kn = kernfs_dentry_node(of->file->f_path.dentry); - struct kernfs_open_node *on = kn->attr.open; + struct kernfs_open_node *on = kernfs_deref_open_node(of, kn); + + if (!on) + return EPOLLERR; poll_wait(of->file, &on->poll, wait); @@ -921,13 +976,13 @@ void kernfs_notify(struct kernfs_node *kn) return; /* kick poll immediately */ - spin_lock_irqsave(&kernfs_open_node_lock, flags); - on = kn->attr.open; + rcu_read_lock(); + on = rcu_dereference(kn->attr.open); if (on) { atomic_inc(&on->event); wake_up_interruptible(&on->poll); } - spin_unlock_irqrestore(&kernfs_open_node_lock, flags); + rcu_read_unlock(); /* schedule work to kick fsnotify */ spin_lock_irqsave(&kernfs_notify_lock, flags); diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index e2ae15a6225e..13f54f078a52 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -114,7 +114,7 @@ struct kernfs_elem_symlink { struct kernfs_elem_attr { const struct kernfs_ops *ops; - struct kernfs_open_node *open; + struct kernfs_open_node __rcu *open; loff_t size; struct kernfs_node *notify_next; /* for kernfs_notify() */ }; From b8f35fa1188b84035c59d4842826c4e93a1b1c9f Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Wed, 15 Jun 2022 12:10:57 +1000 Subject: [PATCH 21/63] kernfs: Change kernfs_notify_list to llist. At present kernfs_notify_list is implemented as a singly linked list of kernfs_node(s), where last element points to itself and value of ->attr.next tells if node is present on the list or not. Both addition and deletion to list happen under kernfs_notify_lock. Change kernfs_notify_list to llist so that addition to list can heppen locklessly. Suggested by: Al Viro Acked-by: Tejun Heo Signed-off-by: Imran Khan Link: https://lore.kernel.org/r/20220615021059.862643-3-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman --- fs/kernfs/file.c | 47 ++++++++++++++++++------------------------ include/linux/kernfs.h | 2 +- 2 files changed, 21 insertions(+), 28 deletions(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 22e7481e7b63..7d9138d2be3b 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -38,18 +38,16 @@ struct kernfs_open_node { struct list_head files; /* goes through kernfs_open_file.list */ }; -/* - * kernfs_notify() may be called from any context and bounces notifications - * through a work item. To minimize space overhead in kernfs_node, the - * pending queue is implemented as a singly linked list of kernfs_nodes. - * The list is terminated with the self pointer so that whether a - * kernfs_node is on the list or not can be determined by testing the next - * pointer for NULL. +/** + * attribute_to_node - get kernfs_node object corresponding to a kernfs attribute + * @ptr: &struct kernfs_elem_attr + * @type: struct kernfs_node + * @member: name of member (i.e attr) */ -#define KERNFS_NOTIFY_EOL ((void *)&kernfs_notify_list) +#define attribute_to_node(ptr, type, member) \ + container_of(ptr, type, member) -static DEFINE_SPINLOCK(kernfs_notify_lock); -static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL; +static LLIST_HEAD(kernfs_notify_list); /** * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn. @@ -902,18 +900,16 @@ static void kernfs_notify_workfn(struct work_struct *work) struct kernfs_node *kn; struct kernfs_super_info *info; struct kernfs_root *root; + struct llist_node *free; + struct kernfs_elem_attr *attr; repeat: /* pop one off the notify_list */ - spin_lock_irq(&kernfs_notify_lock); - kn = kernfs_notify_list; - if (kn == KERNFS_NOTIFY_EOL) { - spin_unlock_irq(&kernfs_notify_lock); + free = llist_del_first(&kernfs_notify_list); + if (free == NULL) return; - } - kernfs_notify_list = kn->attr.notify_next; - kn->attr.notify_next = NULL; - spin_unlock_irq(&kernfs_notify_lock); + attr = llist_entry(free, struct kernfs_elem_attr, notify_next); + kn = attribute_to_node(attr, struct kernfs_node, attr); root = kernfs_root(kn); /* kick fsnotify */ down_write(&root->kernfs_rwsem); @@ -969,12 +965,14 @@ repeat: void kernfs_notify(struct kernfs_node *kn) { static DECLARE_WORK(kernfs_notify_work, kernfs_notify_workfn); - unsigned long flags; struct kernfs_open_node *on; if (WARN_ON(kernfs_type(kn) != KERNFS_FILE)) return; + /* Because we are using llist for kernfs_notify_list */ + WARN_ON_ONCE(in_nmi()); + /* kick poll immediately */ rcu_read_lock(); on = rcu_dereference(kn->attr.open); @@ -985,14 +983,9 @@ void kernfs_notify(struct kernfs_node *kn) rcu_read_unlock(); /* schedule work to kick fsnotify */ - spin_lock_irqsave(&kernfs_notify_lock, flags); - if (!kn->attr.notify_next) { - kernfs_get(kn); - kn->attr.notify_next = kernfs_notify_list; - kernfs_notify_list = kn; - schedule_work(&kernfs_notify_work); - } - spin_unlock_irqrestore(&kernfs_notify_lock, flags); + kernfs_get(kn); + llist_add(&kn->attr.notify_next, &kernfs_notify_list); + schedule_work(&kernfs_notify_work); } EXPORT_SYMBOL_GPL(kernfs_notify); diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 13f54f078a52..2dd9c8df0f4f 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -116,7 +116,7 @@ struct kernfs_elem_attr { const struct kernfs_ops *ops; struct kernfs_open_node __rcu *open; loff_t size; - struct kernfs_node *notify_next; /* for kernfs_notify() */ + struct llist_node notify_next; /* for kernfs_notify() */ }; /* From 41448c614815965d1cdfa720df34257b84afbb9d Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Wed, 15 Jun 2022 12:10:58 +1000 Subject: [PATCH 22/63] kernfs: Introduce interface to access global kernfs_open_file_mutex. This allows to change underlying mutex locking, without needing to change the users of the lock. For example next patch modifies this interface to use hashed mutexes in place of a single global kernfs_open_file_mutex. Acked-by: Tejun Heo Signed-off-by: Imran Khan Link: https://lore.kernel.org/r/20220615021059.862643-4-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman --- fs/kernfs/file.c | 56 ++++++++++++++++++++++++++++++++---------------- 1 file changed, 38 insertions(+), 18 deletions(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 7d9138d2be3b..3b354caad6b5 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -49,6 +49,22 @@ struct kernfs_open_node { static LLIST_HEAD(kernfs_notify_list); +static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn) +{ + return &kernfs_open_file_mutex; +} + +static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn) +{ + struct mutex *lock; + + lock = kernfs_open_file_mutex_ptr(kn); + + mutex_lock(lock); + + return lock; +} + /** * kernfs_deref_open_node - Get kernfs_open_node corresponding to @kn. * @@ -79,9 +95,9 @@ kernfs_deref_open_node(struct kernfs_open_file *of, struct kernfs_node *kn) * @kn: target kernfs_node. * * Fetch and return ->attr.open of @kn when caller holds the - * kernfs_open_file_mutex. + * kernfs_open_file_mutex_ptr(kn). * - * Update of ->attr.open happens under kernfs_open_file_mutex. So when + * Update of ->attr.open happens under kernfs_open_file_mutex_ptr(kn). So when * the caller guarantees that this mutex is being held, other updaters can't * change ->attr.open and this means that we can safely deref ->attr.open * outside RCU read-side critical section. @@ -92,7 +108,7 @@ static struct kernfs_open_node * kernfs_deref_open_node_protected(struct kernfs_node *kn) { return rcu_dereference_protected(kn->attr.open, - lockdep_is_held(&kernfs_open_file_mutex)); + lockdep_is_held(kernfs_open_file_mutex_ptr(kn))); } static struct kernfs_open_file *kernfs_of(struct file *file) @@ -574,19 +590,20 @@ static int kernfs_get_open_node(struct kernfs_node *kn, struct kernfs_open_file *of) { struct kernfs_open_node *on, *new_on = NULL; + struct mutex *mutex = NULL; - mutex_lock(&kernfs_open_file_mutex); + mutex = kernfs_open_file_mutex_lock(kn); on = kernfs_deref_open_node_protected(kn); if (on) { list_add_tail(&of->list, &on->files); - mutex_unlock(&kernfs_open_file_mutex); + mutex_unlock(mutex); return 0; } else { /* not there, initialize a new one */ new_on = kmalloc(sizeof(*new_on), GFP_KERNEL); if (!new_on) { - mutex_unlock(&kernfs_open_file_mutex); + mutex_unlock(mutex); return -ENOMEM; } atomic_set(&new_on->event, 1); @@ -595,7 +612,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn, list_add_tail(&of->list, &new_on->files); rcu_assign_pointer(kn->attr.open, new_on); } - mutex_unlock(&kernfs_open_file_mutex); + mutex_unlock(mutex); return 0; } @@ -617,12 +634,13 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn, struct kernfs_open_file *of) { struct kernfs_open_node *on; + struct mutex *mutex = NULL; - mutex_lock(&kernfs_open_file_mutex); + mutex = kernfs_open_file_mutex_lock(kn); on = kernfs_deref_open_node_protected(kn); if (!on) { - mutex_unlock(&kernfs_open_file_mutex); + mutex_unlock(mutex); return; } @@ -634,7 +652,7 @@ static void kernfs_unlink_open_file(struct kernfs_node *kn, kfree_rcu(on, rcu_head); } - mutex_unlock(&kernfs_open_file_mutex); + mutex_unlock(mutex); } static int kernfs_fop_open(struct inode *inode, struct file *file) @@ -772,11 +790,11 @@ static void kernfs_release_file(struct kernfs_node *kn, /* * @of is guaranteed to have no other file operations in flight and * we just want to synchronize release and drain paths. - * @kernfs_open_file_mutex is enough. @of->mutex can't be used + * @kernfs_open_file_mutex_ptr(kn) is enough. @of->mutex can't be used * here because drain path may be called from places which can * cause circular dependency. */ - lockdep_assert_held(&kernfs_open_file_mutex); + lockdep_assert_held(kernfs_open_file_mutex_ptr(kn)); if (!of->released) { /* @@ -793,11 +811,12 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp) { struct kernfs_node *kn = inode->i_private; struct kernfs_open_file *of = kernfs_of(filp); + struct mutex *mutex = NULL; if (kn->flags & KERNFS_HAS_RELEASE) { - mutex_lock(&kernfs_open_file_mutex); + mutex = kernfs_open_file_mutex_lock(kn); kernfs_release_file(kn, of); - mutex_unlock(&kernfs_open_file_mutex); + mutex_unlock(mutex); } kernfs_unlink_open_file(kn, of); @@ -812,6 +831,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn) { struct kernfs_open_node *on; struct kernfs_open_file *of; + struct mutex *mutex = NULL; if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE))) return; @@ -821,16 +841,16 @@ void kernfs_drain_open_files(struct kernfs_node *kn) * ->attr.open at this point of time. This check allows early bail out * if ->attr.open is already NULL. kernfs_unlink_open_file makes * ->attr.open NULL only while holding kernfs_open_file_mutex so below - * check under kernfs_open_file_mutex will ensure bailing out if + * check under kernfs_open_file_mutex_ptr(kn) will ensure bailing out if * ->attr.open became NULL while waiting for the mutex. */ if (!rcu_access_pointer(kn->attr.open)) return; - mutex_lock(&kernfs_open_file_mutex); + mutex = kernfs_open_file_mutex_lock(kn); on = kernfs_deref_open_node_protected(kn); if (!on) { - mutex_unlock(&kernfs_open_file_mutex); + mutex_unlock(mutex); return; } @@ -844,7 +864,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn) kernfs_release_file(kn, of); } - mutex_unlock(&kernfs_open_file_mutex); + mutex_unlock(mutex); } /* From 1d25b84e444ad66313c473407979ea9cd33deb3f Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Wed, 15 Jun 2022 12:10:59 +1000 Subject: [PATCH 23/63] kernfs: Replace global kernfs_open_file_mutex with hashed mutexes. In current kernfs design a single mutex, kernfs_open_file_mutex, protects the list of kernfs_open_file instances corresponding to a sysfs attribute. So even if different tasks are opening or closing different sysfs files they can contend on osq_lock of this mutex. The contention is more apparent in large scale systems with few hundred CPUs where most of the CPUs have running tasks that are opening, accessing or closing sysfs files at any point of time. Using hashed mutexes in place of a single global mutex, can significantly reduce contention around global mutex and hence can provide better scalability. Moreover as these hashed mutexes are not part of kernfs_node objects we will not see any singnificant change in memory utilization of kernfs based file systems like sysfs, cgroupfs etc. Modify interface introduced in previous patch to make use of hashed mutexes. Use kernfs_node address as hashing key. Acked-by: Tejun Heo Signed-off-by: Imran Khan Link: https://lore.kernel.org/r/20220615021059.862643-5-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman --- fs/kernfs/file.c | 17 ++--------- fs/kernfs/kernfs-internal.h | 4 +++ fs/kernfs/mount.c | 19 +++++++++++++ include/linux/kernfs.h | 57 +++++++++++++++++++++++++++++++++++++ 4 files changed, 83 insertions(+), 14 deletions(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 3b354caad6b5..bb933221b4ba 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -18,19 +18,6 @@ #include "kernfs-internal.h" -/* - * There's one kernfs_open_file for each open file and one kernfs_open_node - * for each kernfs_node with one or more open files. - * - * kernfs_node->attr.open points to kernfs_open_node. attr.open is - * RCU protected. - * - * filp->private_data points to seq_file whose ->private points to - * kernfs_open_file. kernfs_open_files are chained at - * kernfs_open_node->files, which is protected by kernfs_open_file_mutex. - */ -static DEFINE_MUTEX(kernfs_open_file_mutex); - struct kernfs_open_node { struct rcu_head rcu_head; atomic_t event; @@ -51,7 +38,9 @@ static LLIST_HEAD(kernfs_notify_list); static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn) { - return &kernfs_open_file_mutex; + int idx = hash_ptr(kn, NR_KERNFS_LOCK_BITS); + + return &kernfs_locks->open_file_mutex[idx]; } static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn) diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index eeaa779b929c..3ae214d02d44 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -164,4 +164,8 @@ void kernfs_drain_open_files(struct kernfs_node *kn); */ extern const struct inode_operations kernfs_symlink_iops; +/* + * kernfs locks + */ +extern struct kernfs_global_locks *kernfs_locks; #endif /* __KERNFS_INTERNAL_H */ diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index cfa79715fc1a..d0859f72d2d6 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -20,6 +20,7 @@ #include "kernfs-internal.h" struct kmem_cache *kernfs_node_cache, *kernfs_iattrs_cache; +struct kernfs_global_locks *kernfs_locks; static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry) { @@ -387,6 +388,22 @@ void kernfs_kill_sb(struct super_block *sb) kfree(info); } +static void __init kernfs_mutex_init(void) +{ + int count; + + for (count = 0; count < NR_KERNFS_LOCKS; count++) + mutex_init(&kernfs_locks->open_file_mutex[count]); +} + +static void __init kernfs_lock_init(void) +{ + kernfs_locks = kmalloc(sizeof(struct kernfs_global_locks), GFP_KERNEL); + WARN_ON(!kernfs_locks); + + kernfs_mutex_init(); +} + void __init kernfs_init(void) { kernfs_node_cache = kmem_cache_create("kernfs_node_cache", @@ -397,4 +414,6 @@ void __init kernfs_init(void) kernfs_iattrs_cache = kmem_cache_create("kernfs_iattrs_cache", sizeof(struct kernfs_iattrs), 0, SLAB_PANIC, NULL); + + kernfs_lock_init(); } diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 2dd9c8df0f4f..13e703f615f7 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -18,6 +18,7 @@ #include #include #include +#include struct file; struct dentry; @@ -34,6 +35,62 @@ struct kernfs_fs_context; struct kernfs_open_node; struct kernfs_iattrs; +/* + * NR_KERNFS_LOCK_BITS determines size (NR_KERNFS_LOCKS) of hash + * table of locks. + * Having a small hash table would impact scalability, since + * more and more kernfs_node objects will end up using same lock + * and having a very large hash table would waste memory. + * + * At the moment size of hash table of locks is being set based on + * the number of CPUs as follows: + * + * NR_CPU NR_KERNFS_LOCK_BITS NR_KERNFS_LOCKS + * 1 1 2 + * 2-3 2 4 + * 4-7 4 16 + * 8-15 6 64 + * 16-31 8 256 + * 32 and more 10 1024 + * + * The above relation between NR_CPU and number of locks is based + * on some internal experimentation which involved booting qemu + * with different values of smp, performing some sysfs operations + * on all CPUs and observing how increase in number of locks impacts + * completion time of these sysfs operations on each CPU. + */ +#ifdef CONFIG_SMP +#define NR_KERNFS_LOCK_BITS (2 * (ilog2(NR_CPUS < 32 ? NR_CPUS : 32))) +#else +#define NR_KERNFS_LOCK_BITS 1 +#endif + +#define NR_KERNFS_LOCKS (1 << NR_KERNFS_LOCK_BITS) + +/* + * There's one kernfs_open_file for each open file and one kernfs_open_node + * for each kernfs_node with one or more open files. + * + * filp->private_data points to seq_file whose ->private points to + * kernfs_open_file. + * + * kernfs_open_files are chained at kernfs_open_node->files, which is + * protected by kernfs_global_locks.open_file_mutex[i]. + * + * To reduce possible contention in sysfs access, arising due to single + * locks, use an array of locks (e.g. open_file_mutex) and use kernfs_node + * object address as hash keys to get the index of these locks. + * + * Hashed mutexes are safe to use here because operations using these don't + * rely on global exclusion. + * + * In future we intend to replace other global locks with hashed ones as well. + * kernfs_global_locks acts as a holder for all such hash tables. + */ +struct kernfs_global_locks { + struct mutex open_file_mutex[NR_KERNFS_LOCKS]; +}; + enum kernfs_node_type { KERNFS_DIR = 0x0001, KERNFS_FILE = 0x0002, From 8f486cab263ca1b761c1aab9b36975afd355b799 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Thu, 23 Jun 2022 01:03:42 -0700 Subject: [PATCH 24/63] driver core: fw_devlink: Allow firmware to mark devices as best effort When firmware sets the FWNODE_FLAG_BEST_EFFORT flag for a fwnode, fw_devlink will do a best effort ordering for that device where it'll only enforce the probe/suspend/resume ordering of that device with suppliers that have drivers. The driver of that device can then decide if it wants to defer probe or probe without the suppliers. This will be useful for avoid probe delays of the console device that were caused by commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default"). Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default") Reported-by: Sascha Hauer Reported-by: Peng Fan Tested-by: Peng Fan Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220623080344.783549-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/core.c | 3 ++- include/linux/fwnode.h | 4 ++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 839f64485a55..ccdd5b4295de 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -968,7 +968,8 @@ static void device_links_missing_supplier(struct device *dev) static bool dev_is_best_effort(struct device *dev) { - return fw_devlink_best_effort && dev->can_match; + return (fw_devlink_best_effort && dev->can_match) || + (dev->fwnode && (dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT)); } /** diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h index 9a81c4410b9f..89b9bdfca925 100644 --- a/include/linux/fwnode.h +++ b/include/linux/fwnode.h @@ -27,11 +27,15 @@ struct device; * driver needs its child devices to be bound with * their respective drivers as soon as they are * added. + * BEST_EFFORT: The fwnode/device needs to probe early and might be missing some + * suppliers. Only enforce ordering with suppliers that have + * drivers. */ #define FWNODE_FLAG_LINKS_ADDED BIT(0) #define FWNODE_FLAG_NOT_DEVICE BIT(1) #define FWNODE_FLAG_INITIALIZED BIT(2) #define FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD BIT(3) +#define FWNODE_FLAG_BEST_EFFORT BIT(4) struct fwnode_handle { struct fwnode_handle *secondary; From a244ec3640e0dfe90f31750033433cb2c99446e8 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Thu, 23 Jun 2022 01:03:43 -0700 Subject: [PATCH 25/63] of: base: Avoid console probe delay when fw_devlink.strict=1 Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default") enabled iommus and dmas dependency enforcement by default. On some systems, this caused the console device's probe to get delayed until the deferred_probe_timeout expires. We need consoles to work as soon as possible, so mark the console device node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay the probe of the console device for suppliers without drivers. The driver can then make the decision on where it can probe without those suppliers or defer its probe. Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default") Reported-by: Sascha Hauer Reported-by: Peng Fan Tested-by: Peng Fan Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220623080344.783549-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- drivers/of/base.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/of/base.c b/drivers/of/base.c index d4f98c8469ed..a19cd0c73644 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align)) of_property_read_string(of_aliases, "stdout", &name); if (name) of_stdout = of_find_node_opts_by_path(name, &of_stdout_options); + if (of_stdout) + of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT; } if (!of_aliases) From c882716b6d411495f221bdd73e9137357c16a3ea Mon Sep 17 00:00:00 2001 From: Liang He Date: Tue, 28 Jun 2022 10:16:40 +0800 Subject: [PATCH 26/63] firmware: Hold a reference for of_find_compatible_node() In of_register_trusted_foundations(), we need to hold the reference returned by of_find_compatible_node() and then use it to call of_node_put() for refcount balance. Signed-off-by: Liang He Link: https://lore.kernel.org/r/20220628021640.4015-1-windhl@126.com Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/trusted_foundations.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/include/linux/firmware/trusted_foundations.h b/include/linux/firmware/trusted_foundations.h index be5984bda592..931b6c5c72df 100644 --- a/include/linux/firmware/trusted_foundations.h +++ b/include/linux/firmware/trusted_foundations.h @@ -71,12 +71,16 @@ static inline void register_trusted_foundations( static inline void of_register_trusted_foundations(void) { + struct device_node *np = of_find_compatible_node(NULL, NULL, "tlm,trusted-foundations"); + + if (!np) + return; + of_node_put(np); /* * If we find the target should enable TF but does not support it, * fail as the system won't be able to do much anyway */ - if (of_find_compatible_node(NULL, NULL, "tlm,trusted-foundations")) - register_trusted_foundations(NULL); + register_trusted_foundations(NULL); } static inline bool trusted_foundations_registered(void) From 72b5d5aef246a0387cefa23121dd90901c7a691a Mon Sep 17 00:00:00 2001 From: Yushan Zhou Date: Thu, 30 Jun 2022 16:25:12 +0800 Subject: [PATCH 27/63] kernfs: fix potential NULL dereference in __kernfs_remove When lockdep is enabled, lockdep_assert_held_write would cause potential NULL pointer dereference. Fix the following smatch warnings: fs/kernfs/dir.c:1353 __kernfs_remove() warn: variable dereferenced before check 'kn' (see line 1346) Fixes: 393c3714081a ("kernfs: switch global kernfs_rwsem lock to per-fs lock") Signed-off-by: Yushan Zhou Link: https://lore.kernel.org/r/20220630082512.3482581-1-zys.zljxml@gmail.com Signed-off-by: Greg Kroah-Hartman --- fs/kernfs/dir.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 6eca72cfa1f2..1cc88ba6de90 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1343,14 +1343,17 @@ static void __kernfs_remove(struct kernfs_node *kn) { struct kernfs_node *pos; + /* Short-circuit if non-root @kn has already finished removal. */ + if (!kn) + return; + lockdep_assert_held_write(&kernfs_root(kn)->kernfs_rwsem); /* - * Short-circuit if non-root @kn has already finished removal. * This is for kernfs_remove_self() which plays with active ref * after removal. */ - if (!kn || (kn->parent && RB_EMPTY_NODE(&kn->rb))) + if (kn->parent && RB_EMPTY_NODE(&kn->rb)) return; pr_debug("kernfs %s: removing\n", kn->name); From 0d4c331af4d169de26186170010c7b7acd49f266 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:45 +0100 Subject: [PATCH 28/63] ACPI: PPTT: Use table offset as fw_token instead of virtual address There is need to use the cache sharing information quite early during the boot before the secondary cores are up and running. The permanent memory map for all the ACPI tables(via acpi_permanent_mmap) is turned on in acpi_early_init() which is quite late for the above requirement. As a result there is possibility that the ACPI PPTT gets mapped to different virtual addresses. In such scenarios, using virtual address as fw_token before the acpi_permanent_mmap is enabled results in different fw_token for the same cache entity and hence wrong cache sharing information will be deduced based on the same. Instead of using virtual address, just use the table offset as the unique firmware token for the caches. The same offset is used as ACPI identifiers if the firmware has not set a valid one for other entries in the ACPI PPTT. Link: https://lore.kernel.org/r/20220704101605.1318280-2-sudeep.holla@arm.com Cc: linux-acpi@vger.kernel.org Tested-by: Ionela Voinescu Tested-by: Conor Dooley Acked-by: Rafael J. Wysocki Signed-off-by: Sudeep Holla --- drivers/acpi/pptt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c index 701f61c01359..763f021d45e6 100644 --- a/drivers/acpi/pptt.c +++ b/drivers/acpi/pptt.c @@ -437,7 +437,8 @@ static void cache_setup_acpi_cpu(struct acpi_table_header *table, pr_debug("found = %p %p\n", found_cache, cpu_node); if (found_cache) update_cache_properties(this_leaf, found_cache, - cpu_node, table->revision); + ACPI_TO_POINTER(ACPI_PTR_DIFF(cpu_node, table)), + table->revision); index++; } From d4ec840baecbed280c7305f9103a10641d4d3799 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:46 +0100 Subject: [PATCH 29/63] cacheinfo: Use of_cpu_device_node_get instead cpu_dev->of_node The of_cpu_device_node_get takes care of fetching the CPU'd device node either from cached cpu_dev->of_node if cpu_dev is initialised or uses of_get_cpu_node to parse and fetch node if cpu_dev isn't available yet. Just use of_cpu_device_node_get instead of getting the cpu device first and then using cpu_dev->of_node for two reasons: 1. There is no other use of cpu_dev and can be simplified 2. It enabled the use detect_cache_attributes and hence cache_setup_of_node much earlier before the CPUs are registered as devices. Link: https://lore.kernel.org/r/20220704101605.1318280-3-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- drivers/base/cacheinfo.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index dad296229161..b0bde272e2ae 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -14,7 +14,7 @@ #include #include #include -#include +#include #include #include #include @@ -157,7 +157,6 @@ static int cache_setup_of_node(unsigned int cpu) { struct device_node *np; struct cacheinfo *this_leaf; - struct device *cpu_dev = get_cpu_device(cpu); struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); unsigned int index = 0; @@ -166,11 +165,7 @@ static int cache_setup_of_node(unsigned int cpu) return 0; } - if (!cpu_dev) { - pr_err("No cpu device for CPU %d\n", cpu); - return -ENODEV; - } - np = cpu_dev->of_node; + np = of_cpu_device_node_get(cpu); if (!np) { pr_err("Failed to find cpu%d device node\n", cpu); return -ENOENT; From b14e8d21f726f4ffeaf8833783eda68a1c152b15 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:47 +0100 Subject: [PATCH 30/63] cacheinfo: Add helper to access any cache index for a given CPU The cacheinfo for a given CPU at a given index is used at quite a few places by fetching the base point for index 0 using the helper per_cpu_cacheinfo(cpu) and offsetting it by the required index. Instead, add another helper to fetch the required pointer directly and use it to simplify and improve readability. Link: https://lore.kernel.org/r/20220704101605.1318280-4-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- drivers/base/cacheinfo.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index b0bde272e2ae..e13ef41763e4 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -25,6 +25,8 @@ static DEFINE_PER_CPU(struct cpu_cacheinfo, ci_cpu_cacheinfo); #define ci_cacheinfo(cpu) (&per_cpu(ci_cpu_cacheinfo, cpu)) #define cache_leaves(cpu) (ci_cacheinfo(cpu)->num_leaves) #define per_cpu_cacheinfo(cpu) (ci_cacheinfo(cpu)->info_list) +#define per_cpu_cacheinfo_idx(cpu, idx) \ + (per_cpu_cacheinfo(cpu) + (idx)) struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu) { @@ -172,7 +174,7 @@ static int cache_setup_of_node(unsigned int cpu) } while (index < cache_leaves(cpu)) { - this_leaf = this_cpu_ci->info_list + index; + this_leaf = per_cpu_cacheinfo_idx(cpu, index); if (this_leaf->level != 1) np = of_find_next_cache_node(np); else @@ -231,7 +233,7 @@ static int cache_shared_cpu_map_setup(unsigned int cpu) for (index = 0; index < cache_leaves(cpu); index++) { unsigned int i; - this_leaf = this_cpu_ci->info_list + index; + this_leaf = per_cpu_cacheinfo_idx(cpu, index); /* skip if shared_cpu_map is already populated */ if (!cpumask_empty(&this_leaf->shared_cpu_map)) continue; @@ -242,7 +244,7 @@ static int cache_shared_cpu_map_setup(unsigned int cpu) if (i == cpu || !sib_cpu_ci->info_list) continue;/* skip if itself or no cacheinfo */ - sib_leaf = sib_cpu_ci->info_list + index; + sib_leaf = per_cpu_cacheinfo_idx(i, index); if (cache_leaves_are_shared(this_leaf, sib_leaf)) { cpumask_set_cpu(cpu, &sib_leaf->shared_cpu_map); cpumask_set_cpu(i, &this_leaf->shared_cpu_map); @@ -258,12 +260,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu) static void cache_shared_cpu_map_remove(unsigned int cpu) { - struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); struct cacheinfo *this_leaf, *sib_leaf; unsigned int sibling, index; for (index = 0; index < cache_leaves(cpu); index++) { - this_leaf = this_cpu_ci->info_list + index; + this_leaf = per_cpu_cacheinfo_idx(cpu, index); for_each_cpu(sibling, &this_leaf->shared_cpu_map) { struct cpu_cacheinfo *sib_cpu_ci; @@ -274,7 +275,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu) if (!sib_cpu_ci->info_list) continue; - sib_leaf = sib_cpu_ci->info_list + index; + sib_leaf = per_cpu_cacheinfo_idx(sibling, index); cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map); cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map); } @@ -609,7 +610,6 @@ static int cache_add_dev(unsigned int cpu) int rc; struct device *ci_dev, *parent; struct cacheinfo *this_leaf; - struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); const struct attribute_group **cache_groups; rc = cpu_cache_sysfs_init(cpu); @@ -618,7 +618,7 @@ static int cache_add_dev(unsigned int cpu) parent = per_cpu_cache_dev(cpu); for (i = 0; i < cache_leaves(cpu); i++) { - this_leaf = this_cpu_ci->info_list + i; + this_leaf = per_cpu_cacheinfo_idx(cpu, i); if (this_leaf->disable_sysfs) continue; if (this_leaf->type == CACHE_TYPE_NOCACHE) From 9447eb0f1575572218267180b4edff937b3aec57 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:48 +0100 Subject: [PATCH 31/63] cacheinfo: Move cache_leaves_are_shared out of CONFIG_OF cache_leaves_are_shared is already used even with ACPI and PPTT. It checks if the cache leaves are the shared based on fw_token pointer. However it is defined conditionally only if CONFIG_OF is enabled which is wrong. Move the function cache_leaves_are_shared out of CONFIG_OF and keep it generic. It also handles the case where both OF and ACPI is not defined. Link: https://lore.kernel.org/r/20220704101605.1318280-5-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- drivers/base/cacheinfo.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index e13ef41763e4..2cea9201f31c 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -33,13 +33,21 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu) return ci_cacheinfo(cpu); } -#ifdef CONFIG_OF static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf, struct cacheinfo *sib_leaf) { + /* + * For non DT/ACPI systems, assume unique level 1 caches, + * system-wide shared caches for all other levels. This will be used + * only if arch specific code has not populated shared_cpu_map + */ + if (!(IS_ENABLED(CONFIG_OF) || IS_ENABLED(CONFIG_ACPI))) + return !(this_leaf->level == 1); + return sib_leaf->fw_token == this_leaf->fw_token; } +#ifdef CONFIG_OF /* OF properties to query for a given cache type */ struct cache_type_info { const char *size_prop; @@ -193,16 +201,6 @@ static int cache_setup_of_node(unsigned int cpu) } #else static inline int cache_setup_of_node(unsigned int cpu) { return 0; } -static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf, - struct cacheinfo *sib_leaf) -{ - /* - * For non-DT/ACPI systems, assume unique level 1 caches, system-wide - * shared caches for all other levels. This will be used only if - * arch specific code has not populated shared_cpu_map - */ - return !(this_leaf->level == 1); -} #endif int __weak cache_setup_acpi(unsigned int cpu) From cc1cfc47ea47187a21ec1f079b3c53264157fe15 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:49 +0100 Subject: [PATCH 32/63] cacheinfo: Add support to check if last level cache(LLC) is valid or shared It is useful to have helper to check if the given two CPUs share last level cache. We can do that check by comparing fw_token or by comparing the cache ID. Currently we check just for fw_token as the cache ID is optional. This helper can be used to build the llc_sibling during arch specific topology parsing and feeding information to the sched_domains. This also helps to get rid of llc_id in the CPU topology as it is sort of duplicate information. Also add helper to check if the llc information in cacheinfo is valid or not. Link: https://lore.kernel.org/r/20220704101605.1318280-6-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- drivers/base/cacheinfo.c | 26 ++++++++++++++++++++++++++ include/linux/cacheinfo.h | 2 ++ 2 files changed, 28 insertions(+) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index 2cea9201f31c..fdc1baa342f1 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -47,6 +47,32 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf, return sib_leaf->fw_token == this_leaf->fw_token; } +bool last_level_cache_is_valid(unsigned int cpu) +{ + struct cacheinfo *llc; + + if (!cache_leaves(cpu)) + return false; + + llc = per_cpu_cacheinfo_idx(cpu, cache_leaves(cpu) - 1); + + return !!llc->fw_token; +} + +bool last_level_cache_is_shared(unsigned int cpu_x, unsigned int cpu_y) +{ + struct cacheinfo *llc_x, *llc_y; + + if (!last_level_cache_is_valid(cpu_x) || + !last_level_cache_is_valid(cpu_y)) + return false; + + llc_x = per_cpu_cacheinfo_idx(cpu_x, cache_leaves(cpu_x) - 1); + llc_y = per_cpu_cacheinfo_idx(cpu_y, cache_leaves(cpu_y) - 1); + + return cache_leaves_are_shared(llc_x, llc_y); +} + #ifdef CONFIG_OF /* OF properties to query for a given cache type */ struct cache_type_info { diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h index 4ff37cb763ae..7e429bc5c1a4 100644 --- a/include/linux/cacheinfo.h +++ b/include/linux/cacheinfo.h @@ -82,6 +82,8 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu); int init_cache_level(unsigned int cpu); int populate_cache_leaves(unsigned int cpu); int cache_setup_acpi(unsigned int cpu); +bool last_level_cache_is_valid(unsigned int cpu); +bool last_level_cache_is_shared(unsigned int cpu_x, unsigned int cpu_y); #ifndef CONFIG_ACPI_PPTT /* * acpi_find_last_cache_level is only called on ACPI enabled From 36bbc5b4ffab33ccac0f4db27f619a6ba7a4fd32 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:50 +0100 Subject: [PATCH 33/63] cacheinfo: Allow early detection and population of cache attributes Some architecture/platforms may need to setup cache properties very early in the boot along with other cpu topologies so that all these information can be used to build sched_domains which is used by the scheduler. Allow detect_cache_attributes to be called quite early during the boot. Link: https://lore.kernel.org/r/20220704101605.1318280-7-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- drivers/base/cacheinfo.c | 55 ++++++++++++++++++++++++++------------- include/linux/cacheinfo.h | 1 + 2 files changed, 38 insertions(+), 18 deletions(-) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index fdc1baa342f1..4d21a1022fa9 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -193,14 +193,8 @@ static int cache_setup_of_node(unsigned int cpu) { struct device_node *np; struct cacheinfo *this_leaf; - struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); unsigned int index = 0; - /* skip if fw_token is already populated */ - if (this_cpu_ci->info_list->fw_token) { - return 0; - } - np = of_cpu_device_node_get(cpu); if (!np) { pr_err("Failed to find cpu%d device node\n", cpu); @@ -236,6 +230,18 @@ int __weak cache_setup_acpi(unsigned int cpu) unsigned int coherency_max_size; +static int cache_setup_properties(unsigned int cpu) +{ + int ret = 0; + + if (of_have_populated_dt()) + ret = cache_setup_of_node(cpu); + else if (!acpi_disabled) + ret = cache_setup_acpi(cpu); + + return ret; +} + static int cache_shared_cpu_map_setup(unsigned int cpu) { struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); @@ -246,21 +252,21 @@ static int cache_shared_cpu_map_setup(unsigned int cpu) if (this_cpu_ci->cpu_map_populated) return 0; - if (of_have_populated_dt()) - ret = cache_setup_of_node(cpu); - else if (!acpi_disabled) - ret = cache_setup_acpi(cpu); - - if (ret) - return ret; + /* + * skip setting up cache properties if LLC is valid, just need + * to update the shared cpu_map if the cache attributes were + * populated early before all the cpus are brought online + */ + if (!last_level_cache_is_valid(cpu)) { + ret = cache_setup_properties(cpu); + if (ret) + return ret; + } for (index = 0; index < cache_leaves(cpu); index++) { unsigned int i; this_leaf = per_cpu_cacheinfo_idx(cpu, index); - /* skip if shared_cpu_map is already populated */ - if (!cpumask_empty(&this_leaf->shared_cpu_map)) - continue; cpumask_set_cpu(cpu, &this_leaf->shared_cpu_map); for_each_online_cpu(i) { @@ -330,17 +336,28 @@ int __weak populate_cache_leaves(unsigned int cpu) return -ENOENT; } -static int detect_cache_attributes(unsigned int cpu) +int detect_cache_attributes(unsigned int cpu) { int ret; + /* Since early detection of the cacheinfo is allowed via this + * function and this also gets called as CPU hotplug callbacks via + * cacheinfo_cpu_online, the initialisation can be skipped and only + * CPU maps can be updated as the CPU online status would be update + * if called via cacheinfo_cpu_online path. + */ + if (per_cpu_cacheinfo(cpu)) + goto update_cpu_map; + if (init_cache_level(cpu) || !cache_leaves(cpu)) return -ENOENT; per_cpu_cacheinfo(cpu) = kcalloc(cache_leaves(cpu), sizeof(struct cacheinfo), GFP_KERNEL); - if (per_cpu_cacheinfo(cpu) == NULL) + if (per_cpu_cacheinfo(cpu) == NULL) { + cache_leaves(cpu) = 0; return -ENOMEM; + } /* * populate_cache_leaves() may completely setup the cache leaves and @@ -349,6 +366,8 @@ static int detect_cache_attributes(unsigned int cpu) ret = populate_cache_leaves(cpu); if (ret) goto free_ci; + +update_cpu_map: /* * For systems using DT for cache hierarchy, fw_token * and shared_cpu_map will be set up here only if they are diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h index 7e429bc5c1a4..00b7a6ae8617 100644 --- a/include/linux/cacheinfo.h +++ b/include/linux/cacheinfo.h @@ -84,6 +84,7 @@ int populate_cache_leaves(unsigned int cpu); int cache_setup_acpi(unsigned int cpu); bool last_level_cache_is_valid(unsigned int cpu); bool last_level_cache_is_shared(unsigned int cpu_x, unsigned int cpu_y); +int detect_cache_attributes(unsigned int cpu); #ifndef CONFIG_ACPI_PPTT /* * acpi_find_last_cache_level is only called on ACPI enabled From f16d1becf96f0a95dc9e1a5a7f97feeec2b149d5 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:51 +0100 Subject: [PATCH 34/63] cacheinfo: Use cache identifiers to check if the caches are shared if available The cache identifiers is an optional property on most of the platforms. The presence of one must be indicated by the CACHE_ID valid bit in the attributes. We can use the cache identifiers provided by the firmware to check if any two cpus share the same cache instead of relying on the fw_token generated and set in the OS. Link: https://lore.kernel.org/r/20220704101605.1318280-8-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Signed-off-by: Sudeep Holla --- drivers/base/cacheinfo.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index 4d21a1022fa9..e331b399adeb 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -44,6 +44,10 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf, if (!(IS_ENABLED(CONFIG_OF) || IS_ENABLED(CONFIG_ACPI))) return !(this_leaf->level == 1); + if ((sib_leaf->attributes & CACHE_ID) && + (this_leaf->attributes & CACHE_ID)) + return sib_leaf->id == this_leaf->id; + return sib_leaf->fw_token == this_leaf->fw_token; } @@ -56,7 +60,8 @@ bool last_level_cache_is_valid(unsigned int cpu) llc = per_cpu_cacheinfo_idx(cpu, cache_leaves(cpu) - 1); - return !!llc->fw_token; + return (llc->attributes & CACHE_ID) || !!llc->fw_token; + } bool last_level_cache_is_shared(unsigned int cpu_x, unsigned int cpu_y) From 521103134a0d07774c8b17f25ff0ef70cbd56c9d Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:52 +0100 Subject: [PATCH 35/63] cacheinfo: Align checks in cache_shared_cpu_map_{setup,remove} for readability The checks to skip the CPU itself or no cacheinfo case are implemented bit differently though the effect is exactly same. Just align the implementation in both cache_shared_cpu_map_{setup,remove} just for improved readability. No functional change. Link: https://lore.kernel.org/r/20220704101605.1318280-9-sudeep.holla@arm.com Tested-by: Conor Dooley Signed-off-by: Sudeep Holla --- drivers/base/cacheinfo.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index e331b399adeb..65d566ff24c4 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -279,6 +279,7 @@ static int cache_shared_cpu_map_setup(unsigned int cpu) if (i == cpu || !sib_cpu_ci->info_list) continue;/* skip if itself or no cacheinfo */ + sib_leaf = per_cpu_cacheinfo_idx(i, index); if (cache_leaves_are_shared(this_leaf, sib_leaf)) { cpumask_set_cpu(cpu, &sib_leaf->shared_cpu_map); @@ -301,14 +302,11 @@ static void cache_shared_cpu_map_remove(unsigned int cpu) for (index = 0; index < cache_leaves(cpu); index++) { this_leaf = per_cpu_cacheinfo_idx(cpu, index); for_each_cpu(sibling, &this_leaf->shared_cpu_map) { - struct cpu_cacheinfo *sib_cpu_ci; + struct cpu_cacheinfo *sib_cpu_ci = + get_cpu_cacheinfo(sibling); - if (sibling == cpu) /* skip itself */ - continue; - - sib_cpu_ci = get_cpu_cacheinfo(sibling); - if (!sib_cpu_ci->info_list) - continue; + if (sibling == cpu || !sib_cpu_ci->info_list) + continue;/* skip if itself or no cacheinfo */ sib_leaf = per_cpu_cacheinfo_idx(sibling, index); cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map); From 38db9b95464f82fed28794afe0214d9439d86f7c Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:53 +0100 Subject: [PATCH 36/63] arch_topology: Add support to parse and detect cache attributes Currently ACPI populates just the minimum information about the last level cache from PPTT in order to feed the same to build sched_domains. Similar support for DT platforms is not present. In order to enable the same, the entire cache hierarchy information can be built as part of CPU topoplogy parsing both on ACPI and DT platforms. Note that this change builds the cacheinfo early even on ACPI systems, but the current mechanism of building llc_sibling mask remains unchanged. Link: https://lore.kernel.org/r/20220704101605.1318280-10-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- drivers/base/arch_topology.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 579c851a2bd7..e2f7d9ea558e 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -7,6 +7,7 @@ */ #include +#include #include #include #include @@ -780,15 +781,28 @@ __weak int __init parse_acpi_topology(void) #if defined(CONFIG_ARM64) || defined(CONFIG_RISCV) void __init init_cpu_topology(void) { - reset_cpu_topology(); + int ret, cpu; - /* - * Discard anything that was parsed if we hit an error so we - * don't use partial information. - */ - if (parse_acpi_topology()) - reset_cpu_topology(); - else if (of_have_populated_dt() && parse_dt_topology()) + reset_cpu_topology(); + ret = parse_acpi_topology(); + if (!ret) + ret = of_have_populated_dt() && parse_dt_topology(); + + if (ret) { + /* + * Discard anything that was parsed if we hit an error so we + * don't use partial information. + */ reset_cpu_topology(); + return; + } + + for_each_possible_cpu(cpu) { + ret = detect_cache_attributes(cpu); + if (ret) { + pr_info("Early cacheinfo failed, ret = %d\n", ret); + break; + } + } } #endif From f027db2f9a09e76858d06828b9ff817272d64ccc Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:54 +0100 Subject: [PATCH 37/63] arch_topology: Use the last level cache information from the cacheinfo The cacheinfo is now initialised early along with the CPU topology initialisation. Instead of relying on the LLC ID information parsed separately only with ACPI PPTT elsewhere, migrate to use the similar information from the cacheinfo. This is generic for both DT and ACPI systems. The ACPI LLC ID information parsed separately can now be removed from arch specific code. Link: https://lore.kernel.org/r/20220704101605.1318280-11-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- drivers/base/arch_topology.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index e2f7d9ea558e..4f936c984fb6 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -668,7 +668,8 @@ const struct cpumask *cpu_coregroup_mask(int cpu) /* not numa in package, lets use the package siblings */ core_mask = &cpu_topology[cpu].core_sibling; } - if (cpu_topology[cpu].llc_id != -1) { + + if (last_level_cache_is_valid(cpu)) { if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask)) core_mask = &cpu_topology[cpu].llc_sibling; } @@ -699,7 +700,7 @@ void update_siblings_masks(unsigned int cpuid) for_each_online_cpu(cpu) { cpu_topo = &cpu_topology[cpu]; - if (cpu_topo->llc_id != -1 && cpuid_topo->llc_id == cpu_topo->llc_id) { + if (last_level_cache_is_shared(cpu, cpuid)) { cpumask_set_cpu(cpu, &cpuid_topo->llc_sibling); cpumask_set_cpu(cpuid, &cpu_topo->llc_sibling); } From 798eb5b4d41b282f021afc90b5187e91fc731930 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:55 +0100 Subject: [PATCH 38/63] arm64: topology: Remove redundant setting of llc_id in CPU topology Since the cacheinfo LLC information is used directly in arch_topology, there is no need to parse and fetch the LLC ID information only for ACPI systems. Just drop the redundant parsing and setting of llc_id in CPU topology from ACPI PPTT. Link: https://lore.kernel.org/r/20220704101605.1318280-12-sudeep.holla@arm.com Cc: Will Deacon Cc: Catalin Marinas Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Acked-by: Catalin Marinas Signed-off-by: Sudeep Holla --- arch/arm64/kernel/topology.c | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index 9ab78ad826e2..869ffc4d4484 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -89,8 +89,6 @@ int __init parse_acpi_topology(void) return 0; for_each_possible_cpu(cpu) { - int i, cache_id; - topology_id = find_acpi_cpu_topology(cpu, 0); if (topology_id < 0) return topology_id; @@ -107,18 +105,6 @@ int __init parse_acpi_topology(void) cpu_topology[cpu].cluster_id = topology_id; topology_id = find_acpi_cpu_topology_package(cpu); cpu_topology[cpu].package_id = topology_id; - - i = acpi_find_last_cache_level(cpu); - - if (i > 0) { - /* - * this is the only part of cpu_topology that has - * a direct relationship with the cache topology - */ - cache_id = find_acpi_cpu_cache_topology(cpu, i); - if (cache_id > 0) - cpu_topology[cpu].llc_id = cache_id; - } } return 0; From 5b8dc787ce4a45c87254d1b0b22f161347ab7f81 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:56 +0100 Subject: [PATCH 39/63] arch_topology: Drop LLC identifier stash from the CPU topology Since the cacheinfo LLC information is used directly in arch_topology, there is no need to parse and store the LLC ID information only for ACPI systems in the CPU topology. Remove the redundant LLC ID from the generic CPU arch_topology information. Link: https://lore.kernel.org/r/20220704101605.1318280-13-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- drivers/base/arch_topology.c | 1 - include/linux/arch_topology.h | 1 - 2 files changed, 2 deletions(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 4f936c984fb6..8206990c679f 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -752,7 +752,6 @@ void __init reset_cpu_topology(void) cpu_topo->core_id = -1; cpu_topo->cluster_id = -1; cpu_topo->package_id = -1; - cpu_topo->llc_id = -1; clear_cpu_topology(cpu); } diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index 58cbe18d825c..a07b510e7dc5 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -68,7 +68,6 @@ struct cpu_topology { int core_id; int cluster_id; int package_id; - int llc_id; cpumask_t thread_sibling; cpumask_t core_sibling; cpumask_t cluster_sibling; From 3f8283296b16c4e43fd79a5ac364ae8171fe1567 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:57 +0100 Subject: [PATCH 40/63] arch_topology: Set thread sibling cpumask only within the cluster Currently the cluster identifier is not set on the DT based platforms. The reset or default value is -1 for all the CPUs. Once we assign the cluster identifier values correctly that may result in getting the thread siblings wrong as the core identifiers can be same for 2 different CPUs belonging to 2 different cluster. So, in order to get the thread sibling cpumasks correct, we need to update them only if the cores they belong are in the same cluster within the socket. Let us skip updation of the thread sibling cpumaks if the cluster identifier doesn't match. This change won't affect even if the cluster identifiers are not set currently but will avoid any breakage once we set the same correctly. Link: https://lore.kernel.org/r/20220704101605.1318280-14-sudeep.holla@arm.com Tested-by: Gavin Shan Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- drivers/base/arch_topology.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 8206990c679f..6ab173caf1dc 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -708,15 +708,17 @@ void update_siblings_masks(unsigned int cpuid) if (cpuid_topo->package_id != cpu_topo->package_id) continue; - if (cpuid_topo->cluster_id == cpu_topo->cluster_id && - cpuid_topo->cluster_id != -1) { + cpumask_set_cpu(cpuid, &cpu_topo->core_sibling); + cpumask_set_cpu(cpu, &cpuid_topo->core_sibling); + + if (cpuid_topo->cluster_id != cpu_topo->cluster_id) + continue; + + if (cpuid_topo->cluster_id != -1) { cpumask_set_cpu(cpu, &cpuid_topo->cluster_sibling); cpumask_set_cpu(cpuid, &cpu_topo->cluster_sibling); } - cpumask_set_cpu(cpuid, &cpu_topo->core_sibling); - cpumask_set_cpu(cpu, &cpuid_topo->core_sibling); - if (cpuid_topo->core_id != cpu_topo->core_id) continue; From 9eb5e54f876dda1aae0aef10bdd61da4331509ba Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:58 +0100 Subject: [PATCH 41/63] arch_topology: Check for non-negative value rather than -1 for IDs validity Instead of just comparing the cpu topology IDs with -1 to check their validity, improve that by checking for a valid non-negative value. Link: https://lore.kernel.org/r/20220704101605.1318280-15-sudeep.holla@arm.com Suggested-by: Andy Shevchenko Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- drivers/base/arch_topology.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 6ab173caf1dc..c0b0ee64a79d 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -642,7 +642,7 @@ static int __init parse_dt_topology(void) * only mark cores described in the DT as possible. */ for_each_possible_cpu(cpu) - if (cpu_topology[cpu].package_id == -1) + if (cpu_topology[cpu].package_id < 0) ret = -EINVAL; out_map: @@ -714,7 +714,7 @@ void update_siblings_masks(unsigned int cpuid) if (cpuid_topo->cluster_id != cpu_topo->cluster_id) continue; - if (cpuid_topo->cluster_id != -1) { + if (cpuid_topo->cluster_id >= 0) { cpumask_set_cpu(cpu, &cpuid_topo->cluster_sibling); cpumask_set_cpu(cpuid, &cpu_topo->cluster_sibling); } From 5a01bb8efb5177236498fc57b147cabd2b792613 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:59 +0100 Subject: [PATCH 42/63] arch_topology: Avoid parsing through all the CPUs once a outlier CPU is found There is no point in looping through all the CPU's physical package identifier to check if it is valid or not once a CPU which is outside the topology(i.e. outlier CPU) is found. Let us just break out of the loop early in such case. Link: https://lore.kernel.org/r/20220704101605.1318280-16-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- drivers/base/arch_topology.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index c0b0ee64a79d..8f6a964d2512 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -642,8 +642,10 @@ static int __init parse_dt_topology(void) * only mark cores described in the DT as possible. */ for_each_possible_cpu(cpu) - if (cpu_topology[cpu].package_id < 0) + if (cpu_topology[cpu].package_id < 0) { ret = -EINVAL; + break; + } out_map: of_node_put(map); From 26a2b73a7b15a51ec1409648c9b43882f66fbacf Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:16:00 +0100 Subject: [PATCH 43/63] arch_topology: Don't set cluster identifier as physical package identifier Currently as we parse the CPU topology from /cpu-map node from the device tree, we assign generated cluster count as the physical package identifier for each CPU which is wrong. The device tree bindings for CPU topology supports sockets to infer the socket or physical package identifier for a given CPU. Since it is fairly new and not supported on most of the old and existing systems, we can assume all such systems have single socket/physical package. Fix the physical package identifier to 0 by removing the assignment of cluster identifier to the same. Link: https://lore.kernel.org/r/20220704101605.1318280-17-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Ionela Voinescu Signed-off-by: Sudeep Holla --- drivers/base/arch_topology.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 8f6a964d2512..e384afb6cac7 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -549,7 +549,6 @@ static int __init parse_cluster(struct device_node *cluster, int depth) bool leaf = true; bool has_cores = false; struct device_node *c; - static int package_id __initdata; int core_id = 0; int i, ret; @@ -588,7 +587,7 @@ static int __init parse_cluster(struct device_node *cluster, int depth) } if (leaf) { - ret = parse_core(c, package_id, core_id++); + ret = parse_core(c, 0, core_id++); } else { pr_err("%pOF: Non-leaf cluster with core %s\n", cluster, name); @@ -605,9 +604,6 @@ static int __init parse_cluster(struct device_node *cluster, int depth) if (leaf && !has_cores) pr_warn("%pOF: empty cluster\n", cluster); - if (leaf) - package_id++; - return 0; } From bfcc4397435dc0407099b9a805391abc05c2313b Mon Sep 17 00:00:00 2001 From: Ionela Voinescu Date: Mon, 4 Jul 2022 11:16:01 +0100 Subject: [PATCH 44/63] arch_topology: Limit span of cpu_clustergroup_mask() Currently the cluster identifier is not set on DT based platforms. The reset or default value is -1 for all the CPUs. Once we assign the cluster identifier values correctly, the cluster_sibling mask will be populated and returned by cpu_clustergroup_mask() to contribute in the creation of the CLS scheduling domain level, if SCHED_CLUSTER is enabled. To avoid topologies that will result in questionable or incorrect scheduling domains, impose restrictions regarding the span of clusters, as presented to scheduling domains building code: cluster_sibling should not span more or the same CPUs as cpu_coregroup_mask(). This is needed in order to obtain a strict separation between the MC and CLS levels, and maintain the same domains for existing platforms in the presence of CONFIG_SCHED_CLUSTER, where the new cluster information is redundant and irrelevant for the scheduler. While previously the scheduling domain builder code would have removed MC as redundant and kept CLS if SCHED_CLUSTER was enabled and the cpu_coregroup_mask() and cpu_clustergroup_mask() spanned the same CPUs, now CLS will be removed and MC kept. Link: https://lore.kernel.org/r/20220704101605.1318280-18-sudeep.holla@arm.com Cc: Darren Hart Tested-by: Conor Dooley Acked-by: Vincent Guittot Signed-off-by: Ionela Voinescu Signed-off-by: Sudeep Holla --- drivers/base/arch_topology.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index e384afb6cac7..591c1f8e15e2 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -686,6 +686,14 @@ const struct cpumask *cpu_coregroup_mask(int cpu) const struct cpumask *cpu_clustergroup_mask(int cpu) { + /* + * Forbid cpu_clustergroup_mask() to span more or the same CPUs as + * cpu_coregroup_mask(). + */ + if (cpumask_subset(cpu_coregroup_mask(cpu), + &cpu_topology[cpu].cluster_sibling)) + return get_cpu_mask(cpu); + return &cpu_topology[cpu].cluster_sibling; } From 556c9678a7d4456a677588ce308200a673b7eb1f Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:16:02 +0100 Subject: [PATCH 45/63] arch_topology: Set cluster identifier in each core/thread from /cpu-map Let us set the cluster identifier as parsed from the device tree cluster nodes within /cpu-map. We don't support nesting of clusters yet as there are no real hardware to support clusters of clusters. Link: https://lore.kernel.org/r/20220704101605.1318280-19-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Ionela Voinescu Signed-off-by: Sudeep Holla --- drivers/base/arch_topology.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 591c1f8e15e2..217a91fc1f59 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -497,7 +497,7 @@ static int __init get_cpu_for_node(struct device_node *node) } static int __init parse_core(struct device_node *core, int package_id, - int core_id) + int cluster_id, int core_id) { char name[20]; bool leaf = true; @@ -513,6 +513,7 @@ static int __init parse_core(struct device_node *core, int package_id, cpu = get_cpu_for_node(t); if (cpu >= 0) { cpu_topology[cpu].package_id = package_id; + cpu_topology[cpu].cluster_id = cluster_id; cpu_topology[cpu].core_id = core_id; cpu_topology[cpu].thread_id = i; } else if (cpu != -ENODEV) { @@ -534,6 +535,7 @@ static int __init parse_core(struct device_node *core, int package_id, } cpu_topology[cpu].package_id = package_id; + cpu_topology[cpu].cluster_id = cluster_id; cpu_topology[cpu].core_id = core_id; } else if (leaf && cpu != -ENODEV) { pr_err("%pOF: Can't get CPU for leaf core\n", core); @@ -543,7 +545,8 @@ static int __init parse_core(struct device_node *core, int package_id, return 0; } -static int __init parse_cluster(struct device_node *cluster, int depth) +static int __init +parse_cluster(struct device_node *cluster, int cluster_id, int depth) { char name[20]; bool leaf = true; @@ -563,7 +566,7 @@ static int __init parse_cluster(struct device_node *cluster, int depth) c = of_get_child_by_name(cluster, name); if (c) { leaf = false; - ret = parse_cluster(c, depth + 1); + ret = parse_cluster(c, i, depth + 1); of_node_put(c); if (ret != 0) return ret; @@ -587,7 +590,7 @@ static int __init parse_cluster(struct device_node *cluster, int depth) } if (leaf) { - ret = parse_core(c, 0, core_id++); + ret = parse_core(c, 0, cluster_id, core_id++); } else { pr_err("%pOF: Non-leaf cluster with core %s\n", cluster, name); @@ -627,7 +630,7 @@ static int __init parse_dt_topology(void) if (!map) goto out; - ret = parse_cluster(map, 0); + ret = parse_cluster(map, -1, 0); if (ret != 0) goto out_map; From dea8c0b40fb500be29f4649cf01202e42a8a54f8 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:16:03 +0100 Subject: [PATCH 46/63] arch_topology: Add support for parsing sockets in /cpu-map Finally let us add support for socket nodes in /cpu-map in the device tree. Since this may not be present in all the old platforms and even most of the existing platforms, we need to assume absence of the socket node indicates that it is a single socket system and handle appropriately. Also it is likely that most single socket systems skip to as the node since it is optional. Link: https://lore.kernel.org/r/20220704101605.1318280-20-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Ionela Voinescu Signed-off-by: Sudeep Holla --- drivers/base/arch_topology.c | 37 +++++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 217a91fc1f59..8719c4458df9 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -545,8 +545,8 @@ static int __init parse_core(struct device_node *core, int package_id, return 0; } -static int __init -parse_cluster(struct device_node *cluster, int cluster_id, int depth) +static int __init parse_cluster(struct device_node *cluster, int package_id, + int cluster_id, int depth) { char name[20]; bool leaf = true; @@ -566,7 +566,7 @@ parse_cluster(struct device_node *cluster, int cluster_id, int depth) c = of_get_child_by_name(cluster, name); if (c) { leaf = false; - ret = parse_cluster(c, i, depth + 1); + ret = parse_cluster(c, package_id, i, depth + 1); of_node_put(c); if (ret != 0) return ret; @@ -590,7 +590,8 @@ parse_cluster(struct device_node *cluster, int cluster_id, int depth) } if (leaf) { - ret = parse_core(c, 0, cluster_id, core_id++); + ret = parse_core(c, package_id, cluster_id, + core_id++); } else { pr_err("%pOF: Non-leaf cluster with core %s\n", cluster, name); @@ -610,6 +611,32 @@ parse_cluster(struct device_node *cluster, int cluster_id, int depth) return 0; } +static int __init parse_socket(struct device_node *socket) +{ + char name[20]; + struct device_node *c; + bool has_socket = false; + int package_id = 0, ret; + + do { + snprintf(name, sizeof(name), "socket%d", package_id); + c = of_get_child_by_name(socket, name); + if (c) { + has_socket = true; + ret = parse_cluster(c, package_id, -1, 0); + of_node_put(c); + if (ret != 0) + return ret; + } + package_id++; + } while (c); + + if (!has_socket) + ret = parse_cluster(socket, 0, -1, 0); + + return ret; +} + static int __init parse_dt_topology(void) { struct device_node *cn, *map; @@ -630,7 +657,7 @@ static int __init parse_dt_topology(void) if (!map) goto out; - ret = parse_cluster(map, -1, 0); + ret = parse_socket(map); if (ret != 0) goto out_map; From 00e66e37af0090f9ed95ca4bc3d8f5c6171daaf0 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:16:04 +0100 Subject: [PATCH 47/63] arch_topology: Warn that topology for nested clusters is not supported We don't support the topology for clusters of CPU clusters while the DT and ACPI bindings theoritcally support the same. Just warn about the same so that it is clear to the users of arch_topology that the nested clusters are not yet supported. Link: https://lore.kernel.org/r/20220704101605.1318280-21-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Signed-off-by: Sudeep Holla --- drivers/base/arch_topology.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 8719c4458df9..441e14ac33a4 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -567,6 +567,8 @@ static int __init parse_cluster(struct device_node *cluster, int package_id, if (c) { leaf = false; ret = parse_cluster(c, package_id, i, depth + 1); + if (depth > 0) + pr_warn("Topology for clusters of clusters not yet supported\n"); of_node_put(c); if (ret != 0) return ret; From 7128af87c7f1c30cd6cebe0b012cc25872c689e2 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:16:05 +0100 Subject: [PATCH 48/63] ACPI: Remove the unused find_acpi_cpu_cache_topology() The sole user of this find_acpi_cpu_cache_topology() was arm64 topology which is now consolidated into the generic arch_topology without the need of this function. Drop the unused function find_acpi_cpu_cache_topology(). Link: https://lore.kernel.org/r/20220704101605.1318280-22-sudeep.holla@arm.com Cc: Rafael J. Wysocki Reported-by: Ionela Voinescu Tested-by: Conor Dooley Acked-by: Rafael J. Wysocki Signed-off-by: Sudeep Holla --- drivers/acpi/pptt.c | 37 ------------------------------------- include/linux/acpi.h | 5 ----- 2 files changed, 42 deletions(-) diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c index 763f021d45e6..dd3222a15c9c 100644 --- a/drivers/acpi/pptt.c +++ b/drivers/acpi/pptt.c @@ -691,43 +691,6 @@ int find_acpi_cpu_topology(unsigned int cpu, int level) return find_acpi_cpu_topology_tag(cpu, level, 0); } -/** - * find_acpi_cpu_cache_topology() - Determine a unique cache topology value - * @cpu: Kernel logical CPU number - * @level: The cache level for which we would like a unique ID - * - * Determine a unique ID for each unified cache in the system - * - * Return: -ENOENT if the PPTT doesn't exist, or the CPU cannot be found. - * Otherwise returns a value which represents a unique topological feature. - */ -int find_acpi_cpu_cache_topology(unsigned int cpu, int level) -{ - struct acpi_table_header *table; - struct acpi_pptt_cache *found_cache; - acpi_status status; - u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu); - struct acpi_pptt_processor *cpu_node = NULL; - int ret = -1; - - status = acpi_get_table(ACPI_SIG_PPTT, 0, &table); - if (ACPI_FAILURE(status)) { - acpi_pptt_warn_missing(); - return -ENOENT; - } - - found_cache = acpi_find_cache_node(table, acpi_cpu_id, - CACHE_TYPE_UNIFIED, - level, - &cpu_node); - if (found_cache) - ret = ACPI_PTR_DIFF(cpu_node, table); - - acpi_put_table(table); - - return ret; -} - /** * find_acpi_cpu_topology_package() - Determine a unique CPU package value * @cpu: Kernel logical CPU number diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 4f82a5bc6d98..7b96a8bff6d2 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -1429,7 +1429,6 @@ int find_acpi_cpu_topology(unsigned int cpu, int level); int find_acpi_cpu_topology_cluster(unsigned int cpu); int find_acpi_cpu_topology_package(unsigned int cpu); int find_acpi_cpu_topology_hetero_id(unsigned int cpu); -int find_acpi_cpu_cache_topology(unsigned int cpu, int level); #else static inline int acpi_pptt_cpu_is_thread(unsigned int cpu) { @@ -1451,10 +1450,6 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu) { return -EINVAL; } -static inline int find_acpi_cpu_cache_topology(unsigned int cpu, int level) -{ - return -EINVAL; -} #endif #ifdef CONFIG_ACPI_PCC From 2fd26970cf66bd52dc42843c46968040caa8c9a1 Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Wed, 6 Jul 2022 06:10:26 +1000 Subject: [PATCH 49/63] Revert "kernfs: Change kernfs_notify_list to llist." This reverts commit b8f35fa1188b84035c59d4842826c4e93a1b1c9f. This is causing regression due to same kernfs_node getting added multiple times in kernfs_notify_list so revert it until safe way of using llist in this context is found. Reported-by: Nathan Chancellor Reported-by: Michael Walle Reported-by: Marek Szyprowski Signed-off-by: Imran Khan Cc: Tejun Heo Link: https://lore.kernel.org/r/20220705201026.2487665-1-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman --- fs/kernfs/file.c | 47 ++++++++++++++++++++++++------------------ include/linux/kernfs.h | 2 +- 2 files changed, 28 insertions(+), 21 deletions(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index bb933221b4ba..baff4b1d40c7 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -25,16 +25,18 @@ struct kernfs_open_node { struct list_head files; /* goes through kernfs_open_file.list */ }; -/** - * attribute_to_node - get kernfs_node object corresponding to a kernfs attribute - * @ptr: &struct kernfs_elem_attr - * @type: struct kernfs_node - * @member: name of member (i.e attr) +/* + * kernfs_notify() may be called from any context and bounces notifications + * through a work item. To minimize space overhead in kernfs_node, the + * pending queue is implemented as a singly linked list of kernfs_nodes. + * The list is terminated with the self pointer so that whether a + * kernfs_node is on the list or not can be determined by testing the next + * pointer for NULL. */ -#define attribute_to_node(ptr, type, member) \ - container_of(ptr, type, member) +#define KERNFS_NOTIFY_EOL ((void *)&kernfs_notify_list) -static LLIST_HEAD(kernfs_notify_list); +static DEFINE_SPINLOCK(kernfs_notify_lock); +static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL; static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn) { @@ -909,16 +911,18 @@ static void kernfs_notify_workfn(struct work_struct *work) struct kernfs_node *kn; struct kernfs_super_info *info; struct kernfs_root *root; - struct llist_node *free; - struct kernfs_elem_attr *attr; repeat: /* pop one off the notify_list */ - free = llist_del_first(&kernfs_notify_list); - if (free == NULL) + spin_lock_irq(&kernfs_notify_lock); + kn = kernfs_notify_list; + if (kn == KERNFS_NOTIFY_EOL) { + spin_unlock_irq(&kernfs_notify_lock); return; + } + kernfs_notify_list = kn->attr.notify_next; + kn->attr.notify_next = NULL; + spin_unlock_irq(&kernfs_notify_lock); - attr = llist_entry(free, struct kernfs_elem_attr, notify_next); - kn = attribute_to_node(attr, struct kernfs_node, attr); root = kernfs_root(kn); /* kick fsnotify */ down_write(&root->kernfs_rwsem); @@ -974,14 +978,12 @@ repeat: void kernfs_notify(struct kernfs_node *kn) { static DECLARE_WORK(kernfs_notify_work, kernfs_notify_workfn); + unsigned long flags; struct kernfs_open_node *on; if (WARN_ON(kernfs_type(kn) != KERNFS_FILE)) return; - /* Because we are using llist for kernfs_notify_list */ - WARN_ON_ONCE(in_nmi()); - /* kick poll immediately */ rcu_read_lock(); on = rcu_dereference(kn->attr.open); @@ -992,9 +994,14 @@ void kernfs_notify(struct kernfs_node *kn) rcu_read_unlock(); /* schedule work to kick fsnotify */ - kernfs_get(kn); - llist_add(&kn->attr.notify_next, &kernfs_notify_list); - schedule_work(&kernfs_notify_work); + spin_lock_irqsave(&kernfs_notify_lock, flags); + if (!kn->attr.notify_next) { + kernfs_get(kn); + kn->attr.notify_next = kernfs_notify_list; + kernfs_notify_list = kn; + schedule_work(&kernfs_notify_work); + } + spin_unlock_irqrestore(&kernfs_notify_lock, flags); } EXPORT_SYMBOL_GPL(kernfs_notify); diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 13e703f615f7..367044d7708c 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -173,7 +173,7 @@ struct kernfs_elem_attr { const struct kernfs_ops *ops; struct kernfs_open_node __rcu *open; loff_t size; - struct llist_node notify_next; /* for kernfs_notify() */ + struct kernfs_node *notify_next; /* for kernfs_notify() */ }; /* From 6c3c267e5fbc637c8fc4b1d075e3c43e328550a6 Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Mon, 11 Jul 2022 11:10:58 -0700 Subject: [PATCH 50/63] Documentation/process: Add embargoed HW contact for LLVM Should the need for toolchain mitigations ever be necessary, add a group for toolchain ambassadors. Add Nick Desaulniers as LLVM's ambassador for the embargoed hardware issues process. Signed-off-by: Nick Desaulniers Link: https://lore.kernel.org/r/20220711181101.1559558-1-ndesaulniers@google.com Signed-off-by: Greg Kroah-Hartman --- Documentation/process/embargoed-hardware-issues.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/process/embargoed-hardware-issues.rst b/Documentation/process/embargoed-hardware-issues.rst index 95999302d279..48bd249d96b9 100644 --- a/Documentation/process/embargoed-hardware-issues.rst +++ b/Documentation/process/embargoed-hardware-issues.rst @@ -264,6 +264,9 @@ an involved disclosed party. The current ambassadors list: Amazon Google Kees Cook + + GCC + LLVM Nick Desaulniers ============= ======================================================== If you want your organization to be added to the ambassadors list, please From 80dd7ae16bea69a055d14b81e43317207de70f83 Mon Sep 17 00:00:00 2001 From: Lee Jones Date: Thu, 14 Jul 2022 12:25:29 +0100 Subject: [PATCH 51/63] docs: ABI: sysfs-class-pwm: Update Lee Jones' email address Going forward, I'll be using my kernel.org for upstream work. Cc: Greg Kroah-Hartman Cc: Mauro Carvalho Chehab Signed-off-by: Lee Jones Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220714112533.539910-5-lee@kernel.org Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-class-pwm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-class-pwm b/Documentation/ABI/testing/sysfs-class-pwm index 3d65285bcd5f..0638c94d01ef 100644 --- a/Documentation/ABI/testing/sysfs-class-pwm +++ b/Documentation/ABI/testing/sysfs-class-pwm @@ -81,7 +81,7 @@ Description: What: /sys/class/pwm/pwmchip/pwmX/capture Date: June 2016 KernelVersion: 4.8 -Contact: Lee Jones +Contact: Lee Jones Description: Capture information about a PWM signal. The output format is a pair unsigned integers (period and duty cycle), separated by a From 9f9c909095610b1fc11f0784cfa591c756f1cdc0 Mon Sep 17 00:00:00 2001 From: Lee Jones Date: Thu, 14 Jul 2022 12:25:30 +0100 Subject: [PATCH 52/63] docs: ABI: sysfs-devices-soc: Update Lee Jones' email address Going forward, I'll be using my kernel.org for upstream work. Cc: Greg Kroah-Hartman Cc: Mauro Carvalho Chehab Signed-off-by: Lee Jones Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220714112533.539910-6-lee@kernel.org Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-devices-soc | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-devices-soc b/Documentation/ABI/testing/sysfs-devices-soc index ea999e292f11..5269808ec35f 100644 --- a/Documentation/ABI/testing/sysfs-devices-soc +++ b/Documentation/ABI/testing/sysfs-devices-soc @@ -1,6 +1,6 @@ What: /sys/devices/socX Date: January 2012 -contact: Lee Jones +contact: Lee Jones Description: The /sys/devices/ directory contains a sub-directory for each System-on-Chip (SoC) device on a running platform. Information @@ -14,14 +14,14 @@ Description: What: /sys/devices/socX/machine Date: January 2012 -contact: Lee Jones +contact: Lee Jones Description: Read-only attribute common to all SoCs. Contains the SoC machine name (e.g. Ux500). What: /sys/devices/socX/family Date: January 2012 -contact: Lee Jones +contact: Lee Jones Description: Read-only attribute common to all SoCs. Contains SoC family name (e.g. DB8500). @@ -59,7 +59,7 @@ Description: What: /sys/devices/socX/soc_id Date: January 2012 -contact: Lee Jones +contact: Lee Jones Description: Read-only attribute supported by most SoCs. In the case of ST-Ericsson's chips this contains the SoC serial number. @@ -72,21 +72,21 @@ Description: What: /sys/devices/socX/revision Date: January 2012 -contact: Lee Jones +contact: Lee Jones Description: Read-only attribute supported by most SoCs. Contains the SoC's manufacturing revision number. What: /sys/devices/socX/process Date: January 2012 -contact: Lee Jones +contact: Lee Jones Description: Read-only attribute supported ST-Ericsson's silicon. Contains the the process by which the silicon chip was manufactured. What: /sys/bus/soc Date: January 2012 -contact: Lee Jones +contact: Lee Jones Description: The /sys/bus/soc/ directory contains the usual sub-folders expected under most buses. /sys/bus/soc/devices is of particular From 4a4e8f7f625b48c79ec9f3b5c219a09a6c71fd83 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 12 Jul 2022 11:54:19 -0700 Subject: [PATCH 53/63] MAINTAINERS: Change mentions of mpm to olivia Following this mercurial changeset: https://www.mercurial-scm.org/repo/hg-stable/rev/d4ba4d51f85f update the MAINTAINERS entry to replace the now obsolete identity. Signed-off-by: Florian Fainelli Link: https://lore.kernel.org/r/20220712185419.45487-1-f.fainelli@gmail.com Signed-off-by: Greg Kroah-Hartman --- MAINTAINERS | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index a6d3bd9d2a8d..93cf6e8e6700 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7305,7 +7305,7 @@ F: Documentation/admin-guide/media/em28xx* F: drivers/media/usb/em28xx/ EMBEDDED LINUX -M: Matt Mackall +M: Olivia Mackall M: David Woodhouse L: linux-embedded@vger.kernel.org S: Maintained @@ -8686,7 +8686,7 @@ F: include/trace/events/hwmon*.h K: (devm_)?hwmon_device_(un)?register(|_with_groups|_with_info) HARDWARE RANDOM NUMBER GENERATOR CORE -M: Matt Mackall +M: Olivia Mackall M: Herbert Xu L: linux-crypto@vger.kernel.org S: Odd fixes From 7ee951acd31a88f941fd6535fbdee3a1567f1d63 Mon Sep 17 00:00:00 2001 From: Phil Auld Date: Fri, 15 Jul 2022 09:49:24 -0400 Subject: [PATCH 54/63] drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist Using bin_attributes with a 0 size causes fstat and friends to return that 0 size. This breaks userspace code that retrieves the size before reading the file. Rather than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI") let's put in a size value at compile time. For cpulist the maximum size is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1)/2 which for 8192 is 20480 (8192 * 5)/2. In order to get near that you'd need a system with every other CPU on one node. For example: (0,2,4,8, ... ). To simplify the math and support larger NR_CPUS in the future we are using (NR_CPUS * 7)/2. We also set it to a min of PAGE_SIZE to retain the older behavior for smaller NR_CPUS. The cpumap file the size works out to be NR_CPUS/4 + NR_CPUS/32 - 1 (or NR_CPUS * 9/32 - 1) including the ","s. Add a set of macros for these values to cpumask.h so they can be used in multiple places. Apply these to the handful of such files in drivers/base/topology.c as well as node.c. As an example, on an 80 cpu 4-node system (NR_CPUS == 8192): before: -r--r--r--. 1 root root 0 Jul 12 14:08 system/node/node0/cpulist -r--r--r--. 1 root root 0 Jul 11 17:25 system/node/node0/cpumap after: -r--r--r--. 1 root root 28672 Jul 13 11:32 system/node/node0/cpulist -r--r--r--. 1 root root 4096 Jul 13 11:31 system/node/node0/cpumap CONFIG_NR_CPUS = 16384 -r--r--r--. 1 root root 57344 Jul 13 14:03 system/node/node0/cpulist -r--r--r--. 1 root root 4607 Jul 13 14:02 system/node/node0/cpumap The actual number of cpus doesn't matter for the reported size since they are based on NR_CPUS. Fixes: 75bd50fa841d ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI") Fixes: bb9ec13d156e ("topology: use bin_attribute to break the size limitation of cpumap ABI") Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Yury Norov Cc: stable@vger.kernel.org Acked-by: Yury Norov (for include/linux/cpumask.h) Signed-off-by: Phil Auld Link: https://lore.kernel.org/r/20220715134924.3466194-1-pauld@redhat.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/node.c | 4 ++-- drivers/base/topology.c | 32 ++++++++++++++++---------------- include/linux/cpumask.h | 18 ++++++++++++++++++ 3 files changed, 36 insertions(+), 18 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 0ac6376ef7a1..eb0f43784c2b 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -45,7 +45,7 @@ static inline ssize_t cpumap_read(struct file *file, struct kobject *kobj, return n; } -static BIN_ATTR_RO(cpumap, 0); +static BIN_ATTR_RO(cpumap, CPUMAP_FILE_MAX_BYTES); static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj, struct bin_attribute *attr, char *buf, @@ -66,7 +66,7 @@ static inline ssize_t cpulist_read(struct file *file, struct kobject *kobj, return n; } -static BIN_ATTR_RO(cpulist, 0); +static BIN_ATTR_RO(cpulist, CPULIST_FILE_MAX_BYTES); /** * struct node_access_nodes - Access class device to hold user visible diff --git a/drivers/base/topology.c b/drivers/base/topology.c index ac6ad9ab67f9..89f98be5c5b9 100644 --- a/drivers/base/topology.c +++ b/drivers/base/topology.c @@ -62,47 +62,47 @@ define_id_show_func(ppin, "0x%llx"); static DEVICE_ATTR_ADMIN_RO(ppin); define_siblings_read_func(thread_siblings, sibling_cpumask); -static BIN_ATTR_RO(thread_siblings, 0); -static BIN_ATTR_RO(thread_siblings_list, 0); +static BIN_ATTR_RO(thread_siblings, CPUMAP_FILE_MAX_BYTES); +static BIN_ATTR_RO(thread_siblings_list, CPULIST_FILE_MAX_BYTES); define_siblings_read_func(core_cpus, sibling_cpumask); -static BIN_ATTR_RO(core_cpus, 0); -static BIN_ATTR_RO(core_cpus_list, 0); +static BIN_ATTR_RO(core_cpus, CPUMAP_FILE_MAX_BYTES); +static BIN_ATTR_RO(core_cpus_list, CPULIST_FILE_MAX_BYTES); define_siblings_read_func(core_siblings, core_cpumask); -static BIN_ATTR_RO(core_siblings, 0); -static BIN_ATTR_RO(core_siblings_list, 0); +static BIN_ATTR_RO(core_siblings, CPUMAP_FILE_MAX_BYTES); +static BIN_ATTR_RO(core_siblings_list, CPULIST_FILE_MAX_BYTES); #ifdef TOPOLOGY_CLUSTER_SYSFS define_siblings_read_func(cluster_cpus, cluster_cpumask); -static BIN_ATTR_RO(cluster_cpus, 0); -static BIN_ATTR_RO(cluster_cpus_list, 0); +static BIN_ATTR_RO(cluster_cpus, CPUMAP_FILE_MAX_BYTES); +static BIN_ATTR_RO(cluster_cpus_list, CPULIST_FILE_MAX_BYTES); #endif #ifdef TOPOLOGY_DIE_SYSFS define_siblings_read_func(die_cpus, die_cpumask); -static BIN_ATTR_RO(die_cpus, 0); -static BIN_ATTR_RO(die_cpus_list, 0); +static BIN_ATTR_RO(die_cpus, CPUMAP_FILE_MAX_BYTES); +static BIN_ATTR_RO(die_cpus_list, CPULIST_FILE_MAX_BYTES); #endif define_siblings_read_func(package_cpus, core_cpumask); -static BIN_ATTR_RO(package_cpus, 0); -static BIN_ATTR_RO(package_cpus_list, 0); +static BIN_ATTR_RO(package_cpus, CPUMAP_FILE_MAX_BYTES); +static BIN_ATTR_RO(package_cpus_list, CPULIST_FILE_MAX_BYTES); #ifdef TOPOLOGY_BOOK_SYSFS define_id_show_func(book_id, "%d"); static DEVICE_ATTR_RO(book_id); define_siblings_read_func(book_siblings, book_cpumask); -static BIN_ATTR_RO(book_siblings, 0); -static BIN_ATTR_RO(book_siblings_list, 0); +static BIN_ATTR_RO(book_siblings, CPUMAP_FILE_MAX_BYTES); +static BIN_ATTR_RO(book_siblings_list, CPULIST_FILE_MAX_BYTES); #endif #ifdef TOPOLOGY_DRAWER_SYSFS define_id_show_func(drawer_id, "%d"); static DEVICE_ATTR_RO(drawer_id); define_siblings_read_func(drawer_siblings, drawer_cpumask); -static BIN_ATTR_RO(drawer_siblings, 0); -static BIN_ATTR_RO(drawer_siblings_list, 0); +static BIN_ATTR_RO(drawer_siblings, CPUMAP_FILE_MAX_BYTES); +static BIN_ATTR_RO(drawer_siblings_list, CPULIST_FILE_MAX_BYTES); #endif static struct bin_attribute *bin_attrs[] = { diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index fe29ac7cc469..4592d0845941 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -1071,4 +1071,22 @@ cpumap_print_list_to_buf(char *buf, const struct cpumask *mask, [0] = 1UL \ } } +/* + * Provide a valid theoretical max size for cpumap and cpulist sysfs files + * to avoid breaking userspace which may allocate a buffer based on the size + * reported by e.g. fstat. + * + * for cpumap NR_CPUS * 9/32 - 1 should be an exact length. + * + * For cpulist 7 is (ceil(log10(NR_CPUS)) + 1) allowing for NR_CPUS to be up + * to 2 orders of magnitude larger than 8192. And then we divide by 2 to + * cover a worst-case of every other cpu being on one of two nodes for a + * very large NR_CPUS. + * + * Use PAGE_SIZE as a minimum for smaller configurations. + */ +#define CPUMAP_FILE_MAX_BYTES ((((NR_CPUS * 9)/32 - 1) > PAGE_SIZE) \ + ? (NR_CPUS * 9)/32 - 1 : PAGE_SIZE) +#define CPULIST_FILE_MAX_BYTES (((NR_CPUS * 7)/2 > PAGE_SIZE) ? (NR_CPUS * 7)/2 : PAGE_SIZE) + #endif /* __LINUX_CPUMASK_H */ From 11969d698f8cda31bd176ec346833ef97ea7c67e Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 20 Jul 2022 13:55:38 +0100 Subject: [PATCH 55/63] cacheinfo: Use atomic allocation for percpu cache attributes On couple of architectures like RISC-V and ARM64, we need to detect cache attribues quite early during the boot when the secondary CPUs start. So we will call detect_cache_attributes in the atomic context and since use of normal allocation can sleep, we will end up getting "sleeping in the atomic context" bug splat. In order avoid that, move the allocation to use atomic version in preparation to move the actual detection of cache attributes in the CPU hotplug path which is atomic. Cc: Ionela Voinescu Tested-by: Conor Dooley Signed-off-by: Sudeep Holla Link: https://lore.kernel.org/r/20220720-arch_topo_fixes-v3-1-43d696288e84@arm.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/cacheinfo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index 65d566ff24c4..4b5cd08c5a65 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -356,7 +356,7 @@ int detect_cache_attributes(unsigned int cpu) return -ENOENT; per_cpu_cacheinfo(cpu) = kcalloc(cache_leaves(cpu), - sizeof(struct cacheinfo), GFP_KERNEL); + sizeof(struct cacheinfo), GFP_ATOMIC); if (per_cpu_cacheinfo(cpu) == NULL) { cache_leaves(cpu) = 0; return -ENOMEM; From 0c80f9e165f8f9cca743d7b6cbdb54362da297e0 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 20 Jul 2022 13:55:39 +0100 Subject: [PATCH 56/63] ACPI: PPTT: Leave the table mapped for the runtime usage Currently, everytime an information needs to be fetched from the PPTT, the table is mapped via acpi_get_table() and unmapped after the use via acpi_put_table() which is fine. However we do this at runtime especially when the CPU is hotplugged out and plugged in back since we re-populate the cache topology and other information. However, with the support to fetch LLC information from the PPTT in the cpuhotplug path which is executed in the atomic context, it is preferred to avoid mapping and unmapping of the PPTT for every single use as the acpi_get_table() might sleep waiting for a mutex. In order to avoid the same, the table is needs to just mapped once on the boot CPU and is never unmapped allowing it to be used at runtime with out the hassle of mapping and unmapping the table. Reported-by: Guenter Roeck Cc: Rafael J. Wysocki Signed-off-by: Sudeep Holla -- Hi Rafael, Sorry to bother you again on this PPTT changes. Guenter reported an issue with lockdep enabled in -next that include my cacheinfo/arch_topology changes to utilise LLC from PPTT in the CPU hotplug path. Please ack the change once you are happy so that I can get it merged with other fixes via Greg's tree. Regards, Sudeep Acked-by: Rafael J. Wysocki Link: https://lore.kernel.org/r/20220720-arch_topo_fixes-v3-2-43d696288e84@arm.com Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/pptt.c | 102 ++++++++++++++++++++------------------------ 1 file changed, 47 insertions(+), 55 deletions(-) diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c index dd3222a15c9c..c91342dcbcd6 100644 --- a/drivers/acpi/pptt.c +++ b/drivers/acpi/pptt.c @@ -533,21 +533,37 @@ static int topology_get_acpi_cpu_tag(struct acpi_table_header *table, return -ENOENT; } + +static struct acpi_table_header *acpi_get_pptt(void) +{ + static struct acpi_table_header *pptt; + acpi_status status; + + /* + * PPTT will be used at runtime on every CPU hotplug in path, so we + * don't need to call acpi_put_table() to release the table mapping. + */ + if (!pptt) { + status = acpi_get_table(ACPI_SIG_PPTT, 0, &pptt); + if (ACPI_FAILURE(status)) + acpi_pptt_warn_missing(); + } + + return pptt; +} + static int find_acpi_cpu_topology_tag(unsigned int cpu, int level, int flag) { struct acpi_table_header *table; - acpi_status status; int retval; - status = acpi_get_table(ACPI_SIG_PPTT, 0, &table); - if (ACPI_FAILURE(status)) { - acpi_pptt_warn_missing(); + table = acpi_get_pptt(); + if (!table) return -ENOENT; - } + retval = topology_get_acpi_cpu_tag(table, cpu, level, flag); pr_debug("Topology Setup ACPI CPU %d, level %d ret = %d\n", cpu, level, retval); - acpi_put_table(table); return retval; } @@ -568,16 +584,13 @@ static int find_acpi_cpu_topology_tag(unsigned int cpu, int level, int flag) static int check_acpi_cpu_flag(unsigned int cpu, int rev, u32 flag) { struct acpi_table_header *table; - acpi_status status; u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu); struct acpi_pptt_processor *cpu_node = NULL; int ret = -ENOENT; - status = acpi_get_table(ACPI_SIG_PPTT, 0, &table); - if (ACPI_FAILURE(status)) { - acpi_pptt_warn_missing(); - return ret; - } + table = acpi_get_pptt(); + if (!table) + return -ENOENT; if (table->revision >= rev) cpu_node = acpi_find_processor_node(table, acpi_cpu_id); @@ -585,8 +598,6 @@ static int check_acpi_cpu_flag(unsigned int cpu, int rev, u32 flag) if (cpu_node) ret = (cpu_node->flags & flag) != 0; - acpi_put_table(table); - return ret; } @@ -605,18 +616,15 @@ int acpi_find_last_cache_level(unsigned int cpu) u32 acpi_cpu_id; struct acpi_table_header *table; int number_of_levels = 0; - acpi_status status; + + table = acpi_get_pptt(); + if (!table) + return -ENOENT; pr_debug("Cache Setup find last level CPU=%d\n", cpu); acpi_cpu_id = get_acpi_id_for_cpu(cpu); - status = acpi_get_table(ACPI_SIG_PPTT, 0, &table); - if (ACPI_FAILURE(status)) { - acpi_pptt_warn_missing(); - } else { - number_of_levels = acpi_find_cache_levels(table, acpi_cpu_id); - acpi_put_table(table); - } + number_of_levels = acpi_find_cache_levels(table, acpi_cpu_id); pr_debug("Cache Setup find last level level=%d\n", number_of_levels); return number_of_levels; @@ -638,20 +646,16 @@ int acpi_find_last_cache_level(unsigned int cpu) int cache_setup_acpi(unsigned int cpu) { struct acpi_table_header *table; - acpi_status status; + + table = acpi_get_pptt(); + if (!table) + return -ENOENT; pr_debug("Cache Setup ACPI CPU %d\n", cpu); - status = acpi_get_table(ACPI_SIG_PPTT, 0, &table); - if (ACPI_FAILURE(status)) { - acpi_pptt_warn_missing(); - return -ENOENT; - } - cache_setup_acpi_cpu(table, cpu); - acpi_put_table(table); - return status; + return 0; } /** @@ -730,50 +734,38 @@ int find_acpi_cpu_topology_package(unsigned int cpu) int find_acpi_cpu_topology_cluster(unsigned int cpu) { struct acpi_table_header *table; - acpi_status status; struct acpi_pptt_processor *cpu_node, *cluster_node; u32 acpi_cpu_id; int retval; int is_thread; - status = acpi_get_table(ACPI_SIG_PPTT, 0, &table); - if (ACPI_FAILURE(status)) { - acpi_pptt_warn_missing(); + table = acpi_get_pptt(); + if (!table) return -ENOENT; - } acpi_cpu_id = get_acpi_id_for_cpu(cpu); cpu_node = acpi_find_processor_node(table, acpi_cpu_id); - if (cpu_node == NULL || !cpu_node->parent) { - retval = -ENOENT; - goto put_table; - } + if (!cpu_node || !cpu_node->parent) + return -ENOENT; is_thread = cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD; cluster_node = fetch_pptt_node(table, cpu_node->parent); - if (cluster_node == NULL) { - retval = -ENOENT; - goto put_table; - } + if (!cluster_node) + return -ENOENT; + if (is_thread) { - if (!cluster_node->parent) { - retval = -ENOENT; - goto put_table; - } + if (!cluster_node->parent) + return -ENOENT; + cluster_node = fetch_pptt_node(table, cluster_node->parent); - if (cluster_node == NULL) { - retval = -ENOENT; - goto put_table; - } + if (!cluster_node) + return -ENOENT; } if (cluster_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) retval = cluster_node->acpi_processor_id; else retval = ACPI_PTR_DIFF(cluster_node, table); -put_table: - acpi_put_table(table); - return retval; } From 3fcbf1c77d089fcf0331fd8f3cbbe6c436a3edbd Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 20 Jul 2022 13:55:40 +0100 Subject: [PATCH 57/63] arch_topology: Fix cache attributes detection in the CPU hotplug path init_cpu_topology() is called only once at the boot and all the cache attributes are detected early for all the possible CPUs. However when the CPUs are hotplugged out, the cacheinfo gets removed. While the attributes are added back when the CPUs are hotplugged back in as part of CPU hotplug state machine, it ends up called quite late after the update_siblings_masks() are called in the secondary_start_kernel() resulting in wrong llc_sibling_masks. Move the call to detect_cache_attributes() inside update_siblings_masks() to ensure the cacheinfo is updated before the LLC sibling masks are updated. This will fix the incorrect LLC sibling masks generated when the CPUs are hotplugged out and hotplugged back in again. Reported-by: Ionela Voinescu Tested-by: Geert Uytterhoeven Tested-by: Ionela Voinescu Reviewed-by: Conor Dooley Reviewed-by: Ionela Voinescu Signed-off-by: Sudeep Holla Link: https://lore.kernel.org/r/20220720-arch_topo_fixes-v3-3-43d696288e84@arm.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/arch_topology.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 441e14ac33a4..0424b59b695e 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -732,7 +732,11 @@ const struct cpumask *cpu_clustergroup_mask(int cpu) void update_siblings_masks(unsigned int cpuid) { struct cpu_topology *cpu_topo, *cpuid_topo = &cpu_topology[cpuid]; - int cpu; + int cpu, ret; + + ret = detect_cache_attributes(cpuid); + if (ret) + pr_info("Early cacheinfo failed, ret = %d\n", ret); /* update core and thread sibling masks */ for_each_online_cpu(cpu) { @@ -821,7 +825,7 @@ __weak int __init parse_acpi_topology(void) #if defined(CONFIG_ARM64) || defined(CONFIG_RISCV) void __init init_cpu_topology(void) { - int ret, cpu; + int ret; reset_cpu_topology(); ret = parse_acpi_topology(); @@ -836,13 +840,5 @@ void __init init_cpu_topology(void) reset_cpu_topology(); return; } - - for_each_possible_cpu(cpu) { - ret = detect_cache_attributes(cpu); - if (ret) { - pr_info("Early cacheinfo failed, ret = %d\n", ret); - break; - } - } } #endif From 321eaf317dec3710e7a4ad3b3c363d9314c15195 Mon Sep 17 00:00:00 2001 From: Dave Airlie Date: Thu, 21 Jul 2022 14:43:52 +1000 Subject: [PATCH 58/63] docs: driver-api: firmware: add driver firmware guidelines. (v3) A recent snafu where Intel ignored upstream feedback on a firmware change, led to a late rc6 fix being required. In order to avoid this in the future we should document some expectations around linux-firmware. I was originally going to write this for drm, but it seems quite generic advice. v2: rewritten with suggestions from Thorsten Leemhuis v3: rewritten with suggestions from Mauro Acked-by: Luis Chamberlain Acked-by: Rodrigo Vivi Acked-by: Daniel Vetter Acked-by: Harry Wentland Signed-off-by: Dave Airlie Link: https://lore.kernel.org/r/20220721044352.3110507-1-airlied@gmail.com Signed-off-by: Greg Kroah-Hartman --- Documentation/driver-api/firmware/core.rst | 1 + .../firmware/firmware-usage-guidelines.rst | 44 +++++++++++++++++++ 2 files changed, 45 insertions(+) create mode 100644 Documentation/driver-api/firmware/firmware-usage-guidelines.rst diff --git a/Documentation/driver-api/firmware/core.rst b/Documentation/driver-api/firmware/core.rst index 1d1688cbc078..803cd574bbd7 100644 --- a/Documentation/driver-api/firmware/core.rst +++ b/Documentation/driver-api/firmware/core.rst @@ -13,4 +13,5 @@ documents these features. direct-fs-lookup fallback-mechanisms lookup-order + firmware-usage-guidelines diff --git a/Documentation/driver-api/firmware/firmware-usage-guidelines.rst b/Documentation/driver-api/firmware/firmware-usage-guidelines.rst new file mode 100644 index 000000000000..fdcfce42c6d2 --- /dev/null +++ b/Documentation/driver-api/firmware/firmware-usage-guidelines.rst @@ -0,0 +1,44 @@ +=================== +Firmware Guidelines +=================== + +Users switching to a newer kernel should *not* have to install newer +firmware files to keep their hardware working. At the same time updated +firmware files must not cause any regressions for users of older kernel +releases. + +Drivers that use firmware from linux-firmware should follow the rules in +this guide. (Where there is limited control of the firmware, +i.e. company doesn't support Linux, firmwares sourced from misc places, +then of course these rules will not apply strictly.) + +* Firmware files shall be designed in a way that it allows checking for + firmware ABI version changes. It is recommended that firmware files be + versioned with at least a major/minor version. It is suggested that + the firmware files in linux-firmware be named with some device + specific name, and just the major version. The firmware version should + be stored in the firmware header, or as an exception, as part of the + firmware file name, in order to let the driver detact any non-ABI + fixes/changes. The firmware files in linux-firmware should be + overwritten with the newest compatible major version. Newer major + version firmware shall remain compatible with all kernels that load + that major number. + +* If the kernel support for the hardware is normally inactive, or the + hardware isn't available for public consumption, this can + be ignored, until the first kernel release that enables that hardware. + This means no major version bumps without the kernel retaining + backwards compatibility for the older major versions. Minor version + bumps should not introduce new features that newer kernels depend on + non-optionally. + +* If a security fix needs lockstep firmware and kernel fixes in order to + be successful, then all supported major versions in the linux-firmware + repo that are required by currently supported stable/LTS kernels, + should be updated with the security fix. The kernel patches should + detect if the firmware is new enough to declare if the security issue + is fixed. All communications around security fixes should point at + both the firmware and kernel fixes. If a security fix requires + deprecating old major versions, then this should only be done as a + last option, and be stated clearly in all communications. + From 3fe4076482789c2c4a772f6676b246a0d96c99c4 Mon Sep 17 00:00:00 2001 From: Slark Xiao Date: Fri, 22 Jul 2022 18:05:18 +0800 Subject: [PATCH 59/63] kernfs: Fix typo 'the the' in comment Replace 'the the' with 'the' in the comment. Signed-off-by: Slark Xiao Link: https://lore.kernel.org/r/20220722100518.79741-1-slark_xiao@163.com Signed-off-by: Greg Kroah-Hartman --- fs/kernfs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index baff4b1d40c7..b3ec34386b43 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -307,7 +307,7 @@ static ssize_t kernfs_fop_read_iter(struct kiocb *iocb, struct iov_iter *iter) * There is no easy way for us to know if userspace is only doing a partial * write, so we don't support them. We expect the entire buffer to come on * the first write. Hint: if you're writing a value, first read the file, - * modify only the the value you're changing, then write entire buffer + * modify only the value you're changing, then write entire buffer * back. */ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) From b6c694740ea21620c2b86ad37be2c0dc7051a48c Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 14 Jul 2022 18:59:59 -0700 Subject: [PATCH 60/63] kobject: fix Kconfig.debug "its" grammar Use the possessive "its" instead of the contraction "it's" where appropriate. Cc: Russell King Cc: Greg Kroah-Hartman Signed-off-by: Randy Dunlap Link: https://lore.kernel.org/r/20220715015959.12657-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman --- lib/Kconfig.debug | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 2e24db4bff19..abb1b287b8f1 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1560,7 +1560,7 @@ config DEBUG_KOBJECT_RELEASE help kobjects are reference counted objects. This means that their last reference count put is not predictable, and the kobject can - live on past the point at which a driver decides to drop it's + live on past the point at which a driver decides to drop its initial reference to the kobject gained on allocation. An example of this would be a struct device which has just been unregistered. From b18ee4a44e3ff21936d35a9b215cfd6cd5f3af9a Mon Sep 17 00:00:00 2001 From: Slark Xiao Date: Thu, 21 Jul 2022 10:06:23 +0800 Subject: [PATCH 61/63] sysfs docs: ABI: Fix typo in comment Fix typo in the comment Signed-off-by: Slark Xiao Link: https://lore.kernel.org/r/20220721020623.20974-1-slark_xiao@163.com Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/stable/sysfs-module | 2 +- Documentation/ABI/testing/sysfs-class-rtrs-client | 2 +- Documentation/ABI/testing/sysfs-class-rtrs-server | 2 +- Documentation/ABI/testing/sysfs-devices-platform-ACPI-TAD | 2 +- Documentation/ABI/testing/sysfs-devices-power | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-module b/Documentation/ABI/stable/sysfs-module index 560b4a3278df..41b1f16e8795 100644 --- a/Documentation/ABI/stable/sysfs-module +++ b/Documentation/ABI/stable/sysfs-module @@ -38,7 +38,7 @@ What: /sys/module//srcversion Date: Jun 2005 Description: If the module source has MODULE_VERSION, this file will contain - the checksum of the the source code. + the checksum of the source code. What: /sys/module//version Date: Jun 2005 diff --git a/Documentation/ABI/testing/sysfs-class-rtrs-client b/Documentation/ABI/testing/sysfs-class-rtrs-client index 49a4157c7bf1..fecc59d1b96f 100644 --- a/Documentation/ABI/testing/sysfs-class-rtrs-client +++ b/Documentation/ABI/testing/sysfs-class-rtrs-client @@ -78,7 +78,7 @@ What: /sys/class/rtrs-client//paths//hca_name Date: Feb 2020 KernelVersion: 5.7 Contact: Jack Wang Danil Kipnis -Description: RO, Contains the the name of HCA the connection established on. +Description: RO, Contains the name of HCA the connection established on. What: /sys/class/rtrs-client//paths//hca_port Date: Feb 2020 diff --git a/Documentation/ABI/testing/sysfs-class-rtrs-server b/Documentation/ABI/testing/sysfs-class-rtrs-server index 3b6d5b067df0..b08601d80409 100644 --- a/Documentation/ABI/testing/sysfs-class-rtrs-server +++ b/Documentation/ABI/testing/sysfs-class-rtrs-server @@ -24,7 +24,7 @@ What: /sys/class/rtrs-server//paths//hca_name Date: Feb 2020 KernelVersion: 5.7 Contact: Jack Wang Danil Kipnis -Description: RO, Contains the the name of HCA the connection established on. +Description: RO, Contains the name of HCA the connection established on. What: /sys/class/rtrs-server//paths//hca_port Date: Feb 2020 diff --git a/Documentation/ABI/testing/sysfs-devices-platform-ACPI-TAD b/Documentation/ABI/testing/sysfs-devices-platform-ACPI-TAD index f7b360a61b21..bc44bc903bc8 100644 --- a/Documentation/ABI/testing/sysfs-devices-platform-ACPI-TAD +++ b/Documentation/ABI/testing/sysfs-devices-platform-ACPI-TAD @@ -74,7 +74,7 @@ Description: Reads also cause the AC alarm timer status to be reset. - Another way to reset the the status of the AC alarm timer is to + Another way to reset the status of the AC alarm timer is to write (the number) 0 to this file. If the status return value indicates that the timer has expired, diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power index 1b2a2d41ff80..54195530e97a 100644 --- a/Documentation/ABI/testing/sysfs-devices-power +++ b/Documentation/ABI/testing/sysfs-devices-power @@ -303,5 +303,5 @@ Date: Apr 2010 Contact: Dominik Brodowski Description: Reports the runtime PM children usage count of a device, or - 0 if the the children will be ignored. + 0 if the children will be ignored. From f2d57765b79857264fb0ddc52679d661b60ecc21 Mon Sep 17 00:00:00 2001 From: "Fabio M. De Francesco" Date: Fri, 15 Jul 2022 01:50:30 +0200 Subject: [PATCH 62/63] firmware_loader: Replace kmap() with kmap_local_page() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The use of kmap() is being deprecated in favor of kmap_local_page(). Two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) kmap() also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. kmap_local_page() is preferred over kmap() and kmap_atomic(). Where it cannot mechanically replace the latters, code refactor should be considered (special care must be taken if kernel virtual addresses are aliases in different contexts). With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). Call kmap_local_page() in firmware_loader wherever kmap() is currently used. In firmware_rw() use the helpers copy_{from,to}_page() instead of open coding the local mappings + memcpy(). Successfully tested with "firmware" selftests on a QEMU/KVM 32-bits VM with 4GB RAM, booting a kernel with HIGHMEM64GB enabled. Cc: Greg Kroah-Hartman Cc: Luis Chamberlain Suggested-by: Ira Weiny Reviewed-by: Takashi Iwai Signed-off-by: Fabio M. De Francesco Link: https://lore.kernel.org/r/20220714235030.12732-1-fmdefrancesco@gmail.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/firmware_loader/main.c | 4 ++-- drivers/base/firmware_loader/sysfs.c | 10 ++++------ 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c index ac3f34e80194..7c3590fd97c2 100644 --- a/drivers/base/firmware_loader/main.c +++ b/drivers/base/firmware_loader/main.c @@ -435,11 +435,11 @@ static int fw_decompress_xz_pages(struct device *dev, struct fw_priv *fw_priv, /* decompress onto the new allocated page */ page = fw_priv->pages[fw_priv->nr_pages - 1]; - xz_buf.out = kmap(page); + xz_buf.out = kmap_local_page(page); xz_buf.out_pos = 0; xz_buf.out_size = PAGE_SIZE; xz_ret = xz_dec_run(xz_dec, &xz_buf); - kunmap(page); + kunmap_local(xz_buf.out); fw_priv->size += xz_buf.out_pos; /* partial decompression means either end or error */ if (xz_buf.out_pos != PAGE_SIZE) diff --git a/drivers/base/firmware_loader/sysfs.c b/drivers/base/firmware_loader/sysfs.c index 5b0b85b70b6f..77bad32c481a 100644 --- a/drivers/base/firmware_loader/sysfs.c +++ b/drivers/base/firmware_loader/sysfs.c @@ -242,19 +242,17 @@ static void firmware_rw(struct fw_priv *fw_priv, char *buffer, loff_t offset, size_t count, bool read) { while (count) { - void *page_data; int page_nr = offset >> PAGE_SHIFT; int page_ofs = offset & (PAGE_SIZE - 1); int page_cnt = min_t(size_t, PAGE_SIZE - page_ofs, count); - page_data = kmap(fw_priv->pages[page_nr]); - if (read) - memcpy(buffer, page_data + page_ofs, page_cnt); + memcpy_from_page(buffer, fw_priv->pages[page_nr], + page_ofs, page_cnt); else - memcpy(page_data + page_ofs, buffer, page_cnt); + memcpy_to_page(fw_priv->pages[page_nr], page_ofs, + buffer, page_cnt); - kunmap(fw_priv->pages[page_nr]); buffer += page_cnt; offset += page_cnt; count -= page_cnt; From 273aaa24369cb8d0f246bb16f7122b91a1ef5188 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 29 Jul 2022 15:45:17 +0200 Subject: [PATCH 63/63] docs: embargoed-hardware-issues: fix invalid AMD contact email The current AMD contact info email address is incorrect, so fix it up to use the correct one. Cc: Jonathan Corbet Cc: Alex Shi Cc: Yanteng Si Cc: Hu Haowen Cc: Tom Lendacky Acked-by: Tom Lendacky Link: https://lore.kernel.org/r/20220729134517.2284700-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman --- Documentation/process/embargoed-hardware-issues.rst | 2 +- .../translations/zh_CN/process/embargoed-hardware-issues.rst | 2 +- .../translations/zh_TW/process/embargoed-hardware-issues.rst | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/process/embargoed-hardware-issues.rst b/Documentation/process/embargoed-hardware-issues.rst index 48bd249d96b9..b6b4481e2474 100644 --- a/Documentation/process/embargoed-hardware-issues.rst +++ b/Documentation/process/embargoed-hardware-issues.rst @@ -244,7 +244,7 @@ disclosure of a particular issue, unless requested by a response team or by an involved disclosed party. The current ambassadors list: ============= ======================================================== - AMD Tom Lendacky + AMD Tom Lendacky Ampere Darren Hart ARM Catalin Marinas IBM Power Anton Blanchard diff --git a/Documentation/translations/zh_CN/process/embargoed-hardware-issues.rst b/Documentation/translations/zh_CN/process/embargoed-hardware-issues.rst index 88273ebe7823..cf5f1fca3d92 100644 --- a/Documentation/translations/zh_CN/process/embargoed-hardware-issues.rst +++ b/Documentation/translations/zh_CN/process/embargoed-hardware-issues.rst @@ -174,7 +174,7 @@ CVE分配 ============= ======================================================== ARM - AMD Tom Lendacky + AMD Tom Lendacky IBM Intel Tony Luck Qualcomm Trilok Soni diff --git a/Documentation/translations/zh_TW/process/embargoed-hardware-issues.rst b/Documentation/translations/zh_TW/process/embargoed-hardware-issues.rst index 6c76fc96131a..fbde3e26eda5 100644 --- a/Documentation/translations/zh_TW/process/embargoed-hardware-issues.rst +++ b/Documentation/translations/zh_TW/process/embargoed-hardware-issues.rst @@ -177,7 +177,7 @@ CVE分配 ============= ======================================================== ARM - AMD Tom Lendacky + AMD Tom Lendacky IBM Intel Tony Luck Qualcomm Trilok Soni