device.h has everything and the kitchen sink when it comes to struct
device things, so split out the struct bus things things to a separate
.h file to make things easier to maintain and manage over time.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20191209193303.1694546-5-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a helper to match the device name for device lookup. Also
reuse this generic exported helper for the existing bus_find_device_by_name().
and add similar variants for driver/class.
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: linux-leds@vger.kernel.org
Cc: linux-rtc@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: linux-wpan@vger.kernel.org
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Stefan Schmidt <stefan@datenfreihafen.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20190723221838.12024-2-suzuki.poulose@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is an arbitrary difference between the prototypes of
bus_find_device() and class_find_device() preventing their callers
from passing the same pair of data and match() arguments to both of
them, which is the const qualifier used in the prototype of
class_find_device(). If that qualifier is also used in the
bus_find_device() prototype, it will be possible to pass the same
match() callback function to both bus_find_device() and
class_find_device(), which will allow some optimizations to be made in
order to avoid code duplication going forward. Also with that, constify
the "data" parameter as it is passed as a const to the match function.
For this reason, change the prototype of bus_find_device() to match
the prototype of class_find_device() and adjust its callers to use the
const qualifier in accordance with the new prototype of it.
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Andreas Noever <andreas.noever@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Kershner <david.kershner@unisys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Jamet <michael.jamet@intel.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Sebastian Ott <sebott@linux.ibm.com>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Yehezkel Bernat <YehezkelShB@gmail.com>
Cc: rafael@kernel.org
Acked-by: Corey Minyard <minyard@acm.org>
Acked-by: David Kershner <david.kershner@unisys.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de> # for the I2C parts
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Probe devices asynchronously instead of the driver. This results in us
seeing the same behavior if the device is registered before the driver or
after. This way we can avoid serializing the initialization should the
driver not be loaded until after the devices have already been added.
The motivation behind this is that if we have a set of devices that
take a significant amount of time to load we can greatly reduce the time to
load by processing them in parallel instead of one at a time. In addition,
each device can exist on a different node so placing a single thread on one
CPU to initialize all of the devices for a given driver can result in poor
performance on a system with multiple nodes.
This approach can reduce the time needed to scan SCSI LUNs significantly.
The only way to realize that speedup is by enabling more concurrency which
is what is achieved with this patch.
To achieve this it was necessary to add a new member "async_driver" to the
device_private structure to store the driver pointer while we wait on the
deferred probe call.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Try to consolidate all of the locking and unlocking of both the parent and
device when attaching or removing a driver from a given device.
To do that I first consolidated the lock pattern into two functions
__device_driver_lock and __device_driver_unlock. After doing that I then
created functions specific to attaching and detaching the driver while
acquiring these locks. By doing this I was able to reduce the number of
spots where we touch need_parent_lock from 12 down to 4.
This patch should produce no functional changes, it is meant to be a code
clean-up/consolidation only.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are trying to get rid of BUS_ATTR() so drop the last user of it from
the tree. We had to "open code" it in order to prevent a function name
conflict due to the use of DEVICE_ATTR_WO() earlier in the file :(
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are trying to get rid of BUS_ATTR() and the usage of that in bus.c
can be trivially converted to use BUS_ATTR_WO and RW, so use those
macros instead.
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is the much more correct fix for my earlier attempt at:
https://lkml.org/lkml/2018/12/10/118
Short recap:
- There's not actually a locking issue, it's just lockdep being a bit
too eager to complain about a possible deadlock.
- Contrary to what I claimed the real problem is recursion on
kn->count. Greg pointed me at sysfs_break_active_protection(), used
by the scsi subsystem to allow a sysfs file to unbind itself. That
would be a real deadlock, which isn't what's happening here. Also,
breaking the active protection means we'd need to manually handle
all the lifetime fun.
- With Rafael we discussed the task_work approach, which kinda works,
but has two downsides: It's a functional change for a lockdep
annotation issue, and it won't work for the bind file (which needs
to get the errno from the driver load function back to userspace).
- Greg also asked why this never showed up: To hit this you need to
unregister a 2nd driver from the unload code of your first driver. I
guess only gpus do that. The bug has always been there, but only
with a recent patch series did we add more locks so that lockdep
built a chain from unbinding the snd-hda driver to the
acpi_video_unregister call.
Full lockdep splat:
[12301.898799] ============================================
[12301.898805] WARNING: possible recursive locking detected
[12301.898811] 4.20.0-rc7+ #84 Not tainted
[12301.898815] --------------------------------------------
[12301.898821] bash/5297 is trying to acquire lock:
[12301.898826] 00000000f61c6093 (kn->count#39){++++}, at: kernfs_remove_by_name_ns+0x3b/0x80
[12301.898841] but task is already holding lock:
[12301.898847] 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
[12301.898856] other info that might help us debug this:
[12301.898862] Possible unsafe locking scenario:
[12301.898867] CPU0
[12301.898870] ----
[12301.898874] lock(kn->count#39);
[12301.898879] lock(kn->count#39);
[12301.898883] *** DEADLOCK ***
[12301.898891] May be due to missing lock nesting notation
[12301.898899] 5 locks held by bash/5297:
[12301.898903] #0: 00000000cd800e54 (sb_writers#4){.+.+}, at: vfs_write+0x17f/0x1b0
[12301.898915] #1: 000000000465e7c2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd3/0x190
[12301.898925] #2: 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
[12301.898936] #3: 00000000414ef7ac (&dev->mutex){....}, at: device_release_driver_internal+0x34/0x240
[12301.898950] #4: 000000003218fbdf (register_count_mutex){+.+.}, at: acpi_video_unregister+0xe/0x40
[12301.898960] stack backtrace:
[12301.898968] CPU: 1 PID: 5297 Comm: bash Not tainted 4.20.0-rc7+ #84
[12301.898974] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011
[12301.898982] Call Trace:
[12301.898989] dump_stack+0x67/0x9b
[12301.898997] __lock_acquire+0x6ad/0x1410
[12301.899003] ? kernfs_remove_by_name_ns+0x3b/0x80
[12301.899010] ? find_held_lock+0x2d/0x90
[12301.899017] ? mutex_spin_on_owner+0xe4/0x150
[12301.899023] ? find_held_lock+0x2d/0x90
[12301.899030] ? lock_acquire+0x90/0x180
[12301.899036] lock_acquire+0x90/0x180
[12301.899042] ? kernfs_remove_by_name_ns+0x3b/0x80
[12301.899049] __kernfs_remove+0x296/0x310
[12301.899055] ? kernfs_remove_by_name_ns+0x3b/0x80
[12301.899060] ? kernfs_name_hash+0xd/0x80
[12301.899066] ? kernfs_find_ns+0x6c/0x100
[12301.899073] kernfs_remove_by_name_ns+0x3b/0x80
[12301.899080] bus_remove_driver+0x92/0xa0
[12301.899085] acpi_video_unregister+0x24/0x40
[12301.899127] i915_driver_unload+0x42/0x130 [i915]
[12301.899160] i915_pci_remove+0x19/0x30 [i915]
[12301.899169] pci_device_remove+0x36/0xb0
[12301.899176] device_release_driver_internal+0x185/0x240
[12301.899183] unbind_store+0xaf/0x180
[12301.899189] kernfs_fop_write+0x104/0x190
[12301.899195] __vfs_write+0x31/0x180
[12301.899203] ? rcu_read_lock_sched_held+0x6f/0x80
[12301.899209] ? rcu_sync_lockdep_assert+0x29/0x50
[12301.899216] ? __sb_start_write+0x13c/0x1a0
[12301.899221] ? vfs_write+0x17f/0x1b0
[12301.899227] vfs_write+0xb9/0x1b0
[12301.899233] ksys_write+0x50/0xc0
[12301.899239] do_syscall_64+0x4b/0x180
[12301.899247] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[12301.899253] RIP: 0033:0x7f452ac7f7a4
[12301.899259] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 aa f0 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
[12301.899273] RSP: 002b:00007ffceafa6918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[12301.899282] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f452ac7f7a4
[12301.899288] RDX: 000000000000000d RSI: 00005612a1abf7c0 RDI: 0000000000000001
[12301.899295] RBP: 00005612a1abf7c0 R08: 000000000000000a R09: 00005612a1c46730
[12301.899301] R10: 000000000000000a R11: 0000000000000246 R12: 000000000000000d
[12301.899308] R13: 0000000000000001 R14: 00007f452af4a740 R15: 000000000000000d
Looking around I've noticed that usb and i2c already handle similar
recursion problems, where a sysfs file can unbind the same type of
sysfs somewhere else in the hierarchy. Relevant commits are:
commit 356c05d58a
Author: Alan Stern <stern@rowland.harvard.edu>
Date: Mon May 14 13:30:03 2012 -0400
sysfs: get rid of some lockdep false positives
commit e9b526fe70
Author: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Date: Fri May 17 14:56:35 2013 +0200
i2c: suppress lockdep warning on delete_device
Implement the same trick for driver bind/unbind.
v2: Put the macro into bus.c (Greg).
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Arend van Spriel <aspriel@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Propagate error code back to userspace if writing the /sys/.../uevent
file fails. Before, the write operation always returned with success,
even if we failed to recognize the input string or if we failed to
generate the uevent itself.
With the error codes properly propagated back to userspace, we are
able to react in userspace accordingly by not assuming and awaiting
a uevent that is not delivered.
Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SoC have internal I/O buses that can't be proved for devices. The
devices on the buses can be accessed directly without additinal
configuration required. This type of bus is represented as
"simple-bus". In some platforms, we name "soc" with "simple-bus"
attribute and many devices are hooked under it described in DT
(device tree).
In commit bf74ad5bc4 ("Hold the device's parent's lock during
probe and remove") to solve USB subsystem lock sequence since
USB device's characteristic. Thus "soc" needs to be locked
whenever a device and driver's probing happen under "soc" bus.
During this period, an async driver tries to probe a device which
is under the "soc" bus would be blocked until previous driver
finish the probing and release "soc" lock. And the next probing
under the "soc" bus need to wait for async finish. Because of
that, driver's async probe for init time improvement will be
shadowed.
Since many devices don't have USB devices' characteristic, they
actually don't need parent's lock. Thus, we introduce a lock flag
in bus_type struct and driver core would lock the parent lock base
on the flag. For USB, we set this flag in usb_bus_type to keep
original lock behavior in driver core.
Async probe could have more benefit after this patch.
Signed-off-by: Martin Liu <liumartin@google.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When error happens, these interators return the error, no interation should
be continued, so make the change for getting out of while immediately.
Signed-off-by: Gimcuan Hui <gimcuan@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that the SPDX tag is in all driver core files, that identifies the
license in a specific and legally-defined manner. So the extra GPL text
wording can be removed as it is no longer needed at all.
This is done on a quest to remove the 700+ different ways that files in
the kernel describe the GPL license text. And there's unneeded stuff
like the address (sometimes incorrect) for the FSF which is never
needed.
No copyright headers or other non-license-description text was removed.
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the driver core files files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The .release function of driver_ktype is 'driver_release()'.
This function frees the container_of this kobject.
So, this memory must not be freed explicitly in the error handling path of
'bus_add_driver()'. Otherwise a double free will occur.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that all in-kernel users of bus_type.dev_attrs have been converted
to use dev_groups instead, the dev_attrs field, and logic surrounding
it, can be removed.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch makes it possible to pass additional arguments in addition
to uevent action name when writing /sys/.../uevent attribute. These
additional arguments are then inserted into generated synthetic uevent
as additional environment variables.
Before, we were not able to pass any additional uevent environment
variables for synthetic uevents. This made it hard to identify such uevents
properly in userspace to make proper distinction between genuine uevents
originating from kernel and synthetic uevents triggered from userspace.
Also, it was not possible to pass any additional information which would
make it possible to optimize and change the way the synthetic uevents are
processed back in userspace based on the originating environment of the
triggering action in userspace. With the extra additional variables, we are
able to pass through this extra information needed and also it makes it
possible to synchronize with such synthetic uevents as they can be clearly
identified back in userspace.
The format for writing the uevent attribute is following:
ACTION [UUID [KEY=VALUE ...]
There's no change in how "ACTION" is recognized - it stays the same
("add", "change", "remove"). The "ACTION" is the only argument required
to generate synthetic uevent, the rest of arguments, that this patch
adds support for, are optional.
The "UUID" is considered as transaction identifier so it's possible to
use the same UUID value for one or more synthetic uevents in which case
we logically group these uevents together for any userspace listeners.
The "UUID" is expected to be in "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
format where "x" is a hex digit. The value appears in uevent as
"SYNTH_UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" environment variable.
The "KEY=VALUE" pairs can contain alphanumeric characters only. It's
possible to define zero or more more pairs - each pair is then delimited
by a space character " ". Each pair appears in synthetic uevents as
"SYNTH_ARG_KEY=VALUE" environment variable. That means the KEY name gains
"SYNTH_ARG_" prefix to avoid possible collisions with existing variables.
To pass the "KEY=VALUE" pairs, it's also required to pass in the "UUID"
part for the synthetic uevent first.
If "UUID" is not passed in, the generated synthetic uevent gains
"SYNTH_UUID=0" environment variable automatically so it's possible to
identify this situation in userspace when reading generated uevent and so
we can still make a difference between genuine and synthetic uevents.
Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use to_subsys_private() and to_device_private_bus() instead of open-coding.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use list_for_each_entry*() instead of list_for_each*() to simplify
the code.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some devices take a long time when initializing, and not all drivers are
suited to initialize their devices when they are open. For example,
input drivers need to interrogate their devices in order to publish
device's capabilities before userspace will open them. When such drivers
are compiled into kernel they may stall entire kernel initialization.
This change allows drivers request for their probe functions to be
called asynchronously during driver and device registration (manual
binding is still synchronous). Because async_schedule is used to perform
asynchronous calls module loading will still wait for the probing to
complete.
Note that the end goal is to make the probing asynchronous by default,
so annotating drivers with PROBE_PREFER_ASYNCHRONOUS is a temporary
measure that allows us to speed up boot process while we validating and
fixing the rest of the drivers and preparing userspace.
This change is based on earlier patch by "Luis R. Rodriguez"
<mcgrof@suse.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not necessary to call device_remove_groups() when device_add_groups()
fails.
The group added by device_add_groups() should be removed if sysfs_create_link()
fails.
Fixes: fa6fdb33b4 ("driver core: bus_type: add dev_groups")
Signed-off-by: Junjie Mao <junjie_mao@yeah.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fix spelling typo in Documentation/DocBook.
It is because .html and .xml files are generated by make htmldocs,
I have to fix a typo within the source files.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Avoid that bus_unregister() triggers a use-after-free with
CONFIG_DEBUG_KOBJECT_RELEASE=y. This patch avoids that the
following sequence triggers a kernel crash with memory poisoning
enabled:
* bus_register()
* driver_register()
* driver_unregister()
* bus_unregister()
The above sequence causes the bus private data to be freed from
inside the bus_unregister() call although it is not guaranteed in
that function that the reference count on the bus private data has
dropped to zero. As an example, with CONFIG_DEBUG_KOBJECT_RELEASE=y
the ${bus}/drivers kobject is still holding a reference on
bus->p->subsys.kobj via its parent pointer at the time the bus
private data is freed. Fix this by deferring freeing the bus private
data until the last kobject_put() call on bus->p->subsys.kobj.
The kernel oops triggered by the above sequence and with memory
poisoning enabled and that is fixed by this patch is as follows:
general protection fault: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 2711 Comm: kworker/3:32 Tainted: G W O 3.13.0-rc4-debug+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: events kobject_delayed_cleanup
task: ffff880037f866d0 ti: ffff88003b638000 task.ti: ffff88003b638000
Call Trace:
[<ffffffff81263105>] ? kobject_get_path+0x25/0x100
[<ffffffff81264354>] kobject_uevent_env+0x134/0x600
[<ffffffff8126482b>] kobject_uevent+0xb/0x10
[<ffffffff81262fa2>] kobject_delayed_cleanup+0xc2/0x1b0
[<ffffffff8106c047>] process_one_work+0x217/0x700
[<ffffffff8106bfdb>] ? process_one_work+0x1ab/0x700
[<ffffffff8106c64b>] worker_thread+0x11b/0x3a0
[<ffffffff8106c530>] ? process_one_work+0x700/0x700
[<ffffffff81074b70>] kthread+0xf0/0x110
[<ffffffff81074a80>] ? insert_kthread_work+0x80/0x80
[<ffffffff815673bc>] ret_from_fork+0x7c/0xb0
[<ffffffff81074a80>] ? insert_kthread_work+0x80/0x80
Code: 89 f8 48 89 e5 f6 82 c0 27 63 81 20 74 15 0f 1f 44 00 00 48 83 c0 01 0f b6 10 f6 82 c0 27 63 81 20 75 f0 5d c3 66 0f 1f 44 00 00 <80> 3f 00 55 48 89 e5 74 15 48 89 f8 0f 1f 40 00 48 83 c0 01 80
RIP [<ffffffff81267ed0>] strlen+0x0/0x30
RSP <ffff88003b639c70>
---[ end trace 210f883ef80376aa ]---
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that all in-kernel users of bus_type.drv_attrs have been converted
to use drv_groups instead, the drv_attrs field, and logic surrounding
it, can be removed.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that all in-kernel users of bus_type.bus_attrs have been converted
to use bus_groups instead, the bus_attrs field, and logic surrounding
it, can be removed.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is needed to fix the build on sh systems.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are two bus attributes that can better be defined using
DRIVER_ATTR_WO(), so convert them to the new macro, making it easier to
audit attribute permissions.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gotta love a macro that doesn't reduce the typing you have to do.
Also, only the driver core, and one network driver uses this. The
driver core functions will be going away soon, and I'll convert the
network driver soon to not need this as well, so delete it for now
before anyone else gets some bright ideas and wants to use it.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These functions are being open-coded in 3 different places in the driver
core, and other driver subsystems will want to start doing this as well,
so move it to the sysfs core to keep it all in one place, where we know
it is written properly.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
attribute groups are much more flexible than just a list of attributes,
due to their support for visibility of the attributes, and binary
attributes. Add bus_groups to struct bus_type which should be used
instead of bus_attrs.
bus_attrs will be removed from the structure soon.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
attribute groups are much more flexible than just a list of attributes,
due to their support for visibility of the attributes, and binary
attributes. Add drv_groups to struct bus_type which should be used
instead of drv_attrs.
drv_attrs will be removed from the structure soon.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
attribute groups are much more flexible than just a list of attributes,
due to their support for visibility of the attributes, and binary
attributes. Add dev_groups to struct bus_type which should be used
instead of dev_attrs.
dev_attrs will be removed from the structure soon.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Modules want to call this function, so it needs to be exported.
Reported-by: Daniel Mack <zonque@gmail.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull workqueue updates from Tejun Heo:
"A lot of activities on workqueue side this time. The changes achieve
the followings.
- WQ_UNBOUND workqueues - the workqueues which are per-cpu - are
updated to be able to interface with multiple backend worker pools.
This involved a lot of churning but the end result seems actually
neater as unbound workqueues are now a lot closer to per-cpu ones.
- The ability to interface with multiple backend worker pools are
used to implement unbound workqueues with custom attributes.
Currently the supported attributes are the nice level and CPU
affinity. It may be expanded to include cgroup association in
future. The attributes can be specified either by calling
apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if
the workqueue in question is exported through sysfs.
The backend worker pools are keyed by the actual attributes and
shared by any workqueues which share the same attributes. When
attributes of a workqueue are changed, the workqueue binds to the
worker pool with the specified attributes while leaving the work
items which are already executing in its previous worker pools
alone.
This allows converting custom worker pool implementations which
want worker attribute tuning to use workqueues. The writeback pool
is already converted in block tree and there are a couple others
are likely to follow including btrfs io workers.
- WQ_UNBOUND's ability to bind to multiple worker pools is also used
to make it NUMA-aware. Because there's no association between work
item issuer and the specific worker assigned to execute it, before
this change, using unbound workqueue led to unnecessary cross-node
bouncing and it couldn't be helped by autonuma as it requires tasks
to have implicit node affinity and workers are assigned randomly.
After these changes, an unbound workqueue now binds to multiple
NUMA-affine worker pools so that queued work items are executed in
the same node. This is turned on by default but can be disabled
system-wide or for individual workqueues.
Crypto was requesting NUMA affinity as encrypting data across
different nodes can contribute noticeable overhead and doing it
per-cpu was too limiting for certain cases and IO throughput could
be bottlenecked by one CPU being fully occupied while others have
idle cycles.
While the new features required a lot of changes including
restructuring locking, it didn't complicate the execution paths much.
The unbound workqueue handling is now closer to per-cpu ones and the
new features are implemented by simply associating a workqueue with
different sets of backend worker pools without changing queue,
execution or flush paths.
As such, even though the amount of change is very high, I feel
relatively safe in that it isn't likely to cause subtle issues with
basic correctness of work item execution and handling. If something
is wrong, it's likely to show up as being associated with worker pools
with the wrong attributes or OOPS while workqueue attributes are being
changed or during CPU hotplug.
While this creates more backend worker pools, it doesn't add too many
more workers unless, of course, there are many workqueues with unique
combinations of attributes. Assuming everything else is the same,
NUMA awareness costs an extra worker pool per NUMA node with online
CPUs.
There are also a couple things which are being routed outside the
workqueue tree.
- block tree pulled in workqueue for-3.10 so that writeback worker
pool can be converted to unbound workqueue with sysfs control
exposed. This simplifies the code, makes writeback workers
NUMA-aware and allows tuning nice level and CPU affinity via sysfs.
- The conversion to workqueue means that there's no 1:1 association
between a specific worker, which makes writeback folks unhappy as
they want to be able to tell which filesystem caused a problem from
backtrace on systems with many filesystems mounted. This is
resolved by allowing work items to set debug info string which is
printed when the task is dumped. As this change involves unifying
implementations of dump_stack() and friends in arch codes, it's
being routed through Andrew's -mm tree."
* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits)
workqueue: use kmem_cache_free() instead of kfree()
workqueue: avoid false negative WARN_ON() in destroy_workqueue()
workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
workqueue: implement NUMA affinity for unbound workqueues
workqueue: introduce put_pwq_unlocked()
workqueue: introduce numa_pwq_tbl_install()
workqueue: use NUMA-aware allocation for pool_workqueues
workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq()
workqueue: map an unbound workqueues to multiple per-node pool_workqueues
workqueue: move hot fields of workqueue_struct to the end
workqueue: make workqueue->name[] fixed len
workqueue: add workqueue->unbound_attrs
workqueue: determine NUMA node of workers accourding to the allowed cpumask
workqueue: drop 'H' from kworker names of unbound worker pools
workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]
workqueue: move pwq_pool_locking outside of get/put_unbound_pool()
workqueue: fix memory leak in apply_workqueue_attrs()
workqueue: fix unbound workqueue attrs hashing / comparison
workqueue: fix race condition in unbound workqueue free path
workqueue: remove pwq_lock which is no longer used
...
Kay tells me the most appropriate place to expose workqueues to
userland would be /sys/devices/virtual/workqueues/WQ_NAME which is
symlinked to /sys/bus/workqueue/devices/WQ_NAME and that we're lacking
a way to do that outside of driver core as virtual_device_parent()
isn't exported and there's no inteface to conveniently create a
virtual subsystem.
This patch implements subsys_virtual_register() by factoring out
subsys_register() from subsys_system_register() and using it with
virtual_device_parent() as the origin directory. It's identical to
subsys_system_register() other than the origin directory but we aren't
gonna restrict the device names which should be used under it.
This will be used to expose workqueue attributes to userland.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
A bus_type has a list of devices (klist_devices), but the list and the
subsys_private structure that contains it are not initialized until the
bus_type is registered with bus_register().
The panic/reboot path has fixups that look up devices in pci_bus_type. If
we panic before registering pci_bus_type, the bus_type exists but the list
does not, so mach_reboot_fixups() trips over a null pointer and panics
again:
mach_reboot_fixups
pci_get_device
..
bus_find_device(&pci_bus_type, ...)
bus->p is NULL
Joonsoo reported a problem when panicking before PCI was initialized.
I think this patch should be sufficient to replace the patch he posted
here: https://lkml.org/lkml/2012/12/28/75 ("[PATCH] x86, reboot: skip
reboot_fixups in early boot phase")
Reported-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Inside bus_add_driver(), one device might be added(device_add()) into
the bus or probed which is triggered by deferred probe
just after completing of driver_attach() and before
'klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers)',
so the device won't be probed by this driver.
This patch moves the below line
'klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers)'
before driver_attach() inside bus_add_driver() to fix the
problem.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove conditional code based on CONFIG_HOTPLUG being false. It's
always on now in preparation of it going away as an option.
Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Device driver attribute groups are created after userspace is notified
via an add event. Fix this by moving the kobject_uevent call to
driver_register after the attribute groups are added.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit a15d49fd30 as that
patch broke the build.
Cc: Hannes Reinecke <hare@suse.de>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
klist_iter_init_node() takes a node as a start argument.
However, this node might not be valid anymore.
This patch updates the klist_iter_init_node() and
dependent functions to return an error if so.
All calling functions have been audited to check
for a return code here.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Greg Kroah-Hartmann <gregkh@linuxfoundation.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Stable Kernel <stable@kernel.org>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The variable 'system_kset' is only referenced in this file and
should be marked static to prevent it from being exposed globally.
This quiets the sparse waring:
warning: symbol 'system_kset' was not declared. Should it be static?
Also, remove the comment since drivers/base/sys.c has now been
deleted.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This was done to resolve a merge and build problem with the
drivers/acpi/processor_driver.c file.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Check if the sif is not NULL before de-referencing it
Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix new kernel-doc warnings:
Warning(drivers/base/bus.c:925): No description found for parameter 'key'
Warning(drivers/base/bus.c:1241): No description found for parameter 'subsys'
Warning(drivers/base/bus.c:1241): No description found for parameter 'groups'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All sysdev classes and sysdev devices will converted to regular devices
and buses to properly hook userspace into the event processing.
There is no interesting difference between a 'sysdev' and 'device' which
would justify to roll an entire own subsystem with different userspace
export semantics. Userspace relies on events and generic sysfs subsystem
infrastructure from sysdev devices, which are currently not properly
available.
Every converted sysdev class will create a regular device with the class
name in /sys/devices/system and all registered devices will becom a children
of theses devices.
For compatibility reasons, the sysdev class-wide attributes are created
at this parent device. (Do not copy that logic for anything new, subsystem-
wide properties belong to the subsystem, not to some fake parent device
created in /sys/devices.)
Every sysdev driver is implemented as a simple subsystem interface now,
and no longer called a driver.
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>