linux-sg2042/drivers/acpi/acpi_ipmi.c

650 lines
18 KiB
C
Raw Normal View History

/*
* acpi_ipmi.c - ACPI IPMI opregion
*
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
* Copyright (C) 2010, 2013 Intel Corporation
* Author: Zhao Yakui <yakui.zhao@intel.com>
* Lv Zheng <lv.zheng@intel.com>
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or (at
* your option) any later version.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/types.h>
#include <linux/delay.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/interrupt.h>
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/io.h>
#include <acpi/acpi_bus.h>
#include <acpi/acpi_drivers.h>
#include <linux/ipmi.h>
#include <linux/device.h>
#include <linux/pnp.h>
ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:23 +08:00
#include <linux/spinlock.h>
MODULE_AUTHOR("Zhao Yakui");
MODULE_DESCRIPTION("ACPI IPMI Opregion driver");
MODULE_LICENSE("GPL");
#define ACPI_IPMI_OK 0
#define ACPI_IPMI_TIMEOUT 0x10
#define ACPI_IPMI_UNKNOWN 0x07
/* the IPMI timeout is 5s */
#define IPMI_TIMEOUT (5000)
#define ACPI_IPMI_MAX_MSG_LENGTH 64
struct acpi_ipmi_device {
/* the device list attached to driver_data.ipmi_devices */
struct list_head head;
/* the IPMI request message list */
struct list_head tx_msg_list;
ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:23 +08:00
spinlock_t tx_msg_lock;
acpi_handle handle;
struct device *dev;
ipmi_user_t user_interface;
int ipmi_ifnum; /* IPMI interface number */
long curr_msgid;
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
bool dead;
struct kref kref;
};
struct ipmi_driver_data {
struct list_head ipmi_devices;
struct ipmi_smi_watcher bmc_events;
struct ipmi_user_hndl ipmi_hndlrs;
struct mutex ipmi_lock;
ACPI / IPMI: Use global IPMI operation region handler It is found on a real machine, in its ACPI namespace, the IPMI OperationRegions (in the ACPI000D - ACPI power meter) are not defined under the IPMI system interface device (the IPI0001 with KCS type returned from _IFT control method): Device (PMI0) { Name (_HID, "ACPI000D") // _HID: Hardware ID OperationRegion (SYSI, IPMI, 0x0600, 0x0100) Field (SYSI, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0x58), SCMD, 8, GCMD, 8 } OperationRegion (POWR, IPMI, 0x3000, 0x0100) Field (POWR, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0xB3), GPMM, 8 } } Device (PCI0) { Device (ISA) { Device (NIPM) { Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type { Return (0x01) } } } } Current ACPI_IPMI code registers IPMI operation region handler on a per-device basis, so for the above namespace the IPMI operation region handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus when an IPMI operation region field of \PMI0 is accessed, there are errors reported on such platform: ACPI Error: No handlers for Region [IPMI] ACPI Error: Region IPMI(7) has no handler The solution is to install an IPMI operation region handler from root node so that every object that defines IPMI OperationRegion can get an address space handler registered. When an IPMI operation region field is accessed, the Network Function (0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are passed to the operation region handler, there is no system interface specified by the BIOS. The patch tries to select one system interface by monitoring the system interface notification. IPMI messages passed from the ACPI codes are sent to this selected global IPMI system interface. The ACPI_IPMI will always select the first registered IPMI interface with an ACPI handle (i.e., defined in the ACPI namespace). It's hard to determine the selection when there are multiple IPMI system interfaces defined in the ACPI namespace. According to the IPMI specification: A BMC device may make available multiple system interfaces, but only one management controller is allowed to be 'active' BMC that provides BMC functionality for the system (in case of a 'partitioned' system, there can be only one active BMC per partition). Only the system interface(s) for the active BMC allowed to respond to the 'Get Device Id' command. According to the ipmi_si desigin: The ipmi_si registeration notifications can only happen after a successful "Get Device ID" command. Thus it should be OK for non-partitioned systems to do such selection. However, we do not have much knowledge on 'partitioned' systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=46741 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:02 +08:00
/*
* NOTE: IPMI System Interface Selection
* There is no system interface specified by the IPMI operation
* region access. We try to select one system interface with ACPI
* handle set. IPMI messages passed from the ACPI codes are sent
* to this selected global IPMI system interface.
*/
struct acpi_ipmi_device *selected_smi;
};
struct acpi_ipmi_msg {
struct list_head head;
/*
* General speaking the addr type should be SI_ADDR_TYPE. And
* the addr channel should be BMC.
* In fact it can also be IPMB type. But we will have to
* parse it from the Netfn command buffer. It is so complex
* that it is skipped.
*/
struct ipmi_addr addr;
long tx_msgid;
/* it is used to track whether the IPMI message is finished */
struct completion tx_complete;
struct kernel_ipmi_msg tx_message;
int msg_done;
/* tx/rx data . And copy it from/to ACPI object buffer */
u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
u8 rx_len;
struct acpi_ipmi_device *device;
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
struct kref kref;
};
/* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
struct acpi_ipmi_buffer {
u8 status;
u8 length;
u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
};
static void ipmi_register_bmc(int iface, struct device *dev);
static void ipmi_bmc_gone(int iface);
static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
static struct ipmi_driver_data driver_data = {
.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
.bmc_events = {
.owner = THIS_MODULE,
.new_smi = ipmi_register_bmc,
.smi_gone = ipmi_bmc_gone,
},
.ipmi_hndlrs = {
.ipmi_recv_hndl = ipmi_msg_handler,
},
.ipmi_lock = __MUTEX_INITIALIZER(driver_data.ipmi_lock)
};
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
static struct acpi_ipmi_device *
ipmi_dev_alloc(int iface, struct device *dev, acpi_handle handle)
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
{
struct acpi_ipmi_device *ipmi_device;
int err;
ipmi_user_t user;
ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
if (!ipmi_device)
return NULL;
kref_init(&ipmi_device->kref);
INIT_LIST_HEAD(&ipmi_device->head);
INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
spin_lock_init(&ipmi_device->tx_msg_lock);
ipmi_device->handle = handle;
ipmi_device->dev = get_device(dev);
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
ipmi_device->ipmi_ifnum = iface;
err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
ipmi_device, &user);
if (err) {
put_device(dev);
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
kfree(ipmi_device);
return NULL;
}
ipmi_device->user_interface = user;
return ipmi_device;
}
static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
{
ipmi_destroy_user(ipmi_device->user_interface);
put_device(ipmi_device->dev);
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
kfree(ipmi_device);
}
static void ipmi_dev_release_kref(struct kref *kref)
{
struct acpi_ipmi_device *ipmi =
container_of(kref, struct acpi_ipmi_device, kref);
ipmi_dev_release(ipmi);
}
static void __ipmi_dev_kill(struct acpi_ipmi_device *ipmi_device)
{
list_del(&ipmi_device->head);
ACPI / IPMI: Use global IPMI operation region handler It is found on a real machine, in its ACPI namespace, the IPMI OperationRegions (in the ACPI000D - ACPI power meter) are not defined under the IPMI system interface device (the IPI0001 with KCS type returned from _IFT control method): Device (PMI0) { Name (_HID, "ACPI000D") // _HID: Hardware ID OperationRegion (SYSI, IPMI, 0x0600, 0x0100) Field (SYSI, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0x58), SCMD, 8, GCMD, 8 } OperationRegion (POWR, IPMI, 0x3000, 0x0100) Field (POWR, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0xB3), GPMM, 8 } } Device (PCI0) { Device (ISA) { Device (NIPM) { Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type { Return (0x01) } } } } Current ACPI_IPMI code registers IPMI operation region handler on a per-device basis, so for the above namespace the IPMI operation region handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus when an IPMI operation region field of \PMI0 is accessed, there are errors reported on such platform: ACPI Error: No handlers for Region [IPMI] ACPI Error: Region IPMI(7) has no handler The solution is to install an IPMI operation region handler from root node so that every object that defines IPMI OperationRegion can get an address space handler registered. When an IPMI operation region field is accessed, the Network Function (0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are passed to the operation region handler, there is no system interface specified by the BIOS. The patch tries to select one system interface by monitoring the system interface notification. IPMI messages passed from the ACPI codes are sent to this selected global IPMI system interface. The ACPI_IPMI will always select the first registered IPMI interface with an ACPI handle (i.e., defined in the ACPI namespace). It's hard to determine the selection when there are multiple IPMI system interfaces defined in the ACPI namespace. According to the IPMI specification: A BMC device may make available multiple system interfaces, but only one management controller is allowed to be 'active' BMC that provides BMC functionality for the system (in case of a 'partitioned' system, there can be only one active BMC per partition). Only the system interface(s) for the active BMC allowed to respond to the 'Get Device Id' command. According to the ipmi_si desigin: The ipmi_si registeration notifications can only happen after a successful "Get Device ID" command. Thus it should be OK for non-partitioned systems to do such selection. However, we do not have much knowledge on 'partitioned' systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=46741 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:02 +08:00
if (driver_data.selected_smi == ipmi_device)
driver_data.selected_smi = NULL;
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
/*
* Always setting dead flag after deleting from the list or
* list_for_each_entry() codes must get changed.
*/
ipmi_device->dead = true;
}
ACPI / IPMI: Use global IPMI operation region handler It is found on a real machine, in its ACPI namespace, the IPMI OperationRegions (in the ACPI000D - ACPI power meter) are not defined under the IPMI system interface device (the IPI0001 with KCS type returned from _IFT control method): Device (PMI0) { Name (_HID, "ACPI000D") // _HID: Hardware ID OperationRegion (SYSI, IPMI, 0x0600, 0x0100) Field (SYSI, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0x58), SCMD, 8, GCMD, 8 } OperationRegion (POWR, IPMI, 0x3000, 0x0100) Field (POWR, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0xB3), GPMM, 8 } } Device (PCI0) { Device (ISA) { Device (NIPM) { Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type { Return (0x01) } } } } Current ACPI_IPMI code registers IPMI operation region handler on a per-device basis, so for the above namespace the IPMI operation region handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus when an IPMI operation region field of \PMI0 is accessed, there are errors reported on such platform: ACPI Error: No handlers for Region [IPMI] ACPI Error: Region IPMI(7) has no handler The solution is to install an IPMI operation region handler from root node so that every object that defines IPMI OperationRegion can get an address space handler registered. When an IPMI operation region field is accessed, the Network Function (0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are passed to the operation region handler, there is no system interface specified by the BIOS. The patch tries to select one system interface by monitoring the system interface notification. IPMI messages passed from the ACPI codes are sent to this selected global IPMI system interface. The ACPI_IPMI will always select the first registered IPMI interface with an ACPI handle (i.e., defined in the ACPI namespace). It's hard to determine the selection when there are multiple IPMI system interfaces defined in the ACPI namespace. According to the IPMI specification: A BMC device may make available multiple system interfaces, but only one management controller is allowed to be 'active' BMC that provides BMC functionality for the system (in case of a 'partitioned' system, there can be only one active BMC per partition). Only the system interface(s) for the active BMC allowed to respond to the 'Get Device Id' command. According to the ipmi_si desigin: The ipmi_si registeration notifications can only happen after a successful "Get Device ID" command. Thus it should be OK for non-partitioned systems to do such selection. However, we do not have much knowledge on 'partitioned' systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=46741 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:02 +08:00
static struct acpi_ipmi_device *acpi_ipmi_dev_get(void)
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
{
ACPI / IPMI: Use global IPMI operation region handler It is found on a real machine, in its ACPI namespace, the IPMI OperationRegions (in the ACPI000D - ACPI power meter) are not defined under the IPMI system interface device (the IPI0001 with KCS type returned from _IFT control method): Device (PMI0) { Name (_HID, "ACPI000D") // _HID: Hardware ID OperationRegion (SYSI, IPMI, 0x0600, 0x0100) Field (SYSI, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0x58), SCMD, 8, GCMD, 8 } OperationRegion (POWR, IPMI, 0x3000, 0x0100) Field (POWR, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0xB3), GPMM, 8 } } Device (PCI0) { Device (ISA) { Device (NIPM) { Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type { Return (0x01) } } } } Current ACPI_IPMI code registers IPMI operation region handler on a per-device basis, so for the above namespace the IPMI operation region handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus when an IPMI operation region field of \PMI0 is accessed, there are errors reported on such platform: ACPI Error: No handlers for Region [IPMI] ACPI Error: Region IPMI(7) has no handler The solution is to install an IPMI operation region handler from root node so that every object that defines IPMI OperationRegion can get an address space handler registered. When an IPMI operation region field is accessed, the Network Function (0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are passed to the operation region handler, there is no system interface specified by the BIOS. The patch tries to select one system interface by monitoring the system interface notification. IPMI messages passed from the ACPI codes are sent to this selected global IPMI system interface. The ACPI_IPMI will always select the first registered IPMI interface with an ACPI handle (i.e., defined in the ACPI namespace). It's hard to determine the selection when there are multiple IPMI system interfaces defined in the ACPI namespace. According to the IPMI specification: A BMC device may make available multiple system interfaces, but only one management controller is allowed to be 'active' BMC that provides BMC functionality for the system (in case of a 'partitioned' system, there can be only one active BMC per partition). Only the system interface(s) for the active BMC allowed to respond to the 'Get Device Id' command. According to the ipmi_si desigin: The ipmi_si registeration notifications can only happen after a successful "Get Device ID" command. Thus it should be OK for non-partitioned systems to do such selection. However, we do not have much knowledge on 'partitioned' systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=46741 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:02 +08:00
struct acpi_ipmi_device *ipmi_device = NULL;
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
mutex_lock(&driver_data.ipmi_lock);
ACPI / IPMI: Use global IPMI operation region handler It is found on a real machine, in its ACPI namespace, the IPMI OperationRegions (in the ACPI000D - ACPI power meter) are not defined under the IPMI system interface device (the IPI0001 with KCS type returned from _IFT control method): Device (PMI0) { Name (_HID, "ACPI000D") // _HID: Hardware ID OperationRegion (SYSI, IPMI, 0x0600, 0x0100) Field (SYSI, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0x58), SCMD, 8, GCMD, 8 } OperationRegion (POWR, IPMI, 0x3000, 0x0100) Field (POWR, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0xB3), GPMM, 8 } } Device (PCI0) { Device (ISA) { Device (NIPM) { Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type { Return (0x01) } } } } Current ACPI_IPMI code registers IPMI operation region handler on a per-device basis, so for the above namespace the IPMI operation region handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus when an IPMI operation region field of \PMI0 is accessed, there are errors reported on such platform: ACPI Error: No handlers for Region [IPMI] ACPI Error: Region IPMI(7) has no handler The solution is to install an IPMI operation region handler from root node so that every object that defines IPMI OperationRegion can get an address space handler registered. When an IPMI operation region field is accessed, the Network Function (0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are passed to the operation region handler, there is no system interface specified by the BIOS. The patch tries to select one system interface by monitoring the system interface notification. IPMI messages passed from the ACPI codes are sent to this selected global IPMI system interface. The ACPI_IPMI will always select the first registered IPMI interface with an ACPI handle (i.e., defined in the ACPI namespace). It's hard to determine the selection when there are multiple IPMI system interfaces defined in the ACPI namespace. According to the IPMI specification: A BMC device may make available multiple system interfaces, but only one management controller is allowed to be 'active' BMC that provides BMC functionality for the system (in case of a 'partitioned' system, there can be only one active BMC per partition). Only the system interface(s) for the active BMC allowed to respond to the 'Get Device Id' command. According to the ipmi_si desigin: The ipmi_si registeration notifications can only happen after a successful "Get Device ID" command. Thus it should be OK for non-partitioned systems to do such selection. However, we do not have much knowledge on 'partitioned' systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=46741 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:02 +08:00
if (driver_data.selected_smi) {
ipmi_device = driver_data.selected_smi;
kref_get(&ipmi_device->kref);
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
}
mutex_unlock(&driver_data.ipmi_lock);
return ipmi_device;
}
static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device)
{
kref_put(&ipmi_device->kref, ipmi_dev_release_kref);
}
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
{
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
struct acpi_ipmi_device *ipmi;
struct acpi_ipmi_msg *ipmi_msg;
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
ipmi = acpi_ipmi_dev_get();
if (!ipmi)
return NULL;
ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
if (!ipmi_msg) {
acpi_ipmi_dev_put(ipmi);
return NULL;
}
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
kref_init(&ipmi_msg->kref);
init_completion(&ipmi_msg->tx_complete);
INIT_LIST_HEAD(&ipmi_msg->head);
ipmi_msg->device = ipmi;
ipmi_msg->msg_done = ACPI_IPMI_UNKNOWN;
return ipmi_msg;
}
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
{
acpi_ipmi_dev_put(tx_msg->device);
kfree(tx_msg);
}
static void ipmi_msg_release_kref(struct kref *kref)
{
struct acpi_ipmi_msg *tx_msg =
container_of(kref, struct acpi_ipmi_msg, kref);
ipmi_msg_release(tx_msg);
}
static struct acpi_ipmi_msg *acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
{
kref_get(&tx_msg->kref);
return tx_msg;
}
static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
{
kref_put(&tx_msg->kref, ipmi_msg_release_kref);
}
#define IPMI_OP_RGN_NETFN(offset) ((offset >> 8) & 0xff)
#define IPMI_OP_RGN_CMD(offset) (offset & 0xff)
static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
acpi_physical_address address,
acpi_integer *value)
{
struct kernel_ipmi_msg *msg;
struct acpi_ipmi_buffer *buffer;
struct acpi_ipmi_device *device;
ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:23 +08:00
unsigned long flags;
msg = &tx_msg->tx_message;
/*
* IPMI network function and command are encoded in the address
* within the IPMI OpRegion; see ACPI 4.0, sec 5.5.2.4.3.
*/
msg->netfn = IPMI_OP_RGN_NETFN(address);
msg->cmd = IPMI_OP_RGN_CMD(address);
msg->data = tx_msg->data;
/*
* value is the parameter passed by the IPMI opregion space handler.
* It points to the IPMI request message buffer
*/
buffer = (struct acpi_ipmi_buffer *)value;
/* copy the tx message data */
if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
dev_WARN_ONCE(tx_msg->device->dev, true,
"Unexpected request (msg len %d).\n",
buffer->length);
return -EINVAL;
}
msg->data_len = buffer->length;
memcpy(tx_msg->data, buffer->data, msg->data_len);
/*
* now the default type is SYSTEM_INTERFACE and channel type is BMC.
* If the netfn is APP_REQUEST and the cmd is SEND_MESSAGE,
* the addr type should be changed to IPMB. Then we will have to parse
* the IPMI request message buffer to get the IPMB address.
* If so, please fix me.
*/
tx_msg->addr.addr_type = IPMI_SYSTEM_INTERFACE_ADDR_TYPE;
tx_msg->addr.channel = IPMI_BMC_CHANNEL;
tx_msg->addr.data[0] = 0;
/* Get the msgid */
device = tx_msg->device;
ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:23 +08:00
spin_lock_irqsave(&device->tx_msg_lock, flags);
device->curr_msgid++;
tx_msg->tx_msgid = device->curr_msgid;
ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:23 +08:00
spin_unlock_irqrestore(&device->tx_msg_lock, flags);
return 0;
}
static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
acpi_integer *value)
{
struct acpi_ipmi_buffer *buffer;
/*
* value is also used as output parameter. It represents the response
* IPMI message returned by IPMI command.
*/
buffer = (struct acpi_ipmi_buffer *)value;
/*
* If the flag of msg_done is not set, it means that the IPMI command is
* not executed correctly.
*/
buffer->status = msg->msg_done;
if (msg->msg_done != ACPI_IPMI_OK)
return;
/*
* If the IPMI response message is obtained correctly, the status code
* will be ACPI_IPMI_OK
*/
buffer->length = msg->rx_len;
memcpy(buffer->data, msg->data, msg->rx_len);
}
static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
{
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
struct acpi_ipmi_msg *tx_msg;
unsigned long flags;
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
/*
* NOTE: On-going ipmi_recv_msg
* ipmi_msg_handler() may still be invoked by ipmi_si after
* flushing. But it is safe to do a fast flushing on module_exit()
* without waiting for all ipmi_recv_msg(s) to complete from
* ipmi_msg_handler() as it is ensured by ipmi_si that all
* ipmi_recv_msg(s) are freed after invoking ipmi_destroy_user().
*/
spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
while (!list_empty(&ipmi->tx_msg_list)) {
tx_msg = list_first_entry(&ipmi->tx_msg_list,
struct acpi_ipmi_msg,
head);
list_del(&tx_msg->head);
spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
/* wake up the sleep thread on the Tx msg */
complete(&tx_msg->tx_complete);
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
acpi_ipmi_msg_put(tx_msg);
spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
}
spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
}
static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
struct acpi_ipmi_msg *msg)
{
struct acpi_ipmi_msg *tx_msg, *temp;
bool msg_found = false;
unsigned long flags;
spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
if (msg == tx_msg) {
msg_found = true;
list_del(&tx_msg->head);
break;
}
}
spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
if (msg_found)
acpi_ipmi_msg_put(tx_msg);
}
static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
{
struct acpi_ipmi_device *ipmi_device = user_msg_data;
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
bool msg_found = false;
struct acpi_ipmi_msg *tx_msg, *temp;
struct device *dev = ipmi_device->dev;
ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:23 +08:00
unsigned long flags;
if (msg->user != ipmi_device->user_interface) {
dev_warn(dev, "Unexpected response is returned. "
"returned user %p, expected user %p\n",
msg->user, ipmi_device->user_interface);
goto out_msg;
}
ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:23 +08:00
spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
list_for_each_entry_safe(tx_msg, temp, &ipmi_device->tx_msg_list, head) {
if (msg->msgid == tx_msg->tx_msgid) {
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
msg_found = true;
list_del(&tx_msg->head);
break;
}
}
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
if (!msg_found) {
dev_warn(dev, "Unexpected response (msg id %ld) is "
"returned.\n", msg->msgid);
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
goto out_msg;
}
/* copy the response data to Rx_data buffer */
if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
dev_WARN_ONCE(dev, true,
"Unexpected response (msg len %d).\n",
msg->msg.data_len);
goto out_comp;
}
/* response msg is an error msg */
msg->recv_type = IPMI_RESPONSE_RECV_TYPE;
if (msg->recv_type == IPMI_RESPONSE_RECV_TYPE &&
msg->msg.data_len == 1) {
if (msg->msg.data[0] == IPMI_TIMEOUT_COMPLETION_CODE) {
dev_WARN_ONCE(dev, true,
"Unexpected response (timeout).\n");
tx_msg->msg_done = ACPI_IPMI_TIMEOUT;
}
goto out_comp;
}
tx_msg->rx_len = msg->msg.data_len;
memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
tx_msg->msg_done = ACPI_IPMI_OK;
out_comp:
complete(&tx_msg->tx_complete);
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
acpi_ipmi_msg_put(tx_msg);
out_msg:
ipmi_free_recv_msg(msg);
};
static void ipmi_register_bmc(int iface, struct device *dev)
{
struct acpi_ipmi_device *ipmi_device, *temp;
int err;
struct ipmi_smi_info smi_data;
acpi_handle handle;
err = ipmi_get_smi_info(iface, &smi_data);
if (err)
return;
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
if (smi_data.addr_src != SI_ACPI)
goto err_ref;
handle = smi_data.addr_info.acpi_info.acpi_handle;
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
if (!handle)
goto err_ref;
ipmi_device = ipmi_dev_alloc(iface, smi_data.dev, handle);
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
if (!ipmi_device) {
dev_warn(smi_data.dev, "Can't create IPMI user interface\n");
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
goto err_ref;
}
mutex_lock(&driver_data.ipmi_lock);
list_for_each_entry(temp, &driver_data.ipmi_devices, head) {
/*
* if the corresponding ACPI handle is already added
* to the device list, don't add it again.
*/
if (temp->handle == handle)
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
goto err_lock;
}
ACPI / IPMI: Use global IPMI operation region handler It is found on a real machine, in its ACPI namespace, the IPMI OperationRegions (in the ACPI000D - ACPI power meter) are not defined under the IPMI system interface device (the IPI0001 with KCS type returned from _IFT control method): Device (PMI0) { Name (_HID, "ACPI000D") // _HID: Hardware ID OperationRegion (SYSI, IPMI, 0x0600, 0x0100) Field (SYSI, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0x58), SCMD, 8, GCMD, 8 } OperationRegion (POWR, IPMI, 0x3000, 0x0100) Field (POWR, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0xB3), GPMM, 8 } } Device (PCI0) { Device (ISA) { Device (NIPM) { Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type { Return (0x01) } } } } Current ACPI_IPMI code registers IPMI operation region handler on a per-device basis, so for the above namespace the IPMI operation region handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus when an IPMI operation region field of \PMI0 is accessed, there are errors reported on such platform: ACPI Error: No handlers for Region [IPMI] ACPI Error: Region IPMI(7) has no handler The solution is to install an IPMI operation region handler from root node so that every object that defines IPMI OperationRegion can get an address space handler registered. When an IPMI operation region field is accessed, the Network Function (0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are passed to the operation region handler, there is no system interface specified by the BIOS. The patch tries to select one system interface by monitoring the system interface notification. IPMI messages passed from the ACPI codes are sent to this selected global IPMI system interface. The ACPI_IPMI will always select the first registered IPMI interface with an ACPI handle (i.e., defined in the ACPI namespace). It's hard to determine the selection when there are multiple IPMI system interfaces defined in the ACPI namespace. According to the IPMI specification: A BMC device may make available multiple system interfaces, but only one management controller is allowed to be 'active' BMC that provides BMC functionality for the system (in case of a 'partitioned' system, there can be only one active BMC per partition). Only the system interface(s) for the active BMC allowed to respond to the 'Get Device Id' command. According to the ipmi_si desigin: The ipmi_si registeration notifications can only happen after a successful "Get Device ID" command. Thus it should be OK for non-partitioned systems to do such selection. However, we do not have much knowledge on 'partitioned' systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=46741 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:02 +08:00
if (!driver_data.selected_smi)
driver_data.selected_smi = ipmi_device;
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
mutex_unlock(&driver_data.ipmi_lock);
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
put_device(smi_data.dev);
return;
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
err_lock:
mutex_unlock(&driver_data.ipmi_lock);
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
ipmi_dev_release(ipmi_device);
err_ref:
put_device(smi_data.dev);
return;
}
static void ipmi_bmc_gone(int iface)
{
struct acpi_ipmi_device *ipmi_device, *temp;
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
bool dev_found = false;
mutex_lock(&driver_data.ipmi_lock);
list_for_each_entry_safe(ipmi_device, temp,
&driver_data.ipmi_devices, head) {
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
if (ipmi_device->ipmi_ifnum != iface) {
dev_found = true;
__ipmi_dev_kill(ipmi_device);
break;
}
}
ACPI / IPMI: Use global IPMI operation region handler It is found on a real machine, in its ACPI namespace, the IPMI OperationRegions (in the ACPI000D - ACPI power meter) are not defined under the IPMI system interface device (the IPI0001 with KCS type returned from _IFT control method): Device (PMI0) { Name (_HID, "ACPI000D") // _HID: Hardware ID OperationRegion (SYSI, IPMI, 0x0600, 0x0100) Field (SYSI, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0x58), SCMD, 8, GCMD, 8 } OperationRegion (POWR, IPMI, 0x3000, 0x0100) Field (POWR, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0xB3), GPMM, 8 } } Device (PCI0) { Device (ISA) { Device (NIPM) { Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type { Return (0x01) } } } } Current ACPI_IPMI code registers IPMI operation region handler on a per-device basis, so for the above namespace the IPMI operation region handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus when an IPMI operation region field of \PMI0 is accessed, there are errors reported on such platform: ACPI Error: No handlers for Region [IPMI] ACPI Error: Region IPMI(7) has no handler The solution is to install an IPMI operation region handler from root node so that every object that defines IPMI OperationRegion can get an address space handler registered. When an IPMI operation region field is accessed, the Network Function (0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are passed to the operation region handler, there is no system interface specified by the BIOS. The patch tries to select one system interface by monitoring the system interface notification. IPMI messages passed from the ACPI codes are sent to this selected global IPMI system interface. The ACPI_IPMI will always select the first registered IPMI interface with an ACPI handle (i.e., defined in the ACPI namespace). It's hard to determine the selection when there are multiple IPMI system interfaces defined in the ACPI namespace. According to the IPMI specification: A BMC device may make available multiple system interfaces, but only one management controller is allowed to be 'active' BMC that provides BMC functionality for the system (in case of a 'partitioned' system, there can be only one active BMC per partition). Only the system interface(s) for the active BMC allowed to respond to the 'Get Device Id' command. According to the ipmi_si desigin: The ipmi_si registeration notifications can only happen after a successful "Get Device ID" command. Thus it should be OK for non-partitioned systems to do such selection. However, we do not have much knowledge on 'partitioned' systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=46741 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:02 +08:00
if (!driver_data.selected_smi)
driver_data.selected_smi = list_first_entry_or_null(
&driver_data.ipmi_devices,
struct acpi_ipmi_device, head);
mutex_unlock(&driver_data.ipmi_lock);
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
if (dev_found) {
ipmi_flush_tx_msg(ipmi_device);
acpi_ipmi_dev_put(ipmi_device);
}
}
/* --------------------------------------------------------------------------
* Address Space Management
* -------------------------------------------------------------------------- */
/*
* This is the IPMI opregion space handler.
* @function: indicates the read/write. In fact as the IPMI message is driven
* by command, only write is meaningful.
* @address: This contains the netfn/command of IPMI request message.
* @bits : not used.
* @value : it is an in/out parameter. It points to the IPMI message buffer.
* Before the IPMI message is sent, it represents the actual request
* IPMI message. After the IPMI message is finished, it represents
* the response IPMI message returned by IPMI command.
* @handler_context: IPMI device context.
*/
static acpi_status
acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
u32 bits, acpi_integer *value,
void *handler_context, void *region_context)
{
struct acpi_ipmi_msg *tx_msg;
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
struct acpi_ipmi_device *ipmi_device;
int err;
acpi_status status;
ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:23 +08:00
unsigned long flags;
/*
* IPMI opregion message.
* IPMI message is firstly written to the BMC and system software
* can get the respsonse. So it is unmeaningful for the read access
* of IPMI opregion.
*/
if ((function & ACPI_IO_MASK) == ACPI_READ)
return AE_TYPE;
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
tx_msg = ipmi_msg_alloc();
if (!tx_msg)
return AE_NOT_EXIST;
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
ipmi_device = tx_msg->device;
if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
ipmi_msg_release(tx_msg);
return AE_TYPE;
}
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
acpi_ipmi_msg_get(tx_msg);
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
mutex_lock(&driver_data.ipmi_lock);
/* Do not add a tx_msg that can not be flushed. */
if (ipmi_device->dead) {
mutex_unlock(&driver_data.ipmi_lock);
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
ipmi_msg_release(tx_msg);
return AE_NOT_EXIST;
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
}
ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:23 +08:00
spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:23 +08:00
spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
mutex_unlock(&driver_data.ipmi_lock);
err = ipmi_request_settime(ipmi_device->user_interface,
&tx_msg->addr,
tx_msg->tx_msgid,
&tx_msg->tx_message,
NULL, 0, 0, IPMI_TIMEOUT);
if (err) {
status = AE_ERROR;
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
goto out_msg;
}
wait_for_completion(&tx_msg->tx_complete);
acpi_format_ipmi_response(tx_msg, value);
status = AE_OK;
out_msg:
ACPI / IPMI: Add reference counting for ACPI IPMI transfers This patch adds reference counting for ACPI IPMI transfers to tune the locking granularity of tx_msg_lock. This patch also makes the whole acpi_ipmi module's coding style consistent by using reference counting for all its objects (i.e., acpi_ipmi_device and acpi_ipmi_msg). The acpi_ipmi_msg handling is re-designed using referece counting. 1. tx_msg is always unlinked before complete(), so that it is safe to put complete() out side of tx_msg_lock. 2. tx_msg reference counters are incremented before calling ipmi_request_settime() and tx_msg_lock protection is added to ipmi_cancel_tx_msg() so that a complete() can be safely called in parellel with tx_msg unlinking in failure cases. 3. tx_msg holds a reference to acpi_ipmi_device so that it can be flushed and freed in the contexts other than acpi_ipmi_space_handler(). The lockdep_chains shows all acpi_ipmi locks are leaf locks after the tuning: 1. ipmi_lock is always leaf: irq_context: 0 [ffffffff81a943f8] smi_watchers_mutex [ffffffffa06eca60] driver_data.ipmi_lock irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06eca60] driver_data.ipmi_lock 2. without this patch applied, lock used by complete() is held after holding tx_msg_lock: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a6678] s_active#103 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock irq_context: 1 [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock [ffffffffa06eccf0] &x->wait#25 [ffffffff81e36620] &p->pi_lock [ffffffff81e5d0a8] &rq->lock 3. with this patch applied, tx_msg_lock is always leaf: irq_context: 0 [ffffffff82767b40] &buffer->mutex [ffffffffa00a66d8] s_active#107 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock irq_context: 1 [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:11 +08:00
ipmi_cancel_tx_msg(ipmi_device, tx_msg);
acpi_ipmi_msg_put(tx_msg);
return status;
}
static int __init acpi_ipmi_init(void)
{
int result;
ACPI / IPMI: Use global IPMI operation region handler It is found on a real machine, in its ACPI namespace, the IPMI OperationRegions (in the ACPI000D - ACPI power meter) are not defined under the IPMI system interface device (the IPI0001 with KCS type returned from _IFT control method): Device (PMI0) { Name (_HID, "ACPI000D") // _HID: Hardware ID OperationRegion (SYSI, IPMI, 0x0600, 0x0100) Field (SYSI, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0x58), SCMD, 8, GCMD, 8 } OperationRegion (POWR, IPMI, 0x3000, 0x0100) Field (POWR, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0xB3), GPMM, 8 } } Device (PCI0) { Device (ISA) { Device (NIPM) { Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type { Return (0x01) } } } } Current ACPI_IPMI code registers IPMI operation region handler on a per-device basis, so for the above namespace the IPMI operation region handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus when an IPMI operation region field of \PMI0 is accessed, there are errors reported on such platform: ACPI Error: No handlers for Region [IPMI] ACPI Error: Region IPMI(7) has no handler The solution is to install an IPMI operation region handler from root node so that every object that defines IPMI OperationRegion can get an address space handler registered. When an IPMI operation region field is accessed, the Network Function (0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are passed to the operation region handler, there is no system interface specified by the BIOS. The patch tries to select one system interface by monitoring the system interface notification. IPMI messages passed from the ACPI codes are sent to this selected global IPMI system interface. The ACPI_IPMI will always select the first registered IPMI interface with an ACPI handle (i.e., defined in the ACPI namespace). It's hard to determine the selection when there are multiple IPMI system interfaces defined in the ACPI namespace. According to the IPMI specification: A BMC device may make available multiple system interfaces, but only one management controller is allowed to be 'active' BMC that provides BMC functionality for the system (in case of a 'partitioned' system, there can be only one active BMC per partition). Only the system interface(s) for the active BMC allowed to respond to the 'Get Device Id' command. According to the ipmi_si desigin: The ipmi_si registeration notifications can only happen after a successful "Get Device ID" command. Thus it should be OK for non-partitioned systems to do such selection. However, we do not have much knowledge on 'partitioned' systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=46741 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:02 +08:00
acpi_status status;
if (acpi_disabled)
return 0;
ACPI / IPMI: Use global IPMI operation region handler It is found on a real machine, in its ACPI namespace, the IPMI OperationRegions (in the ACPI000D - ACPI power meter) are not defined under the IPMI system interface device (the IPI0001 with KCS type returned from _IFT control method): Device (PMI0) { Name (_HID, "ACPI000D") // _HID: Hardware ID OperationRegion (SYSI, IPMI, 0x0600, 0x0100) Field (SYSI, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0x58), SCMD, 8, GCMD, 8 } OperationRegion (POWR, IPMI, 0x3000, 0x0100) Field (POWR, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0xB3), GPMM, 8 } } Device (PCI0) { Device (ISA) { Device (NIPM) { Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type { Return (0x01) } } } } Current ACPI_IPMI code registers IPMI operation region handler on a per-device basis, so for the above namespace the IPMI operation region handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus when an IPMI operation region field of \PMI0 is accessed, there are errors reported on such platform: ACPI Error: No handlers for Region [IPMI] ACPI Error: Region IPMI(7) has no handler The solution is to install an IPMI operation region handler from root node so that every object that defines IPMI OperationRegion can get an address space handler registered. When an IPMI operation region field is accessed, the Network Function (0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are passed to the operation region handler, there is no system interface specified by the BIOS. The patch tries to select one system interface by monitoring the system interface notification. IPMI messages passed from the ACPI codes are sent to this selected global IPMI system interface. The ACPI_IPMI will always select the first registered IPMI interface with an ACPI handle (i.e., defined in the ACPI namespace). It's hard to determine the selection when there are multiple IPMI system interfaces defined in the ACPI namespace. According to the IPMI specification: A BMC device may make available multiple system interfaces, but only one management controller is allowed to be 'active' BMC that provides BMC functionality for the system (in case of a 'partitioned' system, there can be only one active BMC per partition). Only the system interface(s) for the active BMC allowed to respond to the 'Get Device Id' command. According to the ipmi_si desigin: The ipmi_si registeration notifications can only happen after a successful "Get Device ID" command. Thus it should be OK for non-partitioned systems to do such selection. However, we do not have much knowledge on 'partitioned' systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=46741 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:02 +08:00
status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler,
NULL, NULL);
if (ACPI_FAILURE(status)) {
pr_warn("Can't register IPMI opregion space handle\n");
return -EINVAL;
}
result = ipmi_smi_watcher_register(&driver_data.bmc_events);
ACPI / IPMI: Use global IPMI operation region handler It is found on a real machine, in its ACPI namespace, the IPMI OperationRegions (in the ACPI000D - ACPI power meter) are not defined under the IPMI system interface device (the IPI0001 with KCS type returned from _IFT control method): Device (PMI0) { Name (_HID, "ACPI000D") // _HID: Hardware ID OperationRegion (SYSI, IPMI, 0x0600, 0x0100) Field (SYSI, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0x58), SCMD, 8, GCMD, 8 } OperationRegion (POWR, IPMI, 0x3000, 0x0100) Field (POWR, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0xB3), GPMM, 8 } } Device (PCI0) { Device (ISA) { Device (NIPM) { Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type { Return (0x01) } } } } Current ACPI_IPMI code registers IPMI operation region handler on a per-device basis, so for the above namespace the IPMI operation region handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus when an IPMI operation region field of \PMI0 is accessed, there are errors reported on such platform: ACPI Error: No handlers for Region [IPMI] ACPI Error: Region IPMI(7) has no handler The solution is to install an IPMI operation region handler from root node so that every object that defines IPMI OperationRegion can get an address space handler registered. When an IPMI operation region field is accessed, the Network Function (0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are passed to the operation region handler, there is no system interface specified by the BIOS. The patch tries to select one system interface by monitoring the system interface notification. IPMI messages passed from the ACPI codes are sent to this selected global IPMI system interface. The ACPI_IPMI will always select the first registered IPMI interface with an ACPI handle (i.e., defined in the ACPI namespace). It's hard to determine the selection when there are multiple IPMI system interfaces defined in the ACPI namespace. According to the IPMI specification: A BMC device may make available multiple system interfaces, but only one management controller is allowed to be 'active' BMC that provides BMC functionality for the system (in case of a 'partitioned' system, there can be only one active BMC per partition). Only the system interface(s) for the active BMC allowed to respond to the 'Get Device Id' command. According to the ipmi_si desigin: The ipmi_si registeration notifications can only happen after a successful "Get Device ID" command. Thus it should be OK for non-partitioned systems to do such selection. However, we do not have much knowledge on 'partitioned' systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=46741 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:02 +08:00
if (result)
pr_err("Can't register IPMI system interface watcher\n");
return result;
}
static void __exit acpi_ipmi_exit(void)
{
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
struct acpi_ipmi_device *ipmi_device;
if (acpi_disabled)
return;
ipmi_smi_watcher_unregister(&driver_data.bmc_events);
/*
* When one smi_watcher is unregistered, it is only deleted
* from the smi_watcher list. But the smi_gone callback function
* is not called. So explicitly uninstall the ACPI IPMI oregion
* handler and free it.
*/
mutex_lock(&driver_data.ipmi_lock);
ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user This patch uses reference counting to fix the race caused by the unprotected ACPI IPMI user. There are two rules for using the ipmi_si APIs: 1. In ipmi_si, ipmi_destroy_user() can ensure that no ipmi_recv_msg will be passed to ipmi_msg_handler(), but ipmi_request_settime() can not use an invalid ipmi_user_t. This means the ipmi_si users must ensure that there won't be any local references on ipmi_user_t before invoking ipmi_destroy_user(). 2. In ipmi_si, the smi_gone()/new_smi() callbacks are protected by smi_watchers_mutex, so their execution is serialized. But as a new smi can re-use a freed intf_num, it requires that the callback implementation must not use intf_num as an identification mean or it must ensure all references to the previous smi are all dropped before exiting smi_gone() callback. As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler() can happen before setting user_interface to NULL and codes after the check in acpi_ipmi_space_handler() can happen after user_interface becomes NULL, the on-going acpi_ipmi_space_handler() still can pass an invalid acpi_ipmi_device->user_interface to ipmi_request_settime(). Such race conditions are not allowed by the IPMI layer's API design as a crash will happen in ipmi_request_settime() if something like that happens. This patch follows the ipmi_devintf.c design: 1. Invoke ipmi_destroy_user() after the reference count of acpi_ipmi_device drops to 0. References of acpi_ipmi_device dropping to 0 also means tx_msg related to this acpi_ipmi_device are all freed. This matches the IPMI layer's API calling rule on ipmi_destroy_user() and ipmi_request_settime(). 2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be running in acpi_ipmi_space_handler(). And it is invoked after invoking __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a "dead" flag set, and the "dead" flag check is also introduced to the point where a tx_msg is going to be added to the tx_msg_list so that no new tx_msg can be created after returning from the __ipmi_dev_kill(). 3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not required since this patch ensures no acpi_ipmi reference is still held for ipmi_user_t before calling ipmi_destroy_user() and ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen after returning from ipmi_destroy_user(). 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch. The forthcoming IPMI operation region handler installation changes also requires acpi_ipmi_device be handled in this style. The header comment of the file is also updated due to this design change. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:13:54 +08:00
while (!list_empty(&driver_data.ipmi_devices)) {
ipmi_device = list_first_entry(&driver_data.ipmi_devices,
struct acpi_ipmi_device,
head);
__ipmi_dev_kill(ipmi_device);
mutex_unlock(&driver_data.ipmi_lock);
ipmi_flush_tx_msg(ipmi_device);
acpi_ipmi_dev_put(ipmi_device);
mutex_lock(&driver_data.ipmi_lock);
}
mutex_unlock(&driver_data.ipmi_lock);
ACPI / IPMI: Use global IPMI operation region handler It is found on a real machine, in its ACPI namespace, the IPMI OperationRegions (in the ACPI000D - ACPI power meter) are not defined under the IPMI system interface device (the IPI0001 with KCS type returned from _IFT control method): Device (PMI0) { Name (_HID, "ACPI000D") // _HID: Hardware ID OperationRegion (SYSI, IPMI, 0x0600, 0x0100) Field (SYSI, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0x58), SCMD, 8, GCMD, 8 } OperationRegion (POWR, IPMI, 0x3000, 0x0100) Field (POWR, BufferAcc, Lock, Preserve) { AccessAs (BufferAcc, 0x01), Offset (0xB3), GPMM, 8 } } Device (PCI0) { Device (ISA) { Device (NIPM) { Name (_HID, EisaId ("IPI0001")) // _HID: Hardware ID Method (_IFT, 0, NotSerialized) // _IFT: IPMI Interface Type { Return (0x01) } } } } Current ACPI_IPMI code registers IPMI operation region handler on a per-device basis, so for the above namespace the IPMI operation region handler is registered only under the scope of \_SB.PCI0.ISA.NIPM. Thus when an IPMI operation region field of \PMI0 is accessed, there are errors reported on such platform: ACPI Error: No handlers for Region [IPMI] ACPI Error: Region IPMI(7) has no handler The solution is to install an IPMI operation region handler from root node so that every object that defines IPMI OperationRegion can get an address space handler registered. When an IPMI operation region field is accessed, the Network Function (0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are passed to the operation region handler, there is no system interface specified by the BIOS. The patch tries to select one system interface by monitoring the system interface notification. IPMI messages passed from the ACPI codes are sent to this selected global IPMI system interface. The ACPI_IPMI will always select the first registered IPMI interface with an ACPI handle (i.e., defined in the ACPI namespace). It's hard to determine the selection when there are multiple IPMI system interfaces defined in the ACPI namespace. According to the IPMI specification: A BMC device may make available multiple system interfaces, but only one management controller is allowed to be 'active' BMC that provides BMC functionality for the system (in case of a 'partitioned' system, there can be only one active BMC per partition). Only the system interface(s) for the active BMC allowed to respond to the 'Get Device Id' command. According to the ipmi_si desigin: The ipmi_si registeration notifications can only happen after a successful "Get Device ID" command. Thus it should be OK for non-partitioned systems to do such selection. However, we do not have much knowledge on 'partitioned' systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=46741 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-13 13:14:02 +08:00
acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler);
}
module_init(acpi_ipmi_init);
module_exit(acpi_ipmi_exit);