From 585dc0c2f68981c02a0bb6fc8fe191a3f513959c Mon Sep 17 00:00:00 2001 From: Gernot Vormayr Date: Fri, 24 May 2013 15:55:03 -0700 Subject: [PATCH 01/30] drivers/block/xsysace.c: fix id with missing port-number If the port number is missing from the device-tree the device gets named xs` instead of xsa. This fixes the check for missing ids. Tested on ml507 board. Signed-off-by: Gernot Vormayr Cc: Greg Kroah-Hartman Cc: Jens Axboe Cc: Grant Likely Cc: Rob Herring Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/xsysace.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/block/xsysace.c b/drivers/block/xsysace.c index f8ef15f37c5e..3fd130fdfbc1 100644 --- a/drivers/block/xsysace.c +++ b/drivers/block/xsysace.c @@ -1160,8 +1160,7 @@ static int ace_probe(struct platform_device *dev) dev_dbg(&dev->dev, "ace_probe(%p)\n", dev); /* device id and bus width */ - of_property_read_u32(dev->dev.of_node, "port-number", &id); - if (id < 0) + if (of_property_read_u32(dev->dev.of_node, "port-number", &id)) id = 0; if (of_find_property(dev->dev.of_node, "8-bit", NULL)) bus_width = ACE_BUS_WIDTH_8; From a11650e11093ed57dca78bf16e7836517c599098 Mon Sep 17 00:00:00 2001 From: Alexandre Bounine Date: Fri, 24 May 2013 15:55:05 -0700 Subject: [PATCH 02/30] rapidio: make enumeration/discovery configurable Systems that use RapidIO fabric may need to implement their own enumeration and discovery methods which are better suitable for needs of a target application. The following set of patches is intended to simplify process of introduction of new RapidIO fabric enumeration/discovery methods. The first patch offers ability to add new RapidIO enumeration/discovery methods using kernel configuration options. This new configuration option mechanism allows to select statically linked or modular enumeration/discovery method(s) from the list of existing methods or use external module(s). This patch also updates the currently existing enumeration/discovery code to be used as a statically linked or modular method. The corresponding configuration option is named "Basic enumeration/discovery" method. This is the only one configuration option available today but new methods are expected to be introduced after adoption of provided patches. The second patch address a long time complaint of RapidIO subsystem users regarding fabric enumeration/discovery start sequence. Existing implementation offers only a boot-time enumeration/discovery start which requires synchronized boot of all endpoints in RapidIO network. While it works for small closed configurations with limited number of endpoints, using this approach in systems with large number of endpoints is quite challenging. To eliminate requirement for synchronized start the second patch introduces RapidIO enumeration/discovery start from user space. For compatibility with the existing RapidIO subsystem implementation, automatic boot time enumeration/discovery start can be configured in by specifying "rio-scan.scan=1" command line parameter if statically linked basic enumeration method is selected. This patch: Rework to implement RapidIO enumeration/discovery method selection combined with ability to use enumeration/discovery as a kernel module. This patch adds ability to introduce new RapidIO enumeration/discovery methods using kernel configuration options. Configuration option mechanism allows to select statically linked or modular enumeration/discovery method from the list of existing methods or use external modules. If a modular enumeration/discovery is selected each RapidIO mport device can have its own method attached to it. The existing enumeration/discovery code was updated to be used as statically linked or modular method. This configuration option is named "Basic enumeration/discovery" method. Several common routines have been moved from rio-scan.c to make them available to other enumeration methods and reduce number of exported symbols. Signed-off-by: Alexandre Bounine Cc: Matt Porter Cc: Li Yang Cc: Kumar Gala Cc: Andre van Herk Cc: Micha Nelissen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rapidio/Kconfig | 20 ++++ drivers/rapidio/Makefile | 3 +- drivers/rapidio/rio-driver.c | 7 ++ drivers/rapidio/rio-scan.c | 166 ++++++-------------------- drivers/rapidio/rio.c | 222 +++++++++++++++++++++++++++++++++-- drivers/rapidio/rio.h | 11 +- include/linux/rio.h | 13 +- include/linux/rio_drv.h | 1 + 8 files changed, 304 insertions(+), 139 deletions(-) diff --git a/drivers/rapidio/Kconfig b/drivers/rapidio/Kconfig index 6194d35ebb97..5ab056494bbe 100644 --- a/drivers/rapidio/Kconfig +++ b/drivers/rapidio/Kconfig @@ -47,4 +47,24 @@ config RAPIDIO_DEBUG If you are unsure about this, say N here. +choice + prompt "Enumeration method" + depends on RAPIDIO + default RAPIDIO_ENUM_BASIC + help + There are different enumeration and discovery mechanisms offered + for RapidIO subsystem. You may select single built-in method or + or any number of methods to be built as modules. + Selecting a built-in method disables use of loadable methods. + + If unsure, select Basic built-in. + +config RAPIDIO_ENUM_BASIC + tristate "Basic" + help + This option includes basic RapidIO fabric enumeration and discovery + mechanism similar to one described in RapidIO specification Annex 1. + +endchoice + source "drivers/rapidio/switches/Kconfig" diff --git a/drivers/rapidio/Makefile b/drivers/rapidio/Makefile index ec3fb8121004..3036702ffe8b 100644 --- a/drivers/rapidio/Makefile +++ b/drivers/rapidio/Makefile @@ -1,7 +1,8 @@ # # Makefile for RapidIO interconnect services # -obj-y += rio.o rio-access.o rio-driver.o rio-scan.o rio-sysfs.o +obj-y += rio.o rio-access.o rio-driver.o rio-sysfs.o +obj-$(CONFIG_RAPIDIO_ENUM_BASIC) += rio-scan.o obj-$(CONFIG_RAPIDIO) += switches/ obj-$(CONFIG_RAPIDIO) += devices/ diff --git a/drivers/rapidio/rio-driver.c b/drivers/rapidio/rio-driver.c index 0f4a53bdaa3c..55850bb21480 100644 --- a/drivers/rapidio/rio-driver.c +++ b/drivers/rapidio/rio-driver.c @@ -164,6 +164,13 @@ void rio_unregister_driver(struct rio_driver *rdrv) driver_unregister(&rdrv->driver); } +void rio_attach_device(struct rio_dev *rdev) +{ + rdev->dev.bus = &rio_bus_type; + rdev->dev.parent = &rio_bus; +} +EXPORT_SYMBOL_GPL(rio_attach_device); + /** * rio_match_bus - Tell if a RIO device structure has a matching RIO driver device id structure * @dev: the standard device structure to match against diff --git a/drivers/rapidio/rio-scan.c b/drivers/rapidio/rio-scan.c index a965acd3c0e4..7bdc67419cc3 100644 --- a/drivers/rapidio/rio-scan.c +++ b/drivers/rapidio/rio-scan.c @@ -37,12 +37,8 @@ #include "rio.h" -LIST_HEAD(rio_devices); - static void rio_init_em(struct rio_dev *rdev); -DEFINE_SPINLOCK(rio_global_list_lock); - static int next_destid = 0; static int next_comptag = 1; @@ -326,127 +322,6 @@ static int rio_is_switch(struct rio_dev *rdev) return 0; } -/** - * rio_switch_init - Sets switch operations for a particular vendor switch - * @rdev: RIO device - * @do_enum: Enumeration/Discovery mode flag - * - * Searches the RIO switch ops table for known switch types. If the vid - * and did match a switch table entry, then call switch initialization - * routine to setup switch-specific routines. - */ -static void rio_switch_init(struct rio_dev *rdev, int do_enum) -{ - struct rio_switch_ops *cur = __start_rio_switch_ops; - struct rio_switch_ops *end = __end_rio_switch_ops; - - while (cur < end) { - if ((cur->vid == rdev->vid) && (cur->did == rdev->did)) { - pr_debug("RIO: calling init routine for %s\n", - rio_name(rdev)); - cur->init_hook(rdev, do_enum); - break; - } - cur++; - } - - if ((cur >= end) && (rdev->pef & RIO_PEF_STD_RT)) { - pr_debug("RIO: adding STD routing ops for %s\n", - rio_name(rdev)); - rdev->rswitch->add_entry = rio_std_route_add_entry; - rdev->rswitch->get_entry = rio_std_route_get_entry; - rdev->rswitch->clr_table = rio_std_route_clr_table; - } - - if (!rdev->rswitch->add_entry || !rdev->rswitch->get_entry) - printk(KERN_ERR "RIO: missing routing ops for %s\n", - rio_name(rdev)); -} - -/** - * rio_add_device- Adds a RIO device to the device model - * @rdev: RIO device - * - * Adds the RIO device to the global device list and adds the RIO - * device to the RIO device list. Creates the generic sysfs nodes - * for an RIO device. - */ -static int rio_add_device(struct rio_dev *rdev) -{ - int err; - - err = device_add(&rdev->dev); - if (err) - return err; - - spin_lock(&rio_global_list_lock); - list_add_tail(&rdev->global_list, &rio_devices); - spin_unlock(&rio_global_list_lock); - - rio_create_sysfs_dev_files(rdev); - - return 0; -} - -/** - * rio_enable_rx_tx_port - enable input receiver and output transmitter of - * given port - * @port: Master port associated with the RIO network - * @local: local=1 select local port otherwise a far device is reached - * @destid: Destination ID of the device to check host bit - * @hopcount: Number of hops to reach the target - * @port_num: Port (-number on switch) to enable on a far end device - * - * Returns 0 or 1 from on General Control Command and Status Register - * (EXT_PTR+0x3C) - */ -inline int rio_enable_rx_tx_port(struct rio_mport *port, - int local, u16 destid, - u8 hopcount, u8 port_num) { -#ifdef CONFIG_RAPIDIO_ENABLE_RX_TX_PORTS - u32 regval; - u32 ext_ftr_ptr; - - /* - * enable rx input tx output port - */ - pr_debug("rio_enable_rx_tx_port(local = %d, destid = %d, hopcount = " - "%d, port_num = %d)\n", local, destid, hopcount, port_num); - - ext_ftr_ptr = rio_mport_get_physefb(port, local, destid, hopcount); - - if (local) { - rio_local_read_config_32(port, ext_ftr_ptr + - RIO_PORT_N_CTL_CSR(0), - ®val); - } else { - if (rio_mport_read_config_32(port, destid, hopcount, - ext_ftr_ptr + RIO_PORT_N_CTL_CSR(port_num), ®val) < 0) - return -EIO; - } - - if (regval & RIO_PORT_N_CTL_P_TYP_SER) { - /* serial */ - regval = regval | RIO_PORT_N_CTL_EN_RX_SER - | RIO_PORT_N_CTL_EN_TX_SER; - } else { - /* parallel */ - regval = regval | RIO_PORT_N_CTL_EN_RX_PAR - | RIO_PORT_N_CTL_EN_TX_PAR; - } - - if (local) { - rio_local_write_config_32(port, ext_ftr_ptr + - RIO_PORT_N_CTL_CSR(0), regval); - } else { - if (rio_mport_write_config_32(port, destid, hopcount, - ext_ftr_ptr + RIO_PORT_N_CTL_CSR(port_num), regval) < 0) - return -EIO; - } -#endif - return 0; -} - /** * rio_setup_device- Allocates and sets up a RIO device * @net: RIO network @@ -587,8 +462,7 @@ static struct rio_dev *rio_setup_device(struct rio_net *net, rdev->destid); } - rdev->dev.bus = &rio_bus_type; - rdev->dev.parent = &rio_bus; + rio_attach_device(rdev); device_initialize(&rdev->dev); rdev->dev.release = rio_release_dev; @@ -1421,3 +1295,41 @@ enum_done: bail: return -EBUSY; } + +static struct rio_scan rio_scan_ops = { + .enumerate = rio_enum_mport, + .discover = rio_disc_mport, +}; + +static bool scan; +module_param(scan, bool, 0); +MODULE_PARM_DESC(scan, "Start RapidIO network enumeration/discovery " + "(default = 0)"); + +/** + * rio_basic_attach: + * + * When this enumeration/discovery method is loaded as a module this function + * registers its specific enumeration and discover routines for all available + * RapidIO mport devices. The "scan" command line parameter controls ability of + * the module to start RapidIO enumeration/discovery automatically. + * + * Returns 0 for success or -EIO if unable to register itself. + * + * This enumeration/discovery method cannot be unloaded and therefore does not + * provide a matching cleanup_module routine. + */ + +static int __init rio_basic_attach(void) +{ + if (rio_register_scan(RIO_MPORT_ANY, &rio_scan_ops)) + return -EIO; + if (scan) + rio_init_mports(); + return 0; +} + +late_initcall(rio_basic_attach); + +MODULE_DESCRIPTION("Basic RapidIO enumeration/discovery"); +MODULE_LICENSE("GPL"); diff --git a/drivers/rapidio/rio.c b/drivers/rapidio/rio.c index d553b5d13722..6e75dda34799 100644 --- a/drivers/rapidio/rio.c +++ b/drivers/rapidio/rio.c @@ -31,7 +31,11 @@ #include "rio.h" +static LIST_HEAD(rio_devices); +static DEFINE_SPINLOCK(rio_global_list_lock); + static LIST_HEAD(rio_mports); +static DEFINE_MUTEX(rio_mport_list_lock); static unsigned char next_portid; static DEFINE_SPINLOCK(rio_mmap_lock); @@ -52,6 +56,32 @@ u16 rio_local_get_device_id(struct rio_mport *port) return (RIO_GET_DID(port->sys_size, result)); } +/** + * rio_add_device- Adds a RIO device to the device model + * @rdev: RIO device + * + * Adds the RIO device to the global device list and adds the RIO + * device to the RIO device list. Creates the generic sysfs nodes + * for an RIO device. + */ +int rio_add_device(struct rio_dev *rdev) +{ + int err; + + err = device_add(&rdev->dev); + if (err) + return err; + + spin_lock(&rio_global_list_lock); + list_add_tail(&rdev->global_list, &rio_devices); + spin_unlock(&rio_global_list_lock); + + rio_create_sysfs_dev_files(rdev); + + return 0; +} +EXPORT_SYMBOL_GPL(rio_add_device); + /** * rio_request_inb_mbox - request inbound mailbox service * @mport: RIO master port from which to allocate the mailbox resource @@ -489,6 +519,7 @@ rio_mport_get_physefb(struct rio_mport *port, int local, return ext_ftr_ptr; } +EXPORT_SYMBOL_GPL(rio_mport_get_physefb); /** * rio_get_comptag - Begin or continue searching for a RIO device by component tag @@ -521,6 +552,7 @@ exit: spin_unlock(&rio_global_list_lock); return rdev; } +EXPORT_SYMBOL_GPL(rio_get_comptag); /** * rio_set_port_lockout - Sets/clears LOCKOUT bit (RIO EM 1.3) for a switch port. @@ -545,6 +577,107 @@ int rio_set_port_lockout(struct rio_dev *rdev, u32 pnum, int lock) regval); return 0; } +EXPORT_SYMBOL_GPL(rio_set_port_lockout); + +/** + * rio_switch_init - Sets switch operations for a particular vendor switch + * @rdev: RIO device + * @do_enum: Enumeration/Discovery mode flag + * + * Searches the RIO switch ops table for known switch types. If the vid + * and did match a switch table entry, then call switch initialization + * routine to setup switch-specific routines. + */ +void rio_switch_init(struct rio_dev *rdev, int do_enum) +{ + struct rio_switch_ops *cur = __start_rio_switch_ops; + struct rio_switch_ops *end = __end_rio_switch_ops; + + while (cur < end) { + if ((cur->vid == rdev->vid) && (cur->did == rdev->did)) { + pr_debug("RIO: calling init routine for %s\n", + rio_name(rdev)); + cur->init_hook(rdev, do_enum); + break; + } + cur++; + } + + if ((cur >= end) && (rdev->pef & RIO_PEF_STD_RT)) { + pr_debug("RIO: adding STD routing ops for %s\n", + rio_name(rdev)); + rdev->rswitch->add_entry = rio_std_route_add_entry; + rdev->rswitch->get_entry = rio_std_route_get_entry; + rdev->rswitch->clr_table = rio_std_route_clr_table; + } + + if (!rdev->rswitch->add_entry || !rdev->rswitch->get_entry) + printk(KERN_ERR "RIO: missing routing ops for %s\n", + rio_name(rdev)); +} +EXPORT_SYMBOL_GPL(rio_switch_init); + +/** + * rio_enable_rx_tx_port - enable input receiver and output transmitter of + * given port + * @port: Master port associated with the RIO network + * @local: local=1 select local port otherwise a far device is reached + * @destid: Destination ID of the device to check host bit + * @hopcount: Number of hops to reach the target + * @port_num: Port (-number on switch) to enable on a far end device + * + * Returns 0 or 1 from on General Control Command and Status Register + * (EXT_PTR+0x3C) + */ +int rio_enable_rx_tx_port(struct rio_mport *port, + int local, u16 destid, + u8 hopcount, u8 port_num) +{ +#ifdef CONFIG_RAPIDIO_ENABLE_RX_TX_PORTS + u32 regval; + u32 ext_ftr_ptr; + + /* + * enable rx input tx output port + */ + pr_debug("rio_enable_rx_tx_port(local = %d, destid = %d, hopcount = " + "%d, port_num = %d)\n", local, destid, hopcount, port_num); + + ext_ftr_ptr = rio_mport_get_physefb(port, local, destid, hopcount); + + if (local) { + rio_local_read_config_32(port, ext_ftr_ptr + + RIO_PORT_N_CTL_CSR(0), + ®val); + } else { + if (rio_mport_read_config_32(port, destid, hopcount, + ext_ftr_ptr + RIO_PORT_N_CTL_CSR(port_num), ®val) < 0) + return -EIO; + } + + if (regval & RIO_PORT_N_CTL_P_TYP_SER) { + /* serial */ + regval = regval | RIO_PORT_N_CTL_EN_RX_SER + | RIO_PORT_N_CTL_EN_TX_SER; + } else { + /* parallel */ + regval = regval | RIO_PORT_N_CTL_EN_RX_PAR + | RIO_PORT_N_CTL_EN_TX_PAR; + } + + if (local) { + rio_local_write_config_32(port, ext_ftr_ptr + + RIO_PORT_N_CTL_CSR(0), regval); + } else { + if (rio_mport_write_config_32(port, destid, hopcount, + ext_ftr_ptr + RIO_PORT_N_CTL_CSR(port_num), regval) < 0) + return -EIO; + } +#endif + return 0; +} +EXPORT_SYMBOL_GPL(rio_enable_rx_tx_port); + /** * rio_chk_dev_route - Validate route to the specified device. @@ -610,6 +743,7 @@ rio_mport_chk_dev_access(struct rio_mport *mport, u16 destid, u8 hopcount) return 0; } +EXPORT_SYMBOL_GPL(rio_mport_chk_dev_access); /** * rio_chk_dev_access - Validate access to the specified device. @@ -941,6 +1075,7 @@ rio_mport_get_efb(struct rio_mport *port, int local, u16 destid, return RIO_GET_BLOCK_ID(reg_val); } } +EXPORT_SYMBOL_GPL(rio_mport_get_efb); /** * rio_mport_get_feature - query for devices' extended features @@ -997,6 +1132,7 @@ rio_mport_get_feature(struct rio_mport * port, int local, u16 destid, return 0; } +EXPORT_SYMBOL_GPL(rio_mport_get_feature); /** * rio_get_asm - Begin or continue searching for a RIO device by vid/did/asm_vid/asm_did @@ -1246,6 +1382,71 @@ EXPORT_SYMBOL_GPL(rio_dma_prep_slave_sg); #endif /* CONFIG_RAPIDIO_DMA_ENGINE */ +/** + * rio_register_scan - enumeration/discovery method registration interface + * @mport_id: mport device ID for which fabric scan routine has to be set + * (RIO_MPORT_ANY = set for all available mports) + * @scan_ops: enumeration/discovery control structure + * + * Assigns enumeration or discovery method to the specified mport device (or all + * available mports if RIO_MPORT_ANY is specified). + * Returns error if the mport already has an enumerator attached to it. + * In case of RIO_MPORT_ANY ignores ports with valid scan routines and returns + * an error if was unable to find at least one available mport. + */ +int rio_register_scan(int mport_id, struct rio_scan *scan_ops) +{ + struct rio_mport *port; + int rc = -EBUSY; + + mutex_lock(&rio_mport_list_lock); + list_for_each_entry(port, &rio_mports, node) { + if (port->id == mport_id || mport_id == RIO_MPORT_ANY) { + if (port->nscan && mport_id == RIO_MPORT_ANY) + continue; + else if (port->nscan) + break; + + port->nscan = scan_ops; + rc = 0; + + if (mport_id != RIO_MPORT_ANY) + break; + } + } + mutex_unlock(&rio_mport_list_lock); + + return rc; +} +EXPORT_SYMBOL_GPL(rio_register_scan); + +/** + * rio_unregister_scan - removes enumeration/discovery method from mport + * @mport_id: mport device ID for which fabric scan routine has to be + * unregistered (RIO_MPORT_ANY = set for all available mports) + * + * Removes enumeration or discovery method assigned to the specified mport + * device (or all available mports if RIO_MPORT_ANY is specified). + */ +int rio_unregister_scan(int mport_id) +{ + struct rio_mport *port; + + mutex_lock(&rio_mport_list_lock); + list_for_each_entry(port, &rio_mports, node) { + if (port->id == mport_id || mport_id == RIO_MPORT_ANY) { + if (port->nscan) + port->nscan = NULL; + if (mport_id != RIO_MPORT_ANY) + break; + } + } + mutex_unlock(&rio_mport_list_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(rio_unregister_scan); + static void rio_fixup_device(struct rio_dev *dev) { } @@ -1274,7 +1475,7 @@ static void disc_work_handler(struct work_struct *_work) work = container_of(_work, struct rio_disc_work, work); pr_debug("RIO: discovery work for mport %d %s\n", work->mport->id, work->mport->name); - rio_disc_mport(work->mport); + work->mport->nscan->discover(work->mport); } int rio_init_mports(void) @@ -1290,12 +1491,15 @@ int rio_init_mports(void) * First, run enumerations and check if we need to perform discovery * on any of the registered mports. */ + mutex_lock(&rio_mport_list_lock); list_for_each_entry(port, &rio_mports, node) { - if (port->host_deviceid >= 0) - rio_enum_mport(port); - else + if (port->host_deviceid >= 0) { + if (port->nscan) + port->nscan->enumerate(port); + } else n++; } + mutex_unlock(&rio_mport_list_lock); if (!n) goto no_disc; @@ -1322,14 +1526,16 @@ int rio_init_mports(void) } n = 0; + mutex_lock(&rio_mport_list_lock); list_for_each_entry(port, &rio_mports, node) { - if (port->host_deviceid < 0) { + if (port->host_deviceid < 0 && port->nscan) { work[n].mport = port; INIT_WORK(&work[n].work, disc_work_handler); queue_work(rio_wq, &work[n].work); n++; } } + mutex_unlock(&rio_mport_list_lock); flush_workqueue(rio_wq); pr_debug("RIO: destroy discovery workqueue\n"); @@ -1342,8 +1548,6 @@ no_disc: return 0; } -device_initcall_sync(rio_init_mports); - static int hdids[RIO_MAX_MPORTS + 1]; static int rio_get_hdid(int index) @@ -1371,7 +1575,10 @@ int rio_register_mport(struct rio_mport *port) port->id = next_portid++; port->host_deviceid = rio_get_hdid(port->id); + port->nscan = NULL; + mutex_lock(&rio_mport_list_lock); list_add_tail(&port->node, &rio_mports); + mutex_unlock(&rio_mport_list_lock); return 0; } @@ -1386,3 +1593,4 @@ EXPORT_SYMBOL_GPL(rio_request_inb_mbox); EXPORT_SYMBOL_GPL(rio_release_inb_mbox); EXPORT_SYMBOL_GPL(rio_request_outb_mbox); EXPORT_SYMBOL_GPL(rio_release_outb_mbox); +EXPORT_SYMBOL_GPL(rio_init_mports); diff --git a/drivers/rapidio/rio.h b/drivers/rapidio/rio.h index b1af414f15e6..0afdf482517e 100644 --- a/drivers/rapidio/rio.h +++ b/drivers/rapidio/rio.h @@ -15,6 +15,7 @@ #include #define RIO_MAX_CHK_RETRY 3 +#define RIO_MPORT_ANY (-1) /* Functions internal to the RIO core code */ @@ -27,8 +28,6 @@ extern u32 rio_mport_get_efb(struct rio_mport *port, int local, u16 destid, extern int rio_mport_chk_dev_access(struct rio_mport *mport, u16 destid, u8 hopcount); extern int rio_create_sysfs_dev_files(struct rio_dev *rdev); -extern int rio_enum_mport(struct rio_mport *mport); -extern int rio_disc_mport(struct rio_mport *mport); extern int rio_std_route_add_entry(struct rio_mport *mport, u16 destid, u8 hopcount, u16 table, u16 route_destid, u8 route_port); @@ -39,10 +38,16 @@ extern int rio_std_route_clr_table(struct rio_mport *mport, u16 destid, u8 hopcount, u16 table); extern int rio_set_port_lockout(struct rio_dev *rdev, u32 pnum, int lock); extern struct rio_dev *rio_get_comptag(u32 comp_tag, struct rio_dev *from); +extern int rio_add_device(struct rio_dev *rdev); +extern void rio_switch_init(struct rio_dev *rdev, int do_enum); +extern int rio_enable_rx_tx_port(struct rio_mport *port, int local, u16 destid, + u8 hopcount, u8 port_num); +extern int rio_register_scan(int mport_id, struct rio_scan *scan_ops); +extern int rio_unregister_scan(int mport_id); +extern void rio_attach_device(struct rio_dev *rdev); /* Structures internal to the RIO core code */ extern struct device_attribute rio_dev_attrs[]; -extern spinlock_t rio_global_list_lock; extern struct rio_switch_ops __start_rio_switch_ops[]; extern struct rio_switch_ops __end_rio_switch_ops[]; diff --git a/include/linux/rio.h b/include/linux/rio.h index a3e784278667..cd3796ee7410 100644 --- a/include/linux/rio.h +++ b/include/linux/rio.h @@ -83,7 +83,6 @@ extern struct bus_type rio_bus_type; extern struct device rio_bus; -extern struct list_head rio_devices; /* list of all devices */ struct rio_mport; struct rio_dev; @@ -237,6 +236,7 @@ enum rio_phy_type { * @name: Port name string * @priv: Master port private data * @dma: DMA device associated with mport + * @nscan: RapidIO network enumeration/discovery operations */ struct rio_mport { struct list_head dbells; /* list of doorbell events */ @@ -262,6 +262,7 @@ struct rio_mport { #ifdef CONFIG_RAPIDIO_DMA_ENGINE struct dma_device dma; #endif + struct rio_scan *nscan; }; struct rio_id_table { @@ -460,6 +461,16 @@ static inline struct rio_mport *dma_to_mport(struct dma_device *ddev) } #endif /* CONFIG_RAPIDIO_DMA_ENGINE */ +/** + * struct rio_scan - RIO enumeration and discovery operations + * @enumerate: Callback to perform RapidIO fabric enumeration. + * @discover: Callback to perform RapidIO fabric discovery. + */ +struct rio_scan { + int (*enumerate)(struct rio_mport *mport); + int (*discover)(struct rio_mport *mport); +}; + /* Architecture and hardware-specific functions */ extern int rio_register_mport(struct rio_mport *); extern int rio_open_inb_mbox(struct rio_mport *, void *, int, int); diff --git a/include/linux/rio_drv.h b/include/linux/rio_drv.h index b75c05920ab5..5059994fe297 100644 --- a/include/linux/rio_drv.h +++ b/include/linux/rio_drv.h @@ -433,5 +433,6 @@ extern u16 rio_local_get_device_id(struct rio_mport *port); extern struct rio_dev *rio_get_device(u16 vid, u16 did, struct rio_dev *from); extern struct rio_dev *rio_get_asm(u16 vid, u16 did, u16 asm_vid, u16 asm_did, struct rio_dev *from); +extern int rio_init_mports(void); #endif /* LINUX_RIO_DRV_H */ From bc8fcfea18249640f2728c46d70999dcb7e4dc49 Mon Sep 17 00:00:00 2001 From: Alexandre Bounine Date: Fri, 24 May 2013 15:55:06 -0700 Subject: [PATCH 03/30] rapidio: add enumeration/discovery start from user space Add RapidIO enumeration/discovery start from user space. User space start allows to defer RapidIO fabric scan until the moment when all participating endpoints are initialized avoiding mandatory synchronized start of all endpoints (which may be challenging in systems with large number of RapidIO endpoints). Signed-off-by: Alexandre Bounine Cc: Matt Porter Cc: Li Yang Cc: Kumar Gala Cc: Andre van Herk Cc: Micha Nelissen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rapidio/rio-driver.c | 1 + drivers/rapidio/rio-scan.c | 24 ++++++++++++++++--- drivers/rapidio/rio-sysfs.c | 45 ++++++++++++++++++++++++++++++++++++ drivers/rapidio/rio.c | 28 ++++++++++++++++++++-- drivers/rapidio/rio.h | 2 ++ include/linux/rio.h | 9 ++++++-- 6 files changed, 102 insertions(+), 7 deletions(-) diff --git a/drivers/rapidio/rio-driver.c b/drivers/rapidio/rio-driver.c index 55850bb21480..a0c875563d76 100644 --- a/drivers/rapidio/rio-driver.c +++ b/drivers/rapidio/rio-driver.c @@ -207,6 +207,7 @@ struct bus_type rio_bus_type = { .name = "rapidio", .match = rio_match_bus, .dev_attrs = rio_dev_attrs, + .bus_attrs = rio_bus_attrs, .probe = rio_device_probe, .remove = rio_device_remove, }; diff --git a/drivers/rapidio/rio-scan.c b/drivers/rapidio/rio-scan.c index 7bdc67419cc3..4c15dbf81087 100644 --- a/drivers/rapidio/rio-scan.c +++ b/drivers/rapidio/rio-scan.c @@ -1134,19 +1134,30 @@ static void rio_pw_enable(struct rio_mport *port, int enable) /** * rio_enum_mport- Start enumeration through a master port * @mport: Master port to send transactions + * @flags: Enumeration control flags * * Starts the enumeration process. If somebody has enumerated our * master port device, then give up. If not and we have an active * link, then start recursive peer enumeration. Returns %0 if * enumeration succeeds or %-EBUSY if enumeration fails. */ -int rio_enum_mport(struct rio_mport *mport) +int rio_enum_mport(struct rio_mport *mport, u32 flags) { struct rio_net *net = NULL; int rc = 0; printk(KERN_INFO "RIO: enumerate master port %d, %s\n", mport->id, mport->name); + + /* + * To avoid multiple start requests (repeat enumeration is not supported + * by this method) check if enumeration/discovery was performed for this + * mport: if mport was added into the list of mports for a net exit + * with error. + */ + if (mport->nnode.next || mport->nnode.prev) + return -EBUSY; + /* If somebody else enumerated our master port device, bail. */ if (rio_enum_host(mport) < 0) { printk(KERN_INFO @@ -1236,14 +1247,16 @@ static void rio_build_route_tables(struct rio_net *net) /** * rio_disc_mport- Start discovery through a master port * @mport: Master port to send transactions + * @flags: discovery control flags * * Starts the discovery process. If we have an active link, - * then wait for the signal that enumeration is complete. + * then wait for the signal that enumeration is complete (if wait + * is allowed). * When enumeration completion is signaled, start recursive * peer discovery. Returns %0 if discovery succeeds or %-EBUSY * on failure. */ -int rio_disc_mport(struct rio_mport *mport) +int rio_disc_mport(struct rio_mport *mport, u32 flags) { struct rio_net *net = NULL; unsigned long to_end; @@ -1253,6 +1266,11 @@ int rio_disc_mport(struct rio_mport *mport) /* If master port has an active link, allocate net and discover peers */ if (rio_mport_is_active(mport)) { + if (rio_enum_complete(mport)) + goto enum_done; + else if (flags & RIO_SCAN_ENUM_NO_WAIT) + return -EAGAIN; + pr_debug("RIO: wait for enumeration to complete...\n"); to_end = jiffies + CONFIG_RAPIDIO_DISC_TIMEOUT * HZ; diff --git a/drivers/rapidio/rio-sysfs.c b/drivers/rapidio/rio-sysfs.c index 4dbe360989be..66d4acd5e18f 100644 --- a/drivers/rapidio/rio-sysfs.c +++ b/drivers/rapidio/rio-sysfs.c @@ -285,3 +285,48 @@ void rio_remove_sysfs_dev_files(struct rio_dev *rdev) rdev->rswitch->sw_sysfs(rdev, RIO_SW_SYSFS_REMOVE); } } + +static ssize_t bus_scan_store(struct bus_type *bus, const char *buf, + size_t count) +{ + long val; + struct rio_mport *port = NULL; + int rc; + + if (kstrtol(buf, 0, &val) < 0) + return -EINVAL; + + if (val == RIO_MPORT_ANY) { + rc = rio_init_mports(); + goto exit; + } + + if (val < 0 || val >= RIO_MAX_MPORTS) + return -EINVAL; + + port = rio_find_mport((int)val); + + if (!port) { + pr_debug("RIO: %s: mport_%d not available\n", + __func__, (int)val); + return -EINVAL; + } + + if (!port->nscan) + return -EINVAL; + + if (port->host_deviceid >= 0) + rc = port->nscan->enumerate(port, 0); + else + rc = port->nscan->discover(port, RIO_SCAN_ENUM_NO_WAIT); +exit: + if (!rc) + rc = count; + + return rc; +} + +struct bus_attribute rio_bus_attrs[] = { + __ATTR(scan, (S_IWUSR|S_IWGRP), NULL, bus_scan_store), + __ATTR_NULL +}; diff --git a/drivers/rapidio/rio.c b/drivers/rapidio/rio.c index 6e75dda34799..cb1c08996fbb 100644 --- a/drivers/rapidio/rio.c +++ b/drivers/rapidio/rio.c @@ -1382,6 +1382,30 @@ EXPORT_SYMBOL_GPL(rio_dma_prep_slave_sg); #endif /* CONFIG_RAPIDIO_DMA_ENGINE */ +/** + * rio_find_mport - find RIO mport by its ID + * @mport_id: number (ID) of mport device + * + * Given a RIO mport number, the desired mport is located + * in the global list of mports. If the mport is found, a pointer to its + * data structure is returned. If no mport is found, %NULL is returned. + */ +struct rio_mport *rio_find_mport(int mport_id) +{ + struct rio_mport *port; + + mutex_lock(&rio_mport_list_lock); + list_for_each_entry(port, &rio_mports, node) { + if (port->id == mport_id) + goto found; + } + port = NULL; +found: + mutex_unlock(&rio_mport_list_lock); + + return port; +} + /** * rio_register_scan - enumeration/discovery method registration interface * @mport_id: mport device ID for which fabric scan routine has to be set @@ -1475,7 +1499,7 @@ static void disc_work_handler(struct work_struct *_work) work = container_of(_work, struct rio_disc_work, work); pr_debug("RIO: discovery work for mport %d %s\n", work->mport->id, work->mport->name); - work->mport->nscan->discover(work->mport); + work->mport->nscan->discover(work->mport, 0); } int rio_init_mports(void) @@ -1495,7 +1519,7 @@ int rio_init_mports(void) list_for_each_entry(port, &rio_mports, node) { if (port->host_deviceid >= 0) { if (port->nscan) - port->nscan->enumerate(port); + port->nscan->enumerate(port, 0); } else n++; } diff --git a/drivers/rapidio/rio.h b/drivers/rapidio/rio.h index 0afdf482517e..c14f864dea5c 100644 --- a/drivers/rapidio/rio.h +++ b/drivers/rapidio/rio.h @@ -45,9 +45,11 @@ extern int rio_enable_rx_tx_port(struct rio_mport *port, int local, u16 destid, extern int rio_register_scan(int mport_id, struct rio_scan *scan_ops); extern int rio_unregister_scan(int mport_id); extern void rio_attach_device(struct rio_dev *rdev); +extern struct rio_mport *rio_find_mport(int mport_id); /* Structures internal to the RIO core code */ extern struct device_attribute rio_dev_attrs[]; +extern struct bus_attribute rio_bus_attrs[]; extern struct rio_switch_ops __start_rio_switch_ops[]; extern struct rio_switch_ops __end_rio_switch_ops[]; diff --git a/include/linux/rio.h b/include/linux/rio.h index cd3796ee7410..18e099342e6f 100644 --- a/include/linux/rio.h +++ b/include/linux/rio.h @@ -265,6 +265,11 @@ struct rio_mport { struct rio_scan *nscan; }; +/* + * Enumeration/discovery control flags + */ +#define RIO_SCAN_ENUM_NO_WAIT 0x00000001 /* Do not wait for enum completed */ + struct rio_id_table { u16 start; /* logical minimal id */ u32 max; /* max number of IDs in table */ @@ -467,8 +472,8 @@ static inline struct rio_mport *dma_to_mport(struct dma_device *ddev) * @discover: Callback to perform RapidIO fabric discovery. */ struct rio_scan { - int (*enumerate)(struct rio_mport *mport); - int (*discover)(struct rio_mport *mport); + int (*enumerate)(struct rio_mport *mport, u32 flags); + int (*discover)(struct rio_mport *mport, u32 flags); }; /* Architecture and hardware-specific functions */ From 5eeb929390de7d5219483a1ca10cce4a84066099 Mon Sep 17 00:00:00 2001 From: Alexandre Bounine Date: Fri, 24 May 2013 15:55:07 -0700 Subject: [PATCH 04/30] rapidio: documentation update for enumeration changes Update RapidIO documentation to reflect changes made to enumeration/discovery build configuration and user space triggering mechanism. Signed-off-by: Alexandre Bounine Cc: Matt Porter Cc: Li Yang Cc: Kumar Gala Cc: Andre van Herk Cc: Micha Nelissen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/rapidio/rapidio.txt | 128 +++++++++++++++++++++++++++--- Documentation/rapidio/sysfs.txt | 17 ++++ 2 files changed, 134 insertions(+), 11 deletions(-) diff --git a/Documentation/rapidio/rapidio.txt b/Documentation/rapidio/rapidio.txt index c75694b35d08..a9c16c979da2 100644 --- a/Documentation/rapidio/rapidio.txt +++ b/Documentation/rapidio/rapidio.txt @@ -79,20 +79,63 @@ master port that is used to communicate with devices within the network. In order to initialize the RapidIO subsystem, a platform must initialize and register at least one master port within the RapidIO network. To register mport within the subsystem controller driver initialization code calls function -rio_register_mport() for each available master port. After all active master -ports are registered with a RapidIO subsystem, the rio_init_mports() routine -is called to perform enumeration and discovery. +rio_register_mport() for each available master port. -In the current PowerPC-based implementation a subsys_initcall() is specified to -perform controller initialization and mport registration. At the end it directly -calls rio_init_mports() to execute RapidIO enumeration and discovery. +RapidIO subsystem uses subsys_initcall() or device_initcall() to perform +controller initialization (depending on controller device type). + +After all active master ports are registered with a RapidIO subsystem, +an enumeration and/or discovery routine may be called automatically or +by user-space command. 4. Enumeration and Discovery ---------------------------- -When rio_init_mports() is called it scans a list of registered master ports and -calls an enumeration or discovery routine depending on the configured role of a -master port: host or agent. +4.1 Overview +------------ + +RapidIO subsystem configuration options allow users to specify enumeration and +discovery methods as statically linked components or loadable modules. +An enumeration/discovery method implementation and available input parameters +define how any given method can be attached to available RapidIO mports: +simply to all available mports OR individually to the specified mport device. + +Depending on selected enumeration/discovery build configuration, there are +several methods to initiate an enumeration and/or discovery process: + + (a) Statically linked enumeration and discovery process can be started + automatically during kernel initialization time using corresponding module + parameters. This was the original method used since introduction of RapidIO + subsystem. Now this method relies on enumerator module parameter which is + 'rio-scan.scan' for existing basic enumeration/discovery method. + When automatic start of enumeration/discovery is used a user has to ensure + that all discovering endpoints are started before the enumerating endpoint + and are waiting for enumeration to be completed. + Configuration option CONFIG_RAPIDIO_DISC_TIMEOUT defines time that discovering + endpoint waits for enumeration to be completed. If the specified timeout + expires the discovery process is terminated without obtaining RapidIO network + information. NOTE: a timed out discovery process may be restarted later using + a user-space command as it is described later if the given endpoint was + enumerated successfully. + + (b) Statically linked enumeration and discovery process can be started by + a command from user space. This initiation method provides more flexibility + for a system startup compared to the option (a) above. After all participating + endpoints have been successfully booted, an enumeration process shall be + started first by issuing a user-space command, after an enumeration is + completed a discovery process can be started on all remaining endpoints. + + (c) Modular enumeration and discovery process can be started by a command from + user space. After an enumeration/discovery module is loaded, a network scan + process can be started by issuing a user-space command. + Similar to the option (b) above, an enumerator has to be started first. + + (d) Modular enumeration and discovery process can be started by a module + initialization routine. In this case an enumerating module shall be loaded + first. + +When a network scan process is started it calls an enumeration or discovery +routine depending on the configured role of a master port: host or agent. Enumeration is performed by a master port if it is configured as a host port by assigning a host device ID greater than or equal to zero. A host device ID is @@ -104,8 +147,58 @@ for it. The enumeration and discovery routines use RapidIO maintenance transactions to access the configuration space of devices. -The enumeration process is implemented according to the enumeration algorithm -outlined in the RapidIO Interconnect Specification: Annex I [1]. +4.2 Automatic Start of Enumeration and Discovery +------------------------------------------------ + +Automatic enumeration/discovery start method is applicable only to built-in +enumeration/discovery RapidIO configuration selection. To enable automatic +enumeration/discovery start by existing basic enumerator method set use boot +command line parameter "rio-scan.scan=1". + +This configuration requires synchronized start of all RapidIO endpoints that +form a network which will be enumerated/discovered. Discovering endpoints have +to be started before an enumeration starts to ensure that all RapidIO +controllers have been initialized and are ready to be discovered. Configuration +parameter CONFIG_RAPIDIO_DISC_TIMEOUT defines time (in seconds) which +a discovering endpoint will wait for enumeration to be completed. + +When automatic enumeration/discovery start is selected, basic method's +initialization routine calls rio_init_mports() to perform enumeration or +discovery for all known mport devices. + +Depending on RapidIO network size and configuration this automatic +enumeration/discovery start method may be difficult to use due to the +requirement for synchronized start of all endpoints. + +4.3 User-space Start of Enumeration and Discovery +------------------------------------------------- + +User-space start of enumeration and discovery can be used with built-in and +modular build configurations. For user-space controlled start RapidIO subsystem +creates the sysfs write-only attribute file '/sys/bus/rapidio/scan'. To initiate +an enumeration or discovery process on specific mport device, a user needs to +write mport_ID (not RapidIO destination ID) into that file. The mport_ID is a +sequential number (0 ... RIO_MAX_MPORTS) assigned during mport device +registration. For example for machine with single RapidIO controller, mport_ID +for that controller always will be 0. + +To initiate RapidIO enumeration/discovery on all available mports a user may +write '-1' (or RIO_MPORT_ANY) into the scan attribute file. + +4.4 Basic Enumeration Method +---------------------------- + +This is an original enumeration/discovery method which is available since +first release of RapidIO subsystem code. The enumeration process is +implemented according to the enumeration algorithm outlined in the RapidIO +Interconnect Specification: Annex I [1]. + +This method can be configured as statically linked or loadable module. +The method's single parameter "scan" allows to trigger the enumeration/discovery +process from module initialization routine. + +This enumeration/discovery method can be started only once and does not support +unloading if it is built as a module. The enumeration process traverses the network using a recursive depth-first algorithm. When a new device is found, the enumerator takes ownership of that @@ -160,6 +253,19 @@ time period. If this wait time period expires before enumeration is completed, an agent skips RapidIO discovery and continues with remaining kernel initialization. +4.5 Adding New Enumeration/Discovery Method +------------------------------------------- + +RapidIO subsystem code organization allows addition of new enumeration/discovery +methods as new configuration options without significant impact to to the core +RapidIO code. + +A new enumeration/discovery method has to be attached to one or more mport +devices before an enumeration/discovery process can be started. Normally, +method's module initialization routine calls rio_register_scan() to attach +an enumerator to a specified mport device (or devices). The basic enumerator +implementation demonstrates this process. + 5. References ------------- diff --git a/Documentation/rapidio/sysfs.txt b/Documentation/rapidio/sysfs.txt index 97f71ce575d6..19878179da4c 100644 --- a/Documentation/rapidio/sysfs.txt +++ b/Documentation/rapidio/sysfs.txt @@ -88,3 +88,20 @@ that exports additional attributes. IDT_GEN2: errlog - reads contents of device error log until it is empty. + + +5. RapidIO Bus Attributes +------------------------- + +RapidIO bus subdirectory /sys/bus/rapidio implements the following bus-specific +attribute: + + scan - allows to trigger enumeration discovery process from user space. This + is a write-only attribute. To initiate an enumeration or discovery + process on specific mport device, a user needs to write mport_ID (not + RapidIO destination ID) into this file. The mport_ID is a sequential + number (0 ... RIO_MAX_MPORTS) assigned to the mport device. + For example, for a machine with a single RapidIO controller, mport_ID + for that controller always will be 0. + To initiate RapidIO enumeration/discovery on all available mports + a user must write '-1' (or RIO_MPORT_ANY) into this attribute file. From 7b92d03c3239f43e5b86c9cc9630f026d36ee995 Mon Sep 17 00:00:00 2001 From: OGAWA Hirofumi Date: Fri, 24 May 2013 15:55:08 -0700 Subject: [PATCH 05/30] fat: fix possible overflow for fat_clusters Intermediate value of fat_clusters can be overflowed on 32bits arch. Reported-by: Krzysztof Strasburger Signed-off-by: OGAWA Hirofumi Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/fat/inode.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/fat/inode.c b/fs/fat/inode.c index dfce656ddb33..5d4513cb1b3c 100644 --- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -1229,6 +1229,19 @@ static int fat_read_root(struct inode *inode) return 0; } +static unsigned long calc_fat_clusters(struct super_block *sb) +{ + struct msdos_sb_info *sbi = MSDOS_SB(sb); + + /* Divide first to avoid overflow */ + if (sbi->fat_bits != 12) { + unsigned long ent_per_sec = sb->s_blocksize * 8 / sbi->fat_bits; + return ent_per_sec * sbi->fat_length; + } + + return sbi->fat_length * sb->s_blocksize * 8 / sbi->fat_bits; +} + /* * Read the super block of an MS-DOS FS. */ @@ -1434,7 +1447,7 @@ int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat, sbi->dirty = b->fat16.state & FAT_STATE_DIRTY; /* check that FAT table does not overflow */ - fat_clusters = sbi->fat_length * sb->s_blocksize * 8 / sbi->fat_bits; + fat_clusters = calc_fat_clusters(sb); total_clusters = min(total_clusters, fat_clusters - FAT_START_ENT); if (total_clusters > MAX_FAT(sb)) { if (!silent) From 4c663cfc523a88d97a8309b04a089c27dc57fd7e Mon Sep 17 00:00:00 2001 From: Imre Deak Date: Fri, 24 May 2013 15:55:09 -0700 Subject: [PATCH 06/30] wait: fix false timeouts when using wait_event_timeout() Many callers of the wait_event_timeout() and wait_event_interruptible_timeout() expect that the return value will be positive if the specified condition becomes true before the timeout elapses. However, at the moment this isn't guaranteed. If the wake-up handler is delayed enough, the time remaining until timeout will be calculated as 0 - and passed back as a return value - even if the condition became true before the timeout has passed. Fix this by returning at least 1 if the condition becomes true. This semantic is in line with what wait_for_condition_timeout() does; see commit bb10ed09 ("sched: fix wait_for_completion_timeout() spurious failure under heavy load"). Daniel said "We have 3 instances of this bug in drm/i915. One case even where we switch between the interruptible and not interruptible wait_event_timeout variants, foolishly presuming they have the same semantics. I very much like this." One such bug is reported at https://bugs.freedesktop.org/show_bug.cgi?id=64133 Signed-off-by: Imre Deak Acked-by: Daniel Vetter Acked-by: David Howells Acked-by: Jens Axboe Cc: "Paul E. McKenney" Cc: Dave Jones Cc: Lukas Czerner Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/wait.h | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index ac38be2692d8..1133695eb067 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -217,6 +217,8 @@ do { \ if (!ret) \ break; \ } \ + if (!ret && (condition)) \ + ret = 1; \ finish_wait(&wq, &__wait); \ } while (0) @@ -233,8 +235,9 @@ do { \ * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * - * The function returns 0 if the @timeout elapsed, and the remaining - * jiffies if the condition evaluated to true before the timeout elapsed. + * The function returns 0 if the @timeout elapsed, or the remaining + * jiffies (at least 1) if the @condition evaluated to %true before + * the @timeout elapsed. */ #define wait_event_timeout(wq, condition, timeout) \ ({ \ @@ -302,6 +305,8 @@ do { \ ret = -ERESTARTSYS; \ break; \ } \ + if (!ret && (condition)) \ + ret = 1; \ finish_wait(&wq, &__wait); \ } while (0) @@ -318,9 +323,10 @@ do { \ * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * - * The function returns 0 if the @timeout elapsed, -ERESTARTSYS if it - * was interrupted by a signal, and the remaining jiffies otherwise - * if the condition evaluated to true before the timeout elapsed. + * Returns: + * 0 if the @timeout elapsed, -%ERESTARTSYS if it was interrupted by + * a signal, or the remaining jiffies (at least 1) if the @condition + * evaluated to %true before the @timeout elapsed. */ #define wait_event_interruptible_timeout(wq, condition, timeout) \ ({ \ From d34883d4e35c0a994e91dd847a82b4c9e0c31d83 Mon Sep 17 00:00:00 2001 From: Xiao Guangrong Date: Fri, 24 May 2013 15:55:11 -0700 Subject: [PATCH 07/30] mm: mmu_notifier: re-fix freed page still mapped in secondary MMU Commit 751efd8610d3 ("mmu_notifier_unregister NULL Pointer deref and multiple ->release()") breaks the fix 3ad3d901bbcf ("mm: mmu_notifier: fix freed page still mapped in secondary MMU"). Since hlist_for_each_entry_rcu() is changed now, we can not revert that patch directly, so this patch reverts the commit and simply fix the bug spotted by that patch This bug spotted by commit 751efd8610d3 is: There is a race condition between mmu_notifier_unregister() and __mmu_notifier_release(). Assume two tasks, one calling mmu_notifier_unregister() as a result of a filp_close() ->flush() callout (task A), and the other calling mmu_notifier_release() from an mmput() (task B). A B t1 srcu_read_lock() t2 if (!hlist_unhashed()) t3 srcu_read_unlock() t4 srcu_read_lock() t5 hlist_del_init_rcu() t6 synchronize_srcu() t7 srcu_read_unlock() t8 hlist_del_rcu() <--- NULL pointer deref. This can be fixed by using hlist_del_init_rcu instead of hlist_del_rcu. The another issue spotted in the commit is "multiple ->release() callouts", we needn't care it too much because it is really rare (e.g, can not happen on kvm since mmu-notify is unregistered after exit_mmap()) and the later call of multiple ->release should be fast since all the pages have already been released by the first call. Anyway, this issue should be fixed in a separate patch. -stable suggestions: Any version that has commit 751efd8610d3 need to be backported. I find the oldest version has this commit is 3.0-stable. [akpm@linux-foundation.org: tweak comments] Signed-off-by: Xiao Guangrong Tested-by: Robin Holt Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/mmu_notifier.c | 85 +++++++++++++++++++++++------------------------ 1 file changed, 42 insertions(+), 43 deletions(-) diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index be04122fb277..6725ff183374 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -40,48 +40,44 @@ void __mmu_notifier_release(struct mm_struct *mm) int id; /* - * srcu_read_lock() here will block synchronize_srcu() in - * mmu_notifier_unregister() until all registered - * ->release() callouts this function makes have - * returned. + * SRCU here will block mmu_notifier_unregister until + * ->release returns. */ id = srcu_read_lock(&srcu); + hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) + /* + * If ->release runs before mmu_notifier_unregister it must be + * handled, as it's the only way for the driver to flush all + * existing sptes and stop the driver from establishing any more + * sptes before all the pages in the mm are freed. + */ + if (mn->ops->release) + mn->ops->release(mn, mm); + srcu_read_unlock(&srcu, id); + spin_lock(&mm->mmu_notifier_mm->lock); while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) { mn = hlist_entry(mm->mmu_notifier_mm->list.first, struct mmu_notifier, hlist); - /* - * Unlink. This will prevent mmu_notifier_unregister() - * from also making the ->release() callout. + * We arrived before mmu_notifier_unregister so + * mmu_notifier_unregister will do nothing other than to wait + * for ->release to finish and for mmu_notifier_unregister to + * return. */ hlist_del_init_rcu(&mn->hlist); - spin_unlock(&mm->mmu_notifier_mm->lock); - - /* - * Clear sptes. (see 'release' description in mmu_notifier.h) - */ - if (mn->ops->release) - mn->ops->release(mn, mm); - - spin_lock(&mm->mmu_notifier_mm->lock); } spin_unlock(&mm->mmu_notifier_mm->lock); /* - * All callouts to ->release() which we have done are complete. - * Allow synchronize_srcu() in mmu_notifier_unregister() to complete - */ - srcu_read_unlock(&srcu, id); - - /* - * mmu_notifier_unregister() may have unlinked a notifier and may - * still be calling out to it. Additionally, other notifiers - * may have been active via vmtruncate() et. al. Block here - * to ensure that all notifier callouts for this mm have been - * completed and the sptes are really cleaned up before returning - * to exit_mmap(). + * synchronize_srcu here prevents mmu_notifier_release from returning to + * exit_mmap (which would proceed with freeing all pages in the mm) + * until the ->release method returns, if it was invoked by + * mmu_notifier_unregister. + * + * The mmu_notifier_mm can't go away from under us because one mm_count + * is held by exit_mmap. */ synchronize_srcu(&srcu); } @@ -292,31 +288,34 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) { BUG_ON(atomic_read(&mm->mm_count) <= 0); - spin_lock(&mm->mmu_notifier_mm->lock); if (!hlist_unhashed(&mn->hlist)) { + /* + * SRCU here will force exit_mmap to wait for ->release to + * finish before freeing the pages. + */ int id; - /* - * Ensure we synchronize up with __mmu_notifier_release(). - */ id = srcu_read_lock(&srcu); - - hlist_del_rcu(&mn->hlist); - spin_unlock(&mm->mmu_notifier_mm->lock); - + /* + * exit_mmap will block in mmu_notifier_release to guarantee + * that ->release is called before freeing the pages. + */ if (mn->ops->release) mn->ops->release(mn, mm); - - /* - * Allow __mmu_notifier_release() to complete. - */ srcu_read_unlock(&srcu, id); - } else + + spin_lock(&mm->mmu_notifier_mm->lock); + /* + * Can not use list_del_rcu() since __mmu_notifier_release + * can delete it before we hold the lock. + */ + hlist_del_init_rcu(&mn->hlist); spin_unlock(&mm->mmu_notifier_mm->lock); + } /* - * Wait for any running method to finish, including ->release() if it - * was run by __mmu_notifier_release() instead of us. + * Wait for any running method to finish, of course including + * ->release if it was run by mmu_notifier_relase instead of us. */ synchronize_srcu(&srcu); From afe1bb73f8ed588ab6268c27c5a447fe0484e48f Mon Sep 17 00:00:00 2001 From: Joseph Qi Date: Fri, 24 May 2013 15:55:12 -0700 Subject: [PATCH 08/30] ocfs2: unlock rw lock if inode lock failed In ocfs2_file_aio_write(), it does ocfs2_rw_lock() first and then ocfs2_inode_lock(). But if ocfs2_inode_lock() failed, it goes to out_sems without unlocking rw lock. This will cause a bug in ocfs2_lock_res_free() when testing res->l_ex_holders, which is increased in __ocfs2_cluster_lock() and decreased in __ocfs2_cluster_unlock(). Signed-off-by: Joseph Qi Cc: Joel Becker Cc: Mark Fasheh Cc: Li Zefan Cc: "Duyongfeng (B)" Acked-by: Sunil Mushran Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ocfs2/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 8a7509f9e6f5..ff54014a24ec 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2288,7 +2288,7 @@ relock: ret = ocfs2_inode_lock(inode, NULL, 1); if (ret < 0) { mlog_errno(ret); - goto out_sems; + goto out; } ocfs2_inode_unlock(inode, 1); From 26549c8d36a64d9130e4c0f32412be7ba6180923 Mon Sep 17 00:00:00 2001 From: Stephen Warren Date: Fri, 24 May 2013 15:55:13 -0700 Subject: [PATCH 09/30] drivers/video: implement a simple framebuffer driver A simple frame-buffer describes a raw memory region that may be rendered to, with the assumption that the display hardware has already been set up to scan out from that buffer. This is useful in cases where a bootloader exists and has set up the display hardware, but a Linux driver doesn't yet exist for the display hardware. Examples use-cases include: * The built-in LCD panels on the Samsung ARM chromebook, and Tegra devices, and likely many other ARM or embedded systems. These cannot yet be supported using a full graphics driver, since the panel control should be provided by the CDF (Common Display Framework), which has been stuck in design/review for quite some time. One could support these panels using custom SoC-specific code, but there is a desire to use common infra-structure rather than having each SoC vendor invent their own code, hence the desire to wait for CDF. * Hardware for which a full graphics driver is not yet available, and the path to obtain one upstream isn't yet clear. For example, the Raspberry Pi. * Any hardware in early stages of upstreaming, before a full graphics driver has been tackled. This driver can provide a graphical boot console (even full X support) much earlier in the upstreaming process, thus making new SoC or board support more generally useful earlier. [akpm@linux-foundation.org: make simplefb_formats[] static] Signed-off-by: Stephen Warren Cc: Arnd Bergmann Acked-by: Olof Johansson Cc: Rob Clark Cc: Florian Tobias Schandinat Cc: Tomasz Figa Cc: Laurent Pinchart Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .../bindings/video/simple-framebuffer.txt | 25 ++ drivers/video/Kconfig | 17 ++ drivers/video/Makefile | 1 + drivers/video/simplefb.c | 234 ++++++++++++++++++ 4 files changed, 277 insertions(+) create mode 100644 Documentation/devicetree/bindings/video/simple-framebuffer.txt create mode 100644 drivers/video/simplefb.c diff --git a/Documentation/devicetree/bindings/video/simple-framebuffer.txt b/Documentation/devicetree/bindings/video/simple-framebuffer.txt new file mode 100644 index 000000000000..3ea460583111 --- /dev/null +++ b/Documentation/devicetree/bindings/video/simple-framebuffer.txt @@ -0,0 +1,25 @@ +Simple Framebuffer + +A simple frame-buffer describes a raw memory region that may be rendered to, +with the assumption that the display hardware has already been set up to scan +out from that buffer. + +Required properties: +- compatible: "simple-framebuffer" +- reg: Should contain the location and size of the framebuffer memory. +- width: The width of the framebuffer in pixels. +- height: The height of the framebuffer in pixels. +- stride: The number of bytes in each line of the framebuffer. +- format: The format of the framebuffer surface. Valid values are: + - r5g6b5 (16-bit pixels, d[15:11]=r, d[10:5]=g, d[4:0]=b). + +Example: + + framebuffer { + compatible = "simple-framebuffer"; + reg = <0x1d385000 (1600 * 1200 * 2)>; + width = <1600>; + height = <1200>; + stride = <(1600 * 2)>; + format = "r5g6b5"; + }; diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig index d71d60f94fc1..62460561077e 100644 --- a/drivers/video/Kconfig +++ b/drivers/video/Kconfig @@ -2453,6 +2453,23 @@ config FB_HYPERV help This framebuffer driver supports Microsoft Hyper-V Synthetic Video. +config FB_SIMPLE + bool "Simple framebuffer support" + depends on (FB = y) && OF + select FB_CFB_FILLRECT + select FB_CFB_COPYAREA + select FB_CFB_IMAGEBLIT + help + Say Y if you want support for a simple frame-buffer. + + This driver assumes that the display hardware has been initialized + before the kernel boots, and the kernel will simply render to the + pre-allocated frame buffer surface. + + Configuration re: surface address, size, and format must be provided + through device tree, or potentially plain old platform data in the + future. + source "drivers/video/omap/Kconfig" source "drivers/video/omap2/Kconfig" source "drivers/video/exynos/Kconfig" diff --git a/drivers/video/Makefile b/drivers/video/Makefile index 7234e4a959e8..e8bae8dd4804 100644 --- a/drivers/video/Makefile +++ b/drivers/video/Makefile @@ -166,6 +166,7 @@ obj-$(CONFIG_FB_MX3) += mx3fb.o obj-$(CONFIG_FB_DA8XX) += da8xx-fb.o obj-$(CONFIG_FB_MXS) += mxsfb.o obj-$(CONFIG_FB_SSD1307) += ssd1307fb.o +obj-$(CONFIG_FB_SIMPLE) += simplefb.o # the test framebuffer is last obj-$(CONFIG_FB_VIRTUAL) += vfb.o diff --git a/drivers/video/simplefb.c b/drivers/video/simplefb.c new file mode 100644 index 000000000000..e2e9e3e61b72 --- /dev/null +++ b/drivers/video/simplefb.c @@ -0,0 +1,234 @@ +/* + * Simplest possible simple frame-buffer driver, as a platform device + * + * Copyright (c) 2013, Stephen Warren + * + * Based on q40fb.c, which was: + * Copyright (C) 2001 Richard Zidlicky + * + * Also based on offb.c, which was: + * Copyright (C) 1997 Geert Uytterhoeven + * Copyright (C) 1996 Paul Mackerras + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include +#include +#include +#include +#include + +static struct fb_fix_screeninfo simplefb_fix = { + .id = "simple", + .type = FB_TYPE_PACKED_PIXELS, + .visual = FB_VISUAL_TRUECOLOR, + .accel = FB_ACCEL_NONE, +}; + +static struct fb_var_screeninfo simplefb_var = { + .height = -1, + .width = -1, + .activate = FB_ACTIVATE_NOW, + .vmode = FB_VMODE_NONINTERLACED, +}; + +static int simplefb_setcolreg(u_int regno, u_int red, u_int green, u_int blue, + u_int transp, struct fb_info *info) +{ + u32 *pal = info->pseudo_palette; + u32 cr = red >> (16 - info->var.red.length); + u32 cg = green >> (16 - info->var.green.length); + u32 cb = blue >> (16 - info->var.blue.length); + u32 value; + + if (regno >= 16) + return -EINVAL; + + value = (cr << info->var.red.offset) | + (cg << info->var.green.offset) | + (cb << info->var.blue.offset); + if (info->var.transp.length > 0) { + u32 mask = (1 << info->var.transp.length) - 1; + mask <<= info->var.transp.offset; + value |= mask; + } + pal[regno] = value; + + return 0; +} + +static struct fb_ops simplefb_ops = { + .owner = THIS_MODULE, + .fb_setcolreg = simplefb_setcolreg, + .fb_fillrect = cfb_fillrect, + .fb_copyarea = cfb_copyarea, + .fb_imageblit = cfb_imageblit, +}; + +struct simplefb_format { + const char *name; + u32 bits_per_pixel; + struct fb_bitfield red; + struct fb_bitfield green; + struct fb_bitfield blue; + struct fb_bitfield transp; +}; + +static struct simplefb_format simplefb_formats[] = { + { "r5g6b5", 16, {11, 5}, {5, 6}, {0, 5}, {0, 0} }, +}; + +struct simplefb_params { + u32 width; + u32 height; + u32 stride; + struct simplefb_format *format; +}; + +static int simplefb_parse_dt(struct platform_device *pdev, + struct simplefb_params *params) +{ + struct device_node *np = pdev->dev.of_node; + int ret; + const char *format; + int i; + + ret = of_property_read_u32(np, "width", ¶ms->width); + if (ret) { + dev_err(&pdev->dev, "Can't parse width property\n"); + return ret; + } + + ret = of_property_read_u32(np, "height", ¶ms->height); + if (ret) { + dev_err(&pdev->dev, "Can't parse height property\n"); + return ret; + } + + ret = of_property_read_u32(np, "stride", ¶ms->stride); + if (ret) { + dev_err(&pdev->dev, "Can't parse stride property\n"); + return ret; + } + + ret = of_property_read_string(np, "format", &format); + if (ret) { + dev_err(&pdev->dev, "Can't parse format property\n"); + return ret; + } + params->format = NULL; + for (i = 0; i < ARRAY_SIZE(simplefb_formats); i++) { + if (strcmp(format, simplefb_formats[i].name)) + continue; + params->format = &simplefb_formats[i]; + break; + } + if (!params->format) { + dev_err(&pdev->dev, "Invalid format value\n"); + return -EINVAL; + } + + return 0; +} + +static int simplefb_probe(struct platform_device *pdev) +{ + int ret; + struct simplefb_params params; + struct fb_info *info; + struct resource *mem; + + if (fb_get_options("simplefb", NULL)) + return -ENODEV; + + ret = simplefb_parse_dt(pdev, ¶ms); + if (ret) + return ret; + + mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (!mem) { + dev_err(&pdev->dev, "No memory resource\n"); + return -EINVAL; + } + + info = framebuffer_alloc(sizeof(u32) * 16, &pdev->dev); + if (!info) + return -ENOMEM; + platform_set_drvdata(pdev, info); + + info->fix = simplefb_fix; + info->fix.smem_start = mem->start; + info->fix.smem_len = resource_size(mem); + info->fix.line_length = params.stride; + + info->var = simplefb_var; + info->var.xres = params.width; + info->var.yres = params.height; + info->var.xres_virtual = params.width; + info->var.yres_virtual = params.height; + info->var.bits_per_pixel = params.format->bits_per_pixel; + info->var.red = params.format->red; + info->var.green = params.format->green; + info->var.blue = params.format->blue; + info->var.transp = params.format->transp; + + info->fbops = &simplefb_ops; + info->flags = FBINFO_DEFAULT; + info->screen_base = devm_ioremap(&pdev->dev, info->fix.smem_start, + info->fix.smem_len); + if (!info->screen_base) { + framebuffer_release(info); + return -ENODEV; + } + info->pseudo_palette = (void *)(info + 1); + + ret = register_framebuffer(info); + if (ret < 0) { + dev_err(&pdev->dev, "Unable to register simplefb: %d\n", ret); + framebuffer_release(info); + return ret; + } + + dev_info(&pdev->dev, "fb%d: simplefb registered!\n", info->node); + + return 0; +} + +static int simplefb_remove(struct platform_device *pdev) +{ + struct fb_info *info = platform_get_drvdata(pdev); + + unregister_framebuffer(info); + framebuffer_release(info); + + return 0; +} + +static const struct of_device_id simplefb_of_match[] = { + { .compatible = "simple-framebuffer", }, + { }, +}; +MODULE_DEVICE_TABLE(of, simplefb_of_match); + +static struct platform_driver simplefb_driver = { + .driver = { + .name = "simple-framebuffer", + .owner = THIS_MODULE, + .of_match_table = simplefb_of_match, + }, + .probe = simplefb_probe, + .remove = simplefb_remove, +}; +module_platform_driver(simplefb_driver); + +MODULE_AUTHOR("Stephen Warren "); +MODULE_DESCRIPTION("Simple framebuffer driver"); +MODULE_LICENSE("GPL v2"); From 28ccddf7952c496df2a51ce5aee4f2a058a98bab Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 24 May 2013 15:55:15 -0700 Subject: [PATCH 10/30] mm: memcg: remove incorrect VM_BUG_ON for swap cache pages in uncharge Commit 0c59b89c81ea ("mm: memcg: push down PageSwapCache check into uncharge entry functions") added a VM_BUG_ON() on PageSwapCache in the uncharge path after checking that page flag once, assuming that the state is stable in all paths, but this is not the case and the condition triggers in user environments. An uncharge after the last page table reference to the page goes away can race with reclaim adding the page to swap cache. Swap cache pages are usually uncharged when they are freed after swapout, from a path that also handles swap usage accounting and memcg lifetime management. However, since the last page table reference is gone and thus no references to the swap slot left, the swap slot will be freed shortly when reclaim attempts to write the page to disk. The whole swap accounting is not even necessary. So while the race condition for which this VM_BUG_ON was added is real and actually existed all along, there are no negative effects. Remove the VM_BUG_ON again. Reported-by: Heiko Carstens Reported-by: Lingzhu Xiang Signed-off-by: Johannes Weiner Acked-by: Hugh Dickins Acked-by: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memcontrol.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index cb1c9dedf9b6..010d6c14129a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4108,8 +4108,6 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype, if (mem_cgroup_disabled()) return NULL; - VM_BUG_ON(PageSwapCache(page)); - if (PageTransHuge(page)) { nr_pages <<= compound_order(page); VM_BUG_ON(!PageTransHuge(page)); @@ -4205,6 +4203,18 @@ void mem_cgroup_uncharge_page(struct page *page) if (page_mapped(page)) return; VM_BUG_ON(page->mapping && !PageAnon(page)); + /* + * If the page is in swap cache, uncharge should be deferred + * to the swap path, which also properly accounts swap usage + * and handles memcg lifetime. + * + * Note that this check is not stable and reclaim may add the + * page to swap cache at any time after this. However, if the + * page is not in swap cache by the time page->mapcount hits + * 0, there won't be any page table references to the swap + * slot, and reclaim will free it and not actually write the + * page to disk. + */ if (PageSwapCache(page)) return; __mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_ANON, false); From fb09c3733a94b5f1dba50359d09c9e217c763fb9 Mon Sep 17 00:00:00 2001 From: Jeff Mahoney Date: Fri, 24 May 2013 15:55:16 -0700 Subject: [PATCH 11/30] hfs: avoid crash in hfs_bnode_create Commit 634725a92938 ("hfs: cleanup HFS+ prints") removed the BUG_ON in hfs_bnode_create in hfsplus. This patch removes it from the hfs version and avoids an fsfuzzer crash. Signed-off-by: Jeff Mahoney Acked-by: Jeff Mahoney Signed-off-by: Jiri Slaby Cc: Vyacheslav Dubeyko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/hfs/bnode.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c index f3b1a15ccd59..d3fa6bd9503e 100644 --- a/fs/hfs/bnode.c +++ b/fs/hfs/bnode.c @@ -415,7 +415,11 @@ struct hfs_bnode *hfs_bnode_create(struct hfs_btree *tree, u32 num) spin_lock(&tree->hash_lock); node = hfs_bnode_findhash(tree, num); spin_unlock(&tree->hash_lock); - BUG_ON(node); + if (node) { + pr_crit("new node %u already hashed?\n", num); + WARN_ON(1); + return node; + } node = __hfs_bnode_create(tree, num); if (!node) return ERR_PTR(-ENOMEM); From 1ccc819da6fda9bee10ab8b72e9adbb5ad3e4959 Mon Sep 17 00:00:00 2001 From: Alexandre Bounine Date: Fri, 24 May 2013 15:55:17 -0700 Subject: [PATCH 12/30] rapidio/tsi721: fix bug in MSI interrupt handling Fix bug in MSI interrupt handling which causes loss of event notifications. Typical indication of lost MSI interrupts are stalled message and doorbell transfers between RapidIO endpoints. To avoid loss of MSI interrupts all interrupts from the device must be disabled on entering the interrupt handler routine and re-enabled when exiting it. Re-enabling device interrupts will trigger new MSI message(s) if Tsi721 registered new events since entering interrupt handler routine. This patch is applicable to kernel versions starting from v3.2. Signed-off-by: Alexandre Bounine Cc: Matt Porter Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rapidio/devices/tsi721.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/rapidio/devices/tsi721.c b/drivers/rapidio/devices/tsi721.c index 6faba406b6e9..a8b2c23a7ef4 100644 --- a/drivers/rapidio/devices/tsi721.c +++ b/drivers/rapidio/devices/tsi721.c @@ -471,6 +471,10 @@ static irqreturn_t tsi721_irqhandler(int irq, void *ptr) u32 intval; u32 ch_inte; + /* For MSI mode disable all device-level interrupts */ + if (priv->flags & TSI721_USING_MSI) + iowrite32(0, priv->regs + TSI721_DEV_INTE); + dev_int = ioread32(priv->regs + TSI721_DEV_INT); if (!dev_int) return IRQ_NONE; @@ -560,6 +564,14 @@ static irqreturn_t tsi721_irqhandler(int irq, void *ptr) } } #endif + + /* For MSI mode re-enable device-level interrupts */ + if (priv->flags & TSI721_USING_MSI) { + dev_int = TSI721_DEV_INT_SR2PC_CH | TSI721_DEV_INT_SRIO | + TSI721_DEV_INT_SMSG_CH | TSI721_DEV_INT_BDMA_CH; + iowrite32(dev_int, priv->regs + TSI721_DEV_INTE); + } + return IRQ_HANDLED; } From c2cc499c5bcf9040a738f49e8051b42078205748 Mon Sep 17 00:00:00 2001 From: Leonid Yegoshin Date: Fri, 24 May 2013 15:55:18 -0700 Subject: [PATCH 13/30] mm compaction: fix of improper cache flush in migration code Page 'new' during MIGRATION can't be flushed with flush_cache_page(). Using flush_cache_page(vma, addr, pfn) is justified only if the page is already placed in process page table, and that is done right after flush_cache_page(). But without it the arch function has no knowledge of process PTE and does nothing. Besides that, flush_cache_page() flushes an application cache page, but the kernel has a different page virtual address and dirtied it. Replace it with flush_dcache_page(new) which is the proper usage. The old page is flushed in try_to_unmap_one() before migration. This bug takes place in Sead3 board with M14Kc MIPS CPU without cache aliasing (but Harvard arch - separate I and D cache) in tight memory environment (128MB) each 1-3days on SOAK test. It fails in cc1 during kernel build (SIGILL, SIGBUS, SIGSEG) if CONFIG_COMPACTION is switched ON. Signed-off-by: Leonid Yegoshin Cc: Leonid Yegoshin Acked-by: Rik van Riel Cc: Michal Hocko Acked-by: Mel Gorman Cc: Ralf Baechle Cc: Russell King Cc: David Miller Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 27ed22579fd9..b1f57501de9c 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -165,7 +165,7 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma, pte = arch_make_huge_pte(pte, vma, new, 0); } #endif - flush_cache_page(vma, addr, pte_pfn(pte)); + flush_dcache_page(new); set_pte_at(mm, addr, ptep, pte); if (PageHuge(new)) { From 7450231fb35492951e78a91b833fd935171f4e66 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 24 May 2013 15:55:20 -0700 Subject: [PATCH 14/30] linux/kernel.h: fix kernel-doc warning Fix kernel-doc warning in : Warning(include/linux/kernel.h:590): No description found for parameter 'ip' scripts/kernel-doc cannot handle macros, functions, or function prototypes between the function or macro that is being documented and its definition, so move these prototypes above the function that is being documented. Signed-off-by: Randy Dunlap Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kernel.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index e96329ceb28c..e9ef6d6b51d5 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -562,6 +562,9 @@ int __trace_bprintk(unsigned long ip, const char *fmt, ...); extern __printf(2, 3) int __trace_printk(unsigned long ip, const char *fmt, ...); +extern int __trace_bputs(unsigned long ip, const char *str); +extern int __trace_puts(unsigned long ip, const char *str, int size); + /** * trace_puts - write a string into the ftrace buffer * @str: the string to record @@ -587,8 +590,6 @@ int __trace_printk(unsigned long ip, const char *fmt, ...); * (1 when __trace_bputs is used, strlen(str) when __trace_puts is used) */ -extern int __trace_bputs(unsigned long ip, const char *str); -extern int __trace_puts(unsigned long ip, const char *str, int size); #define trace_puts(str) ({ \ static const char *trace_printk_fmt \ __attribute__((section("__trace_printk_fmt"))) = \ From 7c3425123ddfdc5f48e7913ff59d908789712b18 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Fri, 24 May 2013 15:55:21 -0700 Subject: [PATCH 15/30] mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer We should not use set_pmd_at to update pmd_t with pgtable_t pointer. set_pmd_at is used to set pmd with huge pte entries and architectures like ppc64, clear few flags from the pte when saving a new entry. Without this change we observe bad pte errors like below on ppc64 with THP enabled. BUG: Bad page map in process ld mm=0xc000001ee39f4780 pte:7fc3f37848000001 pmd:c000001ec0000000 Signed-off-by: Aneesh Kumar K.V Cc: Hugh Dickins Cc: Benjamin Herrenschmidt Reviewed-by: Andrea Arcangeli Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/huge_memory.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 03a89a2f464b..362c329b83fe 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2325,7 +2325,12 @@ static void collapse_huge_page(struct mm_struct *mm, pte_unmap(pte); spin_lock(&mm->page_table_lock); BUG_ON(!pmd_none(*pmd)); - set_pmd_at(mm, address, pmd, _pmd); + /* + * We can only use set_pmd_at when establishing + * hugepmds and never for establishing regular pmds that + * points to regular pagetables. Use pmd_populate for that + */ + pmd_populate(mm, pmd, pmd_pgtable(_pmd)); spin_unlock(&mm->page_table_lock); anon_vma_unlock_write(vma->anon_vma); goto out; From 4b949b8af12e24b8a48fa5bb775a13b558d9f4da Mon Sep 17 00:00:00 2001 From: Christian Gmeiner Date: Fri, 24 May 2013 15:55:22 -0700 Subject: [PATCH 16/30] drivers/leds/leds-ot200.c: fix error caused by shifted mask During the development of this driver an in-house register documentation was used. The last week some integration tests were done and this problem was found. It turned out that the released register documentation is wrong. The fix is very simple: shift all masks by one. Signed-off-by: Christian Gmeiner Cc: Bryan Wu Cc: Sebastian Andrzej Siewior Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/leds/leds-ot200.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/leds/leds-ot200.c b/drivers/leds/leds-ot200.c index ee14662ed5ce..98cae529373f 100644 --- a/drivers/leds/leds-ot200.c +++ b/drivers/leds/leds-ot200.c @@ -47,37 +47,37 @@ static struct ot200_led leds[] = { { .name = "led_1", .port = 0x49, - .mask = BIT(7), + .mask = BIT(6), }, { .name = "led_2", .port = 0x49, - .mask = BIT(6), + .mask = BIT(5), }, { .name = "led_3", .port = 0x49, - .mask = BIT(5), + .mask = BIT(4), }, { .name = "led_4", .port = 0x49, - .mask = BIT(4), + .mask = BIT(3), }, { .name = "led_5", .port = 0x49, - .mask = BIT(3), + .mask = BIT(2), }, { .name = "led_6", .port = 0x49, - .mask = BIT(2), + .mask = BIT(1), }, { .name = "led_7", .port = 0x49, - .mask = BIT(1), + .mask = BIT(0), } }; From 97c9266b11967e6401866b0111af59fa894180bf Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Fri, 24 May 2013 15:55:23 -0700 Subject: [PATCH 17/30] revert "selftest: add simple test for soft-dirty bit" Revert commit 58c7be84fec8 ("selftest: add simple test for soft-dirty bit"). This is the self test for Pavel's pagemap2 patches which didn't actually get merged. Reported-by: Pavel Emelyanov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/testing/selftests/Makefile | 1 - tools/testing/selftests/soft-dirty/Makefile | 10 -- .../testing/selftests/soft-dirty/soft-dirty.c | 114 ------------------ 3 files changed, 125 deletions(-) delete mode 100644 tools/testing/selftests/soft-dirty/Makefile delete mode 100644 tools/testing/selftests/soft-dirty/soft-dirty.c diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index d4abc59ce1d9..0a63658065f0 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -6,7 +6,6 @@ TARGETS += memory-hotplug TARGETS += mqueue TARGETS += net TARGETS += ptrace -TARGETS += soft-dirty TARGETS += vm all: diff --git a/tools/testing/selftests/soft-dirty/Makefile b/tools/testing/selftests/soft-dirty/Makefile deleted file mode 100644 index a9cdc823d6e0..000000000000 --- a/tools/testing/selftests/soft-dirty/Makefile +++ /dev/null @@ -1,10 +0,0 @@ -CFLAGS += -iquote../../../../include/uapi -Wall -soft-dirty: soft-dirty.c - -all: soft-dirty - -clean: - rm -f soft-dirty - -run_tests: all - @./soft-dirty || echo "soft-dirty selftests: [FAIL]" diff --git a/tools/testing/selftests/soft-dirty/soft-dirty.c b/tools/testing/selftests/soft-dirty/soft-dirty.c deleted file mode 100644 index aba4f87f87f0..000000000000 --- a/tools/testing/selftests/soft-dirty/soft-dirty.c +++ /dev/null @@ -1,114 +0,0 @@ -#include -#include -#include -#include -#include -#include - -typedef unsigned long long u64; - -#define PME_PRESENT (1ULL << 63) -#define PME_SOFT_DIRTY (1Ull << 55) - -#define PAGES_TO_TEST 3 -#ifndef PAGE_SIZE -#define PAGE_SIZE 4096 -#endif - -static void get_pagemap2(char *mem, u64 *map) -{ - int fd; - - fd = open("/proc/self/pagemap2", O_RDONLY); - if (fd < 0) { - perror("Can't open pagemap2"); - exit(1); - } - - lseek(fd, (unsigned long)mem / PAGE_SIZE * sizeof(u64), SEEK_SET); - read(fd, map, sizeof(u64) * PAGES_TO_TEST); - close(fd); -} - -static inline char map_p(u64 map) -{ - return map & PME_PRESENT ? 'p' : '-'; -} - -static inline char map_sd(u64 map) -{ - return map & PME_SOFT_DIRTY ? 'd' : '-'; -} - -static int check_pte(int step, int page, u64 *map, u64 want) -{ - if ((map[page] & want) != want) { - printf("Step %d Page %d has %c%c, want %c%c\n", - step, page, - map_p(map[page]), map_sd(map[page]), - map_p(want), map_sd(want)); - return 1; - } - - return 0; -} - -static void clear_refs(void) -{ - int fd; - char *v = "4"; - - fd = open("/proc/self/clear_refs", O_WRONLY); - if (write(fd, v, 3) < 3) { - perror("Can't clear soft-dirty bit"); - exit(1); - } - close(fd); -} - -int main(void) -{ - char *mem, x; - u64 map[PAGES_TO_TEST]; - - mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE, - PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, 0, 0); - - x = mem[0]; - mem[2 * PAGE_SIZE] = 'c'; - get_pagemap2(mem, map); - - if (check_pte(1, 0, map, PME_PRESENT)) - return 1; - if (check_pte(1, 1, map, 0)) - return 1; - if (check_pte(1, 2, map, PME_PRESENT | PME_SOFT_DIRTY)) - return 1; - - clear_refs(); - get_pagemap2(mem, map); - - if (check_pte(2, 0, map, PME_PRESENT)) - return 1; - if (check_pte(2, 1, map, 0)) - return 1; - if (check_pte(2, 2, map, PME_PRESENT)) - return 1; - - mem[0] = 'a'; - mem[PAGE_SIZE] = 'b'; - x = mem[2 * PAGE_SIZE]; - get_pagemap2(mem, map); - - if (check_pte(3, 0, map, PME_PRESENT | PME_SOFT_DIRTY)) - return 1; - if (check_pte(3, 1, map, PME_PRESENT | PME_SOFT_DIRTY)) - return 1; - if (check_pte(3, 2, map, PME_PRESENT)) - return 1; - - (void)x; /* gcc warn */ - - printf("PASS\n"); - return 0; -} From 6900807c6b95dcb004902302b8ac5dbfbf6feb89 Mon Sep 17 00:00:00 2001 From: Jeff Moyer Date: Fri, 24 May 2013 15:55:24 -0700 Subject: [PATCH 18/30] aio: fix io_getevents documentation In reviewing man pages, I noticed that io_getevents is documented to update the timeout that gets passed into the library call. This doesn't happen in kernel space or in the library (even though it's documented to do so in both places). Unless there is objection, I'd like to fix the comments/docs to match the code (I will also update the man page upon consensus). Signed-off-by: Jeff Moyer Signed-off-by: Benjamin LaHaise Acked-by: Cyril Hrubis Acked-by: Michael Kerrisk Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/aio.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index c5b1a8c10411..3fcdd73b6f1b 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1299,8 +1299,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, * < min_nr if the timeout specified by timeout has elapsed * before sufficient events are available, where timeout == NULL * specifies an infinite timeout. Note that the timeout pointed to by - * timeout is relative and will be updated if not NULL and the - * operation blocks. Will fail with -ENOSYS if not implemented. + * timeout is relative. Will fail with -ENOSYS if not implemented. */ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id, long, min_nr, From 387b8b3e37cb1c257fb607787f73815c30d22859 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 24 May 2013 15:55:25 -0700 Subject: [PATCH 19/30] auditfilter.c: fix kernel-doc warnings Fix kernel-doc warnings in kernel/auditfilter.c: Warning(kernel/auditfilter.c:1029): Excess function parameter 'loginuid' description in 'audit_receive_filter' Warning(kernel/auditfilter.c:1029): Excess function parameter 'sessionid' description in 'audit_receive_filter' Warning(kernel/auditfilter.c:1029): Excess function parameter 'sid' description in 'audit_receive_filter' Signed-off-by: Randy Dunlap Cc: Eric Paris Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/auditfilter.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c index 83a2970295d1..6bd4a90d1991 100644 --- a/kernel/auditfilter.c +++ b/kernel/auditfilter.c @@ -1021,9 +1021,6 @@ static void audit_log_rule_change(char *action, struct audit_krule *rule, int re * @seq: netlink audit message sequence (serial) number * @data: payload data * @datasz: size of payload data - * @loginuid: loginuid of sender - * @sessionid: sessionid for netlink audit message - * @sid: SE Linux Security ID of sender */ int audit_receive_filter(int type, int pid, int seq, void *data, size_t datasz) { From cac29af6bd6bc5c53499f39ef1eade193295b2f1 Mon Sep 17 00:00:00 2001 From: Lars-Peter Clausen Date: Fri, 24 May 2013 15:55:26 -0700 Subject: [PATCH 20/30] drivers/rtc/rtc-pl031.c: pass correct pointer to free_irq() free_irq() expects the same pointer that was passed to request_irq(), otherwise the IRQ is not freed. The issue was found using the following coccinelle script: @r1@ type T; T devid; @@ request_irq(..., devid) @r2@ type r1.T; T devid; position p; @@ free_irq@p(..., devid) @@ position p != r2.p; @@ *free_irq@p(...) Signed-off-by: Lars-Peter Clausen Cc: Srinidhi Kasagar Cc: Linus Walleij Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rtc/rtc-pl031.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-pl031.c b/drivers/rtc/rtc-pl031.c index 8900ea784817..0f0609b1aa2c 100644 --- a/drivers/rtc/rtc-pl031.c +++ b/drivers/rtc/rtc-pl031.c @@ -306,7 +306,7 @@ static int pl031_remove(struct amba_device *adev) struct pl031_local *ldata = dev_get_drvdata(&adev->dev); amba_set_drvdata(adev, NULL); - free_irq(adev->irq[0], ldata->rtc); + free_irq(adev->irq[0], ldata); rtc_device_unregister(ldata->rtc); iounmap(ldata->base); kfree(ldata); From e5ee7305ae03e43dbe2b0e346232975f793ad0eb Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 24 May 2013 15:55:27 -0700 Subject: [PATCH 21/30] fbdev: FB_GOLDFISH should depend on HAS_DMA If NO_DMA=y: drivers/built-in.o: In function `goldfish_fb_remove': drivers/video/goldfishfb.c:301: undefined reference to `dma_free_coherent' drivers/built-in.o: In function `goldfish_fb_probe': drivers/video/goldfishfb.c:247: undefined reference to `dma_alloc_coherent' drivers/video/goldfishfb.c:280: undefined reference to `dma_free_coherent' Signed-off-by: Geert Uytterhoeven Cc: Florian Tobias Schandinat Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/video/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig index 62460561077e..2e937bdace6f 100644 --- a/drivers/video/Kconfig +++ b/drivers/video/Kconfig @@ -2199,7 +2199,7 @@ config FB_XILINX config FB_GOLDFISH tristate "Goldfish Framebuffer" - depends on FB + depends on FB && HAS_DMA select FB_CFB_FILLRECT select FB_CFB_COPYAREA select FB_CFB_IMAGEBLIT From dfd20b2b174d3a9b258ea3b7a35ead33576587b1 Mon Sep 17 00:00:00 2001 From: Brian Behlendorf Date: Fri, 24 May 2013 15:55:28 -0700 Subject: [PATCH 22/30] drivers/block/brd.c: fix brd_lookup_page() race The index on the page must be set before it is inserted in the radix tree. Otherwise there is a small race which can occur during lookup where the page can be found with the incorrect index. This will trigger the BUG_ON() in brd_lookup_page(). Signed-off-by: Brian Behlendorf Reported-by: Chris Wedgwood Cc: Jens Axboe Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/brd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index f1a29f8e9d33..9bf4371755f2 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -117,13 +117,13 @@ static struct page *brd_insert_page(struct brd_device *brd, sector_t sector) spin_lock(&brd->brd_lock); idx = sector >> PAGE_SECTORS_SHIFT; + page->index = idx; if (radix_tree_insert(&brd->brd_pages, idx, page)) { __free_page(page); page = radix_tree_lookup(&brd->brd_pages, idx); BUG_ON(!page); BUG_ON(page->index != idx); - } else - page->index = idx; + } spin_unlock(&brd->brd_lock); radix_tree_preload_end(); From 136e8770cd5d1fe38b3c613100dd6dc4db6d4fa6 Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Fri, 24 May 2013 15:55:29 -0700 Subject: [PATCH 23/30] nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary DESCRIPTION: There are use-cases when NILFS2 file system (formatted with block size lesser than 4 KB) can be remounted in RO mode because of encountering of "broken bmap" issue. The issue was reported by Anthony Doggett : "The machine I've been trialling nilfs on is running Debian Testing, Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've also reproduced it (identically) with Debian Unstable amd64 and Debian Experimental (using the 3.8-trunk kernel). The problematic partitions were formatted with "mkfs.nilfs2 -b 1024 -B 8192"." SYMPTOMS: (1) System log contains error messages likewise: [63102.496756] nilfs_direct_assign: invalid pointer: 0 [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28) [63102.496798] [63102.524403] Remounting filesystem read-only (2) The NILFS2 file system is remounted in RO mode. REPRODUSING PATH: (1) Create volume group with name "unencrypted" by means of vgcreate utility. (2) Run script (prepared by Anthony Doggett ): ----------------[BEGIN SCRIPT]-------------------- VG=unencrypted lvcreate --size 2G --name ntest $VG mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest mkdir /var/tmp/n mkdir /var/tmp/n/ntest mount /dev/mapper/$VG-ntest /var/tmp/n/ntest mkdir /var/tmp/n/ntest/thedir cd /var/tmp/n/ntest/thedir sleep 2 date darcs init sleep 2 dmesg|tail -n 5 date darcs whatsnew || true date sleep 2 dmesg|tail -n 5 ----------------[END SCRIPT]-------------------- REPRODUCIBILITY: 100% INVESTIGATION: As it was discovered, the issue takes place during segment construction after executing such sequence of user-space operations: open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7 fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 ftruncate(7, 60) The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)" takes place because of trying to get block number for third block of the file with logical offset #3072 bytes. As it is possible to see from above output, the file has 60 bytes of the whole size. So, it is enough one block (1 KB in size) allocation for the whole file. Trying to operate with several blocks instead of one takes place because of discovering several dirty buffers for this file in nilfs_segctor_scan_file() method. The root cause of this issue is in nilfs_set_page_dirty function which is called just before writing to an mmapped page. When nilfs_page_mkwrite function handles a page at EOF boundary, it fills hole blocks only inside EOF through __block_page_mkwrite(). The __block_page_mkwrite() function calls set_page_dirty() after filling hole blocks, thus nilfs_set_page_dirty function (= a_ops->set_page_dirty) is called. However, the current implementation of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page at EOF boundary. As a result, buffers outside EOF are inconsistently marked dirty and queued for write even though they are not mapped with nilfs_get_block function. FIX: This modifies nilfs_set_page_dirty() not to mark hole blocks dirty. Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals for this issue. Signed-off-by: Ryusuke Konishi Reported-by: Anthony Doggett Reported-by: Vyacheslav Dubeyko Cc: Vyacheslav Dubeyko Tested-by: Ryusuke Konishi Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/nilfs2/inode.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c index 689fb608648e..bccfec8343c5 100644 --- a/fs/nilfs2/inode.c +++ b/fs/nilfs2/inode.c @@ -219,13 +219,32 @@ static int nilfs_writepage(struct page *page, struct writeback_control *wbc) static int nilfs_set_page_dirty(struct page *page) { - int ret = __set_page_dirty_buffers(page); + int ret = __set_page_dirty_nobuffers(page); - if (ret) { + if (page_has_buffers(page)) { struct inode *inode = page->mapping->host; - unsigned nr_dirty = 1 << (PAGE_SHIFT - inode->i_blkbits); + unsigned nr_dirty = 0; + struct buffer_head *bh, *head; - nilfs_set_file_dirty(inode, nr_dirty); + /* + * This page is locked by callers, and no other thread + * concurrently marks its buffers dirty since they are + * only dirtied through routines in fs/buffer.c in + * which call sites of mark_buffer_dirty are protected + * by page lock. + */ + bh = head = page_buffers(page); + do { + /* Do not mark hole blocks dirty */ + if (buffer_dirty(bh) || !buffer_mapped(bh)) + continue; + + set_buffer_dirty(bh); + nr_dirty++; + } while (bh = bh->b_this_page, bh != head); + + if (nr_dirty) + nilfs_set_file_dirty(inode, nr_dirty); } return ret; } From 348f9f05e0266822fa048f7fb3b039692a0cafbc Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 24 May 2013 15:55:30 -0700 Subject: [PATCH 24/30] mm/memory_hotplug.c: fix printk format warnings Fix printk format warnings in mm/memory_hotplug.c by using "%pa": mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'resource_size_t' [-Wformat] mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'resource_size_t' [-Wformat] Signed-off-by: Randy Dunlap Reported-by: Geert Uytterhoeven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory_hotplug.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a221fac1f47d..1ad92b46753e 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -720,9 +720,12 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn, start = phys_start_pfn << PAGE_SHIFT; size = nr_pages * PAGE_SIZE; ret = release_mem_region_adjustable(&iomem_resource, start, size); - if (ret) - pr_warn("Unable to release resource <%016llx-%016llx> (%d)\n", - start, start + size - 1, ret); + if (ret) { + resource_size_t endres = start + size - 1; + + pr_warn("Unable to release resource <%pa-%pa> (%d)\n", + &start, &endres, ret); + } sections_to_remove = nr_pages / PAGES_PER_SECTION; for (i = 0; i < sections_to_remove; i++) { From 1e7e2e05c179a68aaf8830fe91547a87f4589e53 Mon Sep 17 00:00:00 2001 From: Jarod Wilson Date: Fri, 24 May 2013 15:55:31 -0700 Subject: [PATCH 25/30] drivers/char/random.c: fix priming of last_data Commit ec8f02da9ea5 ("random: prime last_data value per fips requirements") added priming of last_data per fips requirements. Unfortuantely, it did so in a way that can lead to multiple threads all incrementing nbytes, but only one actually doing anything with the extra data, which leads to some fun random corruption and panics. The fix is to simply do everything needed to prime last_data in a single shot, so there's no window for multiple cpus to increment nbytes -- in fact, we won't even increment or decrement nbytes anymore, we'll just extract the needed EXTRACT_SIZE one time per pool and then carry on with the normal routine. All these changes have been tested across multiple hosts and architectures where panics were previously encoutered. The code changes are are strictly limited to areas only touched when when booted in fips mode. This change should also go into 3.8-stable, to make the myriads of fips users on 3.8.x happy. Signed-off-by: Jarod Wilson Tested-by: Jan Stancek Tested-by: Jan Stodola Cc: Herbert Xu Acked-by: Neil Horman Cc: "David S. Miller" Cc: Matt Mackall Cc: "Theodore Ts'o" Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/char/random.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index cd9a6211dcad..73e52b7796f9 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -957,10 +957,23 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf, { ssize_t ret = 0, i; __u8 tmp[EXTRACT_SIZE]; + unsigned long flags; /* if last_data isn't primed, we need EXTRACT_SIZE extra bytes */ - if (fips_enabled && !r->last_data_init) - nbytes += EXTRACT_SIZE; + if (fips_enabled) { + spin_lock_irqsave(&r->lock, flags); + if (!r->last_data_init) { + r->last_data_init = true; + spin_unlock_irqrestore(&r->lock, flags); + trace_extract_entropy(r->name, EXTRACT_SIZE, + r->entropy_count, _RET_IP_); + xfer_secondary_pool(r, EXTRACT_SIZE); + extract_buf(r, tmp); + spin_lock_irqsave(&r->lock, flags); + memcpy(r->last_data, tmp, EXTRACT_SIZE); + } + spin_unlock_irqrestore(&r->lock, flags); + } trace_extract_entropy(r->name, nbytes, r->entropy_count, _RET_IP_); xfer_secondary_pool(r, nbytes); @@ -970,19 +983,6 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf, extract_buf(r, tmp); if (fips_enabled) { - unsigned long flags; - - - /* prime last_data value if need be, per fips 140-2 */ - if (!r->last_data_init) { - spin_lock_irqsave(&r->lock, flags); - memcpy(r->last_data, tmp, EXTRACT_SIZE); - r->last_data_init = true; - nbytes -= EXTRACT_SIZE; - spin_unlock_irqrestore(&r->lock, flags); - extract_buf(r, tmp); - } - spin_lock_irqsave(&r->lock, flags); if (!memcmp(tmp, r->last_data, EXTRACT_SIZE)) panic("Hardware RNG duplicated output!\n"); From 10b3a32d292c21ea5b3ad5ca5975e88bb20b8d68 Mon Sep 17 00:00:00 2001 From: Jiri Kosina Date: Fri, 24 May 2013 15:55:33 -0700 Subject: [PATCH 26/30] random: fix accounting race condition with lockless irq entropy_count update Commit 902c098a3663 ("random: use lockless techniques in the interrupt path") turned IRQ path from being spinlock protected into lockless cmpxchg-retry update. That commit removed r->lock serialization between crediting entropy bits from IRQ context and accounting when extracting entropy on userspace read path, but didn't turn the r->entropy_count reads/updates in account() to use cmpxchg as well. It has been observed, that under certain circumstances this leads to read() on /dev/urandom to return 0 (EOF), as r->entropy_count gets corrupted and becomes negative, which in turn results in propagating 0 all the way from account() to the actual read() call. Convert the accounting code to be the proper lockless counterpart of what has been partially done by 902c098a3663. Signed-off-by: Jiri Kosina Cc: Theodore Ts'o Cc: Greg KH Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/char/random.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 73e52b7796f9..35487e8ded59 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -865,16 +865,24 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min, if (r->entropy_count / 8 < min + reserved) { nbytes = 0; } else { + int entropy_count, orig; +retry: + entropy_count = orig = ACCESS_ONCE(r->entropy_count); /* If limited, never pull more than available */ - if (r->limit && nbytes + reserved >= r->entropy_count / 8) - nbytes = r->entropy_count/8 - reserved; + if (r->limit && nbytes + reserved >= entropy_count / 8) + nbytes = entropy_count/8 - reserved; - if (r->entropy_count / 8 >= nbytes + reserved) - r->entropy_count -= nbytes*8; - else - r->entropy_count = reserved; + if (entropy_count / 8 >= nbytes + reserved) { + entropy_count -= nbytes*8; + if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig) + goto retry; + } else { + entropy_count = reserved; + if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig) + goto retry; + } - if (r->entropy_count < random_write_wakeup_thresh) + if (entropy_count < random_write_wakeup_thresh) wakeup_write = 1; } From b4ca2b4b577c3530e34dcfaafccb2cc680ce95d1 Mon Sep 17 00:00:00 2001 From: Joseph Qi Date: Fri, 24 May 2013 15:55:34 -0700 Subject: [PATCH 27/30] ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap() Last time we found there is lock/unlock bug in ocfs2_file_aio_write, and then we did a thorough search for all lock resources in ocfs2_inode_info, including rw, inode and open lockres and found this bug. My kernel version is 3.0.13, and it is also in the lastest version 3.9. In ocfs2_fiemap, once ocfs2_get_clusters_nocache failed, it should goto out_unlock instead of out, because we need release buffer head, up read alloc sem and unlock inode. Signed-off-by: Joseph Qi Reviewed-by: Jie Liu Cc: Mark Fasheh Cc: Joel Becker Acked-by: Sunil Mushran Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ocfs2/extent_map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c index 1c39efb71bab..2487116d0d33 100644 --- a/fs/ocfs2/extent_map.c +++ b/fs/ocfs2/extent_map.c @@ -790,7 +790,7 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, &hole_size, &rec, &is_last); if (ret) { mlog_errno(ret); - goto out; + goto out_unlock; } if (rec.e_blkno == 0ULL) { From 43c523bff7c3b47506d536c10637be8399dfd85f Mon Sep 17 00:00:00 2001 From: Tomasz Figa Date: Fri, 24 May 2013 15:55:35 -0700 Subject: [PATCH 28/30] drivers/rtc/rtc-max8998.c: check for pdata presence before dereferencing Currently the driver can crash with a NULL pointer dereference if no pdata is provided, despite of successful registration of the MFD part. This patch fixes the problem by adding a NULL check before dereferencing the pdata pointer. Signed-off-by: Tomasz Figa Signed-off-by: Kyungmin Park Cc: Sachin Kamat Reviewed-by: Jingoo Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rtc/rtc-max8998.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-max8998.c b/drivers/rtc/rtc-max8998.c index 48b6612fae7f..d5af7baa48b5 100644 --- a/drivers/rtc/rtc-max8998.c +++ b/drivers/rtc/rtc-max8998.c @@ -285,7 +285,7 @@ static int max8998_rtc_probe(struct platform_device *pdev) info->irq, ret); dev_info(&pdev->dev, "RTC CHIP NAME: %s\n", pdev->id_entry->name); - if (pdata->rtc_delay) { + if (pdata && pdata->rtc_delay) { info->lp3974_bug_workaround = true; dev_warn(&pdev->dev, "LP3974 with RTC REGERR option." " RTC updates will be extremely slow.\n"); From a9ff785e4437c83d2179161e012f5bdfbd6381f0 Mon Sep 17 00:00:00 2001 From: Cliff Wickman Date: Fri, 24 May 2013 15:55:36 -0700 Subject: [PATCH 29/30] mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas A panic can be caused by simply cat'ing /proc//smaps while an application has a VM_PFNMAP range. It happened in-house when a benchmarker was trying to decipher the memory layout of his program. /proc//smaps and similar walks through a user page table should not be looking at VM_PFNMAP areas. Certain tests in walk_page_range() (specifically split_huge_page_pmd()) assume that all the mapped PFN's are backed with page structures. And this is not usually true for VM_PFNMAP areas. This can result in panics on kernel page faults when attempting to address those page structures. There are a half dozen callers of walk_page_range() that walk through a task's entire page table (as N. Horiguchi pointed out). So rather than change all of them, this patch changes just walk_page_range() to ignore VM_PFNMAP areas. The logic of hugetlb_vma() is moved back into walk_page_range(), as we want to test any vma in the range. VM_PFNMAP areas are used by: - graphics memory manager gpu/drm/drm_gem.c - global reference unit sgi-gru/grufile.c - sgi special memory char/mspec.c - and probably several out-of-tree modules [akpm@linux-foundation.org: remove now-unused hugetlb_vma() stub] Signed-off-by: Cliff Wickman Reviewed-by: Naoya Horiguchi Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dave Hansen Cc: David Sterba Cc: Johannes Weiner Cc: KOSAKI Motohiro Cc: "Kirill A. Shutemov" Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/pagewalk.c | 72 ++++++++++++++++++++++++++------------------------- 1 file changed, 37 insertions(+), 35 deletions(-) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 35aa294656cd..5da2cbcfdbb5 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -127,28 +127,7 @@ static int walk_hugetlb_range(struct vm_area_struct *vma, return 0; } -static struct vm_area_struct* hugetlb_vma(unsigned long addr, struct mm_walk *walk) -{ - struct vm_area_struct *vma; - - /* We don't need vma lookup at all. */ - if (!walk->hugetlb_entry) - return NULL; - - VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem)); - vma = find_vma(walk->mm, addr); - if (vma && vma->vm_start <= addr && is_vm_hugetlb_page(vma)) - return vma; - - return NULL; -} - #else /* CONFIG_HUGETLB_PAGE */ -static struct vm_area_struct* hugetlb_vma(unsigned long addr, struct mm_walk *walk) -{ - return NULL; -} - static int walk_hugetlb_range(struct vm_area_struct *vma, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -198,30 +177,53 @@ int walk_page_range(unsigned long addr, unsigned long end, if (!walk->mm) return -EINVAL; + VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem)); + pgd = pgd_offset(walk->mm, addr); do { - struct vm_area_struct *vma; + struct vm_area_struct *vma = NULL; next = pgd_addr_end(addr, end); /* - * handle hugetlb vma individually because pagetable walk for - * the hugetlb page is dependent on the architecture and - * we can't handled it in the same manner as non-huge pages. + * This function was not intended to be vma based. + * But there are vma special cases to be handled: + * - hugetlb vma's + * - VM_PFNMAP vma's */ - vma = hugetlb_vma(addr, walk); + vma = find_vma(walk->mm, addr); if (vma) { - if (vma->vm_end < next) - next = vma->vm_end; /* - * Hugepage is very tightly coupled with vma, so - * walk through hugetlb entries within a given vma. + * There are no page structures backing a VM_PFNMAP + * range, so do not allow split_huge_page_pmd(). */ - err = walk_hugetlb_range(vma, addr, next, walk); - if (err) - break; - pgd = pgd_offset(walk->mm, next); - continue; + if ((vma->vm_start <= addr) && + (vma->vm_flags & VM_PFNMAP)) { + next = vma->vm_end; + pgd = pgd_offset(walk->mm, next); + continue; + } + /* + * Handle hugetlb vma individually because pagetable + * walk for the hugetlb page is dependent on the + * architecture and we can't handled it in the same + * manner as non-huge pages. + */ + if (walk->hugetlb_entry && (vma->vm_start <= addr) && + is_vm_hugetlb_page(vma)) { + if (vma->vm_end < next) + next = vma->vm_end; + /* + * Hugepage is very tightly coupled with vma, + * so walk through hugetlb entries within a + * given vma. + */ + err = walk_hugetlb_range(vma, addr, next, walk); + if (err) + break; + pgd = pgd_offset(walk->mm, next); + continue; + } } if (pgd_none_or_clear_bad(pgd)) { From 03e04f048d2774aabd126fbad84729d4ba9dc40a Mon Sep 17 00:00:00 2001 From: Benjamin LaHaise Date: Fri, 24 May 2013 15:55:38 -0700 Subject: [PATCH 30/30] aio: fix kioctx not being freed after cancellation at exit time The recent changes overhauling fs/aio.c introduced a bug that results in the kioctx not being freed when outstanding kiocbs are cancelled at exit_aio() time. Specifically, a kiocb that is cancelled has its completion events discarded by batch_complete_aio(), which then fails to wake up the process stuck in free_ioctx(). Fix this by modifying the wait_event() condition in free_ioctx() appropriately. This patch was tested with the cancel operation in the thread based code posted yesterday. [akpm@linux-foundation.org: fix build] Signed-off-by: Benjamin LaHaise Signed-off-by: Kent Overstreet Cc: Kent Overstreet Cc: Josh Boyer Cc: Zach Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/aio.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/aio.c b/fs/aio.c index 3fcdd73b6f1b..7fe5bdee1630 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -307,7 +307,9 @@ static void free_ioctx(struct kioctx *ctx) kunmap_atomic(ring); while (atomic_read(&ctx->reqs_active) > 0) { - wait_event(ctx->wait, head != ctx->tail); + wait_event(ctx->wait, + head != ctx->tail || + atomic_read(&ctx->reqs_active) <= 0); avail = (head <= ctx->tail ? ctx->tail : ctx->nr_events) - head;