linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Eric Anholt	cb290d827e	irqchip/bcm2836: Fix compiler warning on 64-bit build Signed-off-by: Eric Anholt <eric@anholt.net> Acked-by: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:56 +01:00
Eric Anholt	0dc17be876	irqchip/bcm2836: Drop smp_set_ops on arm64 builds For arm64, the bootloader will instead be implementing the spin-table enable method. Signed-off-by: Eric Anholt <eric@anholt.net> Acked-by: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:45 +01:00
Jon Hunter	d6490461a1	irqchip/gic: Add helper functions for GIC setup and teardown Move the code that sets-up a GIC via device-tree into it's own function and add a generic function for GIC teardown that can be used for both device-tree and ACPI to unmap the GIC memory. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:44 +01:00
Jon Hunter	f673b9b5cb	irqchip/gic: Store GIC configuration parameters Store the GIC configuration parameters in the GIC chip data structure. This will allow us to simplify the code by reducing the number of parameters passed between functions. Update the __gic_init_bases() function so that we only need to pass a pointer to the GIC chip data structure and no longer need to pass the GIC index in order to look-up the chip data. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:44 +01:00
Jon Hunter	6e5b5924d9	irqchip/gic: Pass GIC pointer to save/restore functions Instead of passing the GIC index to the save/restore functions pass a pointer to the GIC chip data. This will allow these save/restore functions to be re-used by a platform driver where the GIC chip data structure is allocated dynamically and so there is no applicable index for identifying the GIC. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:43 +01:00
Jon Hunter	dc9722cc57	irqchip/gic: Return an error if GIC initialisation fails If the GIC initialisation fails, then currently we do not return an error or clean-up afterwards. Although for root controllers, this failure may be fatal anyway, for secondary controllers, it may not be fatal and so return an error on failure and clean-up. Update the functions gic_cpu_init() and gic_pm_init() to return an error instead of calling BUG() and perform any necessary clean-up. For non-banked GIC controllers, make sure that we free any memory allocated if we fail to initialise the IRQ domain. Please note that free_percpu() only frees memory if the pointer passed to it is not NULL and so it is unnecessary to check if both pointers are valid or not. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:43 +01:00
Jon Hunter	c2baa2f3f4	irqchip/gic: Remove static irq_chip definition for eoimode1 There are only 3 differences (not including the name) in the definitions of the gic_chip and gic_eoimode1_chip structures. Instead of statically defining the gic_eoimode1_chip structure, remove it and populate the eoimode1 functions dynamically for the appropriate GIC irqchips. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:43 +01:00
Jon Hunter	26acfe7463	irqchip/gic: Don't initialise chip if mapping IO space fails If we fail to map the address space for the GIC distributor or CPU interface, then don't attempt to initialise the chip, just WARN and return. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:42 +01:00
Jon Hunter	992345a58e	irqchip/gic: WARN if setting the interrupt type for a PPI fails Setting the interrupt type for private peripheral interrupts (PPIs) may not be supported by a given GIC because it is IMPLEMENTATION DEFINED whether this is allowed. There is no way to know if setting the type is supported for a given GIC and so the value written is read back to verify it matches the desired configuration. If it does not match then an error is return. There are cases where the interrupt configuration read from firmware (such as a device-tree blob), has been incorrect and hence gic_configure_irq() has returned an error. This error has gone undetected because the error code returned was ignored but the interrupt still worked fine because the configuration for the interrupt could not be overwritten. Given that this has done undetected and that failing to set the configuration for a PPI may not be a catastrophic, don't return an error but WARN if we fail to configure a PPI. This will allows us to fix up any places in the kernel where we should be checking the return status and maintain backward compatibility with firmware images that may have incorrect PPI configurations. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:42 +01:00
Jon Hunter	ec1a454d61	irqchip/gic: Don't unnecessarily write the IRQ configuration If the interrupt configuration matches the current configuration, then don't bother writing the configuration again. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:41 +01:00
Jon Hunter	a2a8fa5563	irqchip: Mask the non-type/sense bits when translating an IRQ The firmware parameter that contains the IRQ sense bits may also contain other data. When return the IRQ type, bits outside of these sense bits should be masked. If these bits are not masked and irq_create_fwspec_mapping() is called to map an IRQ, then the comparison of the type returned from irq_domain_translate() will never match that returned by irq_get_trigger_type() (because this function masks the none sense bits) and so we will always call irq_set_irq_type() to program the type even if it was not really necessary. Currently, the downside to this is unnecessarily re-programmming the type but nevertheless this should be avoided. The Tegra LIC and TI Crossbar irqchips all have client instances (from reviewing the device-tree sources) where bits outside the IRQ sense bits are set, but do not mask these bits. Therefore, ensure these bits are masked for these irqchips. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:41 +01:00
Jon Hunter	9b5d585d14	genirq: Ensure IRQ descriptor is valid when setting-up the IRQ In the function, setup_irq(), we don't check that the descriptor returned from irq_to_desc() is valid before we start using it. For example chip_bus_lock() called from setup_irq(), assumes that the descriptor pointer is valid and doesn't check before dereferencing it. In many other functions including setup/free_percpu_irq() we do check that the descriptor returned is not NULL and therefore add the same test to setup_irq() to ensure the descriptor returned is valid. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:41 +01:00
Marc Zyngier	7c9b973061	irqchip/gic-v3: Configure all interrupts as non-secure Group-1 The GICv3 driver wrongly assumes that it runs on the non-secure side of a secure-enabled system, while it could be on a system with a single security state, or a GICv3 with GICD_CTLR.DS set. Either way, it is important to configure this properly, or interrupts will simply not be delivered on this HW. Cc: stable@vger.kernel.org Reported-by: Peter Maydell <peter.maydell@linaro.org> Tested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:40 +01:00
Ray Jui	74c967aaff	irqchip/gic-v2m: Add workaround for Broadcom NS2 GICv2m erratum Alex Barba <alex.barba@broadcom.com> discovered Broadcom NS2 GICv2m implementation has an erratum where the MSI data needs to be the SPI number subtracted by an offset of 32, for the correct MSI interrupt to be triggered. Here we are adding the workaround based on readings from the MSI_IIDR register, which contains a value unique to Broadcom NS2 GICv2m Reported-by: Alex Barba <alex.barba@broadcom.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:40 +01:00
Christoph Hellwig	1228d53d3d	irqchip/irq-alpine-msi: Don't use <asm-generic/msi.h> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:25 +01:00
Dan Carpenter	086eec2de0	irqchip/mbigen: Checking for IS_ERR() instead of NULL of_platform_device_create() returns NULL on error, it never returns error pointers. Fixes: `ed2a1002d2` ('irqchip/mbigen: Handle multiple device nodes in a mbigen module') Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:13 +01:00
Marc Zyngier	074f23b675	irqchip/gic-v3: Remove inexistant register definition The GICv3 include file defines GICR_ISACTIVER and GICR_ICACTIVER in the RD_base page. News flash, they do not exist (probably a copy/paste brain fart). Just drop them. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:12 +01:00
Shanker Donthineni	466b7d1688	irqchip/gicv3-its: Don't allow devices whose ID is outside range We are not checking whether the requested device identifier fits into the device table memory or not. The function its_create_device() assumes that enough memory has been allocated for whole DevID space (reported by ITS_TYPER.Devbits) during the ITS probe() and continues to initialize ITS hardware. This assumption is not perfect, sometimes we reduce memory size either because of its size crossing MAX_ORDER-1 or BASERn max size limit. The MAPD command fails if 'Device ID' is outside of device table range. Add a simple validation check to avoid MAPD failures since we are not handling ITS command errors. This change also helps to return an error -ENOMEM instead of success to caller. Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:12 +01:00
Vladimir Zapolskiy	8cb17b5ed0	irqchip: Add LPC32xx interrupt controller driver The change adds improved support of NXP LPC32xx MIC, SIC1 and SIC2 interrupt controllers. This is a list of new features in comparison to the legacy driver: * irq types are taken from device tree settings, no more need to hardcode them, * old driver is based on irq_domain_add_legacy, which causes problems with handling MIC hardware interrupt 0 produced by SIC1, * there is one driver for MIC, SIC1 and SIC2, no more need to handle them separately, e.g. have two separate handlers for SIC1 and SIC2, * the driver does not have any dependencies on hardcoded register offsets, * the driver is much simpler for maintenance, * SPARSE_IRQS option is supported. Legacy LPC32xx interrupt controller driver was broken since commit `76ba59f836` ("genirq: Add irq_domain-aware core IRQ handler"), which requires a private interrupt handler, otherwise any SIC1 generated interrupt (mapped to MIC hwirq 0) breaks the kernel with the message "unexpected IRQ trap at vector 00". The change disables compilation of a legacy driver found at arch/arm/mach-lpc32xx/irq.c, the file will be removed in a separate commit. Fixes: `76ba59f836` ("genirq: Add irq_domain-aware core IRQ handler") Tested-by: Sylvain Lemieux <slemieux.tyco@gmail.com> Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:12:11 +01:00
Will Deacon	f86c4fbd93	irqchip/gic: Ensure ordering between read of INTACK and shared data When an IPI is generated by a CPU, the pattern looks roughly like: <write shared data> smp_wmb(); <write to GIC to signal SGI> On the receiving CPU we rely on the fact that, once we've taken the interrupt, then the freshly written shared data must be visible to us. Put another way, the CPU isn't going to speculate taking an interrupt. Unfortunately, this assumption turns out to be broken. Consider that CPUx wants to send an IPI to CPUy, which will cause CPUy to read some shared_data. Before CPUx has done anything, a random peripheral raises an IRQ to the GIC and the IRQ line on CPUy is raised. CPUy then takes the IRQ and starts executing the entry code, heading towards gic_handle_irq. Furthermore, let's assume that a bunch of the previous interrupts handled by CPUy were SGIs, so the branch predictor kicks in and speculates that irqnr will be <16 and we're likely to head into handle_IPI. The prefetcher then grabs a speculative copy of shared_data which contains a stale value. Meanwhile, CPUx gets round to updating shared_data and asking the GIC to send an SGI to CPUy. Internally, the GIC decides that the SGI is more important than the peripheral interrupt (which hasn't yet been ACKed) but doesn't need to do anything to CPUy, because the IRQ line is already raised. CPUy then reads the ACK register on the GIC, sees the SGI value which confirms the branch prediction and we end up with a stale shared_data value. This patch fixes the problem by adding an smp_rmb() to the IPI entry code in gic_handle_irq. As it turns out, the combination of a control dependency and an ISB instruction from the EOI in the GICv3 driver is enough to provide the ordering we need, so we add a comment there justifying the absence of an explicit smp_rmb(). Cc: stable@vger.kernel.org Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-11 10:11:51 +01:00
Minghuan Lian	b8f3ebe630	irqchip: Add Layerscape SCFG MSI controller support Some kind of Freescale Layerscape SoC provides a MSI implementation which uses two SCFG registers MSIIR and MSIR to support 32 MSI interrupts for each PCIe controller. The patch is to support it. Signed-off-by: Minghuan Lian <Minghuan.Lian@nxp.com> Tested-by: Alexander Stein <alexander.stein@systec-electronic.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-04 09:58:04 +01:00
Minghuan Lian	5e79cb29dd	dt/bindings: Add bindings for Layerscape SCFG MSI Some Layerscape SoCs use a simple MSI controller implementation. It contains only two SCFG register to trigger and describe a group 32 MSI interrupts. The patch adds bindings to describe the controller. Signed-off-by: Minghuan Lian <Minghuan.Lian@nxp.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-04 09:54:21 +01:00
Marc Zyngier	287e9357ab	DT/arm,gic-v3: Documment PPI partition support Add a decription of the PPI partitioning support. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Rob Herring <robh+dt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: Jason Cooper <jason@lakedaemon.net> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1460365075-7316-6-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-05-02 13:42:51 +02:00
Marc Zyngier	e3825ba1af	irqchip/gic-v3: Add support for partitioned PPIs Plug the partitioning layer into the GICv3 PPI code, parsing the DT and building the partition affinities and providing the generic code with partition data and callbacks. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: Jason Cooper <jason@lakedaemon.net> Cc: Will Deacon <will.deacon@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Link: http://lkml.kernel.org/r/1460365075-7316-5-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-05-02 13:42:51 +02:00
Marc Zyngier	9e2c986cb4	irqchip: Add per-cpu interrupt partitioning library We've unfortunately started seeing a situation where percpu interrupts are partitioned in the system: one arbitrary set of CPUs has an interrupt connected to a type of device, while another disjoint set of CPUs has the same interrupt connected to another type of device. This makes it impossible to have a device driver requesting this interrupt using the current percpu-interrupt abstraction, as the same interrupt number is now potentially claimed by at least two drivers, and we forbid interrupt sharing on per-cpu interrupt. A solution to this is to turn things upside down. Let's assume that our system describes all the possible partitions for a given interrupt, and give each of them a unique identifier. It is then possible to create a namespace where the affinity identifier itself is a form of interrupt number. At this point, it becomes easy to implement a set of partitions as a cascaded irqchip, each affinity identifier being the HW irq. This allows us to keep a number of nice properties: - Each partition results in a separate percpu-interrupt (with a restrictied affinity), which keeps drivers happy. - Because the underlying interrupt is still per-cpu, the overhead of the indirection can be kept pretty minimal. - The core code can ignore most of that crap. For that purpose, we implement a small library that deals with some of the boilerplate code, relying on platform-specific drivers to provide a description of the affinity sets and a set of callbacks. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: Jason Cooper <jason@lakedaemon.net> Cc: Will Deacon <will.deacon@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Link: http://lkml.kernel.org/r/1460365075-7316-4-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-05-02 13:42:51 +02:00
Marc Zyngier	222df54fd8	genirq: Allow the affinity of a percpu interrupt to be set/retrieved In order to prepare the genirq layer for the concept of partitionned percpu interrupts, let's allow an affinity to be associated with such an interrupt. We introduce: - irq_set_percpu_devid_partition: flag an interrupt as a percpu-devid interrupt, and associate it with an affinity - irq_get_percpu_devid_partition: allow the affinity of that interrupt to be retrieved. This will allow a driver to discover which CPUs the per-cpu interrupt can actually fire on. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: Jason Cooper <jason@lakedaemon.net> Cc: Will Deacon <will.deacon@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Link: http://lkml.kernel.org/r/1460365075-7316-3-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-05-02 13:42:51 +02:00
Marc Zyngier	651e8b54ab	irqdomain: Allow domain matching on irq_fwspec When iterating over the irq domain list, we try to match a domain either by calling a match() function or by comparing a number of fields passed as parameters. Both approaches are a bit restrictive: - match() is DT specific and only takes a device node - the fallback case only deals with the fwnode_handle It would be useful if we had a per-domain function that would actually perform the matching check on the whole of the irq_fwspec structure. This would allow for a domain to triage matching attempts that need to extend beyond the fwnode. Let's introduce irq_find_matching_fwspec(), which takes a full blown irq_fwspec structure, and call into a select() function implemented by the irqdomain. irq_find_matching_fwnode() is made a wrapper around irq_find_matching_fwspec in order to preserve compatibility. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: Jason Cooper <jason@lakedaemon.net> Cc: Will Deacon <will.deacon@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Link: http://lkml.kernel.org/r/1460365075-7316-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-05-02 13:42:50 +02:00
Matt Redfearn	7cec18a390	genirq: Add error code reporting to irq_{reserve,destroy}_ipi Make these functions return appropriate error codes when something goes wrong. Previously irq_destroy_ipi returned void making it impossible to notify the caller if the request could not be fulfilled. Patch 1 in the series added another condition in which this could fail in addition to the existing ones. irq_reserve_ipi returned an unsigned int meaning it could only return 0 on failure and give the caller no indication as to why the request failed. As time goes on there are likely to be further conditions added in which these functions can fail. These APIs and the IPI IRQ domain are new in 4.6 and the number of existing call sites are low, changing the API now has little impact on the code, while making it easier for these functions to grow over time. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: linux-mips@linux-mips.org Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: ralf@linux-mips.org Cc: Qais Yousef <qsyousef@gmail.com> Cc: lisa.parratt@imgtec.com Cc: jiang.liu@linux.intel.com Link: http://lkml.kernel.org/r/1461568464-31701-2-git-send-email-matt.redfearn@imgtec.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-05-02 13:42:50 +02:00
Matt Redfearn	01292cea0d	genirq: Make irq_destroy_ipi take a cpumask of IPIs to destroy Previously irq_destroy_ipi() would destroy IPIs to all CPUs that were configured by irq_reserve_ipi(). This change makes it possible to destroy just a subset of the IPIs. This may be useful to remove IPIs to CPUs that have been hot removed so that the IRQ numbers allocated within the IPI domain can be re-used. The original behaviour is restored by passing the complete mask that the IPI was created with. There are currently no users of this function that would break from the API change. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: linux-mips@linux-mips.org Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: ralf@linux-mips.org Cc: Qais Yousef <qsyousef@gmail.com> Cc: lisa.parratt@imgtec.com Cc: jiang.liu@linux.intel.com Link: http://lkml.kernel.org/r/1461568464-31701-1-git-send-email-matt.redfearn@imgtec.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-05-02 13:42:50 +02:00
Linus Torvalds	b75a2bf899	Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "So, it turns out we had a silly bug in the most fundamental part of workqueue for a very long time. AFAICS, this dates back to pre-git era and has quite likely been there from the time workqueue was first introduced. A work item uses its PENDING bit to synchronize multiple queuers. Anyone who wins the PENDING bit owns the pending state of the work item. Whether a queuer wins or loses the race, one thing should be guaranteed - there will soon be at least one execution of the work item - where "after" means that the execution instance would be able to see all the changes that the queuer has made prior to the queueing attempt. Unfortunately, we were missing a smp_mb() after clearing PENDING for execution, so nothing guaranteed visibility of the changes that a queueing loser has made, which manifested as a reproducible blk-mq stall. Lots of kudos to Roman for debugging the problem. The patch for -stable is the minimal one. For v3.7, Peter is working on a patch to make the code path slightly more efficient and less fragile" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix ghost PENDING flag while doing MQ IO	2016-04-27 12:03:59 -07:00
Linus Torvalds	763cfc86ee	Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two patches to fix a deadlock which can be easily triggered if memcg charge moving is used. This bug was introduced while converting threadgroup locking to a global percpu_rwsem and is caused by cgroup controller task migration path depending on the ability to create new kthreads. cpuset had a similar issue which was fixed by performing heavy-lifting operations asynchronous to task migration. The two patches fix the same issue in memcg in a similar way. The first patch makes the mechanism generic and the second relocates memcg charge moving outside the migration path. Given that we don't want to perform heavy operations while writelocking threadgroup lock anyway, moving them out of the way is a desirable solution. One thing to note is that the problem was difficult to debug because lockdep couldn't figure out the deadlock condition. Looking into how to improve that" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: memcg: relocate charge moving from ->attach to ->post_attach cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback	2016-04-27 11:41:14 -07:00
Linus Torvalds	3118e5f966	Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "I2C has one buildfix, one ABBA deadlock fix, and three simple 'add ID' patches" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared i2c: cpm: Fix build break due to incompatible pointer types i2c: ismt: Add Intel DNV PCI ID i2c: xlp9xx: add support for Broadcom Vulcan i2c: rk3x: add support for rk3228	2016-04-27 11:34:45 -07:00
Linus Torvalds	24131a61ec	ARC fixes for 4.6-rc6 - LOCKDEP now words for ARCv2 builds - Enabling DT reserved-memory binding to work (for forthcoming HDMI driver) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXIMkcAAoJEGnX8d3iisJeHdkP/1Eb+V6asWOPVGinbdKs0nhs xdnFiBVesgdfXKBxY/K1GN6bVNUqaPNVRk7Y1fWFzS6O2tgVMMDFz+4FebU6sDMv wiQuxqzEYhyaiqNvw2JUH60Y5GiCjWFLMfgR2FKCkUM99NZm/2DFos2hPE81K2yc 4JfEQoPcsuYYN6nheMKa0sjomGHg6qIL4wi3uvB3RCrbqs1MbauKpsbdbNF3thqZ xpeHavHLRLUTnN/lIf7Z1SwSh6S0Ey7YFLePxsC48vZCL0a8L1HfFSfvjcxuHBMU cGsRXWQmwVjPUeEC9JIPcDkbEfQ1nlezU8lZcg7PJHOt4DByxN3sCngAPKt6Skls I2Ql5tP12IUmuWv4zpM7VP/ZZvC5EOh4RmG2xQwuV+rtDilkYHZrUPx7PdilRt3S a+A+FoYMgczdTTSCJnI0kJYADmPtz/6e1N9rSyzzmVnDmSPR9hClO9dENtpSwiXD Jtpo4tBsMLyw6+Oj68e70c58t8ek9PObR1lIqZ3zJ97hvgACjjpeDytVjv7oPpYH scTUfx69s6qF96Wn+y44Iw1gRV896UFlLmHtX9Hk0BtuVdjZuwOIF1Kqfsk6SYsJ 0WFdnxoNJHPJVWkp+Pmrz7g8BaUxfAtc7Ly6C+8yUUa1nQswYQmrK+84uKLVjiWl E2sBDIJl2Xu2E7amTJss =rlDa -----END PGP SIGNATURE----- Merge tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - lockdep now works for ARCv2 builds - enable DT reserved-memory binding (for forthcoming HDMI driver) * tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: add support for reserved memory defined by device tree ARC: support generic per-device coherent dma mem Documentation: dt: arc: fix spelling mistakes ARCv2: Enable LOCKDEP	2016-04-27 09:46:21 -07:00
Linus Torvalds	508fea71c6	nios2 fix for v4.6 nios2: memset: use the right constraint modifier for the %4 output operand -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXIH+FAAoJEFWoEK+e3syCfRoQAMPXIiWR/V/dLn3OX8f8CeA6 I9duVqMrKrVh/a+bwxzmVJkumm0xzqYnOyhOpX5fZd3Nx44Q4NynJakwgWpMDVAI +xXxNtHZUhjcRC4EuqJW677plR0Uq8bWY2UibpARPHfB9d0arJOCuL11vdGCAjkg lWeVzUFg7iB9n0tRFwvsN29EcRZDo7+WbJh3cGIfTYNbcihJfiAAlmNyXS7XFiBY DqSYyTXIc8scH1q66gArTnyDryvM7cEZ+zyYoX9v8/E/+xPLLLhogthtqf3u4opn J/70k1LBgzgHCYrlEG8vvd1kCr114PLo7RlgkwJqdAtVhtMAtcGZFfkVQkSo7R3h gHQHf8f0exg3JKp0VesB443FyaIvpCNkth3eGdNMWunhCPB4bXE5W+hg5J5gZBNi 1Ft9VB9Ug/8sh9Es4muinNX1kR5Fc8IWQqIa2U/OCt4O2wR1aFanJvaRqeCozbES SpbRAOoXtzOZ0xRPZGPQpqP6ggfizq9Zil2ZTeXbNPBRybFvmEpgewdJYqvrNwZj pbgB1+7zVcsfGTiMhJ+d1rLUX/oeMuUWT+eHY1jM1k+gTQVUo6k8KNVoaiNKa9z5 cZh+XqgWKE6Qv4DVRnj+ouvDuglhwqIAyI3oXElghrYmCWxjvVKCXowQhnSq727M th2iyocnFMjIVWd5u3b7 =PRJ4 -----END PGP SIGNATURE----- Merge tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2 Pull arch/nios2 fix from Ley Foon Tan: "memset: use the right constraint modifier for the %4 output operand" * tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2: nios2: memset: use the right constraint modifier for the %4 output operand	2016-04-27 09:33:24 -07:00
Linus Torvalds	9453203bf8	platform-drivers-x86 for 4.6-3 toshiba_acpi: - Fix regression caused by hotkey enabling value -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXIEZSAAoJEKbMaAwKp364mLwH/3j01EDn0JF1FIIP+kxVgeeL g8xI+0tlFzxmdcBqW3n4q0apzVuCmHr0pbOik289l3dv7hQ5PEvdmK/VhVPYmJDL 2u/4EWmW7cvYMUAVhGB499pKac38fMUN5y97dkmoikiTQO6VaWsvdczvXuhuz/dP OcQzRR/UttCLMe/ERxz3xh4R9kbY5Hzh4slW8Ay/sGDRrgOUFRLT8Zg3Uo7MY27i Kq++SrH96edL1dW6XkWFIqO7NzWGlbBxTMlTlh+xmGUkOtVxUyzAID3NEDIaw6zC 7QU61eyfIJToa2SxHZ/mT9bEFNHNbJR4KoLREG6K2LbRyMhsQfMxaTym8MNzT/Q= =+IXa -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull x86 platform driver fix from Darren Hart: "Fix regression caused by hotkey enabling value in toshiba_acpi" * tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: toshiba_acpi: Fix regression caused by hotkey enabling value	2016-04-27 08:57:11 -07:00
Alexey Brodkin	1b10cb21d8	ARC: add support for reserved memory defined by device tree Enable reserved memory initialization from device tree. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-04-27 17:06:56 +05:30
Alexey Brodkin	32ed9a0e0d	ARC: support generic per-device coherent dma mem Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-04-27 17:06:55 +05:30
Romain Perier	a8950e49bd	nios2: memset: use the right constraint modifier for the %4 output operand Depending on the size of the area to be memset'ed, the nios2 memset implementation either uses a naive loop (for buffers smaller or equal than 8 bytes) or a more optimized implementation (for buffers larger than 8 bytes). This implementation does 4-byte stores rather than 1-byte stores to speed up memset. However, we discovered that on our nios2 platform, memset() was not properly setting the buffer to the expected value. A memset of 0xff would not set the entire buffer to 0xff, but to: 0xff 0x00 0xff 0x00 0xff 0x00 0xff 0x00 ... Which is obviously incorrect. Our investigation has revealed that the problem lies in the incorrect constraints used in the inline assembly. The following piece of assembly, from the nios2 memset implementation, is supposed to create a 4-byte value that repeats 4 times the 1-byte pattern passed as memset argument: /* fill8 %3, %5 (c & 0xff) */ " slli %4, %5, 8\n" " or %4, %4, %5\n" " slli %3, %4, 16\n" " or %3, %3, %4\n" However, depending on the compiler and optimization level, this code might be compiled as: 34: 280a923a slli r5,r5,8 38: 294ab03a or r5,r5,r5 3c: 2808943a slli r4,r5,16 40: 2148b03a or r4,r4,r5 This is wrong because r5 gets used both for %5 and %4, which leads to the final pattern stored in r4 to be 0xff00ff00 rather than the expected 0xffffffff. %4 is defined with the "=r" constraint, i.e as an output operand. However, as explained in http://www.ethernut.de/en/documents/arm-inline-asm.html, this does not prevent gcc from using the same register for an output operand (%4) and input operand (%5). By using the constraint modifier '&', we indicate that the register should be used for output only. With this change, we get the following assembly output: 34: 2810923a slli r8,r5,8 38: 4150b03a or r8,r8,r5 3c: 400e943a slli r7,r8,16 40: 3a0eb03a or r7,r7,r8 Which correctly produces the 0xffffffff pattern when 0xff is passed as the memset() pattern. It is worth mentioning the observed consequence of this bug: we were hitting the kernel BUG() in mm/bootmem.c:__free() that verifies when marking a page as free that it was previously marked as occupied (i.e that the bit was set to 1). The entire bootmem bitmap is set to 0xff bit via a memset() during the bootmem initialization. The bootmem_free() call right after the initialization was finding some bits to be set to 0, which didn't make sense since the bitmap has just been memset'ed to 0xff. Except that due to the bug explained above, the bitmap was in fact initialized to 0xff00ff00. Thanks to Marek Vasut for his help and feedback. Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Marek Vasut <marex@denx.de> Acked-by: Ley Foon Tan <lftan@altera.com>	2016-04-27 16:35:55 +08:00
Linus Torvalds	f28f20da70	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Handle v4/v6 mixed sockets properly in soreuseport, from Craig Gallak. 2) Bug fixes for the new macsec facility (missing kmalloc NULL checks, missing locking around netdev list traversal, etc.) from Sabrina Dubroca. 3) Fix handling of host routes on ifdown in ipv6, from David Ahern. 4) Fix double-fdput in bpf verifier. From Jann Horn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits) bpf: fix double-fdput in replace_map_fd_with_map_ptr() net: ipv6: Delete host routes on an ifdown Revert "ipv6: Revert optional address flusing on ifdown." net/mlx4_en: fix spurious timestamping callbacks net: dummy: remove note about being Y by default cxgbi: fix uninitialized flowi6 ipv6: Revert optional address flusing on ifdown. ipv4/fib: don't warn when primary address is missing if in_dev is dead net/mlx5: Add pci shutdown callback net/mlx5_core: Remove static from local variable net/mlx5e: Use vport MTU rather than physical port MTU net/mlx5e: Fix minimum MTU net/mlx5e: Device's mtu field is u16 and not int net/mlx5_core: Add ConnectX-5 to list of supported devices net/mlx5e: Fix MLX5E_100BASE_T define net/mlx5_core: Fix soft lockup in steering error flow qlcnic: Update version to 5.3.64 net: stmmac: socfpga: Remove re-registration of reset controller macsec: fix netlink attribute validation macsec: add missing macsec prefix in uapi ...	2016-04-26 16:25:51 -07:00
Linus Torvalds	91ea692f87	Here are the latest bug fixes for ARM SoCs, mostly addressing recent regressions. Changes are across several platforms, so I'm listing every change separately here. Regressions since 4.5: - A correction of the psci firmware DT binding, to prevent users from relying on unintended semantics - Actually getting the newly merged clock driver for some OMAP platforms to work - A revert of patches for the Qualcomm BAM, these need to be reworked for 4.7 to avoid breaking boards other than the one they were intended for - A correction for the I2C device nodes on the Socionext Uniphier platform - i.MX SDHCI was broken for non-DT platforms due to a change with the setting of the DMA mask - A revert of a patch that accidentally added a nonexisting clock on the Rensas "Porter" board - A couple of OMAP fixes that are all related to suspend after the power domain changes for dra7 - On Mediatek, revert part of the power domain initialization changes that broke mt8173-evb Fixes for older bugs: - Workaround for an "external abort" in the omap34xx suspend/resume code. - The USB1/eSATA should not be listed as an excon device on am57xx-beagle-x15 (broken since v4.0) - A v4.5 regression in the TI AM33xx and AM43XX DT specifying incorrect DMA request lines for the GPMC - The jiffies calibration on Renesas platforms was incorrect for some modern CPU cores. - A hardware errata woraround for clockdomains on TI DRA7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIVAwUAVx+5v2CrR//JCVInAQJ/ZBAArI3ZiR+Jj2dZCm9c7+PjlDWngJpBME3V o4aF9CeyuA/eyx+QtKAq1ScG2eRIbfab03XGBMEHXpKmmiTXYFIcLFHewwSGBYsy XUsNO+ZKsw92ImSdcX9p45BjkAADJvUwX5BzDlfOQ5mNX+o0Godb/8Mi2Y6RIqTK 5C0xQ0YE8ZN7xtyNzFylaI+CL6wsVLy6PUKig7UIrOOXQK3Tzt4mEz2ksrSBJzON RiG7kPLf+Zd013WyF/ZUdC3VErDOP7C1Z+YRcK+2rxjlL+4oJUznsoaBYJgLUV+T GmcD0TZNwt6x6FWF6cSiUa+gl+6oWRZwTGfUooS1zEcuLHBsONdMtVat4Z01RYos rdMvFgZ6bxG7n4tajI2jg1gokGfyMfYuKwnHuA8Ynzn4N/VcnnbfxPRyV/RMLN0W ad/e12SlLMX1XahrD9uo/oH/X73gHPnbHlLLzWfDfnyvNGvWiW3SNklFT03q/Yn+ fgfB0OnzG8+a3c/LHZbtAo/yYYLdqIuOg8I40AizN3CKHamUWPAjgFfdHdQADVV8 yC5ugVB6x7RYID/49IPT1C3n/SjoypYyRbo30ipqyz2dTf6kz35SY/YjYNSaIYvY QfnGFuywsKsTprGAzI+x/fGo61Ve0/XkK9RPt0opU1+WdYr3sE+ufGVLVn4g4Cw3 wfd20UTVwGs= =YgL2 -----END PGP SIGNATURE----- Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "Here are the latest bug fixes for ARM SoCs, mostly addressing recent regressions. Changes are across several platforms, so I'm listing every change separately here. Regressions since 4.5: - A correction of the psci firmware DT binding, to prevent users from relying on unintended semantics - Actually getting the newly merged clock driver for some OMAP platforms to work - A revert of patches for the Qualcomm BAM, these need to be reworked for 4.7 to avoid breaking boards other than the one they were intended for - A correction for the I2C device nodes on the Socionext Uniphier platform - i.MX SDHCI was broken for non-DT platforms due to a change with the setting of the DMA mask - A revert of a patch that accidentally added a nonexisting clock on the Rensas "Porter" board - A couple of OMAP fixes that are all related to suspend after the power domain changes for dra7 - On Mediatek, revert part of the power domain initialization changes that broke mt8173-evb Fixes for older bugs: - Workaround for an "external abort" in the omap34xx suspend/resume code. - The USB1/eSATA should not be listed as an excon device on am57xx-beagle-x15 (broken since v4.0) - A v4.5 regression in the TI AM33xx and AM43XX DT specifying incorrect DMA request lines for the GPMC - The jiffies calibration on Renesas platforms was incorrect for some modern CPU cores. - A hardware errata woraround for clockdomains on TI DRA7" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: drivers: firmware: psci: unify enable-method binding on ARM {64,32}-bit systems arm64: dts: uniphier: fix I2C nodes of PH1-LD20 ARM: shmobile: timer: Fix preset_lpj leading to too short delays Revert "ARM: dts: porter: Enable SCIF_CLK frequency and pins" ARM: dts: r8a7791: Don't disable referenced optional clocks Revert "ARM: OMAP: Catch callers of revision information prior to it being populated" ARM: OMAP3: Fix external abort on 36xx waking from off mode idle ARM: dts: am57xx-beagle-x15: remove extcon_usb1 ARM: dts: am437x: Fix GPMC dma properties ARM: dts: am33xx: Fix GPMC dma properties Revert "soc: mediatek: SCPSYS: Fix double enabling of regulators" ARM: mach-imx: sdhci-esdhc-imx: initialize DMA mask ARM: DRA7: clockdomain: Implement timer workaround for errata i874 ARM: OMAP: Catch callers of revision information prior to it being populated ARM: dts: dra7: Correct clock tree for sys_32k_ck ARM: OMAP: DRA7: Provide proper class to omap2_set_globals_tap ARM: OMAP: DRA7: wakeupgen: Skip SAR save for wakeupgen Revert "dts: msm8974: Add dma channels for blsp2_i2c1 node" Revert "dts: msm8974: Add blsp2_bam dma node" ARM: dts: Add clocks for dm814x ADPLL	2016-04-26 16:17:01 -07:00
Linus Torvalds	8ead9dd547	devpts: more pty driver interface cleanups This is more prep-work for the upcoming pty changes. Still just code cleanup with no actual semantic changes. This removes a bunch pointless complexity by just having the slave pty side remember the dentry associated with the devpts slave rather than the inode. That allows us to remove all the "look up the dentry" code for when we want to remove it again. Together with moving the tty pointer from "inode->i_private" to "dentry->d_fsdata" and getting rid of pointless inode locking, this removes about 30 lines of code. Not only is the end result smaller, it's simpler and easier to understand. The old code, for example, depended on the d_find_alias() to not just find the dentry, but also to check that it is still hashed, which in turn validated the tty pointer in the inode. That is a _very_ roundabout way to say "invalidate the cached tty pointer when the dentry is removed". The new code just does dentry->d_fsdata = NULL; in devpts_pty_kill() instead, invalidating the tty pointer rather more directly and obviously. Don't do something complex and subtle when the obvious straightforward approach will do. The rest of the patch (ie apart from code deletion and the above tty pointer clearing) is just switching the calling convention to pass the dentry or file pointer around instead of the inode. Cc: Eric Biederman <ebiederm@xmission.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Greg KH <greg@kroah.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-26 15:47:32 -07:00
Jann Horn	8358b02bf6	bpf: fix double-fdput in replace_map_fd_with_map_ptr() When bpf(BPF_PROG_LOAD, ...) was invoked with a BPF program whose bytecode references a non-map file descriptor as a map file descriptor, the error handling code called fdput() twice instead of once (in __bpf_map_get() and in replace_map_fd_with_map_ptr()). If the file descriptor table of the current task is shared, this causes f_count to be decremented too much, allowing the struct file to be freed while it is still in use (use-after-free). This can be exploited to gain root privileges by an unprivileged user. This bug was introduced in commit `0246e64d9a` ("bpf: handle pseudo BPF_LD_IMM64 insn"), but is only exploitable since commit `1be7f75d16` ("bpf: enable non-root eBPF programs") because previously, CAP_SYS_ADMIN was required to reach the vulnerable code. (posted publicly according to request by maintainer) Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 17:37:21 -04:00
David Ahern	38bd10c447	net: ipv6: Delete host routes on an ifdown It was a simple idea -- save IPv6 configured addresses on a link down so that IPv6 behaves similar to IPv4. As always the devil is in the details and the IPv6 stack as too many behavioral differences from IPv4 making the simple idea more complicated than it needs to be. The current implementation for keeping IPv6 addresses can panic or spit out a warning in one of many paths: 1. IPv6 route gets an IPv4 route as its 'next' which causes a panic in rt6_fill_node while handling a route dump request. 2. rt->dst.obsolete is set to DST_OBSOLETE_DEAD hitting the WARN_ON in fib6_del 3. Panic in fib6_purge_rt because rt6i_ref count is not 1. The root cause of all these is references related to the host route for an address that is retained. So, this patch deletes the host route every time the ifdown loop runs. Since the host route is deleted and will be re-generated an up there is no longer a need for the l3mdev fix up. On the 'admin up' side move addrconf_permanent_addr into the NETDEV_UP event handling so that it runs only once versus on UP and CHANGE events. All of the current panics and warnings appear to be related to addresses on the loopback device, but given the catastrophic nature when a bug is triggered this patch takes the conservative approach and evicts all host routes rather than trying to determine when it can be re-used and when it can not. That can be a later optimizaton if desired. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 11:48:26 -04:00
David S. Miller	6a923934c3	Revert "ipv6: Revert optional address flusing on ifdown." This reverts commit `841645b5f2`. Ok, this puts the feature back. I've decided to apply David A.'s bug fix and run with that rather than make everyone wait another whole release for this feature. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 11:47:41 -04:00
Roman Pen	346c09f804	workqueue: fix ghost PENDING flag while doing MQ IO The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list with the following backtrace: [ 601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds. [ 601.347574] Tainted: G O 4.4.5-1-storage+ #6 [ 601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 601.348142] kworker/u129:5 D ffff880803077988 0 1636 2 0x00000000 [ 601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server] [ 601.348999] ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000 [ 601.349662] ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0 [ 601.350333] ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38 [ 601.350965] Call Trace: [ 601.351203] [<ffffffff815b0920>] ? bit_wait+0x60/0x60 [ 601.351444] [<ffffffff815b01d5>] schedule+0x35/0x80 [ 601.351709] [<ffffffff815b2dd2>] schedule_timeout+0x192/0x230 [ 601.351958] [<ffffffff812d43f7>] ? blk_flush_plug_list+0xc7/0x220 [ 601.352208] [<ffffffff810bd737>] ? ktime_get+0x37/0xa0 [ 601.352446] [<ffffffff815b0920>] ? bit_wait+0x60/0x60 [ 601.352688] [<ffffffff815af784>] io_schedule_timeout+0xa4/0x110 [ 601.352951] [<ffffffff815b3a4e>] ? _raw_spin_unlock_irqrestore+0xe/0x10 [ 601.353196] [<ffffffff815b093b>] bit_wait_io+0x1b/0x70 [ 601.353440] [<ffffffff815b056d>] __wait_on_bit+0x5d/0x90 [ 601.353689] [<ffffffff81127bd0>] wait_on_page_bit+0xc0/0xd0 [ 601.353958] [<ffffffff81096db0>] ? autoremove_wake_function+0x40/0x40 [ 601.354200] [<ffffffff81127cc4>] __filemap_fdatawait_range+0xe4/0x140 [ 601.354441] [<ffffffff81127d34>] filemap_fdatawait_range+0x14/0x30 [ 601.354688] [<ffffffff81129a9f>] filemap_write_and_wait_range+0x3f/0x70 [ 601.354932] [<ffffffff811ced3b>] blkdev_fsync+0x1b/0x50 [ 601.355193] [<ffffffff811c82d9>] vfs_fsync_range+0x49/0xa0 [ 601.355432] [<ffffffff811cf45a>] blkdev_write_iter+0xca/0x100 [ 601.355679] [<ffffffff81197b1a>] __vfs_write+0xaa/0xe0 [ 601.355925] [<ffffffff81198379>] vfs_write+0xa9/0x1a0 [ 601.356164] [<ffffffff811c59d8>] kernel_write+0x38/0x50 The underlying device is a null_blk, with default parameters: queue_mode = MQ submit_queues = 1 Verification that nullb0 has something inflight: root@pserver8:~# cat /sys/block/nullb0/inflight 0 1 root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \; ... /sys/block/nullb0/mq/0/cpu2/rq_list CTX pending: ffff8838038e2400 ... During debug it became clear that stalled request is always inserted in the rq_list from the following path: save_stack_trace_tsk + 34 blk_mq_insert_requests + 231 blk_mq_flush_plug_list + 281 blk_flush_plug_list + 199 wait_on_page_bit + 192 __filemap_fdatawait_range + 228 filemap_fdatawait_range + 20 filemap_write_and_wait_range + 63 blkdev_fsync + 27 vfs_fsync_range + 73 blkdev_write_iter + 202 __vfs_write + 170 vfs_write + 169 kernel_write + 56 So blk_flush_plug_list() was called with from_schedule == true. If from_schedule is true, that means that finally blk_mq_insert_requests() offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue, i.e. it calls kblockd_schedule_delayed_work_on(). That means, that we race with another CPU, which is about to execute __blk_mq_run_hw_queue() work. Further debugging shows the following traces from different CPUs: CPU#0 CPU#1 ---------------------------------- ------------------------------- reqeust A inserted STORE hctx->ctx_map[0] bit marked kblockd_schedule...() returns 1 <schedule to kblockd workqueue> request B inserted STORE hctx->ctx_map[1] bit marked kblockd_schedule...() returns 0 * WORK PENDING bit is cleared * flush_busy_ctxs() is executed, but bit 1, set by CPU#1, is not observed As a result request B pended forever. This behaviour can be explained by speculative LOAD of hctx->ctx_map on CPU#0, which is reordered with clear of PENDING bit and executed _before_ actual STORE of bit 1 on CPU#1. The proper fix is an explicit full barrier <mfence>, which guarantees that clear of PENDING bit is to be executed before all possible speculative LOADS or STORES inside actual work function. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Michael Wang <yun.wang@profitbricks.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>	2016-04-26 11:23:22 -04:00
Sudeep Holla	978fa43623	drivers: firmware: psci: unify enable-method binding on ARM {64,32}-bit systems Currently ARM CPUs DT bindings allows different enable-method value for PSCI based systems. On ARM 64-bit this property is required and must be "psci" while on ARM 32-bit systems this property is optional and must be "arm,psci" if present. However, "arm,psci" has always been the compatible string for the PSCI node, and was never intended to be the enable-method. So this is a bug in the binding and not a deliberate attempt at specifying 32-bit differently. This is problematic if 32-bit OS is run on 64-bit system which has "psci" as enable-method rather than the expected "arm,psci". So let's unify the value into "psci" and remove support for "arm,psci" before it finds any users. Reported-by: Soby Mathew <Soby.Mathew@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2016-04-26 12:46:08 +02:00
Eric Dumazet	fc96256c90	net/mlx4_en: fix spurious timestamping callbacks When multiple skb are TX-completed in a row, we might incorrectly keep a timestamp of a prior skb and cause extra work. Fixes: `ec693d4701` ("net/mlx4_en: Add HW timestamping (TS) support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 01:13:18 -04:00
Ivan Babrou	9f5db53507	net: dummy: remove note about being Y by default Signed-off-by: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 01:11:55 -04:00
Jiri Benc	3d6d30d60a	cxgbi: fix uninitialized flowi6 ip6_route_output looks into different fields in the passed flowi6 structure, yet cxgbi passes garbage in nearly all those fields. Zero the structure out first. Fixes: `fc8d0590d9` ("libcxgbi: Add ipv6 api to driver") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-25 16:20:49 -04:00
Tejun Heo	264a0ae164	memcg: relocate charge moving from ->attach to ->post_attach Hello, So, this ended up a lot simpler than I originally expected. I tested it lightly and it seems to work fine. Petr, can you please test these two patches w/o the lru drain drop patch and see whether the problem is gone? Thanks. ------ 8< ------ If charge moving is used, memcg performs relabeling of the affected pages from its ->attach callback which is called under both cgroup_threadgroup_rwsem and thus can't create new kthreads. This is fragile as various operations may depend on workqueues making forward progress which relies on the ability to create new kthreads. There's no reason to perform charge moving from ->attach which is deep in the task migration path. Move it to ->post_attach which is called after the actual migration is finished and cgroup_threadgroup_rwsem is dropped. * move_charge_struct->mm is added and ->can_attach is now responsible for pinning and recording the target mm. mem_cgroup_clear_mc() is updated accordingly. This also simplifies mem_cgroup_move_task(). * mem_cgroup_move_task() is now called from ->post_attach instead of ->attach. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@kernel.org> Debugged-and-tested-by: Petr Mladek <pmladek@suse.com> Reported-by: Cyril Hrubis <chrubis@suse.cz> Reported-by: Johannes Weiner <hannes@cmpxchg.org> Fixes: `1ed1328792` ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem") Cc: <stable@vger.kernel.org> # 4.4+	2016-04-25 15:45:14 -04:00

1 2 3 4 5 ...

589600 Commits All Branches Search

589600 Commits

All Branches