Register address in name of the node is wrong
Signed-off-by: Vishal Mahaveer <vishalm@ti.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
One of the lines from PCF857x is connected to the vdd line of MMC1
in DRA74x and DRA72x EVMs and is modelled as a regulator. If PCF857x
is not made as built-in, the regulator_get in omap_hsmmc fails making
it difficult to use MMC1 as rootfs.
Make PCF857x built-in.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Use platform specific compatible strings instead of the common
"ti,pbias-omap" compatible string.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Pull ARM fixes from Russell King:
"A number of fixes for the merge window, fixing a number of cases
missed when testing the uaccess code, particularly cases which only
show up with certain compiler versions"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8431/1: fix alignement of __bug_table section entries
arm/xen: Enable user access to the kernel before issuing a privcmd call
ARM: domains: add memory dependencies to get_domain/set_domain
ARM: domains: thread_info.h no longer needs asm/domains.h
ARM: uaccess: fix undefined instruction on ARMv7M/noMMU
ARM: uaccess: remove unneeded uaccess_save_and_disable macro
ARM: swpan: fix nwfpe for uaccess changes
ARM: 8429/1: disable GCC SRA optimization
OMAP5 SoC has Cortex-A15 which does not use TWD timer. It uses
ARCH_TIMER instead, clean up unwanted configuration and enable
OMAP_INTERCONNECT and OPP which is necessary for expected functionality
on the SoC.
Reported-by: Carlos Hernandez <ceh@ti.com>
Reported-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
DRA7 does use OPP, uses OMAP interconnect and also does require SCU.
These are missing in the SoC only build of DRA7 breaking various PM
features in DRA7 only build.
Reported-by: Carlos Hernandez <ceh@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
When commit c4082d499f ("ARM: omap2+: board-generic: clean up the
irq data from board file") cleaned up the direct usage of gic_of_init
and omap_intc_of_init, it failed to clean up the macros properly.
Since these macros are no longer used, lets just remove them.
Fixes: c4082d499f ("ARM: omap2+: board-generic: clean up the irq data from board file")
Reported-by: Carlos Hernandez <ceh@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
OMAP5 and DRA7 reuse the same pm44xx_erratum variable so, enable the
same, else PM features such as Suspend to ram is broken in a SoC only
build configuration.
Reported-by: Carlos Hernandez <ceh@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Only the IGEPv2 boards have a LAN9221i chip connected to the GPMC
so the pinmux configuration for the GPIO connected to the IRQ line
of the LAN chip should not be defined in the IGEP common dtsi but
in the one common to the IGEPv2 boards.
While there, use the OMAP3_CORE1_IOPAD() macro for the padconf reg.
Suggested-by: Ladislav Michl <ladis@linux-mips.org>
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
With the support in the generic PM framework for wakeirq and capability
added to the rtc-ds1307 driver to support this, we can now define the
optional wakeup irq to allow the RTC to wakeup the system from low power
modes as part of suspend.
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Fix the mpu voltage as it is set too low for the silicon
revision 2.1.
Signed-off-by: Teresa Remmet <t.remmet@phytec.de>
Signed-off-by: Tony Lindgren <tony@atomide.com>
For beagle x15, both the vdd and io lines are connected to the
same regulator (ldo1_reg). However vmmc_aux is populated to vdd_3v3.
Remove it.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Acked-by: Nishanth Menon <nm@ti.com>
[tony@atomide.com: updated to apply]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Looks like I made a typo on the control base, all the 81xx
SoCs have it at 0x48140000 base. We've just gotten away with
the typo as the Ethernet phy was configured by the bootloader
on my test system and we're not yet using the pinctrl.
In addition to fixing the contol base, we need to also use the
right Ethernet phy flags to initialize it. And we are still
missing the PLL driver for dm814x and only relying on the
divider and mux clocks.
Fixes: f3d953ea37 ("ARM: dts: Add minimal dm814x support")
Cc: Matthijs van Duin <matthijsvanduin@gmail.com>
Cc: Nicolas Chauvet <kwizart@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Let's fix pinmux address of gpio 170 used by tfp410 powerdown-gpio.
According to the OMAP35x Technical Reference Manual
CONTROL_PADCONF_I2C3_SDA[15:0] 0x480021C4 mode0: i2c3_sda
CONTROL_PADCONF_I2C3_SDA[31:16] 0x480021C4 mode4: gpio_170
the pinmux address of gpio 170 must be 0x480021C6.
The former wrong address broke i2c3 (used by hdmi ddc), resulting in
kernel message:
omap_i2c 48060000.i2c: controller timed out
Fixes: 8cecf52bef ("ARM: omap3-beagle.dts: add display information")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Carl Frederik Werner <frederik@cfbw.eu>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
guaranteed that the write to LVTT has reached the APIC before the
TSC_DEADLINE MSR is written. In such a case the write to the MSR is
ignored and as a consequence the local timer interrupt never fires.
The SDM decribes this issue for xAPIC and x2APIC modes. The
serialization methods recommended by the SDM differ.
xAPIC:
"1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."
x2APIC:
"To allow for efficient access to the APIC registers in x2APIC mode,
the serializing semantics of WRMSR are relaxed when writing to the
APIC registers. Thus, system software should not use 'WRMSR to APIC
registers in x2APIC mode' as a serializing instruction. Read and write
accesses to the APIC registers will occur in program order. A WRMSR to
an APIC register may complete before all preceding stores are globally
visible; software can prevent this by inserting a serializing
instruction, an SFENCE, or an MFENCE before the WRMSR."
The xAPIC method is to just wait for the memory mapped write to hit
the LVTT by checking whether the MSR write has reached the hardware.
There is no reason why a proper MFENCE after the memory mapped write would
not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
xAPIC case as well.
Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
support.
[ tglx: Massaged the changelog ]
Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: <Kernel-team@fb.com>
Cc: <lenb@kernel.org>
Cc: <fenghua.yu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org #v3.7+
Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The recent ioapic cleanups changed the affinity setting in
setup_ioapic_dest() from a direct write to the hardware to the delayed
affinity setup via irq_set_affinity().
That results in a warning from chained_irq_exit():
WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq
[<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
[<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
[<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0
The reason is that irq_set_affinity() does not write directly to the
hardware. It marks the affinity setting as pending and executes it
from the next interrupt. The chained handler infrastructure does not
take the irq descriptor lock for performance reasons because such a
chained interrupt is not visible to any interfaces. So the delayed
affinity setting triggers the warning in irq_move_masked_irq().
Restore the old behaviour by calling the set_affinity function of the
ioapic chip in setup_ioapic_dest(). This is safe as none of the
interrupts can be on the fly at this point.
Fixes: aa5cb97f14 'x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces'
Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: jarkko.nikula@linux.intel.com
When restoring the system register state for an AArch32 guest at EL2,
writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57,
which can lead to the guest effectively running with junk in the DACR
and running into unexpected domain faults.
This patch works around the issue by re-ordering our restoration of the
AArch32 register aliases so that they happen before the AArch64 system
registers. Ensuring that the registers are restored in this order
guarantees that they will be correctly synchronised by the core.
Cc: <stable@vger.kernel.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
- Fix timer interrupt injection after the rework
that went in during the merge window
- Reset the timer to zero on reboot
- Make sure the TCR_EL2 RES1 bits are really set to 1
- Fix a PSCI affinity bug for non-existing vcpus
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV9VQYAAoJECPQ0LrRPXpDYFsQAIp+nIxv6LijQdq310EF7Z95
1j16vx9NfsAsNH0pwiRmx4gfNOHlPp6f30Kb1HJf2TN08Rc7TS8j4Dr3zYnZEdQj
9eNGTjCjlz93VQwKnTBvBvYkHsCEWC76pOxiYoFmnLsGc4m/OilHhohOzOZTAMFC
Le07VXJwrhGHQ5bjdIMmFj+5rMj4eWfT2bYg8uVl5EUNFNkDgAMtT/Nbml90friC
x+r2I9H+G8hHmemOv6hefW7l2JScLSYpqLeGdxZUtnGy9LNP3+jTD1QyqUftUPuS
HtcB4zCtRk62nMupgFNV514Kf26wcwpS53vhd8aM2kxSkpr5xkUFoL+uJNT8ssIH
Y+FwAOeTD0osWbmxw8Gf9d3qYzPBQSCRP3E4mCbNNNsTlMRU+Hu9au0VNBKggkfF
fZ4UphCVX2C+cWaQcB40950h1F76Vx3boAfcMgiXoO+8LJx9OogDtr8+j5cBBbKU
+tMngLhelsMtSCTam9c0jS6BsQfHDy3W1CQNl30cqd2RFjvygOSAqzdCMlBk0lLG
4Sq0eib1zJM+rE2OQs8SaAftyQ0kbElmM3XZB3ZiitaHgkUKV7kt9dMUG9Vfn2SE
/ycgR9eoBq5P6FKhyROmL+0g824nYJmn4fwpecw1rHoRr8jyZjvpbE3VSL0aI5XW
w4UYsAQSTcqM6bHpNYuu
=G02h
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/ARM changes for 4.3-rc2
- Fix timer interrupt injection after the rework
that went in during the merge window
- Reset the timer to zero on reboot
- Make sure the TCR_EL2 RES1 bits are really set to 1
- Fix a PSCI affinity bug for non-existing vcpus
Depending on CONFIG_ARM64_HW_AFDBM, we use either bit 57 or 51 of the
pte to represent PTE_WRITE. Given that bit 51 is reserved prior to
ARMv8.1, we can just use that bit regardless of the config option. That
also matches what happens if a kernel configured with ARM64_HW_AFDBM=y
is run on a CPU without the DBM functionality.
Cc: Julien Grall <julien.grall@citrix.com>
Tested-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The pte_modify() function with hardware AF/DBM enabled must transfer the
hardware dirty information to the software PTE_DIRTY bit. However, it
was setting this bit in newprot and the mask does not cover such bit.
This patch sets PTE_DIRTY on the original pte which will be preserved in
the returned value.
Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Cc: Julien Grall <julien.grall@citrix.com>
Tested-by: Julien Grall <julien.grall@citrix.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 2f4b829c62 ("arm64: Add support for hardware updates of the
access and dirty pte bits") introduced support for handling hardware
updates of the access flag and dirty status. The PTE is automatically
dirtied in hardware (if supported) by clearing the PTE_RDONLY bit when
the PTE_DBM/PTE_WRITE bit is set. The pte_hw_dirty() macro was added to
detect a hardware dirtied pte. The pte_dirty() macro checks for both
software PTE_DIRTY and pte_hw_dirty().
Functions like pte_modify() clear the PTE_RDONLY bit since it is meant
to be set in set_pte_at() when written to memory. In such cases,
pte_hw_dirty() would return true even though such pte is clean. This
patch changes pte_hw_dirty() to test the PTE_DBM/PTE_WRITE bit together
with PTE_RDONLY.
Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Reported-by: Julien Grall <julien.grall@citrix.com>
Tested-by: Julien Grall <julien.grall@citrix.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If CMA is turned on and CMA size is set to zero, kernel should
behave as if CMA was not enabled at compile time.
Every dma allocation should check existence of cma area
before requesting memory.
Arm has done this by commit e464ef16c4 ("arm: dma-mapping: add
checking cma area initialized"), also do this for arm64.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
While the following commit:
37868fe113 ("x86/ldt: Make modify_ldt synchronous")
added a nice comment explaining that Xen needs page-aligned
whole page chunks for guest descriptor tables, it then
nevertheless used kzalloc() on the small size path.
As I'm unaware of guarantees for kmalloc(PAGE_SIZE, ) to return
page-aligned memory blocks, I believe this needs to be switched
back to __get_free_page() (or better get_zeroed_page()).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/55E735D6020000780009F1E6@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The CONFIG_VM86 Kconfig help text is actively misleading, so fix it:
- Don't mark it 'obsolete' in the text as we'll support the ABI as long as CPUs
support it.
- Qualify the part about software emulation and mention that for some apps you
want a real vm86 mode.
- Don't scare users away from the option, instead explain what it does.
Reported-by: Stas Sergeev <stsp@list.ru>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The irq argument of most interrupt flow handlers is unused or merily
used instead of a local variable. The handlers which need the irq
argument can retrieve the irq number from the irq descriptor.
Search and update was done with coccinelle and the invaluable help of
Julia Lawall.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
The irq argument of most interrupt flow handlers is unused or merily
used instead of a local variable. The handlers which need the irq
argument can retrieve the irq number from the irq descriptor.
Search and update was done with coccinelle and the invaluable help of
Julia Lawall.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Scott Wood <scottwood@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org
The irq argument of most interrupt flow handlers is unused or merily
used instead of a local variable. The handlers which need the irq
argument can retrieve the irq number from the irq descriptor.
Search and update was done with coccinelle and the invaluable help of
Julia Lawall.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: linuxppc-dev@lists.ozlabs.org
The ddc-i2c-bus property was missing from the veyron dtsi file since
downstream the ddc-i2c-bus was still being specified in rk3288.dtsi and
nobody noticed when the veyron dtsi was sent upstream. Add it.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Commit 03fbf488ce ("spi: pxa2xx: Differentiate Intel LPSS types") caused
build error here because it removed the type LPSS_SSP and I didn't notice
the type was used here too.
I believe commit a6e56c28a1 ("ARM: pxa: ssp: add DT bindings") added it
accidentally by copying all enum pxa_ssp_type types from
include/linux/pxa2xx_ssp.h even LPSS_SSP was for Intel LPSS SPI devices.
Fix the build error by removing this incorrect binding.
Fixes: 03fbf488ce ("spi: pxa2xx: Differentiate Intel LPSS types")
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reported-by: Axel Lin <axel.lin@ingics.com>
Cc: <stable@vger.kernel.org> # 4.2
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
After the conversion of pxa architecture to common clock framework, the
NAND clock can be disabled on startup if no nand driver claims it.
In this case, it happens that if the bootloader used the NAND and set
the DFI arbitration bit, the next access to a static memory controller
area, such as an ethernet card, the system bus will stall, and the core
will be stalled forever.
Fix this by clearing the DFI arbritration bit in pxa3xx startup. The bit
will be enabled the pxa3xx-nand driver on need anyway. The only left
requirement is that upon pxa3xx-nand removal, the bit should be cleared
before the clock is disabled.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Sasha reported that we can get here with .idx==-1, and
cpuc->event_constraints unallocated.
Suggested-by: Stephane Eranian <eranian@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Fixes: b371b59431 ("perf/x86: Fix event/group validation")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
924e101a7a ("x86/debug: Dump family, model, stepping of the
boot CPU") had its good intentions to dump the exact F/M/S as an
aid during debugging sessions but its output can be ambiguous.
Fix that:
-smpboot: CPU0: Intel Core Processor (Broadwell) (fam: 06, model: 47, stepping: 02)
+smpboot: CPU0: Intel Core Processor (Broadwell) (family: 0x6, model: 0x47, stepping: 0x2)
Also, spell out "family".
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441914927-32037-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iEYEABECAAYFAlXqI2YACgkQ31LbvUHyf1eV9QCdH1QQrv3ze1j+5ut3hVGQFC7F
s0oAnRWfi65O5J6Ns4gEGfbjSXvF2aNf
=5RcU
-----END PGP SIGNATURE-----
Merge tag 'cris-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris
Pull CRIS updates from Jesper Nilsson:
"Mostly removal of old cruft of which we can use a generic version, or
fixes for code not commonly run in the cris port, but also additions
to enable some good debug"
* tag 'cris-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris: (25 commits)
CRISv10: delete unused lib/dmacopy.c
CRISv10: delete unused lib/old_checksum.c
CRIS: fix switch_mm() lockdep splat
CRISv32: enable LOCKDEP_SUPPORT
CRIS: add STACKTRACE_SUPPORT
CRISv32: annotate irq enable in idle loop
CRISv32: add support for irqflags tracing
CRIS: UAPI: use generic types.h
CRIS: UAPI: use generic shmbuf.h
CRIS: UAPI: use generic msgbuf.h
CRIS: UAPI: use generic socket.h
CRIS: UAPI: use generic sembuf.h
CRIS: UAPI: use generic sockios.h
CRIS: UAPI: use generic auxvec.h
CRIS: UAPI: use generic headers via Kbuild
CRIS: UAPI: fix elf.h export
CRIS: don't make asm/elf.h depend on asm/user.h
CRIS: UAPI: fix ptrace.h
CRISv32: Squash compile warnings for axisflashmap
CRISv32: Add GPIO driver to the default configs
...
Merge fourth patch-bomb from Andrew Morton:
- sys_membarier syscall
- seq_file interface changes
- a few misc fixups
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
revert "ocfs2/dlm: use list_for_each_entry instead of list_for_each"
mm/early_ioremap: add explicit #include of asm/early_ioremap.h
fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void
selftests: enhance membarrier syscall test
selftests: add membarrier syscall test
sys_membarrier(): system-wide memory barrier (generic, x86)
MODSIGN: fix a compilation warning in extract-cert
Here is an implementation of a new system call, sys_membarrier(), which
executes a memory barrier on all threads running on the system. It is
implemented by calling synchronize_sched(). It can be used to
distribute the cost of user-space memory barriers asymmetrically by
transforming pairs of memory barriers into pairs consisting of
sys_membarrier() and a compiler barrier. For synchronization primitives
that distinguish between read-side and write-side (e.g. userspace RCU
[1], rwlocks), the read-side can be accelerated significantly by moving
the bulk of the memory barrier overhead to the write-side.
The existing applications of which I am aware that would be improved by
this system call are as follows:
* Through Userspace RCU library (http://urcu.so)
- DNS server (Knot DNS) https://www.knot-dns.cz/
- Network sniffer (http://netsniff-ng.org/)
- Distributed object storage (https://sheepdog.github.io/sheepdog/)
- User-space tracing (http://lttng.org)
- Network storage system (https://www.gluster.org/)
- Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf)
- Financial software (https://lkml.org/lkml/2015/3/23/189)
Those projects use RCU in userspace to increase read-side speed and
scalability compared to locking. Especially in the case of RCU used by
libraries, sys_membarrier can speed up the read-side by moving the bulk of
the memory barrier cost to synchronize_rcu().
* Direct users of sys_membarrier
- core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198)
Microsoft core dotnet GC developers are planning to use the mprotect()
side-effect of issuing memory barriers through IPIs as a way to implement
Windows FlushProcessWriteBuffers() on Linux. They are referring to
sys_membarrier in their github thread, specifically stating that
sys_membarrier() is what they are looking for.
To explain the benefit of this scheme, let's introduce two example threads:
Thread A (non-frequent, e.g. executing liburcu synchronize_rcu())
Thread B (frequent, e.g. executing liburcu
rcu_read_lock()/rcu_read_unlock())
In a scheme where all smp_mb() in thread A are ordering memory accesses
with respect to smp_mb() present in Thread B, we can change each
smp_mb() within Thread A into calls to sys_membarrier() and each
smp_mb() within Thread B into compiler barriers "barrier()".
Before the change, we had, for each smp_mb() pairs:
Thread A Thread B
previous mem accesses previous mem accesses
smp_mb() smp_mb()
following mem accesses following mem accesses
After the change, these pairs become:
Thread A Thread B
prev mem accesses prev mem accesses
sys_membarrier() barrier()
follow mem accesses follow mem accesses
As we can see, there are two possible scenarios: either Thread B memory
accesses do not happen concurrently with Thread A accesses (1), or they
do (2).
1) Non-concurrent Thread A vs Thread B accesses:
Thread A Thread B
prev mem accesses
sys_membarrier()
follow mem accesses
prev mem accesses
barrier()
follow mem accesses
In this case, thread B accesses will be weakly ordered. This is OK,
because at that point, thread A is not particularly interested in
ordering them with respect to its own accesses.
2) Concurrent Thread A vs Thread B accesses
Thread A Thread B
prev mem accesses prev mem accesses
sys_membarrier() barrier()
follow mem accesses follow mem accesses
In this case, thread B accesses, which are ensured to be in program
order thanks to the compiler barrier, will be "upgraded" to full
smp_mb() by synchronize_sched().
* Benchmarks
On Intel Xeon E5405 (8 cores)
(one thread is calling sys_membarrier, the other 7 threads are busy
looping)
1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call.
* User-space user of this system call: Userspace RCU library
Both the signal-based and the sys_membarrier userspace RCU schemes
permit us to remove the memory barrier from the userspace RCU
rcu_read_lock() and rcu_read_unlock() primitives, thus significantly
accelerating them. These memory barriers are replaced by compiler
barriers on the read-side, and all matching memory barriers on the
write-side are turned into an invocation of a memory barrier on all
active threads in the process. By letting the kernel perform this
synchronization rather than dumbly sending a signal to every process
threads (as we currently do), we diminish the number of unnecessary wake
ups and only issue the memory barriers on active threads. Non-running
threads do not need to execute such barrier anyway, because these are
implied by the scheduler context switches.
Results in liburcu:
Operations in 10s, 6 readers, 2 writers:
memory barriers in reader: 1701557485 reads, 2202847 writes
signal-based scheme: 9830061167 reads, 6700 writes
sys_membarrier: 9952759104 reads, 425 writes
sys_membarrier (dyn. check): 7970328887 reads, 425 writes
The dynamic sys_membarrier availability check adds some overhead to
the read-side compared to the signal-based scheme, but besides that,
sys_membarrier slightly outperforms the signal-based scheme. However,
this non-expedited sys_membarrier implementation has a much slower grace
period than signal and memory barrier schemes.
Besides diminishing the number of wake-ups, one major advantage of the
membarrier system call over the signal-based scheme is that it does not
need to reserve a signal. This plays much more nicely with libraries,
and with processes injected into for tracing purposes, for which we
cannot expect that signals will be unused by the application.
An expedited version of this system call can be added later on to speed
up the grace period. Its implementation will likely depend on reading
the cpu_curr()->mm without holding each CPU's rq lock.
This patch adds the system call to x86 and to asm-generic.
[1] http://urcu.so
membarrier(2) man page:
MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2)
NAME
membarrier - issue memory barriers on a set of threads
SYNOPSIS
#include <linux/membarrier.h>
int membarrier(int cmd, int flags);
DESCRIPTION
The cmd argument is one of the following:
MEMBARRIER_CMD_QUERY
Query the set of supported commands. It returns a bitmask of
supported commands.
MEMBARRIER_CMD_SHARED
Execute a memory barrier on all threads running on the system.
Upon return from system call, the caller thread is ensured that
all running threads have passed through a state where all memory
accesses to user-space addresses match program order between
entry to and return from the system call (non-running threads
are de facto in such a state). This covers threads from all pro=E2=80=90
cesses running on the system. This command returns 0.
The flags argument needs to be 0. For future extensions.
All memory accesses performed in program order from each targeted
thread is guaranteed to be ordered with respect to sys_membarrier(). If
we use the semantic "barrier()" to represent a compiler barrier forcing
memory accesses to be performed in program order across the barrier,
and smp_mb() to represent explicit memory barriers forcing full memory
ordering across the barrier, we have the following ordering table for
each pair of barrier(), sys_membarrier() and smp_mb():
The pair ordering is detailed as (O: ordered, X: not ordered):
barrier() smp_mb() sys_membarrier()
barrier() X X O
smp_mb() X O O
sys_membarrier() O O O
RETURN VALUE
On success, these system calls return zero. On error, -1 is returned,
and errno is set appropriately. For a given command, with flags
argument set to 0, this system call is guaranteed to always return the
same value until reboot.
ERRORS
ENOSYS System call is not implemented.
EINVAL Invalid arguments.
Linux 2015-04-15 MEMBARRIER(2)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nicholas Miell <nmiell@comcast.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On old ARM chips, unaligned accesses to memory are not trapped and
fixed. On module load, symbols are relocated, and the relocation of
__bug_table symbols is done on a u32 basis. Yet the section is not
aligned to a multiple of 4 address, but to a multiple of 2.
This triggers an Oops on pxa architecture, where address 0xbf0021ea
is the first relocation in the __bug_table section :
apply_relocate(): pxa3xx_nand: section 13 reloc 0 sym ''
Unable to handle kernel paging request at virtual address bf0021ea
pgd = e1cd0000
[bf0021ea] *pgd=c1cce851, *pte=c1cde04f, *ppte=c1cde01f
Internal error: Oops: 23 [#1] ARM
Modules linked in:
CPU: 0 PID: 606 Comm: insmod Not tainted 4.2.0-rc8-next-20150828-cm-x300+ #887
Hardware name: CM-X300 module
task: e1c68700 ti: e1c3e000 task.ti: e1c3e000
PC is at apply_relocate+0x2f4/0x3d4
LR is at 0xbf0021ea
pc : [<c000e7c8>] lr : [<bf0021ea>] psr: 80000013
sp : e1c3fe30 ip : 60000013 fp : e49e8c60
r10: e49e8fa8 r9 : 00000000 r8 : e49e7c58
r7 : e49e8c38 r6 : e49e8a58 r5 : e49e8920 r4 : e49e8918
r3 : bf0021ea r2 : bf007034 r1 : 00000000 r0 : bf000000
Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 0000397f Table: c1cd0018 DAC: 00000051
Process insmod (pid: 606, stack limit = 0xe1c3e198)
[<c000e7c8>] (apply_relocate) from [<c005ce5c>] (load_module+0x1248/0x1f5c)
[<c005ce5c>] (load_module) from [<c005dc54>] (SyS_init_module+0xe4/0x170)
[<c005dc54>] (SyS_init_module) from [<c000a420>] (ret_fast_syscall+0x0/0x38)
Fix this by ensuring entries in __bug_table are all aligned to at least
of multiple of 4. This transforms a module section __bug_table as :
- [12] __bug_table PROGBITS 00000000 002232 000018 00 A 0 0 1
+ [12] __bug_table PROGBITS 00000000 002232 000018 00 A 0 0 4
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When Xen is copying data to/from the guest it will check if the kernel
has the right to do the access. If not, the hypercall will return an
error.
After the commit a5e090acbf "ARM:
software-based privileged-no-access support", the kernel can't access
any longer the user space by default. This will result to fail on every
hypercall made by the userspace (i.e via privcmd).
We have to enable the userspace access and then restore the correct
permission every time the privcmd is used to made an hypercall.
I didn't find generic helpers to do a these operations, so the change
is only arm32 specific.
Reported-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We need to have memory dependencies on get_domain/set_domain to avoid
the compiler over-optimising these inline assembly instructions.
Loads/stores must not be reordered across a set_domain(), so introduce
a compiler barrier for that assembly.
The value of get_domain() must not be cached across a set_domain(), but
we still want to allow the compiler to optimise it away. Introduce a
dependency on current_thread_info()->cpu_domain to avoid this; the new
memory clobber in set_domain() should therefore cause the compiler to
re-load this. The other advantage of using this is we should have its
address in the register set already, or very soon after at most call
sites.
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
As of 1eef5d2f1b ("ARM: domains: switch to keeping domain value in
register") we no longer need to include asm/domains.h into
asm/thread_info.h. Remove it.
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Since event->hw.itrace_started is now set in pmu::start() to signal the beginning of
the trace, do so also in the intel_bts driver.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1437140050-23363-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Only emit the test-and-set fallback for Hypervisors lacking
PARAVIRT_SPINLOCKS support when building for guests.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 4.2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave ran into horrible performance on a VM without PARAVIRT_SPINLOCKS
set and Linus noted that the test-and-set implementation was retarded.
One should spin on the variable with a load, not a RMW.
While there, remove 'queued' from the name, as the lock isn't queued
at all, but a simple test-and-set.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: stable@vger.kernel.org # v4.2+
Link: http://lkml.kernel.org/r/20150904152523.GR18673@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge third patch-bomb from Andrew Morton:
- even more of the rest of MM
- lib/ updates
- checkpatch updates
- small changes to a few scruffy filesystems
- kmod fixes/cleanups
- kexec updates
- a dma-mapping cleanup series from hch
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
dma-mapping: consolidate dma_set_mask
dma-mapping: consolidate dma_supported
dma-mapping: cosolidate dma_mapping_error
dma-mapping: consolidate dma_{alloc,free}_noncoherent
dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd()
mm: make sure all file VMAs have ->vm_ops set
mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()
mm: mark most vm_operations_struct const
namei: fix warning while make xmldocs caused by namei.c
ipc: convert invalid scenarios to use WARN_ON
zlib_deflate/deftree: remove bi_reverse()
lib/decompress_unlzma: Do a NULL check for pointer
lib/decompressors: use real out buf size for gunzip with kernel
fs/affs: make root lookup from blkdev logical size
sysctl: fix int -> unsigned long assignments in INT_MIN case
kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo
kexec: align crash_notes allocation to make it be inside one physical page
kexec: remove unnecessary test in kimage_alloc_crash_control_pages()
kexec: split kexec_load syscall from kexec core code
...
This is a collection of a few late fixes and other misc. stuff that
had dependencies on things being merged from other trees.
The bulk of the changes are for samsung/exynos SoCs for some changes
that needed a few minor reworks so ended up a bit late. The others
are mainly for qcom SoCs: a couple fixes and some DTS updates.
There's one conflict with drivers/cpufreq/exynos-cpufreq.c because
it's now been completely removed, but there were some fixes that hit
mainline in the meantime.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIbBAABAgAGBQJV8fkAAAoJEFk3GJrT+8Zllf8P9jj3+TnvbJS/8bWoQoB7BRUZ
LZPgi2+sBXylrBV60uQdyodiTHQUMZhbL7GvgEVG0z6yyin7nyijqNkulTbQbWmg
WhumLNCNcs8vlZegA/corbwgcVC7FkjOP97HveTe2mgwZ+GaXj9qMRQzBsMqSXEo
4890ZeP1nWBTP42oXOQHkNyKWFBjuERK0dTw2MXj7WE0/Ag8i7ERp76uJQdQ7V5O
BpNRwxp3vSCky8rxbpD/avWdlspv1yZGBQyLeIreVq2YQFojvT36K8wHcf6iWBT/
pzGGV/uZM7MnrGZdqSfVEMDHl7Z2s7Ls+sv5F6Md7ErnVDerHGRjw/6lJDjbeH7u
trucpsuhz5yhTjpZssGHH2NT8pWxxh8M+AaNOiiH8PuYESAbPAmWLpWkn+648bIn
P++Z90DIyfNEqnNSMHkcQYpVt8zc4g75gZfTIIsXLB+DLgzgSK8ergrfyRN/O0zj
oFY35g3wHdgisnGve+BAW30zTZtP19TMT36OltWjIkjuRZC29PS2vkH8eTETdVXx
01f/qcpbB1L1rXfIBjNkjx81j89XPd68JIBfctTF5QBOAGm/Dix6tj2mU/N7TNu5
TrBmD3CXdOQbCPaoK/qWX7/b11IN/XOlGL06hhYwbSdvCVy7rNccApXvfnrDWziK
Ly923FP1OB7h0Kk1cmo=
=85Ck
-----END PGP SIGNATURE-----
Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull late ARM SoC updates from Kevin Hilman:
"This is a collection of a few late fixes and other misc stuff that had
dependencies on things being merged from other trees.
The bulk of the changes are for samsung/exynos SoCs for some changes
that needed a few minor reworks so ended up a bit late. The others
are mainly for qcom SoCs: a couple fixes and some DTS updates"
* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (37 commits)
ARM: multi_v7_defconfig: Enable PBIAS regulator
soc: qcom: smd: Correct fBLOCKREADINTR handling
soc: qcom: smd: Use correct remote processor ID
soc: qcom: smem: Fix errant private access
ARM: dts: qcom: msm8974-sony-xperia-honami: Use stdout-path
ARM: dts: qcom: msm8960-cdp: Use stdout-path
ARM: dts: qcom: msm8660-surf: Use stdout-path
ARM: dts: qcom: ipq8064-ap148: Use stdout-path
ARM: dts: qcom: apq8084-mtp: Use stdout-path
ARM: dts: qcom: apq8084-ifc6540: Use stdout-path
ARM: dts: qcom: apq8074-dragonboard: Use stdout-path
ARM: dts: qcom: apq8064-ifc6410: Use stdout-path
ARM: dts: qcom: apq8064-cm-qs600: Use stdout-path
ARM: dts: qcom: Label serial nodes for aliasing and stdout-path
reset: ath79: Fix missing spin_lock_init
reset: Add (devm_)reset_control_get stub functions
ARM: EXYNOS: switch to using generic cpufreq driver for exynos4x12
cpufreq: exynos: Remove unselectable rule for arm-exynos-cpufreq.o
ARM: dts: add iommu property to JPEG device for exynos4
ARM: dts: enable SPI1 for exynos4412-odroidu3
...
- Use the correct GFN/BFN terms more consistently.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV8VRMAAoJEFxbo/MsZsTRiGQH/i/jrAJUJfrFC2PINaA2gDwe
O0dlrkCiSgAYChGmxxxXZQSPM5Po5+EbT/dLjZ/uvSooeorM9RYY/mFo7ut/qLep
4pyQUuwGtebWGBZTrj9sygUVXVhgJnyoZxskNUbhj9zvP7hb9++IiI78mzne6cpj
lCh/7Z2dgpfRcKlNRu+qpzP79Uc7OqIfDK+IZLrQKlXa7IQDJTQYoRjbKpfCtmMV
BEG3kN9ESx5tLzYiAfxvaxVXl9WQFEoktqe9V8IgOQlVRLgJ2DQWS6vmraGrokWM
3HDOCHtRCXlPhu1Vnrp0R9OgqWbz8FJnmVAndXT8r3Nsjjmd0aLwhJx7YAReO/4=
=JDia
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen terminology fixes from David Vrabel:
"Use the correct GFN/BFN terms more consistently"
* tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn
xen/privcmd: Further s/MFN/GFN/ clean-up
hvc/xen: Further s/MFN/GFN clean-up
video/xen-fbfront: Further s/MFN/GFN clean-up
xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn
xen: Use correctly the Xen memory terminologies
arm/xen: implement correctly pfn_to_mfn
xen: Make clear that swiotlb and biomerge are dealing with DMA address
Pull hexagon updates from Richard Kuo:
"Just two fixes -- one for a uapi header and one for a timer interface"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel:
Revert "Hexagon: fix signal.c compile error"
hexagon/time: Migrate to new 'set-state' interface
Almost everyone implements dma_set_mask the same way, although some time
that's hidden in ->set_dma_mask methods.
This patch consolidates those into a common implementation that either
calls ->set_dma_mask if present or otherwise uses the default
implementation. Some architectures used to only call ->set_dma_mask
after the initial checks, and those instance have been fixed to do the
full work. h8300 implemented dma_set_mask bogusly as a no-ops and has
been fixed.
Unfortunately some architectures overload unrelated semantics like changing
the dma_ops into it so we still need to allow for an architecture override
for now.
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most architectures just call into ->dma_supported, but some also return 1
if the method is not present, or 0 if no dma ops are present (although
that should never happeb). Consolidate this more broad version into
common code.
Also fix h8300 which inorrectly always returned 0, which would have been
a problem if it's dma_set_mask implementation wasn't a similarly buggy
noop.
As a few architectures have much more elaborate implementations, we
still allow for arch overrides.
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently there are three valid implementations of dma_mapping_error:
(1) call ->mapping_error
(2) check for a hardcoded error code
(3) always return 0
This patch provides a common implementation that calls ->mapping_error
if present, then checks for DMA_ERROR_CODE if defined or otherwise
returns 0.
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most architectures do not support non-coherent allocations and either
define dma_{alloc,free}_noncoherent to their coherent versions or stub
them out.
Openrisc uses dma_{alloc,free}_attrs to implement them, and only Mips
implements them directly.
This patch moves the Openrisc version to common code, and handles the
DMA_ATTR_NON_CONSISTENT case in the mips dma_map_ops instance.
Note that actual non-coherent allocations require a dma_cache_sync
implementation, so if non-coherent allocations didn't work on
an architecture before this patch they still won't work after it.
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since 2009 we have a nice asm-generic header implementing lots of DMA API
functions for architectures using struct dma_map_ops, but unfortunately
it's still missing a lot of APIs that all architectures still have to
duplicate.
This series consolidates the remaining functions, although we still need
arch opt outs for two of them as a few architectures have very
non-standard implementations.
This patch (of 5):
The coherent DMA allocator works the same over all architectures supporting
dma_map operations.
This patch consolidates them and converges the minor differences:
- the debug_dma helpers are now called from all architectures, including
those that were previously missing them
- dma_alloc_from_coherent and dma_release_from_coherent are now always
called from the generic alloc/free routines instead of the ops
dma-mapping-common.h always includes dma-coherent.h to get the defintions
for them, or the stubs if the architecture doesn't support this feature
- checks for ->alloc / ->free presence are removed. There is only one
magic instead of dma_map_ops without them (mic_dma_ops) and that one
is x86 only anyway.
Besides that only x86 needs special treatment to replace a default devices
if none is passed and tweak the gfp_flags. An optional arch hook is provided
for that.
[linux@roeck-us.net: fix build]
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the additional "vm_flags_t vm_flags" argument to do_mmap_pgoff(),
rename it to do_mmap(), and re-introduce do_mmap_pgoff() as a simple
wrapper on top of do_mmap(). Perhaps we should update the callers of
do_mmap_pgoff() and kill it later.
This way mpx_mmap() can simply call do_mmap(vm_flags => VM_MPX) and do not
play with vm internals.
After this change mmap_region() has a single user outside of mmap.c,
arch/tile/mm/elf.c:arch_setup_additional_pages(). It would be nice to
change arch/tile/ and unexport mmap_region().
[kirill@shutemov.name: fix build]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct
structs should be constant.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When loading x86 64bit kernel above 4GiB with patched grub2, got kernel
gunzip error.
| early console in decompress_kernel
| decompress_kernel:
| input: [0x807f2143b4-0x807ff61aee]
| output: [0x807cc00000-0x807f3ea29b] 0x027ea29c: output_len
| boot via startup_64
| KASLR using RDTSC...
| new output: [0x46fe000000-0x470138cfff] 0x0338d000: output_run_size
| decompress: [0x46fe000000-0x47007ea29b] <=== [0x807f2143b4-0x807ff61aee]
|
| Decompressing Linux... gz...
|
| uncompression error
|
| -- System halted
the new buffer is at 0x46fe000000ULL, decompressor_gzip is using
0xffffffb901ffffff as out_len. gunzip in lib/zlib_inflate/inflate.c cap
that len to 0x01ffffff and decompress fails later.
We could hit this problem with crashkernel booting that uses kexec loading
kernel above 4GiB.
We have decompress_* support:
1. inbuf[]/outbuf[] for kernel preboot.
2. inbuf[]/flush() for initramfs
3. fill()/flush() for initrd.
This bug only affect kernel preboot path that use outbuf[].
Add __decompress and take real out_buf_len for gunzip instead of guessing
wrong buf size.
Fixes: 1431574a1c (lib/decompressors: fix "no limit" output buffer length)
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Jon Medhurst <tixy@linaro.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are two kexec load syscalls, kexec_load another and kexec_file_load.
kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
split kexec_load syscall code to kernel/kexec.c.
And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.
The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
kexec_load syscall can bypass the checking.
Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.
Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
kexec_load syscall.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a race which can result in the same virtual IRQ number
being assigned to two different MSI interrupts. The most visible
consequence of that is usually a warning and stack trace from the
sysfs code about an attempt to create a duplicate entry in sysfs.
The race happens when one CPU (say CPU 0) is disposing of an MSI
while another CPU (say CPU 1) is setting up an MSI. CPU 0 calls
(for example) pnv_teardown_msi_irqs(), which calls
msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
hardware IRQ number) is no longer in use. Then, before CPU 0 gets
to calling irq_dispose_mapping() to free up the virtal IRQ number,
CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
MSI, and gets the same hardware IRQ number that CPU 0 just freed.
CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
which sees that there is currently a mapping for that hardware IRQ
number and returns the corresponding virtual IRQ number (which is
the same virtual IRQ number that CPU 0 was using). CPU 0 then
calls irq_dispose_mapping() and frees that virtual IRQ number.
Now, if another CPU comes along and calls irq_create_mapping(), it
is likely to get the virtual IRQ number that was just freed,
resulting in the same virtual IRQ number apparently being used for
two different hardware interrupts.
To fix this race, we just move the call to msi_bitmap_free_hwirqs()
to after the call to irq_dispose_mapping(). Since virq_to_hw()
doesn't work for the virtual IRQ number after irq_dispose_mapping()
has been called, we need to call it before irq_dispose_mapping() and
remember the result for the msi_bitmap_free_hwirqs() call.
The pattern of calling msi_bitmap_free_hwirqs() before
irq_dispose_mapping() appears in 5 places under arch/powerpc, and
appears to have originated in commit 05af7bd2d7 ("[POWERPC] MPIC
U3/U4 MSI backend") from 2007.
Fixes: 05af7bd2d7 ("[POWERPC] MPIC U3/U4 MSI backend")
Cc: stable@vger.kernel.org # v2.6.22+
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The linux/audit.h header uses EM_MICROBLAZE in order to define
AUDIT_ARCH_MICROBLAZE, but it's only available in the microblaze
asm headers. Move it to the common elf-em.h header so that the
define can be used on non-microblaze systems. Otherwise we get
build errors that EM_MICROBLAZE isn't defined when we try to use
the AUDIT_ARCH_MICROBLAZE symbol.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
PBIAS regulator is required for MMC module in OMAP2, OMAP3, OMAP4,
OMAP5 and DRA7 SoCs. Enable it here.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
The use of get_domain() in copy_thread() results in an oops on
ARMv7M/noMMU systems. The thread cpu_domain value is only used when
CONFIG_CPU_USE_DOMAINS is enabled, so there's no need to save the
value in copy_thread() except when this is enabled, and this option
will never be enabled on these platforms.
Unhandled exception: IPSR = 00000006 LR = fffffff1
CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-next-20150909-00001-gb8ec5ad #41
Hardware name: NXP LPC18xx/43xx (Device Tree)
task: 2823fbe0 ti: 2823c000 task.ti: 2823c000
PC is at copy_thread+0x18/0x92
LR is at copy_thread+0x19/0x92
pc : [<2800a46e>] lr : [<2800a46f>] psr: 4100000b
sp : 2823df00 ip : 00000000 fp : 287c81c0
r10: 00000000 r9 : 00800300 r8 : 287c8000
r7 : 287c8000 r6 : 2818908d r5 : 00000000 r4 : 287ca000
r3 : 00000000 r2 : 00000000 r1 : fffffff0 r0 : 287ca048
xPSR: 4100000b
Reported-by: Ariel D'Alessandro <ariel@vanguardiasur.com.ar>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
NWFPE needs to access userspace to check whether the next instruction
is another FP instruction. Allow userspace access for this read.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Use stdout-path so that we don't have to put the console on the
kernel command line.
Cc: Tim Bird <tim.bird@sonymobile.com>
Cc: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Use stdout-path so that we don't have to put the console on the
kernel command line.
Cc: Mathieu Olivari <mathieu@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Use stdout-path so that we don't have to put the console on the
kernel command line.
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
* Switch to use pinctrl compatible for GPIOs
* Add RPM regulators for MSM8960
* Add SPI Ethernet support on MSM8960 CDP
* Add SMEM support along with dependencies
* Add PM8921 support for GPIO and MPP
* Fix GSBI cell index
* Switch to use real regulators on APQ8064 w/ SDCC
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJVuR9GAAoJEFKiBbHx2RXV9Q0P/2TXIsDmgFyyIA7edlqag3ch
m0coBGVv20bP5wIrYHjQywBJqfCuQPm70jVsl2Voxzw/69jCvzHlzia6Aqm6f+sL
jX941CpV7DVVACm0ltec3FqV33OZKRh9Eyz0Mq1jerrPsvwMnwGuAziH7zkzCNnr
b9aLdWqWr68HZve/8naRAFWpm5mvWJhADnbKh2UryH6ZWaT8ueWdqnhy4oteo6MK
sf0CWva9DdkH9GNE8F+N5ksOdV/a5IDdUVoujsaAaXJI/otqAoAn0PrzRTp1sV43
iAJKEAbAlhPKx/0YJIW1dpp8viyOcjAtNRpMLjA79HXF2yaYUlieLWYVhL851dKX
L4MNyYRauQf+JIR6fGwSN7Tg+/G8ZF5m70EZXkoya5JGBTnXbgZ/e0tytu8MXs3E
abiUeaYWFn4r1ta0V8glbk/iDlmNWcBUZz9NInrLzcjLOwq3+PJYdoq4mav5Qkrf
cpByhFRHEV2jToZ7t/uyz8NndwLKnioFHzKb+o8L+B9qzB0r/pEdF143gRWlWV6W
2WGIUf2wCOBEMmaWhbGvGdYP1RghtjSKRGoaN4d3mxSD2lU9OvkARYETKtg4hwo4
y8cWOhx3sNsrAxs0GdzN4bh06LcnKYGUnpuqQb3o3GAjxY3xTFlhK0FDpXULCU2W
c7A8yM5yN+csyAGl0r1/
=kyj6
-----END PGP SIGNATURE-----
Merge tag 'qcom-dt-for-4.3' into v4.2-rc2
Qualcomm ARM Based Device Tree Updates for v4.3
* Switch to use pinctrl compatible for GPIOs
* Add RPM regulators for MSM8960
* Add SPI Ethernet support on MSM8960 CDP
* Add SMEM support along with dependencies
* Add PM8921 support for GPIO and MPP
* Fix GSBI cell index
* Switch to use real regulators on APQ8064 w/ SDCC
Just a couple of changes for v4.3-rc1. A preparatory IRQ patch to
prepare for moving irq_data struct members, and a tweak to
Documentation/features since Meta2 could support THP.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJV7WemAAoJEGwLaZPeOHZ6mG8P/12NgxgdGFfv1DYeoxFTpzxO
KCT9M5u7h6jA0cg1fNNdCfh/xTjtDHOrJhFK1c/ZFJWbDBemEek73/1aLL2Yagj2
Tx7RqW83YWYIdKIkRs5XRz988WJOqsyserY9hvb9D7LdjMtM8rMzOFA5txGMYrJo
STHRbqhVZX6p2eA8qLPtKtb+Xh4I6tcamVWIEqpNqMhA/vU8YmLsXqZLmwclxyOz
pbWvRTfPNa/MWsO1smEPnc+B4M/FEuyQGSpJ4oyyGVEwCfKm34jUR458F1oTmSA6
GRZHTVJFMTVuRzQ7EeTLo+l08pMHwOjtISMl3JWMOW+o0rCiYVq+kmN3qYTnQA5D
rT8K51ty8GYhMvO2DvLJ5/G+lBUBvUNDHSTRIH8Gkyk+OyRAARaMpksJzoSam4RZ
ToSCxIlrrGIxOrlmaVxdPWrMVDjmvLrAzx/PfbVedxrryClzwiLbMsZKVQP5r3f7
GMMWxhhRxtckAnIhIItj+tK238OOUlSNYusfj1FsJ7cabolQFbyyGJyr9nENcZBm
5a6lVCQoyGUGI+O6wLKrigQUS/IGT4tELQ8vpPGZA6cdTgcOfx4HHnqgsIEBJLZF
y8i0vzll20LfpkHDMBSIV0mP1Y7lBULDl2SSUUez/vfN/MfUphOlzqGNIOxViccL
zKPrYR2QAuO7fzNy33YG
=skwh
-----END PGP SIGNATURE-----
Merge tag 'metag-for-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
Pull metag updates from James Hogan:
"Metag architecture changes for v4.3.
Just a couple of changes for v4.3-rc1. A preparatory IRQ patch to
prepare for moving irq_data struct members, and a tweak to
Documentation/features since Meta2 could support THP"
* tag 'metag-for-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
Documentation/features/vm: Meta2 is capable of THP
metag/irq: Use access helper irq_data_get_affinity_mask()
This reverts commit f3f601c1d2.
UAPI headers cannot use "uapi/" in their paths by design -- when they're
installed, they do not have the uapi/ prefix. Otherwise doing so breaks
userland badly.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Merge second patch-bomb from Andrew Morton:
"Almost all of the rest of MM. There was an unusually large amount of
MM material this time"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
zpool: remove no-op module init/exit
mm: zbud: constify the zbud_ops
mm: zpool: constify the zpool_ops
mm: swap: zswap: maybe_preload & refactoring
zram: unify error reporting
zsmalloc: remove null check from destroy_handle_cache()
zsmalloc: do not take class lock in zs_shrinker_count()
zsmalloc: use class->pages_per_zspage
zsmalloc: consider ZS_ALMOST_FULL as migrate source
zsmalloc: partial page ordering within a fullness_list
zsmalloc: use shrinker to trigger auto-compaction
zsmalloc: account the number of compacted pages
zsmalloc/zram: introduce zs_pool_stats api
zsmalloc: cosmetic compaction code adjustments
zsmalloc: introduce zs_can_compact() function
zsmalloc: always keep per-class stats
zsmalloc: drop unused variable `nr_to_migrate'
mm/memblock.c: fix comment in __next_mem_range()
mm/page_alloc.c: fix type information of memoryless node
memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node()
...
Pull parisc updates from Helge Deller:
"The most important changes in this patchset are:
- re-enable 64bit PCI bus addresses which were temporarily disabled
for PA-RISC in kernel 4.2
- fix the 64bit CAS operation in the LWS path which now enables us to
enable the 64bit gcc atomic builtins even on 32bit userspace with
64bit kernel
- fix a long-standing bug which sometimes crashed kernel at bootup
while serial interrupt wasn't registered yet"
* 'parisc-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Use platform_device_register_simple("rtc-generic")
parisc: Drop CONFIG_SMP around update_cr16_clocksource()
parisc: Use double word condition in 64bit CAS operation
parisc: Filter out spurious interrupts in PA-RISC irq handler
parisc: Additionally check for in_atomic() in page fault handler
PCI,parisc: Enable 64-bit bus addresses on PA-RISC
parisc: Define ioremap_uc and ioremap_wc
Migrate hexagon driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.
This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.
We weren't doing anything in the ->set_mode() callback. So, this patch
doesn't provide any set-state callbacks.
Cc: linux-hexagon@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Core:
- use is_visible() to control sysfs attributes
- switch wakealarm attribute to DEVICE_ATTR_RW
- make rtc_does_wakealarm() return boolean
- properly manage lifetime of dev and cdev in rtc device
- remove unnecessary device_get() in rtc_device_unregister
- fix double free in rtc_register_device() error path
New drivers:
- NXP LPC24xx
- Xilinx Zynq MP
- Dialog DA9062
Subsystem wide cleanups:
- fix drivers that consider 0 as a valid IRQ in client->irq
- Drop (un)likely before IS_ERR(_OR_NULL)
- drop the remaining owner assignment for i2c_driver and platform_driver
- module autoload fixes
Drivers:
- 88pm80x: add device tree support
- abx80x: fix RTC write bit
- ab8500: Add a sentinel to ab85xx_rtc_ids[]
- armada38x: Align RTC set time procedure with the official errata
- as3722: correct month value
- at91sam9: cleanups
- at91rm9200: get and use slow clock and cleanups
- bq32k: remove redundant check
- cmos: century support, proper fix for the spurious wakeup
- ds1307: cleanups and wakeup irq support
- ds1374: Remove unused variable
- ds1685: Use module_platform_driver
- ds3232: fix WARNING trace in resume function
- gemini: fix ptr_ret.cocci warnings
- mt6397: implement suspend/resume
- omap: support internal and external clock enabling
- opal: Enable alarms only when opal supports tpo
- pcf2127: use OFS flag to detect unreliable date and warn the user
- pl031: fix typo for author email
- rx8025: huge cleanup and fixes
- sa1100/pxa: share common code
- s5m: fix to update ctrl register
- s3c: fix clocks and wakeup, cleanup
- sirfsoc: use regmap
- nvram_read()/nvram_write() functions for cmos, ds1305, ds1307, ds1343,
ds1511, ds1553, ds1742, m48t59, rp5c01, stk17ta8, tx4939
- use rtc_valid_tm() error code when reading date/time instead of 0 for
isl12022, pcf2123, pcf2127
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJV7ARpAAoJEKbNnwlvZCyzo40P/09fca3C9HBEI5hgJO7PmQX8
+3wkacMjEi58yMTzWoDCFh2H1Vvajm1d6bJ6H//5/3h/KPKrjIsCTFlMfuv8gfbe
xFoeXPMdiojjPxFjbjnpM+0GD9SKTIjwpP4d8HgoUDwGRJ6dxaTOVuJXRmVGcmas
e4Ih6hDy2YBiRXLzLHSvSNcTJDGznwMvEEg1AfqUjAtNm4BUzhUnvq0CYlSdwMzy
xK6w0LmOr67l2QpjiQNCvIn04OWdDQ35f0GcRHYEY66cZPkvPp3OCC/UbeChU6uK
He/v1n+QdBTbkKYeYmNnBSr42KPTLONNaOFXf79PpAz/WtiYxLxJDbf8DYW9gGQj
mC3rMNWXf4rEFHkHYH4KvbvxplYd6/pCByMFJa76pKtcE2FG7pVAQZgLaLGemct4
q0MW73n7yBR1wDlTHqqKu1pAnz1CW3Eui96mDJJptG3CtNqe9a8G48Yzy5OuYAcI
wxBWCbxeT2z+EuePOBGlQvPreyqyTjHXL87NqMEDrzFrdQd02rn90xqIPlCbcr/P
qyRwIrDGFyV8PX9I4ewrtt9WfgttayHgZ4juaSi9F479oT6006z6vArrTg2Q/saA
b0X+hG8oLGSZuSobASjjCtjTE+jYhr+4hzGuHJw0ky0CzLQOqf1APMKDwLN4m5Pz
Ex+JKalozL+LElGMCS3q
=cKuu
-----END PGP SIGNATURE-----
Merge tag 'rtc-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Core:
- use is_visible() to control sysfs attributes
- switch wakealarm attribute to DEVICE_ATTR_RW
- make rtc_does_wakealarm() return boolean
- properly manage lifetime of dev and cdev in rtc device
- remove unnecessary device_get() in rtc_device_unregister
- fix double free in rtc_register_device() error path
New drivers:
- NXP LPC24xx
- Xilinx Zynq MP
- Dialog DA9062
Subsystem wide cleanups:
- fix drivers that consider 0 as a valid IRQ in client->irq
- Drop (un)likely before IS_ERR(_OR_NULL)
- drop the remaining owner assignment for i2c_driver and
platform_driver
- module autoload fixes
Drivers:
- 88pm80x: add device tree support
- abx80x: fix RTC write bit
- ab8500: Add a sentinel to ab85xx_rtc_ids[]
- armada38x: Align RTC set time procedure with the official errata
- as3722: correct month value
- at91sam9: cleanups
- at91rm9200: get and use slow clock and cleanups
- bq32k: remove redundant check
- cmos: century support, proper fix for the spurious wakeup
- ds1307: cleanups and wakeup irq support
- ds1374: Remove unused variable
- ds1685: Use module_platform_driver
- ds3232: fix WARNING trace in resume function
- gemini: fix ptr_ret.cocci warnings
- mt6397: implement suspend/resume
- omap: support internal and external clock enabling
- opal: Enable alarms only when opal supports tpo
- pcf2127: use OFS flag to detect unreliable date and warn the user
- pl031: fix typo for author email
- rx8025: huge cleanup and fixes
- sa1100/pxa: share common code
- s5m: fix to update ctrl register
- s3c: fix clocks and wakeup, cleanup
- sirfsoc: use regmap
- nvram_read()/nvram_write() functions for cmos, ds1305, ds1307,
ds1343, ds1511, ds1553, ds1742, m48t59, rp5c01, stk17ta8, tx4939
- use rtc_valid_tm() error code when reading date/time instead of 0
for isl12022, pcf2123, pcf2127"
* tag 'rtc-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (90 commits)
rtc: abx80x: fix RTC write bit
rtc: ab8500: Add a sentinel to ab85xx_rtc_ids[]
rtc: ds1374: Remove unused variable
rtc: Fix module autoload for OF platform drivers
rtc: Fix module autoload for rtc-{ab8500,max8997,s5m} drivers
rtc: omap: Add external clock enabling support
rtc: omap: Add internal clock enabling support
ARM: dts: AM437x: Add the internal and external clock nodes for rtc
rtc: s5m: fix to update ctrl register
rtc: add xilinx zynqmp rtc driver
devicetree: bindings: rtc: add bindings for xilinx zynqmp rtc
rtc: as3722: correct month value
ARM: config: Switch PXA27x platforms to use PXA RTC driver
ARM: mmp: remove unused RTC register definitions
ARM: sa1100: remove unused RTC register definitions
rtc: sa1100/pxa: convert to run-time register mapping
ARM: pxa: add memory resource to SA1100 RTC device
rtc: pxa: convert to use shared sa1100 functions
rtc: sa1100: prepare to share sa1100_rtc_ops
rtc: ds3232: fix WARNING trace in resume function
...
alloc_pages_exact_node() was introduced in commit 6484eb3e2a ("page
allocator: do not check NUMA node ID when the caller knows the node is
valid") as an optimized variant of alloc_pages_node(), that doesn't
fallback to current node for nid == NUMA_NO_NODE. Unfortunately the
name of the function can easily suggest that the allocation is
restricted to the given node and fails otherwise. In truth, the node is
only preferred, unless __GFP_THISNODE is passed among the gfp flags.
The misleading name has lead to mistakes in the past, see for example
commits 5265047ac3 ("mm, thp: really limit transparent hugepage
allocation to local node") and b360edb43f ("mm, mempolicy:
migrate_to_node should only migrate to node").
Another issue with the name is that there's a family of
alloc_pages_exact*() functions where 'exact' means exact size (instead
of page order), which leads to more confusion.
To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that
it's an optimized variant of alloc_pages_node() not intended for general
usage. Both functions get described in comments.
It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't
duplicate the API needlessly. The number of users would be small
anyway.
Existing callers of alloc_pages_exact_node() are simply converted to
call __alloc_pages_node(), with the exception of sba_alloc_coherent()
which open-codes the check for NUMA_NO_NODE, so it is converted to use
alloc_pages_node() instead. This means it no longer performs some
VM_BUG_ON checks, and since the current check for nid in
alloc_pages_node() uses a 'nid < 0' comparison (which includes
NUMA_NO_NODE), it may hide wrong values which would be previously
exposed.
Both differences will be rectified by the next patch.
To sum up, this patch makes no functional changes, except temporarily
hiding potentially buggy callers. Restricting the checks in
alloc_pages_node() is left for the next patch which can in turn expose
more existing buggy callers.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Robin Holt <robinmholt@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cliff Whickman <cpw@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The early_ioremap library now has a generic copy_from_early_mem()
function. Use the generic copy function for x86 relocate_initrd().
[akpm@linux-foundation.org: remove MAX_MAP_CHUNK define, per Yinghai Lu]
Signed-off-by: Mark Salter <msalter@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The use of mem= could leave part or all of the initrd outside of the
kernel linear map. This will lead to an error when unpacking the initrd
and a probable failure to boot. This patch catches that situation and
relocates the initrd to be fully within the linear map.
Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When parsing SRAT, all memory ranges are added into numa_meminfo. In
numa_init(), before entering numa_cleanup_meminfo(), all possible memory
ranges are in numa_meminfo. And numa_cleanup_meminfo() removes all
ranges over max_pfn or empty.
But, this only works if the nodes are continuous. Let's have a look at
the following example:
We have an SRAT like this:
SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff]
SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff]
SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff]
SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug
SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug
SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug
SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug
SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug
SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug
On boot, only node 0,1,2,3 exist.
And the numa_meminfo will look like this:
numa_meminfo.nr_blks = 9
1. on node 0: [0, 60000000]
2. on node 0: [100000000, 20000000000]
3. on node 1: [20000000000, 40000000000]
4. on node 4: [40000000000, 60000000000]
5. on node 5: [60000000000, 80000000000]
6. on node 2: [80000000000, a0000000000]
7. on node 3: [a0000000000, a0800000000]
8. on node 6: [c0000000000, a0800000000]
9. on node 7: [e0000000000, a0800000000]
And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because the
end address is over max_pfn, which is a0800000000. But 4 and 5 are not
removed because their end addresses are less then max_pfn. But in fact,
node 4 and 5 don't exist.
In a word, numa_cleanup_meminfo() is not able to handle holes between nodes.
Since memory ranges in node 4 and 5 are in numa_meminfo, in
numa_register_memblks(), node 4 and 5 will be mistakenly set to online.
If you run lscpu, it will show:
NUMA node0 CPU(s): 0-14,128-142
NUMA node1 CPU(s): 15-29,143-157
NUMA node2 CPU(s):
NUMA node3 CPU(s):
NUMA node4 CPU(s): 62-76,190-204
NUMA node5 CPU(s): 78-92,206-220
In this patch, we use memblock_overlaps_region() to check if ranges in
numa_meminfo overlap with ranges in memory_block. Since memory_block
contains all available memory at boot time, if they overlap, it means the
ranges exist. If not, then remove them from numa_meminfo.
After this patch, lscpu will show:
NUMA node0 CPU(s): 0-14,128-142
NUMA node1 CPU(s): 15-29,143-157
NUMA node4 CPU(s): 62-76,190-204
NUMA node5 CPU(s): 78-92,206-220
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"memcg: export struct mem_cgroup" will add includes into
linux/memcontrol.h which lead to further header dependency issues as
reported by Guenter Roeck:
In file included from include/linux/highmem.h:7:0,
from include/linux/bio.h:23,
from include/linux/writeback.h:192,
from include/linux/memcontrol.h:30,
from include/linux/swap.h:8,
from ./arch/sparc/include/asm/pgtable_32.h:17,
from ./arch/sparc/include/asm/pgtable.h:6,
from arch/sparc/kernel/traps_32.c:23:
include/linux/mm.h: In function 'is_vmalloc_addr':
include/linux/mm.h:371:17: error: 'VMALLOC_START' undeclared (first use in this function)
include/linux/mm.h:371:17: note: each undeclared identifier is reported only once for each function it appears in
include/linux/mm.h:371:41: error: 'VMALLOC_END' undeclared (first use in this function)
include/linux/mm.h: In function 'maybe_mkwrite':
include/linux/mm.h:556:3: error: implicit declaration of function 'pte_mkwrite'
The issue is that pgtable_32.h depends on swap.h to get swap_entry_t but
that goes all the way down to linux/mm.h which wants to have VMALLOC_*
which is defined later in pgtable_32.h, though.
swap_entry_t is defined in include/mm_types.h so it should be sufficient
to include this header without more dependencies.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map. This facility is used by the pmem driver to
enable pfn_to_page() operations on the page frames returned by DAX
('direct_access' in 'struct block_device_operations'). For now, the
'memmap' allocation for these "device" pages comes from "System
RAM". Support for allocating the memmap from device memory will
arrive in a later kernel.
2/ Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3. Completion of
the conversion is targeted for v4.4.
3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
5/ Miscellaneous updates and fixes to libnvdimm including support
for issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
slsw6DkrWT60CRE42nbK
=o57/
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().
Summary:
- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.
This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').
For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.
- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.
Completion of the conversion is targeted for v4.4.
- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"
* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...
The changes with more meat are:
o Allowing the trace event filters to filter on CPU number and process ids
o Two new markers for trace output latency were added
(10 and 100 msec latencies)
o Have tracing_thresh filter function profiling time
I also worked on modifying the ring buffer code for some future
work, and moved the adding of the timestamp around. One of my changes
caused a regression, and since other changes were built on top of it
and already tested, I had to operate a revert of that change. Instead
of rebasing, this change set has the code that caused a regression
as well as the code to revert that change without touching the other
changes that were made on top of it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV6aZEAAoJEEjnJuOKh9ldrR4H/A1RcQf1prLLoUibPP4w3lat
dmQcdpS1NY+cqyiKuKPAOkFDGQL7qWzRqZ8whcPSJIsHq57ufqNSLf+0bbQYPzg9
g3CgGL7OApmGi5ulj0sNxhadvc9TFm/SAN0nVJlNuUWdm8e1UWHLsrJZaMfopu2r
RDEtkOhg619mhDL4rktNdS6rk0B92Fhu2o2PwLZPVlUl1NNEt4WJU+ejitXUVO1A
Nb70/rTGGJKtyHbW+74on4LnEN5Uu0Viu6rMwGfYyIgRmC2otdBDvE4xfKMiTUKr
SzBjzrhIoMIRn4Vl0vElfulkpYaw7pcC2BdpZ4d9VpIOiLSlZs0x/TgCtpFEv5M=
=baZ3
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing update from Steven Rostedt:
"Mostly this is just clean ups and micro optimizations.
The changes with more meat are:
- Allowing the trace event filters to filter on CPU number and
process ids
- Two new markers for trace output latency were added (10 and 100
msec latencies)
- Have tracing_thresh filter function profiling time
I also worked on modifying the ring buffer code for some future work,
and moved the adding of the timestamp around. One of my changes
caused a regression, and since other changes were built on top of it
and already tested, I had to operate a revert of that change. Instead
of rebasing, this change set has the code that caused a regression as
well as the code to revert that change without touching the other
changes that were made on top of it"
* tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated"
tracing: Don't make assumptions about length of string on task rename
tracing: Allow triggers to filter for CPU ids and process names
ftrace: Format MCOUNT_ADDR address as type unsigned long
tracing: Introduce two additional marks for delay
ftrace: Fix function_graph duration spacing with 7-digits
ftrace: add tracing_thresh to function profile
tracing: Clean up stack tracing and fix fentry updates
ring-buffer: Reorganize function locations
ring-buffer: Make sure event has enough room for extend and padding
ring-buffer: Get timestamp after event is allocated
ring-buffer: Move the adding of the extended timestamp out of line
ring-buffer: Add event descriptor to simplify passing data
ftrace: correct the counter increment for trace_buffer data
tracing: Fix for non-continuous cpu ids
tracing: Prefer kcalloc over kzalloc with multiply
Pull security subsystem updates from James Morris:
"Highlights:
- PKCS#7 support added to support signed kexec, also utilized for
module signing. See comments in 3f1e1bea.
** NOTE: this requires linking against the OpenSSL library, which
must be installed, e.g. the openssl-devel on Fedora **
- Smack
- add IPv6 host labeling; ignore labels on kernel threads
- support smack labeling mounts which use binary mount data
- SELinux:
- add ioctl whitelisting (see
http://kernsec.org/files/lss2015/vanderstoep.pdf)
- fix mprotect PROT_EXEC regression caused by mm change
- Seccomp:
- add ptrace options for suspend/resume"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (57 commits)
PKCS#7: Add OIDs for sha224, sha284 and sha512 hash algos and use them
Documentation/Changes: Now need OpenSSL devel packages for module signing
scripts: add extract-cert and sign-file to .gitignore
modsign: Handle signing key in source tree
modsign: Use if_changed rule for extracting cert from module signing key
Move certificate handling to its own directory
sign-file: Fix warning about BIO_reset() return value
PKCS#7: Add MODULE_LICENSE() to test module
Smack - Fix build error with bringup unconfigured
sign-file: Document dependency on OpenSSL devel libraries
PKCS#7: Appropriately restrict authenticated attributes and content type
KEYS: Add a name for PKEY_ID_PKCS7
PKCS#7: Improve and export the X.509 ASN.1 time object decoder
modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYS
extract-cert: Cope with multiple X.509 certificates in a single file
sign-file: Generate CMS message as signature instead of PKCS#7
PKCS#7: Support CMS messages also [RFC5652]
X.509: Change recorded SKID & AKID to not include Subject or Issuer
PKCS#7: Check content type and versions
MAINTAINERS: The keyrings mailing list has moved
...
Pull NMI backtrace update from Russell King:
"These changes convert the x86 NMI handling to be a library
implementation which other architectures can make use of. Thomas
Gleixner has reviewed and tested these changes, and wishes me to send
these rather than taking them through the tip tree.
The final patch in the set adds an initial implementation using this
infrastructure to ARM, even though it doesn't send the IPI at "NMI"
level. Patches are in progress to add the ARM equivalent of NMI, but
we still need the IRQ-level fallback for systems where the "NMI" isn't
available due to secure firmware denying access to it"
* 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: add basic support for on-demand backtrace of other CPUs
nmi: x86: convert to generic nmi handler
nmi: create generic NMI backtrace implementation
- Convert xen-blkfront to the multiqueue API
- [arm] Support binding event channels to different VCPUs.
- [x86] Support > 512 GiB in a PV guests (off by default as such a
guest cannot be migrated with the current toolstack).
- [x86] PMU support for PV dom0 (limited support for using perf with
Xen and other guests).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV7wIdAAoJEFxbo/MsZsTR0hEH/04HTKLKGnSJpZ5WbMPxqZxE
UqGlvhvVWNAmFocZmbPcEi9T1qtcFrX5pM55JQr6UmAp3ovYsT2q1Q1kKaOaawks
pSfc/YEH3oQW5VUQ9Lm9Ru5Z8Btox0WrzRREO92OF36UOgUOBOLkGsUfOwDinNIM
lSk2djbYwDYAsoeC3PHB32wwMI//Lz6B/9ZVXcyL6ULynt1ULdspETjGnptRPZa7
JTB5L4/soioKOn18HDwwOhKmvaFUPQv9Odnv7dc85XwZreajhM/KMu3qFbMDaF/d
WVB1NMeCBdQYgjOrUjrmpyr5uTMySiQEG54cplrEKinfeZgKlEyjKvjcAfJfiac=
=Ktjl
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from David Vrabel:
"Xen features and fixes for 4.3:
- Convert xen-blkfront to the multiqueue API
- [arm] Support binding event channels to different VCPUs.
- [x86] Support > 512 GiB in a PV guests (off by default as such a
guest cannot be migrated with the current toolstack).
- [x86] PMU support for PV dom0 (limited support for using perf with
Xen and other guests)"
* tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (33 commits)
xen: switch extra memory accounting to use pfns
xen: limit memory to architectural maximum
xen: avoid another early crash of memory limited dom0
xen: avoid early crash of memory limited dom0
arm/xen: Remove helpers which are PV specific
xen/x86: Don't try to set PCE bit in CR4
xen/PMU: PMU emulation code
xen/PMU: Intercept PMU-related MSR and APIC accesses
xen/PMU: Describe vendor-specific PMU registers
xen/PMU: Initialization code for Xen PMU
xen/PMU: Sysfs interface for setting Xen PMU mode
xen: xensyms support
xen: remove no longer needed p2m.h
xen: allow more than 512 GB of RAM for 64 bit pv-domains
xen: move p2m list if conflicting with e820 map
xen: add explicit memblock_reserve() calls for special pages
mm: provide early_memremap_ro to establish read-only mapping
xen: check for initrd conflicting with e820 map
xen: check pre-allocated page tables for conflict with memory map
xen: check for kernel memory conflicting with memory layout
...
Pull m68k/colfire fixes from Greg Ungerer:
"Only a couple of patches this time. One migrating the clock driver
code to the new set-state interface. The other cleaning up to use the
PFN_DOWN macro"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k/coldfire: use PFN_DOWN macro
m68k/coldfire/pit: Migrate to new 'set-state' interface
Pull crypto fix from Herbert Xu:
"This fixes a memory corruption bug in ghash-clmulni-intel due to
insufficient memory allocation"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ghash-clmulni: specify context size for ghash async algorithm
The privcmd code is mixing the usage of GFN and MFN within the same
functions which make the code difficult to understand when you only work
with auto-translated guests.
The privcmd driver is only dealing with GFN so replace all the mention
of MFN into GFN.
The ioctl structure used to map foreign change has been left unchanged
given that the userspace is using it. Nonetheless, add a comment to
explain the expected value within the "mfn" field.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
is meant, I suspect this is because the first support for Xen was for
PV. This resulted in some misimplementation of helpers on ARM and
confused developers about the expected behavior.
For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
Although, if we look at the implementation on x86, it's returning a GFN.
For clarity and avoid new confusion, replace any reference to mfn with
gfn in any helpers used by PV drivers. The x86 code will still keep some
reference of pfn_to_mfn which may be used by all kind of guests
No changes as been made in the hypercall field, even
though they may be invalid, in order to keep the same as the defintion
in xen repo.
Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a
name to close to the KVM function gfn_to_page.
Take also the opportunity to simplify simple construction such
as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up
will come in follow-up patches.
[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
After the commit introducing convertion between DMA and guest addresses,
all the callers of pfn_to_mfn are expecting to get a GFN (Guest Frame
Number). On ARM, all the guests are auto-translated so the GFN is equal
to the Linux PFN (Pseudo-physical Frame Number).
The current implementation may return an MFN if the caller is passing a
PFN associated to a mapped foreign grant. In pratice, I haven't seen
the problem on running guest but we should fix it for the sake of
correctness.
Correct the implementation by always returning the pfn passed in parameter.
A follow-up patch will take care to rename pfn_to_mfn to a suitable
name.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The swiotlb is required when programming a DMA address on ARM when a
device is not protected by an IOMMU.
In this case, the DMA address should always be equal to the machine address.
For DOM0 memory, Xen ensure it by have an identity mapping between the
guest address and host address. However, when mapping a foreign grant
reference, the 1:1 model doesn't work.
For ARM guest, most of the callers of pfn_to_mfn expects to get a GFN
(Guest Frame Number), i.e a PFN (Page Frame Number) from the Linux point
of view given that all ARM guest are auto-translated.
Even though the name pfn_to_mfn is misleading, we need to ensure that
those caller get a GFN and not by mistake a MFN. In pratical, I haven't
seen error related to this but we should fix it for the sake of
correctness.
In order to fix the implementation of pfn_to_mfn on ARM in a follow-up
patch, we have to introduce new helpers to return the DMA from a PFN and
the invert.
On x86, the new helpers will be an alias of pfn_to_mfn and mfn_to_pfn.
The helpers will be used in swiotlb and xen_biovec_phys_mergeable.
This is necessary in the latter because we have to ensure that the
biovec code will not try to merge a biovec using foreign page and
another using Linux memory.
Lastly, the helper mfn_to_local_pfn has been renamed to bfn_to_local_pfn
given that the only usage was in swiotlb.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
No need to use CONFIG_SMP around update_cr16_clocksource(). It checks for
num_online_cpus() beeing greater than 1, which is always 1 in UP builds.
Signed-off-by: Helge Deller <deller@gmx.de>
Instead of using physical addresses for accounting of extra memory
areas available for ballooning switch to pfns as this is much less
error prone regarding partial pages.
Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
When a pv-domain (including dom0) is started it tries to size it's
p2m list according to the maximum possible memory amount it ever can
achieve. Limit the initial maximum memory size to the architectural
limit of the hardware in order to avoid overflows during remapping
of memory.
This problem will occur when dom0 is started with an initial memory
size being a multiple of 1GB, but without specifying it's maximum
memory size. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen.
Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Commit b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory occurring
only on some hardware.
The problem arises in case dom0 is started with initial memory and
maximum memory being the same. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen. If all
of this is true and the E820 map of the machine is sparse (some areas
are not covered) then the machine might crash early in the boot
process.
An example E820 map triggering the problem looks like this:
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cf7fafff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cf7fb000-0x00000000cf95ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cf960000-0x00000000cfb62fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfb63000-0x00000000cfd14fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cfd15000-0x00000000cfd61fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfd62000-0x00000000cfd6cfff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000cfd6d000-0x00000000cfd6ffff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfd70000-0x00000000cfd70fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cfd71000-0x00000000cfea8fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfea9000-0x00000000cfeb9fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfeba000-0x00000000cfecafff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfecb000-0x00000000cfecbfff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfecc000-0x00000000cfedbfff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfedc000-0x00000000cfedcfff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfedd000-0x00000000cfeddfff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfede000-0x00000000cfee3fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfee4000-0x00000000cfef6fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfef7000-0x00000000cfefffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed61000-0x00000000fed70fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000020effffff] usable
In this case the area a0000-dffff isn't present in the map. This will
confuse the memory setup of the domain when remapping the memory from
such holes to populated areas.
To avoid the problem the accounting of to be remapped memory has to
count such holes in the E820 map as well.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Commit b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory.
The problem arises in case dom0 is started with initial memory and
maximum memory being the same and exactly a multiple of 1 GB. The
kernel must be configured without CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
for the problem to happen. In this case it will crash very early
during boot due to the virtual mapped p2m list not being large
enough to be able to remap any memory:
(XEN) Freed 304kB init memory.
mapping kernel into physical memory
about to get started...
(XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080229a93 create_bounce_frame+0x12b/0x13a
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.5.2-pre x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e033:[<ffffffff81d120cb>]
(XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (d0v0)
(XEN) rax: ffffffff81db2000 rbx: 000000004d000000 rcx: 0000000000000000
(XEN) rdx: 000000004d000000 rsi: 0000000000063000 rdi: 000000004d063000
(XEN) rbp: ffffffff81c03d78 rsp: ffffffff81c03d28 r8: 0000000000023000
(XEN) r9: 00000001040ff000 r10: 0000000000007ff0 r11: 0000000000000000
(XEN) r12: 0000000000063000 r13: 000000000004d000 r14: 0000000000000063
(XEN) r15: 0000000000000063 cr0: 0000000080050033 cr4: 00000000000006f0
(XEN) cr3: 0000000105c0f000 cr2: ffffc90000268000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
(XEN) Guest stack trace from rsp=ffffffff81c03d28:
(XEN) 0000000000000000 0000000000000000 ffffffff81d120cb 000000010000e030
(XEN) 0000000000010006 ffffffff81c03d68 000000000000e02b ffffffffffffffff
(XEN) 0000000000000063 000000000004d063 ffffffff81c03de8 ffffffff81d130a7
(XEN) ffffffff81c03de8 000000000004d000 00000001040ff000 0000000000105db1
(XEN) 00000001040ff001 000000000004d062 ffff8800092d6ff8 0000000002027000
(XEN) ffff8800094d8340 ffff8800092d6ff8 00003ffffffff000 ffff8800092d7ff8
(XEN) ffffffff81c03e48 ffffffff81d13c43 ffff8800094d8000 ffff8800094d9000
(XEN) 0000000000000000 ffff8800092d6000 00000000092d6000 000000004cfbf000
(XEN) 00000000092d6000 00000000052d5442 0000000000000000 0000000000000000
(XEN) ffffffff81c03ed8 ffffffff81d185c1 0000000000000000 0000000000000000
(XEN) ffffffff81c03e78 ffffffff810f8ca4 ffffffff81c03ed8 ffffffff8171a15d
(XEN) 0000000000000010 ffffffff81c03ee8 0000000000000000 0000000000000000
(XEN) ffffffff81f0e402 ffffffffffffffff ffffffff81dae900 0000000000000000
(XEN) 0000000000000000 0000000000000000 ffffffff81c03f28 ffffffff81d0cf0f
(XEN) 0000000000000000 0000000000000000 0000000000000000 ffffffff81db82e0
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) ffffffff81c03f38 ffffffff81d0c603 ffffffff81c03ff8 ffffffff81d11c86
(XEN) 0300000100000032 0000000000000005 0000000000000020 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
This can be avoided by allocating aneough space for the p2m to cover
the maximum memory of dom0 plus the identity mapped holes required
for PCI space, BIOS etc.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The attached change fixes the condition used in the "sub" instruction.
A double word comparison is needed. This fixes the 64-bit LWS CAS
operation on 64-bit kernels.
I can now enable 64-bit atomic support in GCC.
Cc: <stable@vger.kernel.org>
Signed-off-by: John David Anglin <dave.anglin>
Signed-off-by: Helge Deller <deller@gmx.de>
When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
long way to go to find the right IRQ line, registering it, then registering the
serial port and the irq handler for the serial port. During this phase spurious
interrupts for the serial port may happen which then crashes the kernel because
the action handler might not have been set up yet.
So, basically it's a race condition between the serial port hardware and the
CPU which sets up the necessary fields in the irq sructs. The main reason for
this race is, that we unmask the serial port irqs too early without having set
up everything properly before (which isn't easily possible because we need the
IRQ number to register the serial ports).
This patch is a work-around for this problem. It adds checks to the CPU irq
handler to verify if the IRQ action field has been initialized already. If not,
we just skip this interrupt (which isn't critical for a serial port at bootup).
The real fix would probably involve rewriting all PA-RISC specific IRQ code
(for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
the irq chips and proper irq enabling along this line.
This bug has been in the PA-RISC port since the beginning, but the crashes
happened very rarely with currently used hardware. But on the latest machine
which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
crashed at every boot because of this race. So, without this patch the machine
would currently be unuseable.
For the record, here is the flow logic:
1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
5. serial_init_chip() then registers the 8250 port.
Problems:
- In step 4 the CPU irq shouldn't have been registered yet, but after step 5
- If serial irq happens between 4 and 5 have finished, the kernel will crash
Signed-off-by: Helge Deller <deller@gmx.de>
Craig Estey noticed that we didn't checked for in_atomic() in our page fault
handler like other architectures. This commit adds this check by using
faulthandler_disabled() which includes a check for pagefault_disabled() and
in_atomic().
Reported-by: Craig Estey <cae370@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Commit 3cc2dac5be ("drivers/video/fbdev/atyfb: Replace MTRR UC hole
with strong UC") introduces calls to ioremap_wc and ioremap_uc. This
causes build failures with parisc:allmodconfig. Map the missing
functions to ioremap_nocache.
Fixes: 3cc2dac5be ("drivers/video/fbdev/atyfb:
Replace MTRR UC hole with strong UC")
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Max10 is a FPGA device. This patch adds defconfig based on Max10 hardware
reference design. Design is intended to run on Max10 development kit.
Signed-off-by: Chee Nouk Phoon <cnphoon@altera.com>
Signed-off-by: Ley Foon Tan <lftan@altera.com>
Max10 is a FPGA device. This patch adds Nios2 support for Max10.
This device tree is based on Max10 hardware reference design.
Signed-off-by: Chee Nouk Phoon <cnphoon@altera.com>
Signed-off-by: Ley Foon Tan <lftan@altera.com>
Commit f32393c943 ("powerpc/pseries: Correct cpu affinity for
dlpar added cpus") moved dlpar_acquire_drc() call to before
dlpar_configure_connector() call in dlpar_cpu_probe(), but missed
to release the DRC if dlpar_configure_connector() failed.
During CPU hotplug, if configure-connector fails for any reason,
then this will result in subsequent CPU hotplug attempts to fail.
Release the acquired DRC if dlpar_configure_connector() call fails
so that the DRC is left in right isolation and allocation state
for the subsequent hotplug operation to succeed.
Fixes: f32393c943 ("powerpc/pseries: Correct cpu affinity for dlpar added cpus")
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Removed some statistic counters to improve the performance of the handler.
Signed-off-by: Bernd Weiberg <bernd.weiberg@siemens.com>
Signed-off-by: Ley Foon Tan <lftan@altera.com>
Migrate nios2 driver to the new 'set-state' interface provided by clockevents core, the earlier 'set-mode' interface is marked obsolete now.
This also enables us to implement callbacks for new states of clockevent devices, for example: ONESHOT_STOPPED.
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Dmitry Torokhov <dtor@chromium.org>
Cc: nios2-dev@lists.rocketboards.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Ley Foon Tan <lftan@altera.com>
While working on the 32-bit ARM port of UEFI, I noticed a strange
corruption in the kernel log. The following snprintf() statement
(in drivers/firmware/efi/efi.c:efi_md_typeattr_format())
snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
was producing the following output in the log:
| | | | | |WB|WT|WC|UC]
| | | | | |WB|WT|WC|UC]
| | | | | |WB|WT|WC|UC]
|RUN| | | | |WB|WT|WC|UC]*
|RUN| | | | |WB|WT|WC|UC]*
| | | | | |WB|WT|WC|UC]
|RUN| | | | |WB|WT|WC|UC]*
| | | | | |WB|WT|WC|UC]
|RUN| | | | | | | |UC]
|RUN| | | | | | | |UC]
As it turns out, this is caused by incorrect code being emitted for
the string() function in lib/vsprintf.c. The following code
if (!(spec.flags & LEFT)) {
while (len < spec.field_width--) {
if (buf < end)
*buf = ' ';
++buf;
}
}
for (i = 0; i < len; ++i) {
if (buf < end)
*buf = *s;
++buf; ++s;
}
while (len < spec.field_width--) {
if (buf < end)
*buf = ' ';
++buf;
}
when called with len == 0, triggers an issue in the GCC SRA optimization
pass (Scalar Replacement of Aggregates), which handles promotion of signed
struct members incorrectly. This is a known but as yet unresolved issue.
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular
case, it is causing the second while loop to be executed erroneously a
single time, causing the additional space characters to be printed.
So disable the optimization by passing -fno-ipa-sra.
Cc: <stable@vger.kernel.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The 32-bit TCE table initialization relies on the DMA window having a
size equal to a power of 2 (and checks for it explicitly). But
crashkernel= has no constraint that requires a power-of-2 be specified.
This causes the kdump kernel to fail to boot as none of the PCI devices
(including the disk controller) are successfully initialized.
After this change, the PCI devices successfully set up the 32-bit TCE
table and kdump succeeds.
Fixes: aca6913f55 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages")
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 4.2
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When attempting to kdump with the 4.2 kernel, we see for each PCI
device:
pci 0003:01 : [PE# 000] Assign DMA32 space
pci 0003:01 : [PE# 000] Setting up 32-bit TCE table at 0..80000000
pci 0003:01 : [PE# 000] Failed to create 32-bit TCE table, err -22
PCI: Domain 0004 has 8 available 32-bit DMA segments
PCI: 4 PE# for a total weight of 70
pci 0004:01 : [PE# 002] Assign DMA32 space
pci 0004:01 : [PE# 002] Setting up 32-bit TCE table at 0..80000000
pci 0004:01 : [PE# 002] Failed to create 32-bit TCE table, err -22
pci 0004:0d : [PE# 005] Assign DMA32 space
pci 0004:0d : [PE# 005] Setting up 32-bit TCE table at 0..80000000
pci 0004:0d : [PE# 005] Failed to create 32-bit TCE table, err -22
pci 0004:0e : [PE# 006] Assign DMA32 space
pci 0004:0e : [PE# 006] Setting up 32-bit TCE table at 0..80000000
pci 0004:0e : [PE# 006] Failed to create 32-bit TCE table, err -22
pci 0004:10 : [PE# 008] Assign DMA32 space
pci 0004:10 : [PE# 008] Setting up 32-bit TCE table at 0..80000000
pci 0004:10 : [PE# 008] Failed to create 32-bit TCE table, err -22
and eventually the kdump kernel fails to boot as none of the PCI devices
(including the disk controller) are successfully initialized.
The EINVAL response is because the DMA window (the 2GB base window) is
larger than the kdump kernel's reserved memory (crashkernel=, in this
case specified to be 1024M). The check in question,
if ((window_size > memory_hotplug_max()) || !is_power_of_2(window_size))
is a valid sanity check for pnv_pci_ioda2_table_alloc_pages(), so adjust
the caller to pass in a smaller window size if our maximum memory value
is smaller than the DMA window.
After this change, the PCI devices successfully set up the 32-bit TCE
table and kdump succeeds.
The problem was seen on a Firestone machine originally.
Fixes: aca6913f55 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages")
Cc: stable@vger.kernel.org # 4.2
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Coding style pedantry, use u64, change the indentation]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Compiler warning:
CC [M] arch/x86/kvm/emulate.o
arch/x86/kvm/emulate.c: In function "__do_insn_fetch_bytes":
arch/x86/kvm/emulate.c:814:9: warning: "linear" may be used uninitialized in this function [-Wmaybe-uninitialized]
GCC is smart enough to realize that the inlined __linearize may return before
setting the value of linear, but not smart enough to realize the same
X86EMU_CONTINUE blocks actual use of the value. However, the value of
'linear' can only be set to one value, so hoisting the one line of code
upwards makes GCC happy with the code.
Reported-by: Aruna Hewapathirane <aruna.hewapathirane@gmail.com>
Tested-by: Aruna Hewapathirane <aruna.hewapathirane@gmail.com>
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The process_smi_save_seg_64() function called only in the
process_smi_save_state_64() if the CONFIG_X86_64 is set. This
patch adds #ifdef CONFIG_X86_64 around process_smi_save_seg_64()
to prevent following warning message:
arch/x86/kvm/x86.c:5946:13: warning: ‘process_smi_save_seg_64’ defined but not used [-Wunused-function]
static void process_smi_save_seg_64(struct kvm_vcpu *vcpu, char *buf, int n)
^
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This does not show up on all compiler versions, so it sneaked into the
first 4.3 pull request. The fix is to mimic the logic of the "print
sptes" loop in the "fill array" loop. Then leaf and root can be
both initialized unconditionally.
Note that "leaf" now points to the first unused element of the array,
not the last filled element.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull vfs updates from Al Viro:
"In this one:
- d_move fixes (Eric Biederman)
- UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
handling ought to be fixed)
- switch of sb_writers to percpu rwsem (Oleg Nesterov)
- superblock scalability (Josef Bacik and Dave Chinner)
- swapon(2) race fix (Hugh Dickins)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
vfs: Test for and handle paths that are unreachable from their mnt_root
dcache: Reduce the scope of i_lock in d_splice_alias
dcache: Handle escaped paths in prepend_path
mm: fix potential data race in SyS_swapon
inode: don't softlockup when evicting inodes
inode: rename i_wb_list to i_io_list
sync: serialise per-superblock sync operations
inode: convert inode_sb_list_lock to per-sb
inode: add hlist_fake to avoid the inode hash lock in evict
writeback: plug writeback at a high level
change sb_writers to use percpu_rw_semaphore
shift percpu_counter_destroy() into destroy_super_work()
percpu-rwsem: kill CONFIG_PERCPU_RWSEM
percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
percpu-rwsem: introduce percpu_down_read_trylock()
document rwsem_release() in sb_wait_write()
fix the broken lockdep logic in __sb_start_write()
introduce __sb_writers_{acquired,release}() helpers
ufs_inode_get{frag,block}(): get rid of 'phys' argument
ufs_getfrag_block(): tidy up a bit
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6ifEAAoJEAhfPr2O5OEVn5kP/i2jM1tWcmV/ZEBKGAN0jpRk
5Y/Q+rnXvOpIJSQC3dEkweoBymVMclSgSB/wFSWCZtp5MaB8KrH4/2uc3UvolF91
7bqXt+fCUacMbDQyaabMCR83mz9tdOJLd5sf0ABqBgXGfwh5uXmBPaYBzmcYvKcW
4D89MFUpaFDPARTs9rdpVyr0aPRU4GcN0R3snRO9Ly+cQnyV/RxPf9NqCgnI+yPq
+NvA9ScUBcBt62piSIGR4egcAR8boxYC+0r57340S21/JVMvsHQ3ok9b1aT8/rtd
Yl24FkcKrRV0ShN5S1RmW5DLH/HRGabuMjkiEz9xq52FGD2sQQda0At58dWivsa4
XYdxS9UUfb9Z+qyeMdmCl1MUFRrV2G4H6VItP+GKyT3UZLEDcLl6hBg3SkyWxWB4
CSO5WuRThiIB86OVcIaREftzqDy5HdvH3ZKRD7QrW0DItGVjQwV5j6gvwqO9OEXs
99BnSohyKwUBonumE2ZtFGGhIwIomllrMSqg991bPH9+13bg/rPxUqntkPrVap/9
cV3qKO8ZFrz5UInBnR1U83l60ZK7rV4G6AVMSMKpM9XVK9TDKryAUN9Mhj5XWRH8
hbma89TQVdhdrITtt27uzj8F622cvZvxd1BqDBR8DjKVvtv/E2GPzJrAj7GHe3/o
NgzP5fF6X2Si32GNb7J8
=cIed
-----END PGP SIGNATURE-----
Merge tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- new DVB frontend drivers: ascot2e, cxd2841er, horus3a, lnbh25
- new HDMI capture driver: tc358743
- new driver for NetUP DVB new boards (netup_unidvb)
- IR support for DVBSky cards (smipcie-ir)
- Coda driver has gain macroblock tiling support
- Renesas R-Car gains JPEG codec driver
- new DVB platform driver for STi boards: c8sectpfe
- added documentation for the media core kABI to device-drivers DocBook
- lots of driver fixups, cleanups and improvements
* tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (297 commits)
[media] c8sectpfe: Remove select on undefined LIBELF_32
[media] i2c: fix platform_no_drv_owner.cocci warnings
[media] cx231xx: Use wake_up_interruptible() instead of wake_up_interruptible_nr()
[media] tc358743: only queue subdev notifications if devnode is set
[media] tc358743: add missing Kconfig dependency/select
[media] c8sectpfe: Use %pad to print 'dma_addr_t'
[media] DocBook media: Fix typo "the the" in xml files
[media] tc358743: make reset gpio optional
[media] tc358743: set direction of reset gpio using devm_gpiod_get
[media] dvbdev: document most of the functions/data structs
[media] dvb_frontend.h: document the struct dvb_frontend
[media] dvb-frontend.h: document struct dtv_frontend_properties
[media] dvb-frontend.h: document struct dvb_frontend_ops
[media] dvb: Use DVBFE_ALGO_HW where applicable
[media] dvb_frontend.h: document struct analog_demod_ops
[media] dvb_frontend.h: Document struct dvb_tuner_ops
[media] Docbook: Document struct analog_parameters
[media] dvb_frontend.h: get rid of dvbfe_modcod
[media] add documentation for struct dvb_tuner_info
[media] dvb_frontend: document dvb_frontend_tune_settings
...
Merge patch-bomb from Andrew Morton:
- a few misc things
- Andy's "ambient capabilities"
- fs/nofity updates
- the ocfs2 queue
- kernel/watchdog.c updates and feature work.
- some of MM. Includes Andrea's userfaultfd feature.
[ Hadn't noticed that userfaultfd was 'default y' when applying the
patches, so that got fixed in this merge instead. We do _not_ mark
new features that nobody uses yet 'default y' - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
mm/hugetlb.c: make vma_has_reserves() return bool
mm/madvise.c: make madvise_behaviour_valid() return bool
mm/memory.c: make tlb_next_batch() return bool
mm/dmapool.c: change is_page_busy() return from int to bool
mm: remove struct node_active_region
mremap: simplify the "overlap" check in mremap_to()
mremap: don't do uneccesary checks if new_len == old_len
mremap: don't do mm_populate(new_addr) on failure
mm: move ->mremap() from file_operations to vm_operations_struct
mremap: don't leak new_vma if f_op->mremap() fails
mm/hugetlb.c: make vma_shareable() return bool
mm: make GUP handle pfn mapping unless FOLL_GET is requested
mm: fix status code which move_pages() returns for zero page
mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout()
genalloc: add support of multiple gen_pools per device
genalloc: add name arg to gen_pool_get() and devm_gen_pool_create()
mm/memblock: WARN_ON when nid differs from overlap region
Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap
mm: defer flush of writable TLB entries
mm: send one IPI per CPU to TLB flush all entries after unmapping pages
...
rtc can either be supplied from internal 32k clock or external crystal
generated 32k clock. Internal clock is SOC specific and the external
clock is board dependent. Adding the corresponding nodes.
Signed-off-by: Keerthy <j-keerthy@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
With the SA1100 and PXA RTC drivers be mutually exclusive and no
longer sharing hardware, PXA27x/PXA3xx platforms must use the PXA RTC
driver as the SA1100 platform device is no longer registered.
This change should be almost transparent to userspace. Former users of
pxa-rtc should be aware that 2 RTCs will be available on their kernels,
rtc0 being sa1100-rtc and rtc1 being pxa-rtc. Any userspace relying on
the fact that rtc0 was pxa-rtc should be fixed.
As a consequence:
- the first reboot after the switch will have the wrong time,
- on dual boot platform where the other OS programs some logic into the
sa1100 rtc IP, a lack of fix in userspace, ie. a kernel changing
sa1100-rtc thinking it is pxa-rtc could have dire consequence, such
as wiping the other OS data partition.
(Thanks to Robert Jarmik for help on the above commit text.)
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Daniel Mack <daniel@zonque.org>
Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
Cc: Sergey Lapin <slapin@ossfans.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mike Rapoport <mike@compulab.co.il>
Cc: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Now that register definitions have been moved to the driver, regs-rtc.h is
no longer used and can be removed.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Now that register definitions have been moved to the driver, we can remove
them from machine specific code.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
The drivers for the SA1100 and PXA RTCs are now mutually exclusive, so
add the memory resource for the sa1100-rtc device. Since the memory
resource is already present in the pxa_rtc_resources, that makes
sa1100_rtc_resources and pxa_rtc_resources equivalent, so use
pxa_rtc_resources for both devices and remove the duplicate
sa1100_rtc_resources.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Daniel Mack <daniel@zonque.org>
Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Currently, the rtc-sa1100 and rtc-pxa drivers co-exist as rtc-pxa has a
superset of functionality. Having 2 drivers sharing the same memory
resource is not allowed by the driver model if resources are properly
declared. This problem was avoided by not adding memory resources to the
SA1100 RTC driver, but that prevents clean-up of the SA1100 driver.
This commit converts the PXA RTC to use the exported SA1100 RTC
functions. Now the sa1100-rtc and pxa-rtc devices are mutually
exclusive, so we must remove the sa1100-rtc from pxa27x and pxa3xx.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Daniel Mack <daniel@zonque.org>
Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: rtc-linux@googlegroups.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
vm86 exposes an interesting attack surface against the entry
code. Since vm86 is mostly useless anyway if mmap_min_addr != 0,
just turn it off in that case.
There are some reports that vbetool can work despite setting
mmap_min_addr to zero. This shouldn't break that use case,
as CAP_SYS_RAWIO already overrides mmap_min_addr.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This change modifies gen_pool_get() and devm_gen_pool_create() client
interfaces adding one more argument "name" of a gen_pool object.
Due to implementation gen_pool_get() is capable to retrieve only one
gen_pool associated with a device even if multiple gen_pools are created,
fortunately right at the moment it is sufficient for the clients, hence
provide NULL as a valid argument on both producer devm_gen_pool_create()
and consumer gen_pool_get() sides.
Because only one created gen_pool per device is addressable, explicitly
add a restriction to devm_gen_pool_create() to create only one gen_pool
per device, this implies two possible error codes returned by the
function, account it on client side (only misc/sram). This completes
client side changes related to genalloc updates.
[akpm@linux-foundation.org: gen_pool_get() cleanup]
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An IPI is sent to flush remote TLBs when a page is unmapped that was
potentially accesssed by other CPUs. There are many circumstances where
this happens but the obvious one is kswapd reclaiming pages belonging to a
running process as kswapd and the task are likely running on separate
CPUs.
On small machines, this is not a significant problem but as machine gets
larger with more cores and more memory, the cost of these IPIs can be
high. This patch uses a simple structure that tracks CPUs that
potentially have TLB entries for pages being unmapped. When the unmapping
is complete, the full TLB is flushed on the assumption that a refill cost
is lower than flushing individual entries.
Architectures wishing to do this must give the following guarantee.
If a clean page is unmapped and not immediately flushed, the
architecture must guarantee that a write to that linear address
from a CPU with a cached TLB entry will trap a page fault.
This is essentially what the kernel already depends on but the window is
much larger with this patch applied and is worth highlighting. The
architecture should consider whether the cost of the full TLB flush is
higher than sending an IPI to flush each individual entry. An additional
architecture helper called flush_tlb_local is required. It's a trivial
wrapper with some accounting in the x86 case.
The impact of this patch depends on the workload as measuring any benefit
requires both mapped pages co-located on the LRU and memory pressure. The
case with the biggest impact is multiple processes reading mapped pages
taken from the vm-scalability test suite. The test case uses NR_CPU
readers of mapped files that consume 10*RAM.
Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%)
Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%)
Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%)
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
User 581.00 611.43
System 5804.93 4111.76
Elapsed 161.03 122.12
This is showing that the readers completed 24.40% faster with 29% less
system CPU time. From vmstats, it is known that the vanilla kernel was
interrupted roughly 900K times per second during the steady phase of the
test and the patched kernel was interrupts 180K times per second.
The impact is lower on a single socket machine.
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%)
Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%)
Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%)
4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
User 58.09 57.64
System 111.82 76.56
Elapsed 27.29 22.55
It's still a noticeable improvement with vmstat showing interrupts went
from roughly 500K per second to 45K per second.
The patch will have no impact on workloads with no memory pressure or have
relatively few mapped pages. It will have an unpredictable impact on the
workload running on the CPU being flushed as it'll depend on how many TLB
entries need to be refilled and how long that takes. Worst case, the TLB
will be completely cleared of active entries when the target PFNs were not
resident at all.
[sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When unmapping pages it is necessary to flush the TLB. If that page was
accessed by another CPU then an IPI is used to flush the remote CPU. That
is a lot of IPIs if kswapd is scanning and unmapping >100K pages per
second.
There already is a window between when a page is unmapped and when it is
TLB flushed. This series increases the window so multiple pages can be
flushed using a single IPI. This should be safe or the kernel is hosed
already.
Patch 1 simply made the rest of the series easier to write as ftrace
could identify all the senders of TLB flush IPIS.
Patch 2 tracks what CPUs potentially map a PFN and then sends an IPI
to flush the entire TLB.
Patch 3 tracks when there potentially are writable TLB entries that
need to be batched differently
Patch 4 increases SWAP_CLUSTER_MAX to further batch flushes
The performance impact is documented in the changelogs but in the optimistic
case on a 4-socket machine the full series reduces interrupts from 900K
interrupts/second to 60K interrupts/second.
This patch (of 4):
It is easy to trace when an IPI is received to flush a TLB but harder to
detect what event sent it. This patch makes it easy to identify the
source of IPIs being transmitted for TLB flushes on x86.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename watchdog_suspend() to lockup_detector_suspend() and
watchdog_resume() to lockup_detector_resume() to avoid confusion with the
watchdog subsystem and to be consistent with the existing name
lockup_detector_init().
Also provide comment blocks to explain the watchdog_running and
watchdog_suspended variables and their relationship.
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Stephane Eranian <eranian@google.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove watchdog_nmi_disable_all() and watchdog_nmi_enable_all() since
these functions are no longer needed. If a subsystem has a need to
deactivate the watchdog temporarily, it should utilize the
watchdog_suspend() and watchdog_resume() functions.
[akpm@linux-foundation.org: fix build with CONFIG_LOCKUP_DETECTOR=m]
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Stephane Eranian <eranian@google.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel's NMI watchdog has nothing to do with the watchdog subsystem.
Its header declarations should be in linux/nmi.h, not linux/watchdog.h.
The code provided two sets of dummy functions if HARDLOCKUP_DETECTOR is
not configured, one in the include file and one in kernel/watchdog.c.
Remove the dummy functions from kernel/watchdog.c and use those from the
include file.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With lockdep support implemented on CRISv32, we get the following splat.
switch_mm() can be called both from the scheduler() (with interrupts
disabled) and from flush_old_exec (via activate_mm()), with interrupts
enabled. Fix it by disabling interrupts in activate_mm(), similar to
powerpc and hexagon.
t======================================================
[ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
3.19.0-08802-g20bc9f1-dirty #323 Not tainted
------------------------------------------------------
init/1 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
(mmu_context_lock){+.+...}, at: [<c0009290>] switch_mm+0x22/0xc6
and this task is already holding:
(&rq->lock){-.-.-.}, at: [<c01a0756>] __schedule+0x5e/0x648
which would create a new lock dependency:
(&rq->lock){-.-.-.} -> (mmu_context_lock){+.+...}
but this new dependency connects a HARDIRQ-irq-safe lock:
(&rq->lock){-.-.-.}
... which became HARDIRQ-irq-safe at:
[<c002b03c>] scheduler_tick+0x28/0x5e
[<c0007c6c>] timer_interrupt+0x4e/0x6a
[<c0043ac4>] handle_irq_event_percpu+0x54/0x13c
[<c004343c>] generic_handle_irq+0x2a/0x36
to a HARDIRQ-irq-unsafe lock:
(mmu_context_lock){+.+...}
... which became HARDIRQ-irq-unsafe at:
... [<c0039e60>] __lock_acquire+0x8f8/0x1d9c
[<c0009290>] switch_mm+0x22/0xc6
[<c009c260>] flush_old_exec+0x500/0x5d4
[<c00da4c6>] load_elf_phdrs+0x7a/0x84
[<c00dbdb0>] load_elf_binary+0x21c/0x13b4
[<c009cdb6>] do_execve+0x22/0x2c
[<c001dcf2>] ____call_usermodehelper+0x0/0x154
[<c000581e>] ret_from_kernel_thread+0xe/0x14
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(mmu_context_lock);
local_irq_disable();
lock(&rq->lock);
lock(mmu_context_lock);
<Interrupt>
lock(&rq->lock);
*** DEADLOCK ***
1 lock held by init/1:
#0: (&rq->lock){-.-.-.}, at: [<c01a0756>] __schedule+0x5e/0x648
Call Trace:
[<c019fe9e>] printk+0x0/0x4e
[<c00368f8>] print_shortest_lock_dependencies+0x0/0x15c
[<c0048628>] print_stack_trace+0x0/0x88
[<c0038912>] __lock_is_held+0x3e/0x5e
[<c003b894>] lock_acquire+0x8a/0xcc
[<c01a50c4>] _raw_spin_lock+0x44/0x7a
[<c0009290>] switch_mm+0x22/0xc6
[<c01a06f8>] __schedule+0x0/0x648
[<c01a0d76>] schedule+0x36/0x7c
[<c0037d04>] trace_hardirqs_on+0x0/0x1e
[<c0004e18>] do_work_pending+0x30/0xd4
[<c000591a>] _work_pending+0xe/0x12
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Now that we have stack tracing and irq flags tracing support,
we can also enable lockdep support
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>