Merge yet more updates from Andrew Morton:
- a pile of minor fs fixes and cleanups
- kexec updates
- random misc fixes in various places: vmcore, rbtree, eventfd, ipc, seccomp.
- a series of python-based kgdb helper scripts
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO
samples/seccomp: improve label helper
ipc,sem: use current->state helpers
scripts/gdb: disable pagination while printing from breakpoint handler
scripts/gdb: define maintainer
scripts/gdb: convert CpuList to generator function
scripts/gdb: convert ModuleList to generator function
scripts/gdb: use a generator instead of iterator for task list
scripts/gdb: ignore byte-compiled python files
scripts/gdb: port to python3 / gdb7.7
scripts/gdb: add basic documentation
scripts/gdb: add lx-lsmod command
scripts/gdb: add class to iterate over CPU masks
scripts/gdb: add lx_current convenience function
scripts/gdb: add internal helper and convenience function for per-cpu lookup
scripts/gdb: add get_gdbserver_type helper
scripts/gdb: add internal helper and convenience function to retrieve thread_info
scripts/gdb: add is_target_arch helper
scripts/gdb: add helper and convenience function to look up tasks
scripts/gdb: add task iteration class
...
Signed-off-by: John de la Garza <john@jjdev.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new kexec preprocessor macro IND_FLAGS, which is the bitwise OR of
all the possible kexec IND_ kimage_entry indirection flags. Having this
macro allows for simplified code in the prosessing of the kexec
kimage_entry items. Also, remove the local powerpc definition and use the
generic one.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Maximilian Attems <max@stro.at>
Cc: Michal Marek <mmarek@suse.cz>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define new kexec preprocessor macros IND_*_BIT that define the bit
position of the kimage entry flags. Change the existing IND_* flag macros
to be defined as bit shifts of the corresponding IND_*_BIT macros. Also
wrap all C language code in kexec.h with #if !defined(__ASSEMBLY__) so
assembly files can include kexec.h to get the IND_* and IND_*_BIT macros.
Some CPU instruction sets have tests for bit position which are convenient
in implementing routines that operate on the kimage entry list. The
addition of these bit position macros in a common location will avoid
duplicate definitions and the chance that changes to the IND_* flags will
not be propagated to assembly files.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Maximilian Attems <max@stro.at>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the unneded declaration for a kexec_load() routine.
Fixes errors like these when running 'make headers_check':
include/uapi/linux/kexec.h: userspace cannot reference function or variable defined in the kernel
Paul said:
: The kexec_load declaration isn't very useful for userspace, see the patch
: I submitted in http://lkml.kernel.org/r/1389791824.17407.9.camel@x220 .
: And After my attempt the export of that declaration has also been
: discussed in
: http://lkml.kernel.org/r/115373b6ac68ee7a305975896e1c4971e8e51d4c.1408731991.git.geoff@infradead.org
:
: In that last discussion no one has been able to point to an actual user of
: it. So, as far as I can tell, no one actually uses it. Which makes
: sense, because including this header by itself doesn't give one access to
: a useful definition of kexec_load. So why bother with the declaration?
Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Paul Bolle <pebolle@tiscali.nl>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Maximilian Attems <max@stro.at>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct kimage has a member destination which is used to store the real
destination address of each page when load segment from user space buffer
to kernel. But we never retrieve the value stored in kimage->destination,
so this member variable in kimage and its assignment operation are
redundent code.
I guess for_each_kimage_entry just does the work that kimage->destination
is expected to do.
So in this patch just make a cleanup to remove it.
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull parisc update from Helge Deller:
"The major change in here is the removal of the old HP-UX compat code
which should have made it possible to load and execute 32-bit HP-UX
binaries on PA-RISC Linux. Since it was never functional and since
nobody cares about old 32-bit HPUX binaries any longer, it's now time
to free up 3200 lines of kernel code (CONFIG_HPUX and
CONFIG_BINFMT_SOM).
Other than that we wire up the execveat() syscall, fix sparse errors
and have some whitespace cleanups"
* 'parisc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
fs/binfmt_som: Drop kernel support for HP-UX SOM binaries
parisc: Remove unused function
parisc: macro whitespace fixes
parisc/uaccess: fix sparse errors
parisc: hpux - Remove HPUX syscall numbers
parisc: hpux - Remove hpux gateway page
parisc: hpux - Delete files in hpux subdirectory
parisc: hpux - Do not compile hpux subdirectory
parisc: hpux - Drop support for HP-UX binaries
parisc: Add error checks when building up signal trampoline handler
parisc: Wire up execveat syscall
Till now suspend-to-idle has not been able to save much more energy
than runtime PM because of timer interrupts that periodically bring
CPUs out of idle while they are waiting for a wakeup interrupt. Of
course, the timer interrupts are not wakeup ones, so the handling of
them can be deferred until a real wakeup interrupt happens, but at
the same time we don't want to mass-expire timers at that point.
The solution is to suspend the entire timekeeping when the last CPU
is entering an idle state and resume it when the first CPU goes out
of idle. That has to be done with care, though, so as to avoid
accessing suspended clocksources etc. end we need extra support
from idle drivers for that.
This series of commits adds support for quiescing timers during
suspend-to-idle and adds the requisite callbacks to intel_idle
and the ACPI cpuidle driver.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJU4PNaAAoJEILEb/54YlRxgjsP/0UbDGbltVyM8VFhsobqhOni
thKJTJsqWqYgsPfTufbOGyvP6zskbsarDlzCXoKXuHaynIqcxY8xfZvMdcQr1j0S
nhKdqv4R6qlP3w2cFxXVZwhw21X3YO1zIxpi5Do1HdVuWoOvxq8Dk4cU8MrgOJC0
6ThC9Q7klheV4tY6Narlmmf6sX5O+S/EaqnupESSG4cqxNmlPw5AguLviBaUNVAY
RSjUX8LAce05bOIGEpaFY+vUws+jlU7/T/GEajquEsGF9zalh2CsWso5nQvilxrJ
22MVqXUyHaXmTC+U7nV78qRkavR6zyr3v/JBDse56qRI1mFlmyvGh8mE5ukmpqJE
Cg5rRC68o71xlBSVGhKW3Os2ks2Nenj2NLltrTyuh43OBJ691TaLsZnKh5nYt/MW
MZdqRRjIDTMF+/P1u4wY8S63labrrmp7w4T720CgaZCLJ/9VfZQuqKXTTm2R5/II
eDhFvdYXoP2748uUOn5sOr5/o0xhnMdaxykZZxE3IkSpOpIV1Mo2HWTIyDYXlILP
0OuJUUZFZtFOjWGCPn3YgoFT94C3nlO1vkXw//44okTUiUaaOZz+VWDF4fxdVeLR
8NGTe+/QzEq+2lbs+ZWRSM1hPukOntFcwCgWXFiqh9x2F00LAw9JpkiKBujxTjUV
m2WstYaML3W7gBMyhxg0
=55Jb
-----END PGP SIGNATURE-----
Merge tag 'suspend-to-idle-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull suspend-to-idle updates from Rafael Wysocki:
"Suspend-to-idle timer quiescing support for v3.20-rc1
Until now suspend-to-idle has not been able to save much more energy
than runtime PM because of timer interrupts that periodically bring
CPUs out of idle while they are waiting for a wakeup interrupt. Of
course, the timer interrupts are not wakeup ones, so the handling of
them can be deferred until a real wakeup interrupt happens, but at the
same time we don't want to mass-expire timers at that point.
The solution is to suspend the entire timekeeping when the last CPU is
entering an idle state and resume it when the first CPU goes out of
idle. That has to be done with care, though, so as to avoid accessing
suspended clocksources etc. end we need extra support from idle
drivers for that.
This series of commits adds support for quiescing timers during
suspend-to-idle and adds the requisite callbacks to intel_idle and the
ACPI cpuidle driver"
* tag 'suspend-to-idle-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / idle: Implement ->enter_freeze callback routine
intel_idle: Add ->enter_freeze callbacks
PM / sleep: Make it possible to quiesce timers during suspend-to-idle
timekeeping: Make it safe to use the fast timekeeper while suspended
timekeeping: Pass readout base to update_fast_timekeeper()
PM / sleep: Re-implement suspend-to-idle handling
Just like in case of other watchdog drivers, use the new kernel core
API to provide restart support.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
These are changes for drivers that are intimately tied to some SoC
and for some reason could not get merged through the respective
subsystem maintainer tree.
This time around, much of this is for at91, with the bulk of it being syscon
and udc drivers.
Also, there's:
- coupled cpuidle support for Samsung Exynos4210
- Renesas 73A0 common-clk work
- of/platform changes to tear down DMA mappings on device destruction
- a few updates to the TI Keystone knav code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU4upSAAoJEIwa5zzehBx3HkUP/Rc4B1yZChNIFNfVq4dbei6w
dT9WdFmxOIj2JLeXEypFBiNf1nSHmsxrZe9/IDACz2fYQOnaZZ6/786utUJP/PtC
2GDJy9cjL2Xh03We3nQp5z6J33XvpEni1t82cOpCl8wLBOQNnkjEks8UvLgi1LHW
CNLcMm8JtDQ2aB/gRTjzetp9liZluESY5+Mig+loE2F/rzbMbNQDcWDDgUPyIQIS
1onL+Bad3BnGFdo/+qnkurGc81pxoKiQJty06VWFftzvIwxXhsNjrqls2+KzstAx
0lLvW1tqaDhXvUBImRM8GgfbldZslsgoFVmgESS9MpPMBNENYrkAiQNvJUnM7kd9
qHDQNq+zRNsz/k4fVvp/YUp7xEiAo4rLcFmp/dBr535jS2LNyiZnB94q+kXsin/m
tiyEMx+RWxEHTEHN9WdKE61Ty1RbzOa5UTLSzOKFAkF+m2nvuQsJvb97n19coAq9
SSsj/wJgesfqrDEegphCDh1fyVxUzlAjjhTAyvPS155WvPzkbxZxuBbSqRuriRKA
2aCfVne2ELimHAr3LEPgPW2kFBcONHckOGe6MvrTX4zPHU8bb9WIeg+iGdQChnr3
nclT9jq+ZnQro5XTgUtPtadq100oEXlJbqpAzhd+cJbvgzSNbcWfcgE6kOWqd9uK
oeWQWFLCdXLmXf9zCwmk
=T7R2
-----END PGP SIGNATURE-----
Merge tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC driver updates from Olof Johansson:
"These are changes for drivers that are intimately tied to some SoC and
for some reason could not get merged through the respective subsystem
maintainer tree.
This time around, much of this is for at91, with the bulk of it being
syscon and udc drivers.
Also, there's:
- coupled cpuidle support for Samsung Exynos4210
- Renesas 73A0 common-clk work
- of/platform changes to tear down DMA mappings on device destruction
- a few updates to the TI Keystone knav code"
* tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (26 commits)
cpuidle: exynos: add coupled cpuidle support for exynos4210
ARM: EXYNOS: apply S5P_CENTRAL_SEQ_OPTION fix only when necessary
soc: ti: knav_qmss_queue: change knav_range_setup_acc_irq to static
soc: ti: knav_qmss_queue: makefile tweak to build as dynamic module
pcmcia: at91_cf: depend on !ARCH_MULTIPLATFORM
soc: ti: knav_qmss_queue: export API calls for use by user driver
of/platform: teardown DMA mappings on device destruction
usb: gadget: at91_udc: Allocate udc instance
usb: gadget: at91_udc: Update DT binding documentation
usb: gadget: at91_udc: Rework for multi-platform kernel support
usb: gadget: at91_udc: Simplify probe and remove functions
usb: gadget: at91_udc: Remove non-DT handling code
usb: gadget: at91_udc: Document DT clocks and clock-names property
usb: gadget: at91_udc: Drop uclk clock
usb: gadget: at91_udc: Fix clock names
mfd: syscon: Add Atmel SMC binding doc
mfd: syscon: Add atmel-smc registers definition
mfd: syscon: Add Atmel Matrix bus DT binding documentation
mfd: syscon: Add atmel-matrix registers definition
clk: shmobile: fix sparse NULL pointer warning
...
DT changes continue to be the bulk of our merge window contents.
We continue to have a large set of changes across the board as new platforms
and drivers are added.
Some of the new platforms are:
- Alphascale ASM9260
- Marvell Armada 388
- CSR Atlas7
- TI Davinci DM816x
- Hisilicon HiP01
- ST STiH418
There have also been some sweeping changes, including relicensing of DTS
contents from GPL to GPLv2+/X11 so that the same files can be reused in
other non-GPL projects more easily. There's also been changes to the
DT Makefile to make it a little less conflict-ridden and churny down
the road.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU4u0bAAoJEIwa5zzehBx3XFQP+wbVDp39ay3SRanFWeXqhfTe
6jRsYrOcq6BN/b1NugjD+yKIYp2MQhwlXbMmj/1vnmJ3XSY25ZMLlgs0/vsNk7W2
5e0xySwdhd1DjsajhZyN+5gUgqcTgOof/V+CbEUkijDDJ9v/WJbGZrpCHDz+UVTh
dG9p1vrKoxDELAVbnp9muKZPlaQkAM60zJcHNJw9bJB5M0RCx4XFwPZc1cDLIsIZ
lK/uYpKsgvgrGw5QuCtEK1/NkqLkBqgBfVg6xq0VB6OCYetqpxqs7kSZjnncIhQc
PvxShsIJzb/dgfk7xBVb1+4Jfe5L/4poFwS69QuBlr/wiwc7wrhv37edgkyDlclS
aj5xfOIhQdDaTkknFVs4QEkGAFg/lnTZnmiNiQdlsmDHqbWdTEELKShdVeMO7Zsg
6bPdDipA2jsQ86UWNwucis8QulzVTuyNuU+Mlrxp73b76+hKXLkbYcZ51FJ/xMD8
wLpCGqtc9Quirdb7Wy7XiVfesv3lKfDmzZB/6ZJ6zfadDvsqJPxAjNTA8VYZ9YeT
EyW4K6CMOa5v+sLmIQUsAjKCYaul3PVDCi4voQjpS1ZtPLw+WN3zqX5XZZDT9Ll2
D1ycmInp/40KsQgjV36u1NlIKMM+oaUJaSzaSPGdgj3Zcw0YZi8O+h0m6iHrlzUB
uGFufsLKmcOFY/sLwprt
=XEw1
-----END PGP SIGNATURE-----
Merge tag 'dt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC DT updates from Olof Johansson:
"DT changes continue to be the bulk of our merge window contents.
We continue to have a large set of changes across the board as new
platforms and drivers are added.
Some of the new platforms are:
- Alphascale ASM9260
- Marvell Armada 388
- CSR Atlas7
- TI Davinci DM816x
- Hisilicon HiP01
- ST STiH418
There have also been some sweeping changes, including relicensing of
DTS contents from GPL to GPLv2+/X11 so that the same files can be
reused in other non-GPL projects more easily. There's also been
changes to the DT Makefile to make it a little less conflict-ridden
and churny down the road"
* tag 'dt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (330 commits)
ARM: dts: Add PPMU node for exynos4412-trats2
ARM: dts: Add PPMU node for exynos3250-monk and exynos3250-rinato
ARM: dts: Add PPMU dt node for exynos4 and exynos4210
ARM: dts: Add PPMU dt node for exynos3250
ARM: dts: add mipi dsi device node for exynos4415
ARM: dts: add fimd device node for exynos4415
ARM: dts: Add syscon phandle to the video-phy node for Exynos4
ARM: dts: Add sound nodes for exynos4412-trats2
ARM: dts: Fix CLK_MOUT_CAMn parent clocks assignment for exynos4412-trats2
ARM: dts: Fix CLK_UART_ISP_SCLK clock assignment in exynos4x12.dtsi
ARM: dts: Add max77693 charger node for exynos4412-trats2
ARM: dts: Switch max77686 regulators to GPIO control for exynos4412-trats2
ARM: dts: Add suspend configuration for max77686 regulators for exynos4412-trats2
ARM: dts: Add Maxim 77693 fuel gauge node for exynos4412-trats2
ARM: dts: am57xx-beagle-x15: Fix USB2 mode
ARM: dts: am57xx-beagle-x15: Add extcon nodes for USB
ARM: dts: dra72-evm: Add extcon nodes for USB
ARM: dts: dra7-evm: Add extcon nodes for USB
ARM: dts: rockchip: move the hdmi ddc-i2c-bus property to the actual boards
ARM: dts: rockchip: enable vops and hdmi output on rk3288-firefly and -evb
...
New and updated SoC support. Also included are some cleanups where the
platform maintainers hadn't separated cleanups from new developent in
separate branches.
Some of the larger things worth pointing out:
- A large set of changes from Alexandre Belloni and Nicolas Ferre
preparing at91 platforms for multiplatform and cleaning up quite a
bit in the process.
- Removal of CSR's "Marco" SoC platform that never made it out to the
market. We love seeing these since it means the vendor published
support before product was out, which is exactly what we want!
New platforms this release are:
- Conexant Digicolor (CX92755 SoC)
- Hisilicon HiP01 SoC
- CSR/sirf Atlas7 SoC
- ST STiH418 SoC
- Common code changes for Nvidia Tegra132 (64-bit SoC)
We're seeing more and more platforms having a harder time labelling
changes as cleanups vs new development -- which is a good sign that
we've come quite far on the cleanup effort. So over time we might start
combining the cleanup and new-development branches more.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU4uiiAAoJEIwa5zzehBx3LtoQAIP4eInJAumhB67MexzWGIBx
eOsloBRMEBrjBQdSYsdsypN6T61WjDu1aieCxEGzIqitcMa59AIyyzglmlXy3UmV
XQuSnIBag2fsOqrvqd+c6ewzAMxm2/Nbi3+zjzApkf27NDlBLhEjxuK6pAAf4Yw9
gyWqB9g0d4V06XdqRInRvyyVfMu6fdApHLnadtjcMdiorQGd1bcOE1sQYygy6N6e
d6vGvyKSv4ygyDG9//njzm6C5OnmHliimMToeuDC2Scel69RM97EnMXys988CqUH
0Ru7XANEujtHXSOBYOyCv1kk4V5NguGzlfepe23oidOew8MjUdyRvKrwUiMt3AnT
SVqcZ9UU5wjJC6j+iADh+E7zww2H0rA6vFRzXy297dDuLg2C2ONFljBj/tIKGc71
++gLc6LRn7UmSyK98JMzkxDhmnnPn8w2O0M5GdabAqzZSfHlL1juW9ljp9Al5P6y
apLRzqMGjEoyC4huXvB3XVfrxGfepe5pco6wVlwmF3ilwf7iHnfuHONC1aw2mPRO
aOKiS+0gHWL3rNZtZQtyW7Ws0I2HJFip2CWIloBK1/2ntEoh51PH7jGw8iu/6jTk
//DCXqPBNXcLqonB9CHJZ/EWt0wup0BcHyLjlWX7iEjsdP/QJXrDgnrV3qdHibbh
AJASjs0YVDcdvRsRStlg
=szd9
-----END PGP SIGNATURE-----
Merge tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC platform changes from Olof Johansson:
"New and updated SoC support. Also included are some cleanups where
the platform maintainers hadn't separated cleanups from new developent
in separate branches.
Some of the larger things worth pointing out:
- A large set of changes from Alexandre Belloni and Nicolas Ferre
preparing at91 platforms for multiplatform and cleaning up quite a
bit in the process.
- Removal of CSR's "Marco" SoC platform that never made it out to the
market. We love seeing these since it means the vendor published
support before product was out, which is exactly what we want!
New platforms this release are:
- Conexant Digicolor (CX92755 SoC)
- Hisilicon HiP01 SoC
- CSR/sirf Atlas7 SoC
- ST STiH418 SoC
- Common code changes for Nvidia Tegra132 (64-bit SoC)
We're seeing more and more platforms having a harder time labelling
changes as cleanups vs new development -- which is a good sign that
we've come quite far on the cleanup effort. So over time we might
start combining the cleanup and new-development branches more"
* tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (124 commits)
ARM: at91/trivial: unify functions and machine names
ARM: at91: remove at91_dt_initialize and machine init_early()
ARM: at91: change board files into SoC files
ARM: at91: remove at91_boot_soc
ARM: at91: move alternative initial mapping to board-dt-sama5.c
ARM: at91: merge all SOC_AT91SAM9xxx
ARM: at91: at91rm9200: set idle and restart from rm9200_dt_device_init()
ARM: digicolor: select syscon and timer
ARM: zynq: Simplify SLCR initialization
ARM: zynq: PM: Fixed simple typo.
ARM: zynq: Setup default gpio number for Xilinx Zynq
ARM: digicolor: add low level debug support
ARM: initial support for Conexant Digicolor CX92755 SoC
ARM: OMAP2+: Add dm816x hwmod support
ARM: OMAP2+: Add clock domain support for dm816x
ARM: OMAP2+: Add board-generic.c entry for ti81xx
ARM: at91: pm: remove warning to remove SOC_AT91SAM9263 usage
ARM: at91: remove unused mach/system_rev.h
ARM: at91: stop using HAVE_AT91_DBGUx
ARM: at91: fix ordering of SRAM and PM initialization
...
Provide a file creation function that also takes an initial size so that the
caller doesn't have to set i_size, thus meaning that we don't have to call
deal with ->d_inode in the callers.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Merge fifth set of updates from Andrew Morton:
- A few things which were awaiting merges from linux-next:
- rtc
- ocfs2
- misc others
- Willy's "dax" feature: direct fs access to memory (mainly NV-DIMMs)
which isn't backed by pageframes.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (37 commits)
rtc: add driver for DS1685 family of real time clocks
MAINTAINERS: add entry for Maxim PMICs on Samsung boards
lib/Kconfig: use bool instead of boolean
powerpc: drop _PAGE_FILE and pte_file()-related helpers
ocfs2: set append dio as a ro compat feature
ocfs2: wait for orphan recovery first once append O_DIRECT write crash
ocfs2: complete the rest request through buffer io
ocfs2: do not fallback to buffer I/O write if appending
ocfs2: allocate blocks in ocfs2_direct_IO_get_blocks
ocfs2: implement ocfs2_direct_IO_write
ocfs2: add orphan recovery types in ocfs2_recover_orphans
ocfs2: add functions to add and remove inode in orphan dir
ocfs2: prepare some interfaces used in append direct io
MAINTAINERS: fix spelling mistake & remove trailing WS
dax: does not work correctly with virtual aliasing caches
brd: rename XIP to DAX
ext4: add DAX functionality
dax: add dax_zero_page_range
ext2: get rid of most mentions of XIP in ext2
ext2: remove ext2_aops_xip
...
The parisc arch has been the only user of HP-UX SOM binaries.
Support for HP-UX executables was never finished and since we now drop support
for the HP-UX compat layer anyway, it does not makes sense to keep the
BINFMT_SOM support.
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
This was introduced in commit ed9ecb0415,
but only defined if !VIRTIO_NET_NO_LEGACY. We should always define
it: easier for users to have conditional legacy code.
Suggested-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This adds a driver for the Dallas/Maxim DS1685-family of RTC chips. It
supports the DS1685/DS1687, DS1688/DS1691, DS1689/DS1693, DS17285/DS17287,
DS17485/DS17487, and DS17885/DS17887 RTC chips. These chips are commonly
found in SGI O2 and SGI Octane systems. It was originally derived from a
driver patch submitted by Matthias Fuchs many years ago for use in
EPPC-405-UC modules, which also used these RTCs. In addition to the
time-keeping functions, this RTC also handles the shutdown mechanism of
the O2 and Octane and acts as a partial NVRAM for the boot PROMS in these
systems.
Verified on both an SGI O2 and an SGI Octane.
Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This new function allows us to support hole-punch for DAX files by zeroing
a partial page, as opposed to the dax_truncate_page() function which can
only truncate to the end of the page. Reimplement dax_truncate_page() to
call dax_zero_page_range().
[ross.zwisler@linux.intel.com: ported to 3.13-rc2]
[akpm@linux-foundation.org: fix typos in comments]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The fewer Kconfig options we have the better. Use the generic
CONFIG_FS_DAX to enable XIP support in ext2 as well as in the core.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All callers of get_xip_mem() are now gone. Remove checks for it,
initialisers of it, documentation of it and the only implementation of it.
Also remove mm/filemap_xip.c as it is now empty. Also remove
documentation of the long-gone get_xip_page().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It takes a get_block parameter just like nobh_truncate_page() and
block_truncate_page()
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of calling aops->get_xip_mem from the fault handler, the
filesystem passes a get_block_t that is used to find the appropriate
blocks.
This requires that all architectures implement copy_user_page(). At the
time of writing, mips and arm do not. Patches exist and are in progress.
[akpm@linux-foundation.org: remap_file_pages went away]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is practically generic code; other filesystems will want to call it
from other places, but there's nothing ext2-specific about it.
Make it a little more generic by allowing it to take a count of the number
of bytes to zero rather than fixing it to a single page. Thanks to Dave
Hansen for suggesting that I need to call cond_resched() if zeroing more
than one page.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the generic AIO infrastructure instead of custom read and write
methods. In addition to giving us support for AIO, this adds the missing
locking between read() and truncate().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use an inode flag to tag inodes which should avoid using the page cache.
Convert ext2 to use it instead of mapping_is_xip(). Prevent I/Os to files
tagged with the DAX flag from falling back to buffered I/O.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently COW of an XIP file is done by first bringing in a read-only
mapping, then retrying the fault and copying the page. It is much more
efficient to tell the fault handler that a COW is being attempted (by
passing in the pre-allocated page in the vm_fault structure), and allow
the handler to perform the COW operation itself.
The handler cannot insert the page itself if there is already a read-only
mapping at that address, so allow the handler to return VM_FAULT_LOCKED
and set the fault_page to be NULL. This indicates to the MM code that the
i_mmap_lock is held instead of the page lock.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull drm updates from Dave Airlie:
"This is the main drm pull, it has a shared branch with some alsa
crossover but everything should be acked by relevant people.
New drivers:
- ATMEL HLCDC driver
- designware HDMI core support (used in multiple SoCs).
core:
- lots more atomic modesetting work, properties and atomic ioctl
(hidden under option)
- bridge rework allows support for Samsung exynos chromebooks to
work finally.
- some more panels supported
i915:
- atomic plane update support
- DSI uses shared DSI infrastructure
- Skylake basic support is all merged now
- component framework used for i915/snd-hda interactions
- write-combine cpu memory mappings
- engine init code refactored
- full ppgtt enabled where execlists are enabled.
- cherryview rps/gpu turbo and pipe CRC support.
radeon:
- indirect draw support for evergreen/cayman
- SMC and manual fan control for SI/CI
- Displayport audio support
amdkfd:
- SDMA usermode queue support
- replace suballocator usage with more suitable one
- rework for allowing interfacing to more than radeon
nouveau:
- major renaming in prep for later splitting work
- merge arm platform driver into nouveau
- GK20A reclocking support
msm:
- conversion to atomic modesetting
- YUV support for mdp4/5
- eDP support
- hw cursor for mdp5
tegra:
- conversion to atomic modesetting
- better suspend/resume support for child devices
rcar-du:
- interlaced support
imx:
- move to using dw_hdmi shared support
- mode_fixup support
sti:
- DVO support
- HDMI infoframe support
exynos:
- refactoring and cleanup, removed lots of internal unnecessary
abstraction
- exynos7 DECON display controller support
Along with the usual bunch of fixes, cleanups etc"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (724 commits)
drm/radeon: fix voltage setup on hawaii
drm/radeon/dp: Set EDP_CONFIGURATION_SET for bridge chips if necessary
drm/radeon: only enable kv/kb dpm interrupts once v3
drm/radeon: workaround for CP HW bug on CIK
drm/radeon: Don't try to enable write-combining without PAT
drm/radeon: use 0-255 rather than 0-100 for pwm fan range
drm/i915: Clamp efficient frequency to valid range
drm/i915: Really ignore long HPD pulses on eDP
drm/exynos: Add DECON driver
drm/i915: Correct the base value while updating LP_OUTPUT_HOLD in MIPI_PORT_CTRL
drm/i915: Insert a command barrier on BLT/BSD cache flushes
drm/i915: Drop vblank wait from intel_dp_link_down
drm/exynos: fix NULL pointer reference
drm/exynos: remove exynos_plane_dpms
drm/exynos: remove mode property of exynos crtc
drm/exynos: Remove exynos_plane_dpms() call with no effect
drm/i915: Squelch overzealous uncore reset WARN_ON
drm/i915: Take runtime pm reference on hangcheck_info
drm/i915: Correct the IOSF Dev_FN field for IOSF transfers
drm/exynos: fix DMA_ATTR_NO_KERNEL_MAPPING usage
...
Pull irqchip updates from Ingo Molnar:
"Various irqchip driver updates, plus a genirq core update that allows
the initial spreading of irqs amonst CPUs without having to do it from
user-space"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Fix null pointer reference in irq_set_affinity_hint()
irqchip: gic: Allow interrupt level to be set for PPIs
irqchip: mips-gic: Handle pending interrupts once in __gic_irq_dispatch()
irqchip: Conexant CX92755 interrupts controller driver
irqchip: Devicetree: document Conexant Digicolor irq binding
irqchip: omap-intc: Remove unused legacy interface for omap2
irqchip: omap-intc: Fix support for dm814 and dm816
irqchip: mtk-sysirq: Get irq number from register resource size
irqchip: renesas-intc-irqpin: r8a7779 IRLM setup support
genirq: Set initial affinity in irq_set_affinity_hint()
Pull x86 perf updates from Ingo Molnar:
"This series tightens up RDPMC permissions: currently even highly
sandboxed x86 execution environments (such as seccomp) have permission
to execute RDPMC, which may leak various perf events / PMU state such
as timing information and other CPU execution details.
This 'all is allowed' RDPMC mode is still preserved as the
(non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is
that RDPMC access is only allowed if a perf event is mmap-ed (which is
needed to correctly interpret RDPMC counter values in any case).
As a side effect of these changes CR4 handling is cleaned up in the
x86 code and a shadow copy of the CR4 value is added.
The extra CR4 manipulation adds ~ <50ns to the context switch cost
between rdpmc-capable and rdpmc-non-capable mms"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
perf/x86: Only allow rdpmc if a perf_event is mapped
perf: Pass the event to arch_perf_update_userpage()
perf: Add pmu callbacks to track event mapping and unmapping
x86: Add a comment clarifying LDT context switching
x86: Store a per-cpu shadow copy of CR4
x86: Clean up cr4 manipulation
This reverts commit 9bd0f45b70.
Linus rightly pointed out that I failed to initialize the counters
when adding them, so they don't work as expected. Just revert this
patch for now.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Newer Blackfin boards use pinctrl API to manage pins and the legacy
peripherial lists are not useful on them. Let's move pin lists into
platform data so older boards can still use them and newer boards can use
the modern API.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The platform data definition of the rotary driver should be generic for all
architectures.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Here's the big tty/serial driver update for 3.20-rc1. Nothing huge
here, just lots of driver updates and some core tty layer fixes as well.
All have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlTgtgkACgkQMUfUDdst+ykXbACg14oFAmeYjO9RsdIHPXBvKseO
47QAn0foy91bpNQ5UFOxWS5L6Fzj2ZND
=syx2
-----END PGP SIGNATURE-----
Merge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver patches from Greg KH:
"Here's the big tty/serial driver update for 3.20-rc1. Nothing huge
here, just lots of driver updates and some core tty layer fixes as
well. All have been in linux-next with no reported issues"
* tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (119 commits)
serial: 8250: Fix UART_BUG_TXEN workaround
serial: driver for ETRAX FS UART
tty: remove unused variable sprop
serial: of-serial: fetch line number from DT
serial: samsung: earlycon support depends on CONFIG_SERIAL_SAMSUNG_CONSOLE
tty/serial: serial8250_set_divisor() can be static
tty/serial: Add Spreadtrum sc9836-uart driver support
Documentation: DT: Add bindings for Spreadtrum SoC Platform
serial: samsung: remove redundant interrupt enabling
tty: Remove external interface for tty_set_termios()
serial: omap: Fix RTS handling
serial: 8250_omap: Use UPSTAT_AUTORTS for RTS handling
serial: core: Rework hw-assisted flow control support
tty/serial: 8250_early: Add support for PXA UARTs
tty/serial: of_serial: add support for PXA/MMP uarts
tty/serial: of_serial: add DT alias ID handling
serial: 8250: Prevent concurrent updates to shadow registers
serial: 8250: Use canary to restart console after suspend
serial: 8250: Refactor XR17V35X divisor calculation
serial: 8250: Refactor divisor programming
...
Here's the big staging driver tree update for 3.20-rc1. Lots of little
things in here, adding up to lots of overall cleanups. The IIO driver
updates are also in here as they cross the staging tree boundry a lot.
I2O has moved into staging as well, as a plan to drop it from the tree
eventually as that's a dead subsystem.
All of this has been in linux-next with no reported issues for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlTgtVQACgkQMUfUDdst+yk4mACgshYZ1fOQDoPR+BXd+QD1HXfh
GosAoICXkSjDQjwVo13W6QHIVMsUezY+
=4jHr
-----END PGP SIGNATURE-----
Merge tag 'staging-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging drivers patches from Greg KH:
"Here's the big staging driver tree update for 3.20-rc1.
Lots of little things in here, adding up to lots of overall cleanups.
The IIO driver updates are also in here as they cross the staging tree
boundry a lot. I2O has moved into staging as well, as a plan to drop
it from the tree eventually as that's a dead subsystem.
All of this has been in linux-next with no reported issues for a
while"
* tag 'staging-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (740 commits)
staging: lustre: lustre: libcfs: define symbols as static
staging: rtl8712: Do coding style cleanup
staging: lustre: make obd_updatemax_lock static
staging: rtl8188eu: core: switch with redundant cases
staging: rtl8188eu: odm: conditional setting with no effect
staging: rtl8188eu: odm: condition with no effect
staging: ft1000: fix braces warning
staging: sm7xxfb: fix remaining CamelCase
staging: sm7xxfb: fix CamelCase
staging: rtl8723au: multiple condition with no effect - if identical to else
staging: sm7xxfb: make smtc_scr_info static
staging/lustre/mdc: Initialize req in mdc_enqueue for !it case
staging/lustre/clio: Do not allow group locks with gid 0
staging/lustre/llite: don't add to page cache upon failure
staging/lustre/llite: Add exception entry check after radix_tree
staging/lustre/libcfs: protect kkuc_groups from write access
staging/lustre/fld: refer to MDT0 for fld lookup in some cases
staging/lustre/llite: Solve a race to access lli_has_smd in read case
staging/lustre/ptlrpc: hold rq_lock when modify rq_flags
staging/lustre/lnet: portal spreading rotor should be unsigned
...
Really tiny set of patches for this kernel. Nothing major, all
described in the shortlog and have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlTgtIAACgkQMUfUDdst+ymjSwCfWspNT71lmsVwasCTPQopgXov
TqAAoKR4I5ZebMks/nW6ClxUFYwVSL02
=leVc
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg KH:
"Really tiny set of patches for this kernel. Nothing major, all
described in the shortlog and have been in linux-next for a while"
* tag 'driver-core-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: fix warning when creating a sysfs group without attributes
firmware_loader: handle timeout via wait_for_completion_interruptible_timeout()
firmware_loader: abort request if wait_for_completion is interrupted
firmware: Correct function name in comment
device: Change dev_<level> logging functions to return void
device: Fix dev_dbg_once macro
Here's the big char/misc driver update for 3.20-rc1.
Lots of little things in here, all described in the changelog. Nothing
major or unusual, except maybe the binder selinux stuff, which was all
acked by the proper selinux people and they thought it best to come
through this tree.
All of this has been in linux-next with no reported issues for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlTgs80ACgkQMUfUDdst+yn86gCeMLbxANGExVLd+PR46GNsAUQb
SJ4AmgIqrkIz+5LCwZWM02ldbYhPeBVf
=lfmM
-----END PGP SIGNATURE-----
Merge tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc patches from Greg KH:
"Here's the big char/misc driver update for 3.20-rc1.
Lots of little things in here, all described in the changelog.
Nothing major or unusual, except maybe the binder selinux stuff, which
was all acked by the proper selinux people and they thought it best to
come through this tree.
All of this has been in linux-next with no reported issues for a while"
* tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits)
coresight: fix function etm_writel_cp14() parameter order
coresight-etm: remove check for unknown Kconfig macro
coresight: fixing CPU hwid lookup in device tree
coresight: remove the unnecessary function coresight_is_bit_set()
coresight: fix the debug AMBA bus name
coresight: remove the extra spaces
coresight: fix the link between orphan connection and newly added device
coresight: remove the unnecessary replicator property
coresight: fix the replicator subtype value
pdfdocs: Fix 'make pdfdocs' failure for 'uio-howto.tmpl'
mcb: Fix error path of mcb_pci_probe
virtio/console: verify device has config space
ti-st: clean up data types (fix harmless memory corruption)
mei: me: release hw from reset only during the reset flow
mei: mask interrupt set bit on clean reset bit
extcon: max77693: Constify struct regmap_config
extcon: adc-jack: Release IIO channel on driver remove
extcon: Remove duplicated include from extcon-class.c
Drivers: hv: vmbus: hv_process_timer_expiration() can be static
Drivers: hv: vmbus: serialize Offer and Rescind offer
...
The efficiency of suspend-to-idle depends on being able to keep CPUs
in the deepest available idle states for as much time as possible.
Ideally, they should only be brought out of idle by system wakeup
interrupts.
However, timer interrupts occurring periodically prevent that from
happening and it is not practical to chase all of the "misbehaving"
timers in a whack-a-mole fashion. A much more effective approach is
to suspend the local ticks for all CPUs and the entire timekeeping
along the lines of what is done during full suspend, which also
helps to keep suspend-to-idle and full suspend reasonably similar.
The idea is to suspend the local tick on each CPU executing
cpuidle_enter_freeze() and to make the last of them suspend the
entire timekeeping. That should prevent timer interrupts from
triggering until an IO interrupt wakes up one of the CPUs. It
needs to be done with interrupts disabled on all of the CPUs,
though, because otherwise the suspended clocksource might be
accessed by an interrupt handler which might lead to fatal
consequences.
Unfortunately, the existing ->enter callbacks provided by cpuidle
drivers generally cannot be used for implementing that, because some
of them re-enable interrupts temporarily and some idle entry methods
cause interrupts to be re-enabled automatically on exit. Also some
of these callbacks manipulate local clock event devices of the CPUs
which really shouldn't be done after suspending their ticks.
To overcome that difficulty, introduce a new cpuidle state callback,
->enter_freeze, that will be guaranteed (1) to keep interrupts
disabled all the time (and return with interrupts disabled) and (2)
not to touch the CPU timer devices. Modify cpuidle_enter_freeze() to
look for the deepest available idle state with ->enter_freeze present
and to make the CPU execute that callback with suspended tick (and the
last of the online CPUs to execute it with suspended timekeeping).
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Here's the big pull request for the USB driver tree for 3.20-rc1.
Nothing major happening here, just lots of gadget driver updates, new
device ids, and a bunch of cleanups.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlTgtrcACgkQMUfUDdst+yn0tACgygJPNvu1l3ukNJCCpWuOErIj
3KsAnjiEXv90DLYJiVLJ4EbLPw0V9wAv
=DrJx
-----END PGP SIGNATURE-----
Merge tag 'usb-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB patches from Greg KH:
"Here's the big pull request for the USB driver tree for 3.20-rc1.
Nothing major happening here, just lots of gadget driver updates, new
device ids, and a bunch of cleanups.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (299 commits)
usb: musb: fix device hotplug behind hub
usb: dwc2: Fix a bug in reading the endpoint directions from reg.
staging: emxx_udc: fix the build error
usb: Retry port status check on resume to work around RH bugs
Revert "usb: Reset USB-3 devices on USB-3 link bounce"
uhci-hub: use HUB_CHAR_*
usb: kconfig: replace PPC_OF with PPC
ehci-pci: disable for Intel MID platforms (update)
usb: gadget: Kconfig: use bool instead of boolean
usb: musb: blackfin: remove incorrect __exit_p()
USB: fix use-after-free bug in usb_hcd_unlink_urb()
ehci-pci: disable for Intel MID platforms
usb: host: pci_quirks: joing string literals
USB: add flag for HCDs that can't receive wakeup requests (isp1760-hcd)
USB: usbfs: allow URBs to be reaped after disconnection
cdc-acm: kill unnecessary messages
cdc-acm: add sanity checks
usb: phy: phy-generic: Fix USB PHY gpio reset
usb: dwc2: fix USB core dependencies
usb: renesas_usbhs: fix NULL pointer dereference in dma_release_channel()
...
Pull UBI and UBIFS updates from Richard Weinberger:
- cleanups and bug fixes all over UBI and UBIFS
- block-mq support for UBI Block
- UBI volumes can now be renamed while they are in use
- security.* XATTR support for UBIFS
- a maintainer update
* 'for-linus-v3.20' of git://git.infradead.org/linux-ubifs:
UBI: block: Fix checking for NULL instead of IS_ERR()
UBI: block: Continue creating ubiblocks after an initialization error
UBIFS: return -EINVAL if log head is empty
UBI: Block: Explain usage of blk_rq_map_sg()
UBI: fix soft lockup in ubi_check_volume()
UBI: Fastmap: Care about the protection queue
UBIFS: add a couple of extra asserts
UBI: do propagate positive error codes up
UBI: clean-up printing helpers
UBI: extend UBI layer debug/messaging capabilities - cosmetics
UBIFS: add ubifs_err() to print error reason
UBIFS: Add security.* XATTR support for the UBIFS
UBIFS: Add xattr support for symlinks
UBI: Block: Add blk-mq support
UBI: Add initial support for scatter gather
UBI: rename_volumes: Use UBI_METAONLY
UBI: Implement UBI_METAONLY
Add myself as UBI co-maintainer
This field is unused and uninitialized since commit 9a11b49a80
("[PATCH] lockdep: better lock debugging")
Signed-off-by: Adrien Schildknecht <adrien+dev@schischi.me>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This series tightens the rules for ACCESS_ONCE to only work
on scalar types. It also contains the necessary fixups as
indicated by build bots of linux-next.
Now everything is in place to prevent new non-scalar users
of ACCESS_ONCE and we can continue to convert code to
READ_ONCE/WRITE_ONCE.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJU2H5MAAoJEBF7vIC1phx8Jm4QALPqKOMDSUBCrqJFWJeujtv2
ILxJKsnjrAlt3dxnlVI3q6e5wi896hSce75PcvZ/vs/K3GdgMxOjrakBJGTJ2Qjg
5njW9aGJDDr/SYFX33MLWfqy222TLtpxgSz379UgXjEzB0ymMWbJJ3FnGjVqQJdp
RXDutpncRySc/rGHh9UPREIRR5GvimONsWE2zxgXjUzB8vIr2fCGvHTXfIb6RKbQ
yaFoihzn0m+eisc5Gy4tQ1qhhnaYyWEGrINjHTjMFTQOWTlH80BZAyQeLdbyj2K5
qloBPS/VhBTr/5TxV5onM+nVhu0LiblVNrdMHVeb7jyST4LeFOCaWK98lB3axSB5
v/2D1YKNb3g1U1x3In/oNGQvs36zGiO1uEdMF1l8ZFXgCvHmATSFSTWBtqUhb5Ew
JA3YyqMTG6dpRTMSnmu3/frr4wDqnxlB/ktQC1pf3tDp87mr1ZYEy/dQld+tltjh
9Z5GSdrw0nf91wNI3DJf+26ZDdz5B+EpDnPnOKG8anI1lc/mQneI21/K/xUteFXw
UZ1XGPLV2vbv9/a13u44SdjenHvQs1egsGeebMxVPoj6WmDLVmcIqinyS6NawYzn
IlDGy/b3bSnXWMBP0ZVBX94KWLxqDDc4a/ayxsmxsP1tPZ+jDXjVDa7E3zskcHxG
Uj5ULCPyU087t8Sl76mv
=Dj70
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux
Pull ACCESS_ONCE() rule tightening from Christian Borntraeger:
"Tighten rules for ACCESS_ONCE
This series tightens the rules for ACCESS_ONCE to only work on scalar
types. It also contains the necessary fixups as indicated by build
bots of linux-next. Now everything is in place to prevent new
non-scalar users of ACCESS_ONCE and we can continue to convert code to
READ_ONCE/WRITE_ONCE"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux:
kernel: Fix sparse warning for ACCESS_ONCE
next: sh: Fix compile error
kernel: tighten rules for ACCESS ONCE
mm/gup: Replace ACCESS_ONCE with READ_ONCE
x86/spinlock: Leftover conversion ACCESS_ONCE->READ_ONCE
x86/xen/p2m: Replace ACCESS_ONCE with READ_ONCE
ppc/hugetlbfs: Replace ACCESS_ONCE with READ_ONCE
ppc/kvm: Replace ACCESS_ONCE with READ_ONCE
Pull crypto update from Herbert Xu:
"Here is the crypto update for 3.20:
- Added 192/256-bit key support to aesni GCM.
- Added MIPS OCTEON MD5 support.
- Fixed hwrng starvation and race conditions.
- Added note that memzero_explicit is not a subsitute for memset.
- Added user-space interface for crypto_rng.
- Misc fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits)
crypto: tcrypt - do not allocate iv on stack for aead speed tests
crypto: testmgr - limit IV copy length in aead tests
crypto: tcrypt - fix buflen reminder calculation
crypto: testmgr - mark rfc4106(gcm(aes)) as fips_allowed
crypto: caam - fix resource clean-up on error path for caam_jr_init
crypto: caam - pair irq map and dispose in the same function
crypto: ccp - terminate ccp_support array with empty element
crypto: caam - remove unused local variable
crypto: caam - remove dead code
crypto: caam - don't emit ICV check failures to dmesg
hwrng: virtio - drop extra empty line
crypto: replace scatterwalk_sg_next with sg_next
crypto: atmel - Free memory in error path
crypto: doc - remove colons in comments
crypto: seqiv - Ensure that IV size is at least 8 bytes
crypto: cts - Weed out non-CBC algorithms
MAINTAINERS: add linux-crypto to hw random
crypto: cts - Remove bogus use of seqiv
crypto: qat - don't need qat_auth_state struct
crypto: algif_rng - fix sparse non static symbol warning
...
This feature let us to detect accesses out of bounds of global variables.
This will work as for globals in kernel image, so for globals in modules.
Currently this won't work for symbols in user-specified sections (e.g.
__init, __read_mostly, ...)
The idea of this is simple. Compiler increases each global variable by
redzone size and add constructors invoking __asan_register_globals()
function. Information about global variable (address, size, size with
redzone ...) passed to __asan_register_globals() so we could poison
variable's redzone.
This patch also forces module_alloc() to return 8*PAGE_SIZE aligned
address making shadow memory handling (
kasan_module_alloc()/kasan_module_free() ) more simple. Such alignment
guarantees that each shadow page backing modules address space correspond
to only one module_alloc() allocation.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MODULE_DEVICE_TABLE() macro used to create aliases to device tables.
Normally alias should have the same type as aliased symbol.
Device tables are arrays, so they have 'struct type##_device_id[x]'
types. Alias created by MODULE_DEVICE_TABLE() will have non-array type -
'struct type##_device_id'.
This inconsistency confuses compiler, it could make a wrong assumption
about variable's size which leads KASan to produce a false positive report
about out of bounds access.
For every global variable compiler calls __asan_register_globals() passing
information about global variable (address, size, size with redzone, name
...) __asan_register_globals() poison symbols redzone to detect possible
out of bounds accesses.
When symbol has an alias __asan_register_globals() will be called as for
symbol so for alias. Compiler determines size of variable by size of
variable's type. Alias and symbol have the same address, so if alias have
the wrong size part of memory that actually belongs to the symbol could be
poisoned as redzone of alias symbol.
By fixing type of alias symbol we will fix size of it, so
__asan_register_globals() will not poison valid memory.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KASan uses constructors for initializing redzones for global variables.
Globals instrumentation in GCC 4.9.2 produces constructors with priority
(.init_array.00099)
Currently kernel ignores such constructors. Only constructors with
default priority supported (.init_array)
This patch adds support for constructors with priorities. For kernel
image we put pointers to constructors between __ctors_start/__ctors_end
and do_ctors() will call them on start up. For modules we merge
.init_array.* sections into resulting .init_array. Module code properly
handles constructors in .init_array section.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For instrumenting global variables KASan will shadow memory backing memory
for modules. So on module loading we will need to allocate memory for
shadow and map it at address in shadow that corresponds to the address
allocated in module_alloc().
__vmalloc_node_range() could be used for this purpose, except it puts a
guard hole after allocated area. Guard hole in shadow memory should be a
problem because at some future point we might need to have a shadow memory
at address occupied by guard hole. So we could fail to allocate shadow
for module_alloc().
Now we have VM_NO_GUARD flag disabling guard page, so we need to pass into
__vmalloc_node_range(). Add new parameter 'vm_flags' to
__vmalloc_node_range() function.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For instrumenting global variables KASan will shadow memory backing memory
for modules. So on module loading we will need to allocate memory for
shadow and map it at address in shadow that corresponds to the address
allocated in module_alloc().
__vmalloc_node_range() could be used for this purpose, except it puts a
guard hole after allocated area. Guard hole in shadow memory should be a
problem because at some future point we might need to have a shadow memory
at address occupied by guard hole. So we could fail to allocate shadow
for module_alloc().
Add a new vm_struct flag 'VM_NO_GUARD' indicating that vm area doesn't
have a guard hole.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stack instrumentation allows to detect out of bounds memory accesses for
variables allocated on stack. Compiler adds redzones around every
variable on stack and poisons redzones in function's prologue.
Such approach significantly increases stack usage, so all in-kernel stacks
size were doubled.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With this patch kasan will be able to catch bugs in memory allocated by
slub. Initially all objects in newly allocated slab page, marked as
redzone. Later, when allocation of slub object happens, requested by
caller number of bytes marked as accessible, and the rest of the object
(including slub's metadata) marked as redzone (inaccessible).
We also mark object as accessible if ksize was called for this object.
There is some places in kernel where ksize function is called to inquire
size of really allocated area. Such callers could validly access whole
allocated memory, so it should be marked as accessible.
Code in slub.c and slab_common.c files could validly access to object's
metadata, so instrumentation for this files are disabled.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Signed-off-by: Dmitry Chernenkov <dmitryc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove static and add function declarations to linux/slub_def.h so it
could be used by kernel address sanitizer.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kernel Address sanitizer (KASan) is a dynamic memory error detector. It
provides fast and comprehensive solution for finding use-after-free and
out-of-bounds bugs.
KASAN uses compile-time instrumentation for checking every memory access,
therefore GCC > v4.9.2 required. v4.9.2 almost works, but has issues with
putting symbol aliases into the wrong section, which breaks kasan
instrumentation of globals.
This patch only adds infrastructure for kernel address sanitizer. It's
not available for use yet. The idea and some code was borrowed from [1].
Basic idea:
The main idea of KASAN is to use shadow memory to record whether each byte
of memory is safe to access or not, and use compiler's instrumentation to
check the shadow memory on each memory access.
Address sanitizer uses 1/8 of the memory addressable in kernel for shadow
memory and uses direct mapping with a scale and offset to translate a
memory address to its corresponding shadow address.
Here is function to translate address to corresponding shadow address:
unsigned long kasan_mem_to_shadow(unsigned long addr)
{
return (addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET;
}
where KASAN_SHADOW_SCALE_SHIFT = 3.
So for every 8 bytes there is one corresponding byte of shadow memory.
The following encoding used for each shadow byte: 0 means that all 8 bytes
of the corresponding memory region are valid for access; k (1 <= k <= 7)
means that the first k bytes are valid for access, and other (8 - k) bytes
are not; Any negative value indicates that the entire 8-bytes are
inaccessible. Different negative values used to distinguish between
different kinds of inaccessible memory (redzones, freed memory) (see
mm/kasan/kasan.h).
To be able to detect accesses to bad memory we need a special compiler.
Such compiler inserts a specific function calls (__asan_load*(addr),
__asan_store*(addr)) before each memory access of size 1, 2, 4, 8 or 16.
These functions check whether memory region is valid to access or not by
checking corresponding shadow memory. If access is not valid an error
printed.
Historical background of the address sanitizer from Dmitry Vyukov:
"We've developed the set of tools, AddressSanitizer (Asan),
ThreadSanitizer and MemorySanitizer, for user space. We actively use
them for testing inside of Google (continuous testing, fuzzing,
running prod services). To date the tools have found more than 10'000
scary bugs in Chromium, Google internal codebase and various
open-source projects (Firefox, OpenSSL, gcc, clang, ffmpeg, MySQL and
lots of others): [2] [3] [4].
The tools are part of both gcc and clang compilers.
We have not yet done massive testing under the Kernel AddressSanitizer
(it's kind of chicken and egg problem, you need it to be upstream to
start applying it extensively). To date it has found about 50 bugs.
Bugs that we've found in upstream kernel are listed in [5].
We've also found ~20 bugs in out internal version of the kernel. Also
people from Samsung and Oracle have found some.
[...]
As others noted, the main feature of AddressSanitizer is its
performance due to inline compiler instrumentation and simple linear
shadow memory. User-space Asan has ~2x slowdown on computational
programs and ~2x memory consumption increase. Taking into account that
kernel usually consumes only small fraction of CPU and memory when
running real user-space programs, I would expect that kernel Asan will
have ~10-30% slowdown and similar memory consumption increase (when we
finish all tuning).
I agree that Asan can well replace kmemcheck. We have plans to start
working on Kernel MemorySanitizer that finds uses of unitialized
memory. Asan+Msan will provide feature-parity with kmemcheck. As
others noted, Asan will unlikely replace debug slab and pagealloc that
can be enabled at runtime. Asan uses compiler instrumentation, so even
if it is disabled, it still incurs visible overheads.
Asan technology is easily portable to other architectures. Compiler
instrumentation is fully portable. Runtime has some arch-dependent
parts like shadow mapping and atomic operation interception. They are
relatively easy to port."
Comparison with other debugging features:
========================================
KMEMCHECK:
- KASan can do almost everything that kmemcheck can. KASan uses
compile-time instrumentation, which makes it significantly faster than
kmemcheck. The only advantage of kmemcheck over KASan is detection of
uninitialized memory reads.
Some brief performance testing showed that kasan could be
x500-x600 times faster than kmemcheck:
$ netperf -l 30
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
no debug: 87380 16384 16384 30.00 41624.72
kasan inline: 87380 16384 16384 30.00 12870.54
kasan outline: 87380 16384 16384 30.00 10586.39
kmemcheck: 87380 16384 16384 30.03 20.23
- Also kmemcheck couldn't work on several CPUs. It always sets
number of CPUs to 1. KASan doesn't have such limitation.
DEBUG_PAGEALLOC:
- KASan is slower than DEBUG_PAGEALLOC, but KASan works on sub-page
granularity level, so it able to find more bugs.
SLUB_DEBUG (poisoning, redzones):
- SLUB_DEBUG has lower overhead than KASan.
- SLUB_DEBUG in most cases are not able to detect bad reads,
KASan able to detect both reads and writes.
- In some cases (e.g. redzone overwritten) SLUB_DEBUG detect
bugs only on allocation/freeing of object. KASan catch
bugs right before it will happen, so we always know exact
place of first bad read/write.
[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel
[2] https://code.google.com/p/address-sanitizer/wiki/FoundBugs
[3] https://code.google.com/p/thread-sanitizer/wiki/FoundBugs
[4] https://code.google.com/p/memory-sanitizer/wiki/FoundBugs
[5] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel#Trophies
Based on work by Andrey Konovalov.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that all bitmap formatting usages have been converted to
'%*pb[l]', the separate formatting functions are unnecessary. The
following functions are removed.
* bitmap_scn[list]printf()
* cpumask_scnprintf(), cpulist_scnprintf()
* [__]nodemask_scnprintf(), [__]nodelist_scnprintf()
* seq_bitmap[_list](), seq_cpumask[_list](), seq_nodemask[_list]()
* seq_buf_bitmask()
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
printf family of functions can now format bitmaps using '%*pb[l]' and
all cpumask and nodemask formatting will be converted to use it. To
ease printing these masks with '%*pb[l]' which require two params -
the number of bits and the actual bitmap, this patch implement
cpumask_pr_args() and nodemask_pr_args() which can be used to provide
arguments for '%*pb[l]'
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bitmap implements two variants of scnprintf functions to format a bitmap
into a string and cpumask and nodemask wrap them to provide equivalent
interfaces. The scnprintf family of functions require a string buffer as
an output target which complicates code paths which just want to print out
the mask through printk for informational or debug purposes as they have
to worry about how large the buffer should be and whether it's too large
to allocate on stack.
Neither cpumask or nodemask provides a guildeline on how large the target
buffer should be forcing users come up with their own solutions - some
allocate an arbitrarily sized buffer which is small enough to allocate on
stack but may be too short in corner cases, other come up with a custom
upper limit calculation considering the output format, some allocate the
buffer dynamically while one resorted to using lock to synchronize access
to a static buffer.
This is an artificial problem which is being solved repeatedly for no
benefit. In a lot of cases, the output area already exists and can be
targeted directly making the intermediate buffer unnecessary. This
patchset teaches printf family of functions how to format bitmaps and
replace the dedicated formatting functions with it.
Pointer formatting is extended to cover bitmap formatting. It uses the
field width for the number of bits instead of precision. The format used
is '%*pb[l]', with the optional trailing 'l' specifying list format
instead of hex masks. For more details, please see 0002.
This patch (of 31):
Currently, the formatting and parsing functions in cpumask.h use
nr_cpumask_bits like other cpumask functions; however, nr_cpumask_bits
is either NR_CPUS or nr_cpu_ids depending on CONFIG_CPUMASK_OFFSTACK.
This leads to inconsistent behaviors.
With CONFIG_NR_CPUS=512 and !CONFIG_CPUMASK_OFFSTACK
# cat /sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
# cat /proc/self/status | grep Cpus_allowed:
Cpus_allowed: f
With CONFIG_NR_CPUS=1024 and CONFIG_CPUMASK_OFFSTACK (fedora default)
# cat /sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
0
# cat /proc/self/status | grep Cpus_allowed:
Cpus_allowed: f
Note that /proc/self/status is always using nr_cpu_ids regardless of
config. This is because seq cpumask formattings functions always use
nr_cpu_ids.
Given that the same output fields may switch between the two forms,
converging on nr_cpu_ids always isn't too likely to surprise userland.
This patch updates the formatting and parsing functions in cpumask.h
to always use nr_cpu_ids. There's no point in dealing with CPUs which
aren't even possible on the machine.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a new kernfs node is created, KERNFS_STATIC_NAME is used to avoid
making a separate copy of its name. It's currently only used for sysfs
attributes whose filenames are required to stay accessible and unchanged.
There are rare exceptions where these names are allocated and formatted
dynamically but for the vast majority of cases they're consts in the
rodata section.
Now that kernfs is converted to use kstrdup_const() and kfree_const(),
there's little point in keeping KERNFS_STATIC_NAME around. Remove it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kstrdup() is often used to duplicate strings where neither source neither
destination will be ever modified. In such case we can just reuse the
source instead of duplicating it. The problem is that we must be sure
that the source is non-modifiable and its life-time is long enough.
I suspect the good candidates for such strings are strings located in
kernel .rodata section, they cannot be modifed because the section is
read-only and their life-time is equal to kernel life-time.
This small patchset proposes alternative version of kstrdup -
kstrdup_const, which returns source string if it is located in .rodata
otherwise it fallbacks to kstrdup. To verify if the source is in
.rodata function checks if the address is between sentinels
__start_rodata, __end_rodata. I guess it should work with all
architectures.
The main patch is accompanied by four patches constifying kstrdup for
cases where situtation described above happens frequently.
I have tested the patchset on mobile platform (exynos4210-trats) and it
saves 3272 string allocations. Since minimal allocation is 32 or 64
bytes depending on Kconfig options the patchset saves respectively about
100KB or 200KB of memory.
Stats from tested platform show that the main offender is sysfs:
By caller:
2260 __kernfs_new_node
631 clk_register+0xc8/0x1b8
318 clk_register+0x34/0x1b8
51 kmem_cache_create
12 alloc_vfsmnt
By string (with count >= 5):
883 power
876 subsystem
135 parameters
132 device
61 iommu_group
...
This patch (of 5):
Add an alternative version of kstrdup which returns pointer to constant
char array. The function checks if input string is in persistent and
read-only memory section, if yes it returns the input string, otherwise it
fallbacks to kstrdup.
kstrdup_const is accompanied by kfree_const performing conditional memory
deallocation of the string.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc can generate slightly better code for stuff like "nbits %
BITS_PER_LONG" when it knows nbits is not negative. Since negative size
bitmaps or shift amounts don't make sense, change these parameters of
bitmap_shift_right to unsigned.
If off >= lim (which requires shift >= nbits), k is initialized with a
large positive value, but since I've let k continue to be signed, the loop
will never run and dst will be zeroed as expected. Inside the loop, k is
guaranteed to be non-negative, so the fact that it is promoted to unsigned
in the various expressions it appears in is harmless.
Also use "shift" and "nbits" consistently for the parameter names.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I've previously changed the nbits parameter of most bitmap_* functions to
unsigned; now it is bitmap_shift_{left,right}'s turn. This alone saves
some .text, but while at it I found that there were a few other things one
could do. The end result of these seven patches is
$ scripts/bloat-o-meter /tmp/bitmap.o.{old,new}
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-328 (-328)
function old new delta
__bitmap_shift_right 384 226 -158
__bitmap_shift_left 306 136 -170
and less importantly also a smaller stack footprint
$ stack-o-meter.pl master bitmap
file function old new delta
lib/bitmap.o __bitmap_shift_right 24 8 -16
lib/bitmap.o __bitmap_shift_left 24 0 -24
For each pair of 0 <= shift <= nbits <= 256 I've tested the end result
with a few randomly filled src buffers (including garbage beyond nbits),
in each case verifying that the shift {left,right}-most bits of dst are
zero and the remaining nbits-shift bits correspond to src, so I'm fairly
confident I didn't screw up. That hasn't stopped me from being wrong
before, though.
This patch (of 7):
gcc can generate slightly better code for stuff like "nbits %
BITS_PER_LONG" when it knows nbits is not negative. Since negative size
bitmaps or shift amounts don't make sense, change these parameters of
bitmap_shift_right to unsigned.
The expressions involving "lim - 1" are still ok, since if lim is 0 the
loop is never executed.
Also use "shift" and "nbits" consistently for the parameter names.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On little-endian, there's no reason to have an extra, presumably less
efficient, way of copying a bitmap.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the prototype of bitmap_copy_le the same as bitmap_copy's. All other
bitmap_* functions take unsigned long* parameters; there's no reason this
should be special.
The only current user is the static inline uwb_mas_bm_copy_le, which
already does the void* laundering, so the end users can pass their u8 or
__le32 buffers without a cast.
Furthermore, this allows us to simply let bitmap_copy_le be an alias for
bitmap_copy on little-endian; see next patch.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 411a99adff (nfs: clear_request_commit while holding i_lock)
assumes that the nfs_commit_info always points to the inode->i_lock.
For historical reasons, that is not the case for O_DIRECT writes.
Cc: Weston Andros Adamson <dros@primarydata.com>
Fixes: 411a99adff ("nfs: clear_request_commit while holding i_lock")
Cc: stable@vger.kernel.org # 3.17.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
In preparation for adding support for quiescing timers in the final
stage of suspend-to-idle transitions, rework the freeze_enter()
function making the system wait on a wakeup event, the freeze_wake()
function terminating the suspend-to-idle loop and the mechanism by
which deep idle states are entered during suspend-to-idle.
First of all, introduce a simple state machine for suspend-to-idle
and make the code in question use it.
Second, prevent freeze_enter() from losing wakeup events due to race
conditions and ensure that the number of online CPUs won't change
while it is being executed. In addition to that, make it force
all of the CPUs re-enter the idle loop in case they are in idle
states already (so they can enter deeper idle states if possible).
Next, drop cpuidle_use_deepest_state() and replace use_deepest_state
checks in cpuidle_select() and cpuidle_reflect() with a single
suspend-to-idle state check in cpuidle_idle_call().
Finally, introduce cpuidle_enter_freeze() that will simply find the
deepest idle state available to the given CPU and enter it using
cpuidle_enter().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Pull LED subsystem update from Bryan Wu:
"The big change of LED subsystem is introducing a new LED class for
Flash type LEDs which will be used for V4L2 subsystem.
Also we got some cleanup and fixes"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
leds: leds-gpio: Pass on error codes unmodified
DT: leds: Add led-sources property
leds: Add LED Flash class extension to the LED subsystem
leds: leds-mc13783: Use of_get_child_by_name() instead of refcount hack
leds: Use setup_timer
leds: Don't allow brightness values greater than max_brightness
DT: leds: Add flash LED devices related properties
Common: Optional support for adding a small amount of polling on each HLT
instruction executed in the guest (or equivalent for other architectures).
This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes
or TCP_RR netperf tests). This also has to be enabled manually for now,
but the plan is to auto-tune this in the future.
ARM/ARM64: the highlights are support for GICv3 emulation and dirty page
tracking
s390: several optimizations and bugfixes. Also a first: a feature
exposed by KVM (UUID and long guest name in /proc/sysinfo) before
it is available in IBM's hypervisor! :)
MIPS: Bugfixes.
x86: Support for PML (page modification logging, a new feature in
Broadwell Xeons that speeds up dirty page tracking), nested virtualization
improvements (nested APICv---a nice optimization), usual round of emulation
fixes. There is also a new option to reduce latency of the TSC deadline
timer in the guest; this needs to be tuned manually.
Some commits are common between this pull and Catalin's; I see you
have already included his tree.
ARM has other conflicts where functions are added in the same place
by 3.19-rc and 3.20 patches. These are not large though, and entirely
within KVM.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi
cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5
DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg
NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9
LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn
JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak=
=7gdx
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM update from Paolo Bonzini:
"Fairly small update, but there are some interesting new features.
Common:
Optional support for adding a small amount of polling on each HLT
instruction executed in the guest (or equivalent for other
architectures). This can improve latency up to 50% on some
scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This
also has to be enabled manually for now, but the plan is to
auto-tune this in the future.
ARM/ARM64:
The highlights are support for GICv3 emulation and dirty page
tracking
s390:
Several optimizations and bugfixes. Also a first: a feature
exposed by KVM (UUID and long guest name in /proc/sysinfo) before
it is available in IBM's hypervisor! :)
MIPS:
Bugfixes.
x86:
Support for PML (page modification logging, a new feature in
Broadwell Xeons that speeds up dirty page tracking), nested
virtualization improvements (nested APICv---a nice optimization),
usual round of emulation fixes.
There is also a new option to reduce latency of the TSC deadline
timer in the guest; this needs to be tuned manually.
Some commits are common between this pull and Catalin's; I see you
have already included his tree.
Powerpc:
Nothing yet.
The KVM/PPC changes will come in through the PPC maintainers,
because I haven't received them yet and I might end up being
offline for some part of next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
KVM: ia64: drop kvm.h from installed user headers
KVM: x86: fix build with !CONFIG_SMP
KVM: x86: emulate: correct page fault error code for NoWrite instructions
KVM: Disable compat ioctl for s390
KVM: s390: add cpu model support
KVM: s390: use facilities and cpu_id per KVM
KVM: s390/CPACF: Choose crypto control block format
s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
KVM: s390: reenable LPP facility
KVM: s390: floating irqs: fix user triggerable endless loop
kvm: add halt_poll_ns module parameter
kvm: remove KVM_MMIO_SIZE
KVM: MIPS: Don't leak FPU/DSP to guest
KVM: MIPS: Disable HTW while in guest
KVM: nVMX: Enable nested posted interrupt processing
KVM: nVMX: Enable nested virtual interrupt delivery
KVM: nVMX: Enable nested apic register virtualization
KVM: nVMX: Make nested control MSRs per-cpu
KVM: nVMX: Enable nested virtualize x2apic mode
KVM: nVMX: Prepare for using hardware MSR bitmap
...
In particular, the virtio header always has the u16 num_buffers field.
We define a new 'struct virtio_net_hdr_v1' for this (rather than
simply calling it 'struct virtio_net_hdr', to avoid nasty type errors
if some parts of a project define VIRTIO_NET_NO_LEGACY and some don't.
Transitional devices (which can't define VIRTIO_NET_NO_LEGACY) will
have to keep using struct virtio_net_hdr_mrg_rxbuf, which has the same
byte layout as struct virtio_net_hdr_v1.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pull f2fs updates from Jaegeuk Kim:
"Major changes are to:
- add f2fs_io_tracer and F2FS_IOC_GETVERSION
- fix wrong acl assignment from parent
- fix accessing wrong data blocks
- fix wrong condition check for f2fs_sync_fs
- align start block address for direct_io
- add and refactor the readahead flows of FS metadata
- refactor atomic and volatile write policies
But most of patches are for clean-ups and minor bug fixes. Some of
them refactor old code too"
* tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (64 commits)
f2fs: use spinlock for segmap_lock instead of rwlock
f2fs: fix accessing wrong indexed data blocks
f2fs: avoid variable length array
f2fs: fix sparse warnings
f2fs: allocate data blocks in advance for f2fs_direct_IO
f2fs: introduce macros to convert bytes and blocks in f2fs
f2fs: call set_buffer_new for get_block
f2fs: check node page contents all the time
f2fs: avoid data offset overflow when lseeking huge file
f2fs: fix to use highmem for pages of newly created directory
f2fs: introduce a batched trim
f2fs: merge {invalidate,release}page for meta/node/data pages
f2fs: show the number of writeback pages in stat
f2fs: keep PagePrivate during releasepage
f2fs: should fail mount when trying to recover data on read-only dev
f2fs: split UMOUNT and FASTBOOT flags
f2fs: avoid write_checkpoint if f2fs is mounted readonly
f2fs: support norecovery mount option
f2fs: fix not to drop mount options when retrying fill_super
f2fs: merge flags in struct f2fs_sb_info
...
Summary:
- Add code cleanups and bug fixups.
- Add a new display controller dirver, DECON which is a new display
controller of Exynos7 SoC. This device is much different from
FIMD of Exynos4 and Exynos4 SoC series.
* 'exynos-drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: Add DECON driver
drm/exynos: fix NULL pointer reference
drm/exynos: remove exynos_plane_dpms
drm/exynos: remove mode property of exynos crtc
drm/exynos: Remove exynos_plane_dpms() call with no effect
drm/exynos: fix DMA_ATTR_NO_KERNEL_MAPPING usage
drm/exynos: hdmi: replace fb size with mode size from win commit
drm/exynos: fix no hdmi output
drm/exynos: use driver internal struct
drm/exynos: fix wrong pipe calculation for crtc
drm/exynos: remove to use unnecessary MODULE_xxx macro
drm/exynos: remove DRM_EXYNOS_DMABUF config
drm/exynos: IOMMU support should not be selectable by user
drm/exynos: add support for 'hdmi' clock
Merge third set of updates from Andrew Morton:
- the rest of MM
[ This includes getting rid of the numa hinting bits, in favor of
just generic protnone logic. Yay. - Linus ]
- core kernel
- procfs
- some of lib/ (lots of lib/ material this time)
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (104 commits)
lib/lcm.c: replace include
lib/percpu_ida.c: remove redundant includes
lib/strncpy_from_user.c: replace module.h include
lib/stmp_device.c: replace module.h include
lib/sort.c: move include inside #if 0
lib/show_mem.c: remove redundant include
lib/radix-tree.c: change to simpler include
lib/plist.c: remove redundant include
lib/nlattr.c: remove redundant include
lib/kobject_uevent.c: remove redundant include
lib/llist.c: remove redundant include
lib/md5.c: simplify include
lib/list_sort.c: rearrange includes
lib/genalloc.c: remove redundant include
lib/idr.c: remove redundant include
lib/halfmd4.c: simplify includes
lib/dynamic_queue_limits.c: simplify includes
lib/sort.c: use simpler includes
lib/interval_tree.c: simplify includes
hexdump: make it return number of bytes placed in buffer
...
We only need EXPORT_SYMBOL, so compiler.h and export.h suffice. This
means linux/types.h is no longer implicitly included, so add an include of
uapi/linux/types.h to linux/cryptohash.h for __u32. Other users of
cryptohash.h cannot be affected, since they must already have been
including uapi/linux/types.h in order for gcc not to complain about
unknown types.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes hexdump return the number of bytes placed in the buffer
excluding trailing NUL. In the case of overflow it returns the desired
amount of bytes to produce the entire dump. Thus, it mimics snprintf().
This will be useful for users that would like to repeat with a bigger
buffer.
[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that all in-tree users of strnicmp have been converted to
strncasecmp, the wrapper can be removed.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: David Howells <dhowells@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Also, rename bits to nbits. Both changes for consistency with other
bitmap_* functions.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the return value and the ord and nbits parameters of
bitmap_ord_to_pos unsigned.
Also, simplify the implementation and as a side effect make the result
fully defined, returning nbits for ord >= weight, in analogy with what
find_{first,next}_bit does. This is a better sentinel than the former
("unofficial") 0. No current users are affected by this change.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the sz and nbits parameters of bitmap_fold to unsigned int for
consistency with other bitmap_* functions, and to save another few bytes
in the generated code.
[akpm@linux-foundation.org: fix kerneldoc]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the nbits parameter of bitmap_onto to unsigned int for consistency
with other bitmap_* functions.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the various bitmap_* functions now take an unsigned int as nbits
parameter, it makes sense to also update the various wrappers, even though
they're marked as obsolete.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the various bitmap_* functions now take an unsigned int as nbits
parameter, it makes sense to also update the various wrappers.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For consistency with the other bitmap_* functions, also make the nbits
parameter of bitmap_zero, bitmap_fill and bitmap_copy unsigned.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
string_get_size() was documented to return an error, but in fact always
returned 0. Since the output always fits in 9 bytes, just document that
and let callers do what they do now: pass a small stack buffer and ignore
the return value.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__FUNCTION__ hasn't been treated as a string literal since gcc 3.4, so
this only helps people who only test-compile using 3.3 (compiler-gcc3.h
barks at anything older than that). Besides, there are almost no
occurrences of __FUNCTION__ left in the tree.
[akpm@linux-foundation.org: convert remaining __FUNCTION__ references]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the hypervisor pauses a virtualised kernel the kernel will observe a
jump in timebase, this can cause spurious messages from the softlockup
detector.
Whilst these messages are harmless, they are accompanied with a stack
trace which causes undue concern and more problematically the stack trace
in the guest has nothing to do with the observed problem and can only be
misleading.
Futhermore, on POWER8 this is completely avoidable with the introduction
of the Virtual Time Base (VTB) register.
This patch (of 2):
This permits the use of arch specific clocks for which virtualised kernels
can use their notion of 'running' time, not the elpased wall time which
will include host execution time.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Jones <drjones@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: chai wen <chaiw.fnst@cn.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Ben Zhang <benzh@chromium.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Everybody uses unsigned long for pgoff_t, and no one ever overrode the
definition of pgoff_t. Keep it that way, and remove the option of
overriding it.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target. This is because the
restart_block is held in the same memory allocation as the kernel stack.
Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.
Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.
It's also a decent simplification, since the restart code is more or less
identical on all architectures.
[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peak resident size of a process can be reset back to the process's
current rss value by writing "5" to /proc/pid/clear_refs. The driving
use-case for this would be getting the peak RSS value, which can be
retrieved from the VmHWM field in /proc/pid/status, per benchmark
iteration or test scenario.
[akpm@linux-foundation.org: clarify behaviour in documentation]
Signed-off-by: Petr Cermak <petrcermak@chromium.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Primiano Tucci <primiano@chromium.org>
Cc: Petr Cermak <petrcermak@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the underlay of zpool: zsmalloc/zbud, do not know who creates
them. There is not a method to let zsmalloc/zbud find which caller they
belong to.
Now we want to add statistics collection in zsmalloc. We need to name the
debugfs dir for each pool created. The way suggested by Minchan Kim is to
use a name passed by caller(such as zram) to create the zsmalloc pool.
/sys/kernel/debug/zsmalloc/zram0
This patch adds an argument `name' to zs_create_pool() and other related
functions.
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm->nr_pmds doesn't make sense on !MMU configurations
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move memcg_socket_limit_enabled decrement to tcp_destroy_cgroup (called
from memcg_destroy_kmem -> mem_cgroup_sockets_destroy) and zap a bunch of
wrapper functions.
Although this patch moves static keys decrement from __mem_cgroup_free to
mem_cgroup_css_free, it does not introduce any functional changes, because
the keys are incremented on setting the limit (tcp or kmem), which can
only happen after successful mem_cgroup_css_online.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, the only reason to keep kmemcg_id till css free is list_lru, which
uses it to distribute elements between per-memcg lists. However, it can
be easily sorted out - we only need to change kmemcg_id of an offline
cgroup to its parent's id, making further list_lru_add()'s add elements to
the parent's list, and then move all elements from the offline cgroup's
list to the one of its parent. It will work, because a racing
list_lru_del() does not need to know the list it is deleting the element
from. It can decrement the wrong nr_items counter though, but the ongoing
reparenting will fix it. After list_lru reparenting is done we are free
to release kmemcg_id saving a valuable slot in a per-memcg array for new
cgroups.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, the isolate callback passed to the list_lru_walk family of
functions is supposed to just delete an item from the list upon returning
LRU_REMOVED or LRU_REMOVED_RETRY, while nr_items counter is fixed by
__list_lru_walk_one after the callback returns. Since the callback is
allowed to drop the lock after removing an item (it has to return
LRU_REMOVED_RETRY then), the nr_items can be less than the actual number
of elements on the list even if we check them under the lock. This makes
it difficult to move items from one list_lru_one to another, which is
required for per-memcg list_lru reparenting - we can't just splice the
lists, we have to move entries one by one.
This patch therefore introduces helpers that must be used by callback
functions to isolate items instead of raw list_del/list_move. These are
list_lru_isolate and list_lru_isolate_move. They not only remove the
entry from the list, but also fix the nr_items counter, making sure
nr_items always reflects the actual number of elements on the list if
checked under the appropriate lock.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to look up a kmem_cache in ->memcg_params.memcg_caches arrays only
on allocations, so there is no need to have the array entries set until
css free - we can clear them on css offline. This will allow us to reuse
array entries more efficiently and avoid costly array relocations.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes, we need to iterate over all memcg copies of a particular root
kmem cache. Currently, we use memcg_cache_params->memcg_caches array for
that, because it contains all existing memcg caches.
However, it's a bad practice to keep all caches, including those that
belong to offline cgroups, in this array, because it will be growing
beyond any bounds then. I'm going to wipe away dead caches from it to
save space. To still be able to perform iterations over all memcg caches
of the same kind, let us link them into a list.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, kmem_cache stores a pointer to struct memcg_cache_params
instead of embedding it. The rationale is to save memory when kmem
accounting is disabled. However, the memcg_cache_params has shrivelled
drastically since it was first introduced:
* Initially:
struct memcg_cache_params {
bool is_root_cache;
union {
struct kmem_cache *memcg_caches[0];
struct {
struct mem_cgroup *memcg;
struct list_head list;
struct kmem_cache *root_cache;
bool dead;
atomic_t nr_pages;
struct work_struct destroy;
};
};
};
* Now:
struct memcg_cache_params {
bool is_root_cache;
union {
struct {
struct rcu_head rcu_head;
struct kmem_cache *memcg_caches[0];
};
struct {
struct mem_cgroup *memcg;
struct kmem_cache *root_cache;
};
};
};
So the memory saving does not seem to be a clear win anymore.
OTOH, keeping a pointer to memcg_cache_params struct instead of embedding
it results in touching one more cache line on kmem alloc/free hot paths.
Besides, it makes linking kmem caches in a list chained by a field of
struct memcg_cache_params really painful due to a level of indirection,
while I want to make them linked in the following patch. That said, let
us embed it.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are several FS shrinkers, including super_block::s_shrink, that
keep reclaimable objects in the list_lru structure. Hence to turn them
to memcg-aware shrinkers, it is enough to make list_lru per-memcg.
This patch does the trick. It adds an array of lru lists to the
list_lru_node structure (per-node part of the list_lru), one for each
kmem-active memcg, and dispatches every item addition or removal to the
list corresponding to the memcg which the item is accounted to. So now
the list_lru structure is not just per node, but per node and per memcg.
Not all list_lrus need this feature, so this patch also adds a new
method, list_lru_init_memcg, which initializes a list_lru as memcg
aware. Otherwise (i.e. if initialized with old list_lru_init), the
list_lru won't have per memcg lists.
Just like per memcg caches arrays, the arrays of per-memcg lists are
indexed by memcg_cache_id, so we must grow them whenever
memcg_nr_cache_ids is increased. So we introduce a callback,
memcg_update_all_list_lrus, invoked by memcg_alloc_cache_id if the id
space is full.
The locking is implemented in a manner similar to lruvecs, i.e. we have
one lock per node that protects all lists (both global and per cgroup) on
the node.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To make list_lru memcg aware, we need all list_lrus to be kept on a list
protected by a mutex, so that we could sleep while walking over the
list.
Therefore after this change list_lru_destroy may sleep. Fortunately,
there is only one user that calls it from an atomic context - it's
put_super - and we can easily fix it by calling list_lru_destroy before
put_super in destroy_locked_super - anyway we don't longer need lrus by
that time.
Another point that should be noted is that list_lru_destroy is allowed
to be called on an uninitialized zeroed-out object, in which case it is
a no-op. Before this patch this was guaranteed by kfree, but now we
need an explicit check there.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The active_nodes mask allows us to skip empty nodes when walking over
list_lru items from all nodes in list_lru_count/walk. However, these
functions are never called from hot paths, so it doesn't seem we need
such kind of optimization there. OTOH, removing the mask will make it
easier to make list_lru per-memcg.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>