Douglas Anderson, recently pointed out an interesting problem due to which
udelay() was expiring earlier than it should.
While transitioning between frequencies few platforms may temporarily switch to
a stable frequency, waiting for the main PLL to stabilize.
For example: When we transition between very low frequencies on exynos, like
between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
No CPUFREQ notification is sent for that. That means there's a period of time
when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
and 300MHz. And so udelay behaves badly.
To get this fixed in a generic way, introduce another set of callbacks
get_intermediate() and target_intermediate(), only for drivers with
target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
get_intermediate() should return a stable intermediate frequency platform wants
to switch to, and target_intermediate() should set CPU to that frequency,
before jumping to the frequency corresponding to 'index'. Core will take care of
sending notifications and driver doesn't have to handle them in
target_intermediate() or target_index().
NOTE: ->target_index() should restore to policy->restore_freq in case of
failures as core would send notifications for that.
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ARM64 EFI update from Peter Anvin:
"By agreement with the ARM64 EFI maintainers, we have agreed to make
-tip the upstream for all EFI patches. That is why this patchset
comes from me :)
This patchset enables EFI stub support for ARM64, like we already have
on x86"
* 'arm64-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64: efi: only attempt efi map setup if booting via EFI
efi/arm64: ignore dtb= when UEFI SecureBoot is enabled
doc: arm64: add description of EFI stub support
arm64: efi: add EFI stub
doc: arm: add UEFI support documentation
arm64: add EFI runtime services
efi: Add shared FDT related functions for ARM/ARM64
arm64: Add function to create identity mappings
efi: add helper function to get UEFI params from FDT
doc: efi-stub.txt updates for ARM
lib: add fdt_empty_tree.c
Some drivers have different limits on what size a request should
optimally be, depending on the offset of the request. Similar to
dividing a device into chunks. Add a setting that allows the driver
to inform the block layer of such a chunk size. The block layer will
then prevent merging across the chunks.
This is needed to optimally support NVMe with a non-zero stripe size.
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull x86 EFI updates from Peter Anvin:
"A collection of EFI changes. The perhaps most important one is to
fully save and restore the FPU state around each invocation of EFI
runtime, and to not choke on non-ASCII characters in the boot stub"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efivars: Add compatibility code for compat tasks
efivars: Refactor sanity checking code into separate function
efivars: Stop passing a struct argument to efivar_validate()
efivars: Check size of user object
efivars: Use local variables instead of a pointer dereference
x86/efi: Save and restore FPU context around efi_calls (i386)
x86/efi: Save and restore FPU context around efi_calls (x86_64)
x86/efi: Implement a __efi_call_virt macro
x86, fpu: Extend the use of static_cpu_has_safe
x86/efi: Delete most of the efi_call* macros
efi: x86: Handle arbitrary Unicode characters
efi: Add get_dram_base() helper function
efi: Add shared printk wrapper for consistent prefixing
efi: create memory map iteration helper
efi: efi-stub-helper cleanup
Pull x86 cdso updates from Peter Anvin:
"Vdso cleanups and improvements largely from Andy Lutomirski. This
makes the vdso a lot less ''special''"
* 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso, build: Make LE access macros clearer, host-safe
x86/vdso, build: Fix cross-compilation from big-endian architectures
x86/vdso, build: When vdso2c fails, unlink the output
x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls
x86, mm: Improve _install_special_mapping and fix x86 vdso naming
mm, fs: Add vm_ops->name as an alternative to arch_vma_name
x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments
x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
x86, vdso: Move the 32-bit vdso special pages after the text
x86, vdso: Reimplement vdso.so preparation in build-time C
x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
x86, vdso: Clean up 32-bit vs 64-bit vdso params
x86, mm: Ensure correct alignment of the fixmap
Add common code to generate -ENOTSUPP at event creation time if an
architecture attempts to create a sampled event and
PERF_PMU_NO_INTERRUPT is set.
This adds a new pmu->capabilities flag. Initially we only support
PERF_PMU_NO_INTERRUPT (to indicate a PMU has no support for generating
hardware interrupts) but there are other capabilities that can be
added later.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Will Deacon <will.deacon@arm.com>
[peterz: rename to PERF_PMU_CAP_* and moved the pmu::capabilities word into a hole]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1405161708060.11099@vincent-weaver-1.umelst.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is better not to think about compute capacity as being equivalent
to "CPU power". The upcoming "power aware" scheduler work may create
confusion with the notion of energy consumption if "power" is used too
liberally.
Let's rename the following feature flags since they do relate to capacity:
SD_SHARE_CPUPOWER -> SD_SHARE_CPUCAPACITY
ARCH_POWER -> ARCH_CAPACITY
NONTASK_POWER -> NONTASK_CAPACITY
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linaro-kernel@lists.linaro.org
Cc: Andy Fleming <afleming@freescale.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/n/tip-e93lpnxb87owfievqatey6b5@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is better not to think about compute capacity as being equivalent
to "CPU power". The upcoming "power aware" scheduler work may create
confusion with the notion of energy consumption if "power" is used too
liberally.
This contains the architecture visible changes. Incidentally, only ARM
takes advantage of the available pow^H^H^Hcapacity scaling hooks and
therefore those changes outside kernel/sched/ are confined to one ARM
specific file. The default arch_scale_smt_power() hook is not overridden
by anyone.
Replacements are as follows:
arch_scale_freq_power --> arch_scale_freq_capacity
arch_scale_smt_power --> arch_scale_smt_capacity
SCHED_POWER_SCALE --> SCHED_CAPACITY_SCALE
SCHED_POWER_SHIFT --> SCHED_CAPACITY_SHIFT
The local usage of "power" in arch/arm/kernel/topology.c is also changed
to "capacity" as appropriate.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linaro-kernel@lists.linaro.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: devicetree@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/n/tip-48zba9qbznvglwelgq2cfygh@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is better not to think about compute capacity as being equivalent
to "CPU power". The upcoming "power aware" scheduler work may create
confusion with the notion of energy consumption if "power" is used too
liberally.
Since struct sched_group_power is really about compute capacity of sched
groups, let's rename it to struct sched_group_capacity. Similarly sgp
becomes sgc. Related variables and functions dealing with groups are also
adjusted accordingly.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linaro-kernel@lists.linaro.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/n/tip-5yeix833vvgf2uyj5o36hpu9@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
yield_to() is supposed to return -ESRCH if there is no task to
yield to, but because the type is bool that is the same as returning
true.
The only place I see which cares is kvm_vcpu_on_spin().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Raghavendra <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/20140523102042.GA7267@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Optimistic spinning is only used by the xadd variant
of rw-semaphores. Make sure that we use the old version
of the __RWSEM_INITIALIZER macro for systems that rely
on the spinlock one, otherwise warnings can be triggered,
such as the following reported on an arm box:
ipc/ipcns_notifier.c:22:8: warning: excess elements in struct initializer [enabled by default]
ipc/ipcns_notifier.c:22:8: warning: (near initialization for 'ipcns_chain.rwsem') [enabled by default]
ipc/ipcns_notifier.c:22:8: warning: excess elements in struct initializer [enabled by default]
ipc/ipcns_notifier.c:22:8: warning: (near initialization for 'ipcns_chain.rwsem') [enabled by default]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Link: http://lkml.kernel.org/r/1400545677.6399.10.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have reached the point where our mutexes are quite fine tuned
for a number of situations. This includes the use of heuristics
and optimistic spinning, based on MCS locking techniques.
Exclusive ownership of read-write semaphores are, conceptually,
just about the same as mutexes, making them close cousins. To
this end we need to make them both perform similarly, and
right now, rwsems are simply not up to it. This was discovered
by both reverting commit 4fc3f1d6 (mm/rmap, migration: Make
rmap_walk_anon() and try_to_unmap_anon() more scalable) and
similarly, converting some other mutexes (ie: i_mmap_mutex) to
rwsems. This creates a situation where users have to choose
between a rwsem and mutex taking into account this important
performance difference. Specifically, biggest difference between
both locks is when we fail to acquire a mutex in the fastpath,
optimistic spinning comes in to play and we can avoid a large
amount of unnecessary sleeping and overhead of moving tasks in
and out of wait queue. Rwsems do not have such logic.
This patch, based on the work from Tim Chen and I, adds support
for write-side optimistic spinning when the lock is contended.
It also includes support for the recently added cancelable MCS
locking for adaptive spinning. Note that is is only applicable
to the xadd method, and the spinlock rwsem variant remains intact.
Allowing optimistic spinning before putting the writer on the wait
queue reduces wait queue contention and provided greater chance
for the rwsem to get acquired. With these changes, rwsem is on par
with mutex. The performance benefits can be seen on a number of
workloads. For instance, on a 8 socket, 80 core 64bit Westmere box,
aim7 shows the following improvements in throughput:
+--------------+---------------------+-----------------+
| Workload | throughput-increase | number of users |
+--------------+---------------------+-----------------+
| alltests | 20% | >1000 |
| custom | 27%, 60% | 10-100, >1000 |
| high_systime | 36%, 30% | >100, >1000 |
| shared | 58%, 29% | 10-100, >1000 |
+--------------+---------------------+-----------------+
There was also improvement on smaller systems, such as a quad-core
x86-64 laptop running a 30Gb PostgreSQL (pgbench) workload for up
to +60% in throughput for over 50 clients. Additionally, benefits
were also noticed in exim (mail server) workloads. Furthermore, no
performance regression have been seen at all.
Based-on-work-from: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
[peterz: rej fixup due to comment patches, sched/rt.h header]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Scott J Norton" <scott.norton@hp.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Link: http://lkml.kernel.org/r/1399055055.6275.15.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
capi_info2str() is apparently meant to be of general utility. It is
actually only used in capidrv.c. So move it from capiutil.c to
capidrv.c and (obviously) stop exporting it.
And, since we're touching this, merge the two versions of this
function.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call gso_make_checksum. This should have the benefit of using a
checksum that may have been previously computed for the packet.
This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that
offload GRE GSO with and without the GRE checksum offloaed.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates
that a device is capable of computing the UDP checksum in the
encapsulating header of a UDP tunnel.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When creating a GSO packet segment we may need to set more than
one checksum in the packet (for instance a TCP checksum and
UDP checksum for VXLAN encapsulation). To be efficient, we want
to do checksum calculation for any part of the packet at most once.
This patch adds csum_start offset to skb_gso_cb. This tracks the
starting offset for skb->csum which is initially set in skb_segment.
When a protocol needs to compute a transport checksum it calls
gso_make_checksum which computes the checksum value from the start
of transport header to csum_start and then adds in skb->csum to get
the full checksum. skb->csum and csum_start are then updated to reflect
the checksum of the resultant packet starting from the transport header.
This patch also adds a flag to skbuff, encap_hdr_csum, which is set
in *gso_segment fucntions to indicate that a tunnel protocol needs
checksum calculation
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge misc updates from Andrew Morton:
- a few fixes for 3.16. Cc'ed to stable so they'll get there somehow.
- various misc fixes and cleanups
- most of the ocfs2 queue. Review is slow...
- most of MM. The MM queue is pretty huge this time, but not much in
the way of feature work.
- some tweaks under kernel/
- printk maintenance work
- updates to lib/
- checkpatch updates
- tweaks to init/
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (276 commits)
fs/autofs4/dev-ioctl.c: add __init to autofs_dev_ioctl_init
fs/ncpfs/getopt.c: replace simple_strtoul by kstrtoul
init/main.c: remove an ifdef
kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid CLONE_SIGHAND
init/main.c: add initcall_blacklist kernel parameter
init/main.c: don't use pr_debug()
fs/binfmt_flat.c: make old_reloc() static
fs/binfmt_elf.c: fix bool assignements
fs/efs: convert printk(KERN_DEBUG to pr_debug
fs/efs: add pr_fmt / use __func__
fs/efs: convert printk to pr_foo()
scripts/checkpatch.pl: device_initcall is not the only __initcall substitute
checkpatch: check stable email address
checkpatch: warn on unnecessary void function return statements
checkpatch: prefer kstrto<foo> to sscanf(buf, "%<lhuidx>", &bar);
checkpatch: add warning for kmalloc/kzalloc with multiply
checkpatch: warn on #defines ending in semicolon
checkpatch: make --strict a default for files in drivers/net and net/
checkpatch: always warn on missing blank line after variable declaration block
checkpatch: fix wildcard DT compatible string checking
...
1. Remove CLONE_KERNEL, it has no users and it is dangerous.
The (old) comment says "List of flags we want to share for kernel
threads" but this is not true, we do not want to share ->sighand by
default. This flag can only be used if the caller is sure that both
parent/child will never play with signals (say, allow_signal/etc).
2. Change rest_init() to clone kernel_init() without CLONE_SIGHAND.
In this case CLONE_SIGHAND does not really hurt, and it looks like
optimization because copy_sighand() can avoid kmem_cache_alloc().
But in fact this only adds the minor pessimization. kernel_init()
is going to exec the init process, and de_thread() will need to
unshare ->sighand and do kmem_cache_alloc(sighand_cachep) anyway,
but it needs to do more work and take tasklist_lock and siglock.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... instead of naked numbers.
Stuff in sysrq.c used to set it to 8 which is supposed to mean above
default level so set it to DEBUG instead as we're terminating/killing all
tasks and we want to be verbose there.
Also, correct the check in x86_64_start_kernel which should be >= as
we're clearly issuing the string there for all debug levels, not only
the magical 10.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Joe Perches <joe@perches.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The pr_debug() and related debug print macros all differ from the normal
pr_XXX() macros, in that the normal ones print unconditionally, while
the debug macros are compiled out unless DEBUG is defined or
CONFIG_DYNAMIC_DEBUG is set. This isn't obvious, and the only way to
find this out is either to review the actual printk.h code or to read
CodingStyle, and the message there doesn't highlight the fact.
Change Documentation/CodingStyle to clearly indicate that pr_debug() and
related debug printing macros behave differently than all other pr_XXX()
macros, and attempt to clarify when and where the different debug
printing methods might be used.
Add short comment to printk.h above the pr_XXX() macros indicating that
while these macros print unconditionally, pr_debug() does not.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Joe Perches <joe@perches.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two of the three prink_deferred uses are really printk_once style
uses, so add a printk_deferred_once macro to simplify those call
sites.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After learning we'll need some sort of deferred printk functionality in
the timekeeping core, Peter suggested we rename the printk_sched function
so it can be reused by needed subsystems.
This only changes the function name. No logic changes.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nobody seems uses it for a long time. Let's drop it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Usually, BUG_ON and friends aren't even evaluated in sparse, but recently
compiletime_assert_atomic_type() was added, and that now results in a
sparse warning every time it is used.
The reason turns out to be the temporary variable, after it sparse no
longer considers the value to be a constant, and results in a warning and
an error. The error is the more annoying part of this as it suppresses
any further warnings in the same file, hiding other problems.
Unfortunately the condition cannot be simply expanded out to avoid the
temporary variable since it breaks compiletime_assert on old versions of
GCC such as GCC 4.2.4 which the latest metag compiler is based on.
Therefore #ifndef __CHECKER__ out the __compiletime_error_fallback which
uses the potentially negative size array to trigger a conditional compiler
error, so that sparse doesn't see it.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Luciano Coelho <luciano.coelho@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zbud_alloc is only called by zswap_frontswap_store with unsigned int len.
Change function parameter + update >= 0 check.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We already have a function named hugepages_supported(), and the similar
name hugepage_migration_support() is a bit unconfortable, so let's rename
it hugepage_migration_supported().
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Transform action part of ttu_flags into individiual bits. These flags
aren't part of any uses-space visible api or even trace events.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, we use (zone->managed_pages + KSWAPD_ZONE_BALANCE_GAP_RATIO-1)
/ KSWAPD_ZONE_BALANCE_GAP_RATIO to avoid a zero gap value. It's better to
use DIV_ROUND_UP macro for neater code and clear meaning.
Besides, the gap value is calculated against the per-zone "managed pages",
not "present pages". This patch also corrects the comment and do some
rephrasing.
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mmdebug.h is included twice.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
aops->write_begin may allocate a new page and make it visible only to have
mark_page_accessed called almost immediately after. Once the page is
visible the atomic operations are necessary which is noticable overhead
when writing to an in-memory filesystem like tmpfs but should also be
noticable with fast storage. The objective of the patch is to initialse
the accessed information with non-atomic operations before the page is
visible.
The bulk of filesystems directly or indirectly use
grab_cache_page_write_begin or find_or_create_page for the initial
allocation of a page cache page. This patch adds an init_page_accessed()
helper which behaves like the first call to mark_page_accessed() but may
called before the page is visible and can be done non-atomically.
The primary APIs of concern in this care are the following and are used
by most filesystems.
find_get_page
find_lock_page
find_or_create_page
grab_cache_page_nowait
grab_cache_page_write_begin
All of them are very similar in detail to the patch creates a core helper
pagecache_get_page() which takes a flags parameter that affects its
behavior such as whether the page should be marked accessed or not. Then
old API is preserved but is basically a thin wrapper around this core
function.
Each of the filesystems are then updated to avoid calling
mark_page_accessed when it is known that the VM interfaces have already
done the job. There is a slight snag in that the timing of the
mark_page_accessed() has now changed so in rare cases it's possible a page
gets to the end of the LRU as PageReferenced where as previously it might
have been repromoted. This is expected to be rare but it's worth the
filesystem people thinking about it in case they see a problem with the
timing change. It is also the case that some filesystems may be marking
pages accessed that previously did not but it makes sense that filesystems
have consistent behaviour in this regard.
The test case used to evaulate this is a simple dd of a large file done
multiple times with the file deleted on each iterations. The size of the
file is 1/10th physical memory to avoid dirty page balancing. In the
async case it will be possible that the workload completes without even
hitting the disk and will have variable results but highlight the impact
of mark_page_accessed for async IO. The sync results are expected to be
more stable. The exception is tmpfs where the normal case is for the "IO"
to not hit the disk.
The test machine was single socket and UMA to avoid any scheduling or NUMA
artifacts. Throughput and wall times are presented for sync IO, only wall
times are shown for async as the granularity reported by dd and the
variability is unsuitable for comparison. As async results were variable
do to writback timings, I'm only reporting the maximum figures. The sync
results were stable enough to make the mean and stddev uninteresting.
The performance results are reported based on a run with no profiling.
Profile data is based on a separate run with oprofile running.
async dd
3.15.0-rc3 3.15.0-rc3
vanilla accessed-v2
ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%)
tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%)
btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%)
ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%)
xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%)
The XFS figure is a bit strange as it managed to avoid a worst case by
sheer luck but the average figures looked reasonable.
samples percentage
ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
[akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Prabhakar Lad <prabhakar.csengg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
shmem_getpage_gfp uses an atomic operation to set the SwapBacked field
before it's even added to the LRU or visible. This is unnecessary as what
could it possible race against? Use an unlocked variant.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cold is a bool, make it one. Make the likely case the "if" part of the
block instead of the else as according to the optimisation manual this is
preferred.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
X86 prefers the use of unsigned types for iterators and there is a
tendency to mix whether a signed or unsigned type if used for page order.
This converts a number of sites in mm/page_alloc.c to use unsigned int for
order where possible.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The test_bit operations in get/set pageblock flags are expensive. This
patch reads the bitmap on a word basis and use shifts and masks to isolate
the bits of interest. Similarly masks are used to set a local copy of the
bitmap and then use cmpxchg to update the bitmap if there have been no
other changes made in parallel.
In a test running dd onto tmpfs the overhead of the pageblock-related
functions went from 1.27% in profiles to 0.5%.
In addition to the performance benefits, this patch closes races that are
possible between:
a) get_ and set_pageblock_migratetype(), where get_pageblock_migratetype()
reads part of the bits before and other part of the bits after
set_pageblock_migratetype() has updated them.
b) set_pageblock_migratetype() and set_pageblock_skip(), where the non-atomic
read-modify-update set bit operation in set_pageblock_skip() will cause
lost updates to some bits changed in the set_pageblock_migratetype().
Joonsoo Kim first reported the case a) via code inspection. Vlastimil
Babka's testing with a debug patch showed that either a) or b) occurs
roughly once per mmtests' stress-highalloc benchmark (although not
necessarily in the same pageblock). Furthermore during development of
unrelated compaction patches, it was observed that frequent calls to
{start,undo}_isolate_page_range() the race occurs several thousands of
times and has resulted in NULL pointer dereferences in move_freepages()
and free_one_page() in places where free_list[migratetype] is
manipulated by e.g. list_move(). Further debugging confirmed that
migratetype had invalid value of 6, causing out of bounds access to the
free_list array.
That confirmed that the race exist, although it may be extremely rare,
and currently only fatal where page isolation is performed due to
memory hot remove. Races on pageblocks being updated by
set_pageblock_migratetype(), where both old and new migratetype are
lower MIGRATE_RESERVE, currently cannot result in an invalid value
being observed, although theoretically they may still lead to
unexpected creation or destruction of MIGRATE_RESERVE pageblocks.
Furthermore, things could get suddenly worse when memory isolation is
used more, or when new migratetypes are added.
After this patch, the race has no longer been observed in testing.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If cpusets are not in use then we still check a global variable on every
page allocation. Use jump labels to avoid the overhead.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch exposes the jump_label reference count in preparation for the
next patch. cpusets cares about both the jump_label being enabled and how
many users of the cpusets there currently are.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current names are rather inconsistent. Let's try to improve them.
Brief change log:
** old name ** ** new name **
kmem_cache_create_memcg memcg_create_kmem_cache
memcg_kmem_create_cache memcg_regsiter_cache
memcg_kmem_destroy_cache memcg_unregister_cache
kmem_cache_destroy_memcg_children memcg_cleanup_cache_params
mem_cgroup_destroy_all_caches memcg_unregister_all_caches
create_work memcg_register_cache_work
memcg_create_cache_work_func memcg_register_cache_func
memcg_create_cache_enqueue memcg_schedule_register_cache
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally get_swap_page() started iterating through the singly-linked
list of swap_info_structs using swap_list.next or highest_priority_index,
which both were intended to point to the highest priority active swap
target that was not full. The first patch in this series changed the
singly-linked list to a doubly-linked list, and removed the logic to start
at the highest priority non-full entry; it starts scanning at the highest
priority entry each time, even if the entry is full.
Replace the manually ordered swap_list_head with a plist, swap_active_head.
Add a new plist, swap_avail_head. The original swap_active_head plist
contains all active swap_info_structs, as before, while the new
swap_avail_head plist contains only swap_info_structs that are active and
available, i.e. not full. Add a new spinlock, swap_avail_lock, to protect
the swap_avail_head list.
Mel Gorman suggested using plists since they internally handle ordering
the list entries based on priority, which is exactly what swap was doing
manually. All the ordering code is now removed, and swap_info_struct
entries and simply added to their corresponding plist and automatically
ordered correctly.
Using a new plist for available swap_info_structs simplifies and
optimizes get_swap_page(), which no longer has to iterate over full
swap_info_structs. Using a new spinlock for swap_avail_head plist
allows each swap_info_struct to add or remove themselves from the
plist when they become full or not-full; previously they could not
do so because the swap_info_struct->lock is held when they change
from full<->not-full, and the swap_lock protecting the main
swap_active_head must be ordered before any swap_info_struct->lock.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Cc: Weijie Yang <weijieut@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add plist_requeue(), which moves the specified plist_node after all other
same-priority plist_nodes in the list. This is essentially an optimized
plist_del() followed by plist_add().
This is needed by swap, which (with the next patch in this set) uses a
plist of available swap devices. When a swap device (either a swap
partition or swap file) are added to the system with swapon(), the device
is added to a plist, ordered by the swap device's priority. When swap
needs to allocate a page from one of the swap devices, it takes the page
from the first swap device on the plist, which is the highest priority
swap device. The swap device is left in the plist until all its pages are
used, and then removed from the plist when it becomes full.
However, as described in man 2 swapon, swap must allocate pages from swap
devices with the same priority in round-robin order; to do this, on each
swap page allocation, swap uses a page from the first swap device in the
plist, and then calls plist_requeue() to move that swap device entry to
after any other same-priority swap devices. The next swap page allocation
will again use a page from the first swap device in the plist and requeue
it, and so on, resulting in round-robin usage of equal-priority swap
devices.
Also add plist_test_requeue() test function, for use by plist_test() to
test plist_requeue() function.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Cc: Weijie Yang <weijieut@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add PLIST_HEAD() to plist.h, equivalent to LIST_HEAD() from list.h, to
define and initialize a struct plist_head.
Add plist_for_each_continue() and plist_for_each_entry_continue(),
equivalent to list_for_each_continue() and list_for_each_entry_continue(),
to iterate over a plist continuing after the current position.
Add plist_prev() and plist_next(), equivalent to (struct list_head*)->prev
and ->next, implemented by list_prev_entry() and list_next_entry(), to
access the prev/next struct plist_node entry. These are needed because
unlike struct list_head, direct access of the prev/next struct plist_node
isn't possible; the list must be navigated via the contained struct
list_head. e.g. instead of accessing the prev by list_prev_entry(node,
node_list) it can be accessed by plist_prev(node).
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Cc: Weijie Yang <weijieut@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The logic controlling the singly-linked list of swap_info_struct entries
for all active, i.e. swapon'ed, swap targets is rather complex, because:
- it stores the entries in priority order
- there is a pointer to the highest priority entry
- there is a pointer to the highest priority not-full entry
- there is a highest_priority_index variable set outside the swap_lock
- swap entries of equal priority should be used equally
this complexity leads to bugs such as: https://lkml.org/lkml/2014/2/13/181
where different priority swap targets are incorrectly used equally.
That bug probably could be solved with the existing singly-linked lists,
but I think it would only add more complexity to the already difficult to
understand get_swap_page() swap_list iteration logic.
The first patch changes from a singly-linked list to a doubly-linked list
using list_heads; the highest_priority_index and related code are removed
and get_swap_page() starts each iteration at the highest priority
swap_info entry, even if it's full. While this does introduce unnecessary
list iteration (i.e. Schlemiel the painter's algorithm) in the case where
one or more of the highest priority entries are full, the iteration and
manipulation code is much simpler and behaves correctly re: the above bug;
and the fourth patch removes the unnecessary iteration.
The second patch adds some minor plist helper functions; nothing new
really, just functions to match existing regular list functions. These
are used by the next two patches.
The third patch adds plist_requeue(), which is used by get_swap_page() in
the next patch - it performs the requeueing of same-priority entries
(which moves the entry to the end of its priority in the plist), so that
all equal-priority swap_info_structs get used equally.
The fourth patch converts the main list into a plist, and adds a new plist
that contains only swap_info entries that are both active and not full.
As Mel suggested using plists allows removing all the ordering code from
swap - plists handle ordering automatically. The list naming is also
clarified now that there are two lists, with the original list changed
from swap_list_head to swap_active_head and the new list named
swap_avail_head. A new spinlock is also added for the new list, so
swap_info entries can be added or removed from the new list immediately as
they become full or not full.
This patch (of 4):
Replace the singly-linked list tracking active, i.e. swapon'ed,
swap_info_struct entries with a doubly-linked list using struct
list_heads. Simplify the logic iterating and manipulating the list of
entries, especially get_swap_page(), by using standard list_head
functions, and removing the highest priority iteration logic.
The change fixes the bug:
https://lkml.org/lkml/2014/2/13/181
in which different priority swap entries after the highest priority entry
are incorrectly used equally in pairs. The swap behavior is now as
advertised, i.e. different priority swap entries are used in order, and
equal priority swap targets are used concurrently.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Cc: Weijie Yang <weijieut@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We're going to want to manipulate the migration mode for compaction in the
page allocator, and currently compact_control's sync field is only a bool.
Currently, we only do MIGRATE_ASYNC or MIGRATE_SYNC_LIGHT compaction
depending on the value of this bool. Convert the bool to enum
migrate_mode and pass the migration mode in directly. Later, we'll want
to avoid MIGRATE_SYNC_LIGHT for thp allocations in the pagefault patch to
avoid unnecessary latency.
This also alters compaction triggered from sysfs, either for the entire
system or for a node, to force MIGRATE_SYNC.
[akpm@linux-foundation.org: fix build]
[iamjoonsoo.kim@lge.com: use MIGRATE_SYNC in alloc_contig_range()]
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Each zone has a cached migration scanner pfn for memory compaction so that
subsequent calls to memory compaction can start where the previous call
left off.
Currently, the compaction migration scanner only updates the per-zone
cached pfn when pageblocks were not skipped for async compaction. This
creates a dependency on calling sync compaction to avoid having subsequent
calls to async compaction from scanning an enormous amount of non-MOVABLE
pageblocks each time it is called. On large machines, this could be
potentially very expensive.
This patch adds a per-zone cached migration scanner pfn only for async
compaction. It is updated everytime a pageblock has been scanned in its
entirety and when no pages from it were successfully isolated. The cached
migration scanner pfn for sync compaction is updated only when called for
sync compaction.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory migration uses a callback defined by the caller to determine how to
allocate destination pages. When migration fails for a source page,
however, it frees the destination page back to the system.
This patch adds a memory migration callback defined by the caller to
determine how to free destination pages. If a caller, such as memory
compaction, builds its own freelist for migration targets, this can reuse
already freed memory instead of scanning additional memory.
If the caller provides a function to handle freeing of destination pages,
it is called when page migration fails. If the caller passes NULL then
freeing back to the system will be handled as usual. This patch
introduces no functional change.
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of calling back to memcontrol.c from kmem_cache_create_memcg in
order to just create the name of a per memcg cache, let's allocate it in
place. We only need to pass the memcg name to kmem_cache_create_memcg for
that - everything else can be done in slab_common.c.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With ELF extended numbering 16-bit bound is not hard limit any more.
[akpm@linux-foundation.org: fix typo]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KSM was converted to use rmap_walk() and now nobody uses these functions
outside mm/rmap.c.
Let's covert them back to static.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use BOOTMEM_DEFAULT instead of 0 in the comment.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, in put_compound_page(), we have
======
if (likely(!PageTail(page))) { <------ (1)
if (put_page_testzero(page)) {
/*
¦* By the time all refcounts have been released
¦* split_huge_page cannot run anymore from under us.
¦*/
if (PageHead(page))
__put_compound_page(page);
else
__put_single_page(page);
}
return;
}
/* __split_huge_page_refcount can run under us */
page_head = compound_head(page); <------------ (2)
======
if at (1) , we fail the check, this means page is *likely* a tail page.
Then at (2), as compoud_head(page) is inlined, it is :
======
static inline struct page *compound_head(struct page *page)
{
if (unlikely(PageTail(page))) { <----------- (3)
struct page *head = page->first_page;
smp_rmb();
if (likely(PageTail(page)))
return head;
}
return page;
}
======
here, the (3) unlikely in the case is a negative hint, because it is
*likely* a tail page. So the check (3) in this case is not good, so I
introduce a helper for this case.
So this patch introduces compound_head_by_tail() which deals with a
possible tail page(though it could be spilt by a racy thread), and make
compound_head() a wrapper on it.
This patch has no functional change, and it reduces the object
size slightly:
text data bss dec hex filename
11003 1328 16 12347 303b mm/swap.o.orig
10971 1328 16 12315 301b mm/swap.o.patched
I've ran "perf top -e branch-miss" to observe branch-miss in this case.
As Michael points out, it's a slow path, so only very few times this case
happens. But I grep'ed the code base, and found there still are some
other call sites could be benifited from this helper. And given that it
only bloating up the source by only 5 lines, but with a reduced object
size. I still believe this helper deserves to exsit.
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The nmask argument to set_mempolicy() is const according to the user-space
header numaif.h, and since the kernel does indeed not modify it, it might
as well be declared const in the kernel.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The nmask argument to mbind() is const according to the userspace header
numaif.h, and since the kernel does indeed not modify it, it might as well
be declared const in the kernel.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A block device driver may choose to provide a rw_page operation. These
will be called when the filesystem is attempting to do page sized I/O to
page cache pages (ie not for direct I/O). This does preclude I/Os that
are larger than page size, so this may only be a performance gain for
some devices.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Tested-by: Dheeraj Reddy <dheeraj.reddy@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
page_endio() takes care of updating all the appropriate page flags once
I/O has finished to a page. Switch to using mapping_set_error() instead
of setting AS_EIO directly; this will handle thin-provisioned devices
correctly.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dheeraj Reddy <dheeraj.reddy@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The last in-tree caller of block_write_full_page_endio() was removed in
January 2013. It's time to remove the EXPORT_SYMBOL, which leaves
block_write_full_page() as the only caller of
block_write_full_page_endio(), so inline block_write_full_page_endio()
into block_write_full_page().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dheeraj Reddy <dheeraj.reddy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At present, we have the following mutexes protecting data related to per
memcg kmem caches:
- slab_mutex. This one is held during the whole kmem cache creation
and destruction paths. We also take it when updating per root cache
memcg_caches arrays (see memcg_update_all_caches). As a result, taking
it guarantees there will be no changes to any kmem cache (including per
memcg). Why do we need something else then? The point is it is
private to slab implementation and has some internal dependencies with
other mutexes (get_online_cpus). So we just don't want to rely upon it
and prefer to introduce additional mutexes instead.
- activate_kmem_mutex. Initially it was added to synchronize
initializing kmem limit (memcg_activate_kmem). However, since we can
grow per root cache memcg_caches arrays only on kmem limit
initialization (see memcg_update_all_caches), we also employ it to
protect against memcg_caches arrays relocation (e.g. see
__kmem_cache_destroy_memcg_children).
- We have a convention not to take slab_mutex in memcontrol.c, but we
want to walk over per memcg memcg_slab_caches lists there (e.g. for
destroying all memcg caches on offline). So we have per memcg
slab_caches_mutex's protecting those lists.
The mutexes are taken in the following order:
activate_kmem_mutex -> slab_mutex -> memcg::slab_caches_mutex
Such a syncrhonization scheme has a number of flaws, for instance:
- We can't call kmem_cache_{destroy,shrink} while walking over a
memcg::memcg_slab_caches list due to locking order. As a result, in
mem_cgroup_destroy_all_caches we schedule the
memcg_cache_params::destroy work shrinking and destroying the cache.
- We don't have a mutex to synchronize per memcg caches destruction
between memcg offline (mem_cgroup_destroy_all_caches) and root cache
destruction (__kmem_cache_destroy_memcg_children). Currently we just
don't bother about it.
This patch simplifies it by substituting per memcg slab_caches_mutex's
with the global memcg_slab_mutex. It will be held whenever a new per
memcg cache is created or destroyed, so it protects per root cache
memcg_caches arrays and per memcg memcg_slab_caches lists. The locking
order is following:
activate_kmem_mutex -> memcg_slab_mutex -> slab_mutex
This allows us to call kmem_cache_{create,shrink,destroy} under the
memcg_slab_mutex. As a result, we don't need memcg_cache_params::destroy
work any more - we can simply destroy caches while iterating over a per
memcg slab caches list.
Also using the global mutex simplifies synchronization between concurrent
per memcg caches creation/destruction, e.g. mem_cgroup_destroy_all_caches
vs __kmem_cache_destroy_memcg_children.
The downside of this is that we substitute per-memcg slab_caches_mutex's
with a hummer-like global mutex, but since we already take either the
slab_mutex or the cgroup_mutex along with a memcg::slab_caches_mutex, it
shouldn't hurt concurrency a lot.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we have two pairs of kmemcg-related functions that are called on
slab alloc/free. The first is memcg_{bind,release}_pages that count the
total number of pages allocated on a kmem cache. The second is
memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
counter. Let's just merge them to keep the code clean.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset is a part of preparations for kmemcg re-parenting. It
targets at simplifying kmemcg work-flows and synchronization.
First, it removes async per memcg cache destruction (see patches 1, 2).
Now caches are only destroyed on memcg offline. That means the caches
that are not empty on memcg offline will be leaked. However, they are
already leaked, because memcg_cache_params::nr_pages normally never drops
to 0 so the destruction work is never scheduled except kmem_cache_shrink
is called explicitly. In the future I'm planning reaping such dead caches
on vmpressure or periodically.
Second, it substitutes per memcg slab_caches_mutex's with the global
memcg_slab_mutex, which should be taken during the whole per memcg cache
creation/destruction path before the slab_mutex (see patch 3). This
greatly simplifies synchronization among various per memcg cache
creation/destruction paths.
I'm still not quite sure about the end picture, in particular I don't know
whether we should reap dead memcgs' kmem caches periodically or try to
merge them with their parents (see https://lkml.org/lkml/2014/4/20/38 for
more details), but whichever way we choose, this set looks like a
reasonable change to me, because it greatly simplifies kmemcg work-flows
and eases further development.
This patch (of 3):
After a memcg is offlined, we mark its kmem caches that cannot be deleted
right now due to pending objects as dead by setting the
memcg_cache_params::dead flag, so that memcg_release_pages will schedule
cache destruction (memcg_cache_params::destroy) as soon as the last slab
of the cache is freed (memcg_cache_params::nr_pages drops to zero).
I guess the idea was to destroy the caches as soon as possible, i.e.
immediately after freeing the last object. However, it just doesn't work
that way, because kmem caches always preserve some pages for the sake of
performance, so that nr_pages never gets to zero unless the cache is
shrunk explicitly using kmem_cache_shrink. Of course, we could account
the total number of objects on the cache or check if all the slabs
allocated for the cache are empty on kmem_cache_free and schedule
destruction if so, but that would be too costly.
Thus we have a piece of code that works only when we explicitly call
kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
it's racy in fact. For instance, kmem_cache_shrink may free the last slab
and thus schedule cache destruction before it finishes checking that the
cache is empty, which can lead to use-after-free.
So I propose to remove this async cache destruction from
memcg_release_pages, and check if the cache is empty explicitly after
calling kmem_cache_shrink instead. This will simplify things a lot w/o
introducing any functional changes.
And regarding dead memcg caches (i.e. those that are left hanging around
after memcg offline for they have objects), I suppose we should reap them
either periodically or on vmpressure as Glauber suggested initially. I'm
going to implement this later.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CONFIG_MM_OWNER makes no sense. It is not user-selectable, it is only
selected by CONFIG_MEMCG automatically. So we can kill this option in
init/Kconfig and do s/CONFIG_MM_OWNER/CONFIG_MEMCG/ globally.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In mm/swap.c, __lru_cache_add() is exported, but actually there are no
users outside this file.
This patch unexports __lru_cache_add(), and makes it static. It also
exports lru_cache_add_file(), as it is use by cifs and fuse, which can
loaded as modules.
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kmem_cache_{create,destroy,shrink} need to get a stable value of
cpu/node online mask, because they init/destroy/access per-cpu/node
kmem_cache parts, which can be allocated or destroyed on cpu/mem
hotplug. To protect against cpu hotplug, these functions use
{get,put}_online_cpus. However, they do nothing to synchronize with
memory hotplug - taking the slab_mutex does not eliminate the
possibility of race as described in patch 2.
What we need there is something like get_online_cpus, but for memory.
We already have lock_memory_hotplug, which serves for the purpose, but
it's a bit of a hammer right now, because it's backed by a mutex. As a
result, it imposes some limitations to locking order, which are not
desirable, and can't be used just like get_online_cpus. That's why in
patch 1 I substitute it with get/put_online_mems, which work exactly
like get/put_online_cpus except they block not cpu, but memory hotplug.
[ v1 can be found at https://lkml.org/lkml/2014/4/6/68. I NAK'ed it by
myself, because it used an rw semaphore for get/put_online_mems,
making them dead lock prune. ]
This patch (of 2):
{un}lock_memory_hotplug, which is used to synchronize against memory
hotplug, is currently backed by a mutex, which makes it a bit of a
hammer - threads that only want to get a stable value of online nodes
mask won't be able to proceed concurrently. Also, it imposes some
strong locking ordering rules on it, which narrows down the set of its
usage scenarios.
This patch introduces get/put_online_mems, which are the same as
get/put_online_cpus, but for memory hotplug, i.e. executing a code
inside a get/put_online_mems section will guarantee a stable value of
online nodes, present pages, etc.
lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pgdat->reclaim_nodes tracks if a remote node is allowed to be reclaimed
by zone_reclaim due to its distance. As it is expected that
zone_reclaim_mode will be rarely enabled it is unreasonable for all
machines to take a penalty. Fortunately, the zone_reclaim_mode() path
is already slow and it is the path that takes the hit.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When it was introduced, zone_reclaim_mode made sense as NUMA distances
punished and workloads were generally partitioned to fit into a NUMA
node. NUMA machines are now common but few of the workloads are
NUMA-aware and it's routine to see major performance degradation due to
zone_reclaim_mode being enabled but relatively few can identify the
problem.
Those that require zone_reclaim_mode are likely to be able to detect
when it needs to be enabled and tune appropriately so lets have a
sensible default for the bulk of users.
This patch (of 2):
zone_reclaim_mode causes processes to prefer reclaiming memory from
local node instead of spilling over to other nodes. This made sense
initially when NUMA machines were almost exclusively HPC and the
workload was partitioned into nodes. The NUMA penalties were
sufficiently high to justify reclaiming the memory. On current machines
and workloads it is often the case that zone_reclaim_mode destroys
performance but not all users know how to detect this. Favour the
common case and disable it by default. Users that are sophisticated
enough to know they need zone_reclaim_mode will detect it.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I recently added a patch to let folks pass a "reason" string dump_page()
which gets dumped out along with the page's data. This essentially
saves the bug-reader a trip in to the source to figure out why we
BUG_ON()'d.
The new VM_BUG_ON_PAGE() passes in NULL for "reason". It seems like we
might as well pass the BUG_ON() condition if we have it. This will
bloat kernels a bit with ~160 new strings, but this is all under a
debugging option anyway.
page:ffffea0008560280 count:1 mapcount:0 mapping:(null) index:0x0
page flags: 0xbfffc0000000001(locked)
page dumped because: VM_BUG_ON_PAGE(PageLocked(page))
------------[ cut here ]------------
kernel BUG at /home/davehans/linux.git/mm/filemap.c:464!
invalid opcode: 0000 [#1] SMP
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0+ #251
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
...
[akpm@linux-foundation.org: include stringify.h]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WARN_ON() and WARN_ON_ONCE(), dependent on CONFIG_DEBUG_VM
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, "cma=" kernel parameter is used to specify the size of CMA,
but we can't specify where it is located. We want to locate CMA below
4GB for devices only supporting 32-bit addressing on 64-bit systems
without iommu.
This enables to specify the placement of CMA by extending "cma=" kernel
parameter.
Examples:
1. locate 64MB CMA below 4GB by "cma=64M@0-4G"
2. locate 64MB CMA exact at 512MB by "cma=64M@512M"
Note that the DMA contiguous memory allocator on x86 assumes that
page_address() works for the pages to allocate. So this change requires
to limit end address of contiguous memory area upto max_pfn_mapped to
prevent from locating it on highmem area by the argument of
dma_contiguous_reserve().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This introduces memblock_alloc_range() which allocates memblock from the
specified range of physical address. I would like to use this function
to specify the location of CMA.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The DMA Contiguous Memory Allocator support on x86 is disabled when
swiotlb config option is enabled. So DMA CMA is always disabled on
x86_64 because swiotlb is always enabled. This attempts to support for
DMA CMA with enabling swiotlb config option.
The contiguous memory allocator on x86 is integrated in the function
dma_generic_alloc_coherent() which is .alloc callback in nommu_dma_ops
for dma_alloc_coherent().
x86_swiotlb_alloc_coherent() which is .alloc callback in swiotlb_dma_ops
tries to allocate with dma_generic_alloc_coherent() firstly and then
swiotlb_alloc_coherent() is called as a fallback.
The main part of supporting DMA CMA with swiotlb is that changing
x86_swiotlb_free_coherent() which is .free callback in swiotlb_dma_ops
for dma_free_coherent() so that it can distinguish memory allocated by
dma_generic_alloc_coherent() from one allocated by
swiotlb_alloc_coherent() and release it with dma_generic_free_coherent()
which can handle contiguous memory. This change requires making
is_swiotlb_buffer() global function.
This also needs to change .free callback in the dma_map_ops for amd_gart
and sta2x11, because these dma_ops are also using
dma_generic_alloc_coherent().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a CONFIG_DEBUG_VM_VMACACHE option to enable counting the cache
hit rate -- exported in /proc/vmstat.
Any updates to the caching scheme needs this kind of data, thus it can
save some work re-implementing the counting all the time.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently to allocate a page that should be charged to kmemcg (e.g.
threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page
allocated is then to be freed by free_memcg_kmem_pages. Apart from
looking asymmetrical, this also requires intrusion to the general
allocation path. So let's introduce separate functions that will
alloc/free pages charged to kmemcg.
The new functions are called alloc_kmem_pages and free_kmem_pages. They
should be used when the caller actually would like to use kmalloc, but
has to fall back to the page allocator for the allocation is large.
They only differ from alloc_pages and free_pages in that besides
allocating or freeing pages they also charge them to the kmem resource
counter of the current memory cgroup.
[sfr@canb.auug.org.au: export kmalloc_order() to modules]
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have only a few places where we actually want to charge kmem so
instead of intruding into the general page allocation path with
__GFP_KMEMCG it's better to explictly charge kmem there. All kmem
charges will be easier to follow that way.
This is a step towards removing __GFP_KMEMCG. It removes __GFP_KMEMCG
from memcg caches' allocflags. Instead it makes slab allocation path
call memcg_charge_kmem directly getting memcg to charge from the cache's
memcg params.
This also eliminates any possibility of misaccounting an allocation
going from one memcg's cache to another memcg, because now we always
charge slabs against the memcg the cache belongs to. That's why this
patch removes the big comment to memcg_kmem_get_cache.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
_PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
faults on x86. Care is taken such that _PAGE_NUMA is used only in
situations where the VMA flags distinguish between NUMA hinting faults
and prot_none faults. This decision was x86-specific and conceptually
it is difficult requiring special casing to distinguish between PROTNONE
and NUMA ptes based on context.
Fundamentally, we only need the _PAGE_NUMA bit to tell the difference
between an entry that is really unmapped and a page that is protected
for NUMA hinting faults as if the PTE is not present then a fault will
be trapped.
Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset.
This patch shrinks the maximum possible swap size and uses the bit to
uniquely distinguish between NUMA hinting ptes and swap ptes.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Description by Jan Kara:
"A lot of older filesystems don't properly flush volatile disk caches
on fsync(2) which can lead to loss of fsynced data after power failure.
This patch makes generic_file_fsync() issue proper cache flush to fix the
problem. Sysadmin can use /sys/devices/.../cache_type to tell the system
it should not send the cache flush."
[akpm@linux-foundation.org: nuke ifdef]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently hugepage migration is available for all archs which support
pmd-level hugepage, but testing is done only for x86_64 and there're
bugs for other archs. So to avoid breaking such archs, this patch
limits the availability strictly to x86_64 until developers of other
archs get interested in enabling this feature.
Simply disabling hugepage migration on non-x86_64 archs is not enough to
fix the reported problem where sys_move_pages() hits the BUG_ON() in
follow_page(FOLL_GET), so let's fix this by checking if hugepage
migration is supported in vma_migratable().
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull core irq updates from Thomas Gleixner:
"The irq department delivers:
- Another tree wide update to get rid of the horrible create_irq
interface along with its even more horrible variants. That also
gets rid of the last leftovers of the initial sparse irq hackery.
arch/driver specific changes have been either acked or ignored.
- A fix for the spurious interrupt detection logic with threaded
interrupts.
- A new ARM SoC interrupt controller
- The usual pile of fixes and improvements all over the place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
Documentation: brcmstb-l2: Add Broadcom STB Level-2 interrupt controller binding
irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller
genirq: Improve documentation to match current implementation
ARM: iop13xx: fix msi support with sparse IRQ
genirq: Provide !SMP stub for irq_set_affinity_notifier()
irqchip: armada-370-xp: Move the devicetree binding documentation
irqchip: gic: Use mask field in GICC_IAR
genirq: Remove dynamic_irq mess
ia64: Use irq_init_desc
genirq: Replace dynamic_irq_init/cleanup
genirq: Remove irq_reserve_irq[s]
genirq: Replace reserve_irqs in core code
s390: Avoid call to irq_reserve_irqs()
s390: Remove pointless arch_show_interrupts()
s390: pci: Check return value of alloc_irq_desc() proper
sh: intc: Remove pointless irq_reserve_irqs() invocation
x86, irq: Remove pointless irq_reserve_irqs() call
genirq: Make create/destroy_irq() ia64 private
tile: Use SPARSE_IRQ
tile: pci: Use irq_alloc/free_hwirq()
...
Pull timer core updates from Thomas Gleixner:
"This time you get nothing really exciting:
- A huge update to the sh* clocksource drivers
- Support for two more ARM SoCs
- Removal of the deprecated setup_sched_clock() API
- The usual pile of fixlets all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
clocksource: Add Freescale FlexTimer Module (FTM) timer support
ARM: dts: vf610: Add Freescale FlexTimer Module timer node.
clocksource: ftm: Add FlexTimer Module (FTM) Timer devicetree Documentation
clocksource: sh_tmu: Remove unnecessary OOM messages
clocksource: sh_mtu2: Remove unnecessary OOM messages
clocksource: sh_cmt: Remove unnecessary OOM messages
clocksource: em_sti: Remove unnecessary OOM messages
clocksource: dw_apb_timer_of: Do not trace read_sched_clock
clocksource: Fix clocksource_mmio_readX_down
clocksource: Fix type confusion for clocksource_mmio_readX_Y
clocksource: sh_tmu: Fix channel IRQ retrieval in legacy case
clocksource: qcom: Implement read_current_timer for udelay
ntp: Make is_error_status() use its argument
ntp: Convert simple_strtol to kstrtol
timer_stats/doc: Fix /proc/timer_stats documentation
sched_clock: Remove deprecated setup_sched_clock() API
ARM: sun6i: a31: Add support for the High Speed Timers
clocksource: sun5i: Add support for reset controller
clocksource: efm32: use $vendor,$device scheme for compatible string
KConfig: Vexpress: build the ARM_GLOBAL_TIMER with vexpress platform
...
Pull block follow-up bits from Jens Axboe:
"A few minor (but important) fixes for blk-mq for the -rc1 window.
- Hot removal potential oops fix for single queue devices. From me.
- Two merged patches in late May meant that we accidentally lost a
fix for freeing an active queue. Fix that up. From me.
- A change of the blk_mq_tag_to_rq() API, passing in blk_mq_tags, to
make life considerably easier for scsi-mq. From me.
- A schedule-while-atomic fix from Ming Lei, which would hit if the
tag space was exhausted.
- Missing __percpu annotation in one place in blk-mq. Found by the
magic Wu compile bot due to code being moved around by the previous
patch, but it's actually an older issue. From Ming Lei.
- Clearing of tag of a flush request at end_io time. From Ming Lei"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: mq flush: clear flush_rq's tag in flush_end_io()
blk-mq: let blk_mq_tag_to_rq() take blk_mq_tags as the main parameter
blk-mq: fix regression from commit 624dbe4754
blk-mq: handle NULL req return from blk_map_request in single queue mode
blk-mq: fix sparse warning on missed __percpu annotation
blk-mq: fix schedule from atomic context
blk-mq: move blk_mq_get_ctx/blk_mq_put_ctx to mq private header
Pull media updates from Mauro Carvalho Chehab:
"This contains:
- a new frontend/tuner driver set for si2168 and sa2157
- Videobuf 2 core now supports DVB too
- A new gspca sub-driver (dtcs033)
- saa7134 is now converted to use videobuf2
- add support for 4K timings
- several other driver fixes and improvements
PS. This pull request is shorter than usual, partly because I have
some other patches on topic branches that I'll be sending you later
this week"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (286 commits)
[media] au0828-dvb: restore its permission to 644
[media] xc5000: delay tuner sleep to 5 seconds
[media] xc5000: Don't use whitespace before tabs
[media] xc5000: fix CamelCase
[media] xc5000: Don't wrap msleep()
[media] xc5000: get rid of positive error codes
[media] au0828: reset streaming when a new frequency is set
[media] au0828: Improve debug messages for urb_completion
[media] au0828: Cancel stream-restart operation if frontend is disconnected
[media] dib0700: fix RC support on Hauppauge Nova-TD
[media] USB: as102_usb_drv.c: Remove useless return variables
[media] v4l: Fix documentation of V4L2_PIX_FMT_H264_MVC and VP8 pixel formats
[media] m5mols: Replace missing header
[media] staging: lirc: Fix sparse warnings
[media] fix mceusb endpoint type identification/handling
[media] az6027: Added the PID for a new revision of the Elgato EyeTV Sat DVB-S Tuner
[media] DocBook media: fix typo
[media] adv7604: Add missing include to linux/types.h
[media] v4l: Validate fields in the core code for subdev EDID ioctls
[media] v4l: Add support for DV timings ioctls on subdev nodes
...
- Another round of clean-up of FDT related code in architecture code.
This removes knowledge of internal FDT details from most architectures
except powerpc.
- Conversion of kernel's custom FDT parsing code to use libfdt.
- DT based initialization for generic serial earlycon. The introduction
of generic serial earlycon support went in thru tty tree.
- Improve the platform device naming for DT probed devices to ensure
unique naming and use parent names instead of a global index.
- Fix a race condition in of_update_property.
- Unify the various linker section OF match tables and fix several
function prototype errors.
- Update platform_get_irq_byname to work in deferred probe cases.
- 2 binding doc updates
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTjzgyAAoJEMhvYp4jgsXiFsUH/1PMTGo8CyD62VQD5ZKdAoW+
Fq6vCiRQ8assF5i5ZLcW1DqhjtoRaCKYhVbRKa5lj7cZdjlSpacI/qQPrF5Br2Ii
bTE3Ff/AQwipQaz/Bj7HqJCgGwfWK8xdfgW0abKsyXMWDN86Bov/zzeu8apmws0x
H1XjJRgnc/rzM4m9ny6+lss0iq6YL54SuTYNzHR33+Ywxls69SfHXIhCW0KpZcBl
5U3YUOomt40GfO46sxFA4xApAhypEK4oVq7asyiA2ArTZ/c2Pkc9p5CBqzhDLmlq
yioWTwHIISv0q+yMLCuQrVGIsbUDkQyy7RQ15z6U+/e/iGO/M+j3A5yxMc3qOi4=
=Onff
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux into next
Pull DeviceTree updates from Rob Herring:
- Another round of clean-up of FDT related code in architecture code.
This removes knowledge of internal FDT details from most
architectures except powerpc.
- Conversion of kernel's custom FDT parsing code to use libfdt.
- DT based initialization for generic serial earlycon. The
introduction of generic serial earlycon support went in through the
tty tree.
- Improve the platform device naming for DT probed devices to ensure
unique naming and use parent names instead of a global index.
- Fix a race condition in of_update_property.
- Unify the various linker section OF match tables and fix several
function prototype errors.
- Update platform_get_irq_byname to work in deferred probe cases.
- 2 binding doc updates
* tag 'devicetree-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (58 commits)
of: handle NULL node in next_child iterators
of/irq: provide more wrappers for !CONFIG_OF
devicetree: bindings: Document micrel vendor prefix
dt: bindings: dwc2: fix required value for the phy-names property
of_pci_irq: kill useless variable in of_irq_parse_pci()
of/irq: do irq resolution in platform_get_irq_byname()
of: Add a testcase for of_find_node_by_path()
of: Make of_find_node_by_path() handle /aliases
of: Create unlocked version of for_each_child_of_node()
lib: add glibc style strchrnul() variant
of: Handle memory@0 node on PPC32 only
pci/of: Remove dead code
of: fix race between search and remove in of_update_property()
of: Use NULL for pointers
of: Stop naming platform_device using dcr address
of: Ensure unique names without sacrificing determinism
tty/serial: pl011: add DT based earlycon support
of/fdt: add FDT serial scanning for earlycon
of/fdt: add FDT address translation support
serial: earlycon: add DT support
...
Pull percpu fix from Tejun Heo:
"It is very late but this is an important percpu-refcount fix from
Sebastian Ott.
The problem is that percpu_ref_*() used __this_cpu_*() instead of
this_cpu_*(). The difference between the two is that the latter is
atomic on the local cpu while the former is not. this_cpu_inc() is
guaranteed to increment the percpu counter on the cpu that the
operation is executed on without any synchronization; however,
__this_cpu_inc() doesn't and if the local cpu invokes the function
from different contexts (e.g. process and irq) of the same CPU, it's
not guaranteed to actually increment as it may be implemented as rmw.
This bug existed from the get-go but it hasn't been noticed earlier
probably because on x86 __this_cpu_inc() is equivalent to
this_cpu_inc() as both get translated into single instruction;
however, s390 uses the generic rmw implementation and gets affected by
the bug. Kudos to Sebastian and Heiko for diagnosing it.
The change is very low risk and fixes a critical issue on the affected
architectures, so I think it's a good candidate for inclusion although
it's very late in the devel cycle. On the other hand, this has been
broken since v3.11, so backporting it through -stable post -rc1 won't
be the end of the world.
I'll ping Christoph whether __this_cpu_*() ops can be better annotated
so that it can trigger lockdep warning when used from multiple
contexts"
* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu-refcount: fix usage of this_cpu_ops
Pull percpu/for-3.15-fixes into percpu/for-3.16 to receive
0c36b390a5 ("percpu-refcount: fix usage of this_cpu_ops").
The merge doesn't produce any conflict but the automatic merge is
still incorrect because 4fb6e25049 ("percpu-refcount: implement
percpu_ref_tryget()") added another use of __this_cpu_inc() which
should also be converted to this_cpu_ince().
This commit pulls in percpu/for-3.15-fixes and converts the newly
added __this_cpu_inc() to this_cpu_inc().
Signed-off-by: Tejun Heo <tj@kernel.org>
We currently pass in the hardware queue, and get the tags from there.
But from scsi-mq, with a shared tag space, it's a lot more convenient
to pass in the blk_mq_tags instead as the hardware queue isn't always
directly available. So instead of having to re-map to a given
hardware queue from rq->mq_ctx, just pass in the tags structure.
Signed-off-by: Jens Axboe <axboe@fb.com>
The percpu-refcount infrastructure uses the underscore variants of
this_cpu_ops in order to modify percpu reference counters.
(e.g. __this_cpu_inc()).
However the underscore variants do not atomically update the percpu
variable, instead they may be implemented using read-modify-write
semantics (more than one instruction). Therefore it is only safe to
use the underscore variant if the context is always the same (process,
softirq, or hardirq). Otherwise it is possible to lose updates.
This problem is something that Sebastian has seen within the aio
subsystem which uses percpu refcounters both in process and softirq
context leading to reference counts that never dropped to zeroes; even
though the number of "get" and "put" calls matched.
Fix this by using the non-underscore this_cpu_ops variant which
provides correct per cpu atomic semantics and fixes the corrupted
reference counts.
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: <stable@vger.kernel.org> # v3.11+
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
References: http://lkml.kernel.org/g/alpine.LFD.2.11.1406041540520.21183@denkbrett
At this time, majority of changes come from ASoC world while we got a
few new drivers in other places for FireWire and USB. There have been
lots of ASoC core cleanups / refactoring, but very little visible to
external users.
ASoC
- Support for specifying aux CODECs in DT
- Removal of the deprecated mux and enum macros
- More moves towards full componentisation
- Removal of some unused I/O code
- Lots of cleanups, fixes and enhancements to the davinci, Freescale,
Haswell and Realtek drivers
- Several drivers exposed directly in Kconfig for use with simple-card
- GPIO descriptor support for jacks
- More updates and fixes to the Freescale SSI, Intel and rsnd drivers
- New drivers for Cirrus CS42L56, Realtek RT5639, RT5642 and RT5651 and
ST STA350, Analog Devices ADAU1361, ADAU1381, ADAU1761 and ADAU1781,
and Realtek RT5677
HD-audio:
- Clean up Dell headset quirks
- Noise fixes for Dell and Sony laptops
- Thinkpad T440 dock fix
- Realtek codec updates (ALC293,ALC233,ALC3235)
- Tegra HD-audio HDMI support
FireWire-audio:
- FireWire audio stack enhancement (AMDTP, MIDI), support for incoming
isochronous stream and duplex streams with timestamp synchronization
- BeBoB-based devices support
- Fireworks-based device support
USB-audio:
- Behringer BCD2000 USB device support
Misc:
- Clean up of a few old drivers, atmel, fm801, etc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJTjzW4AAoJEGwxgFQ9KSmkUrMP/1z43Kp+F9Y0v0VBH6oR/d4N
l9IyxBno/ABxfWloGFnRLEyzZyj2yG8A7inT0alVXJifHJN4iPOKBb5dPE9LMRvc
qLhJjMwznAirkuE8Wsk+IAoKuyXEI4m+KKEIXt5WJ3UyAo/j1lySZVMChzcTFFk/
oc2C6CciYrQLziaaL/K5zD9v9XdDr9koOaSHK/xjUOCbDlEBJu6T2IvRI/tkqJmy
8oRRhRteXZ9D959+ftntKrFVf10APQ4ZQbsX/pHboduaoozYAJSJGFhQNbh/UZnb
zwwwanNZvLwzn+rRXJJuzHF4jra34CuQFL2awsDP9Wck9E3YLmt4audNQ6LM6J8z
IVZs5IjMIL1ey1T2oRczLnv7EoDp0xdP38GqXnQ88j3zd+Ifi77idNw1ssU1aZ5B
LzEFEytT1UbEUkqom9qtIG+GId9hSmVmHQuLsc6Ayg7md0oBeJnBC05Xt5FATdrp
HseHYfSrNNDBFKyj8+j0TVtHc9Xf4SKziSVWz/PT0gaROzOsR2e46HC2Hvut+OFZ
rLLPXn9up5viQFxOTbO7sdYGCYa/iVH7IwB2oCP6Z5/I8+fhsU7aA4Hl+0wBikin
PDSwuchmRlNpHJ18YDonjzFtWA51wG4IlcNbQY4ywO/jFae06KYxQPTwvmJI0+oV
GXyKtjdBnQg8nnWJlS8J
=nxFA
-----END PGP SIGNATURE-----
Merge tag 'sound-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound into next
Pull sound updates from Takashi Iwai:
"At this time, majority of changes come from ASoC world while we got a
few new drivers in other places for FireWire and USB. There have been
lots of ASoC core cleanups / refactoring, but very little visible to
external users.
ASoC:
- Support for specifying aux CODECs in DT
- Removal of the deprecated mux and enum macros
- More moves towards full componentisation
- Removal of some unused I/O code
- Lots of cleanups, fixes and enhancements to the davinci, Freescale,
Haswell and Realtek drivers
- Several drivers exposed directly in Kconfig for use with
simple-card
- GPIO descriptor support for jacks
- More updates and fixes to the Freescale SSI, Intel and rsnd drivers
- New drivers for Cirrus CS42L56, Realtek RT5639, RT5642 and RT5651
and ST STA350, Analog Devices ADAU1361, ADAU1381, ADAU1761 and
ADAU1781, and Realtek RT5677
HD-audio:
- Clean up Dell headset quirks
- Noise fixes for Dell and Sony laptops
- Thinkpad T440 dock fix
- Realtek codec updates (ALC293,ALC233,ALC3235)
- Tegra HD-audio HDMI support
FireWire-audio:
- FireWire audio stack enhancement (AMDTP, MIDI), support for
incoming isochronous stream and duplex streams with timestamp
synchronization
- BeBoB-based devices support
- Fireworks-based device support
USB-audio:
- Behringer BCD2000 USB device support
Misc:
- Clean up of a few old drivers, atmel, fm801, etc"
* tag 'sound-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (480 commits)
ASoC: Fix wrong argument for card remove callbacks
ASoC: free jack GPIOs before the sound card is freed
ALSA: firewire-lib: Remove a comment about restriction of asynchronous operation
ASoC: cache: Fix error code when not using ASoC level cache
ALSA: hda/realtek - Fix COEF widget NID for ALC260 replacer fixup
ALSA: hda/realtek - Correction of fixup codes for PB V7900 laptop
ALSA: firewire-lib: Use IEC 61883-6 compliant labels for Raw Audio data
ASoC: add RT5677 CODEC driver
ASoC: intel: The Baytrail/MAX98090 driver depends on I2C
ASoC: rt5640: Add the function "get_clk_info" to RL6231 shared support
ASoC: rt5640: Add the function of the PLL clock calculation to RL6231 shared support
ASoC: rt5640: Add RL6231 class device shared support for RT5640, RT5645 and RT5651
ASoC: cache: Fix possible ZERO_SIZE_PTR pointer dereferencing error.
ASoC: Add helper functions to cast from DAPM context to CODEC/platform
ALSA: bebob: sizeof() vs ARRAY_SIZE() typo
ASoC: wm9713: correct mono out PGA sources
ALSA: synth: emux: soundfont.c: Cleaning up memory leak
ASoC: fsl: Remove dependencies of boards for SND_SOC_EUKREA_TLV320
ASoC: fsl-ssi: Use regmap
ASoC: fsl-ssi: reorder and document fsl_ssi_private
...
Mainly fixes and small improvements. The biggest change seems to be backlight
control support for mx3fb.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTju0rAAoJEPo9qoy8lh71hsEP/03g4O1AmV0j+dDhemaQXZTg
OVTMPhjWF77RRApp/aneX48/+F1qaQyVbdyBWrlVFlRhPOCvkDC/JCU2PhqI9399
wRVOeL01q2IJHzh3S7jvQqF79WejO3VOfK/LbkztVnJMVHw8h5uaGEqHrOu4IpdU
tsLeSrNPsfJd0e1bmTAXOWft+I5hekRmLqbWbGzV6l1TElLo0tJjN3sbUUe7wTQF
gsS3svnH8LAS0njpcuskWRNz9XpMkIJl/CGaB5kRqvswPbeFPgzwHXsH4N7LFNFm
xyw/xgRTOWGhbKAdzylp1Ko15S36iVTuWzwi0KJ4CNd6f1LXlPvznR6a+m722C1K
GuCIbLH6vqNW/EAxmpMYyvMlEYIrgu4fuFh0+V6dEv3zPFY/1XWjx3Un7mhfryXi
zmDeZfMpkDyTMHlzGpuCaC5jp92LqEN+Ys1AWGJFwpUSMQcP3GoyPh4tLnC2C+/t
hf9TCQ4V2C7ongLIImiI2plhgAE173L2ZkIJjgcVYOO/DP+Eq7fI0dTFy6E49Mwk
Zwc7MlYg50/eoRkaZ7gLvaVuzoPr+EoMdKXI/p+5/4rIpomZQ9Ndu2hswvzjd9WY
Hf0fT6NJQ+wLPl3MtACwsdxSnq0ige1sv5awVUAWvP436mFExtbuXnM/B1NnwECI
TcRqEYMINo3u8cbhHUVW
=Gfjc
-----END PGP SIGNATURE-----
Merge tag 'fbdev-main-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux into next
Pull main fbdev changes from Tomi Valkeinen:
"Mainly fixes and small improvements. The biggest change seems to be
backlight control support for mx3fb"
* tag 'fbdev-main-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (31 commits)
drivers/video/fbdev/fb-puv3.c: Add header files for function unifb_mmap
video: fbdev: s3fb.c: Fix for possible null pointer dereference
video: fbdev: grvga.c: Fix for possible null pointer dereference
matroxfb: perform a dummy read of M_STATUS
video: of: display_timing: fix default native-mode setting
video: delete unneeded call to platform_get_drvdata
video: mx3fb: Add backlight control support
video: omap: delete support for early fbmem allocation
video: of: display_timing: remove two unsafe error messages
fbdev: fbmem: remove positive test on unsigned values
fbcon: Fix memory leak in con2fb_release_oldinfo()
video: Kconfig: Add a dependency to the Goldfish framebuffer driver
video: exynos: Add a dependency to the menu
video: mx3fb: Use devm_kzalloc
video/nuc900: allow modular build
video: atmel needs FB_BACKLIGHT
video: export fb_prepare_logo
video/mbx: fix building debugfs support
video/omap: fix modular build
video: clarify I2C dependencies
...
- ACPICA update to upstream version 20140424. That includes a
number of fixes and improvements related to things like GPE
handling, table loading, headers, memory mapping and unmapping,
DSDT/SSDT overriding, and the Unload() operator. The acpidump
utility from upstream ACPICA is included too. From Bob Moore,
Lv Zheng, David Box, David Binderman, and Colin Ian King.
- Fixes and cleanups related to ACPI video and backlight interfaces
from Hans de Goede. That includes blacklist entries for some new
machines and using native backlight by default.
- ACPI device enumeration changes to create platform devices
rather than PNP devices for ACPI device objects with _HID by
default. PNP devices will still be created for the ACPI device
object with device IDs corresponding to real PNP devices, so
that change should not break things left and right, and we're
expecting to see more and more ACPI-enumerated platform devices
in the future. From Zhang Rui and Rafael J Wysocki.
- Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing
it to handle system suspend/resume on Asus T100 correctly.
From Heikki Krogerus and Rafael J Wysocki.
- PM core update introducing a mechanism to allow runtime-suspended
devices to stay suspended over system suspend/resume transitions
if certain additional conditions related to coordination within
device hierarchy are met. Related PM documentation update and
ACPI PM domain support for the new feature. From Rafael J Wysocki.
- Fixes and improvements related to the "freeze" sleep state. They
affect several places including cpuidle, PM core, ACPI core, and
the ACPI battery driver. From Rafael J Wysocki and Zhang Rui.
- Miscellaneous fixes and updates of the ACPI core from Aaron Lu,
Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki.
- Fixes and cleanups for the ACPI processor and ACPI PAD (Processor
Aggregator Device) drivers from Baoquan He, Manuel Schölling,
Tony Camuso, and Toshi Kani.
- System suspend/resume optimization in the ACPI battery driver from
Lan Tianyu.
- OPP (Operating Performance Points) subsystem updates from
Chander Kashyap, Mark Brown, and Nishanth Menon.
- cpufreq core fixes, updates and cleanups from Srivatsa S Bhat,
Stratos Karafotis, and Viresh Kumar.
- Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q,
s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris,
Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and
Viresh Kumar.
- intel_pstate driver fixes and cleanups from Dirk Brandewie,
Doug Smythies, and Stratos Karafotis.
- Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown.
- Fix for the cpuidle menu governor from Chander Kashyap.
- New ARM clps711x cpuidle driver from Alexander Shiyan.
- Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter,
Fabian Frederick, Pali Rohár, and Sebastian Capella.
- Intel RAPL (Running Average Power Limit) driver updates from
Jacob Pan.
- PNP subsystem updates from Bjorn Helgaas and Fabian Frederick.
- devfreq core updates from Chanwoo Choi and Paul Bolle.
- devfreq updates for exynos4 and exynos5 from Chanwoo Choi and
Bartlomiej Zolnierkiewicz.
- turbostat tool fix from Jean Delvare.
- cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra
and Thomas Renninger.
- New ACPI ec_access.c tool for poking at the EC in a safe way
from Thomas Renninger.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTjl16AAoJEILEb/54YlRxeKgP/RRQSV7lFtf582Dw/5M/iWOg
qYeNtuYFLArEmJ7SpxHdKsU1ZRm3CahAS1j7grvQMQasUxTzoavMcSBNZefeaoNK
d01LVNqcyKCZs3+izRezk5N1IY+AjdrOcqCdIk8rfgFnc6kOttYUrVcIzKuIKAvJ
MsJ5s/uqP8G69FsAA3Ttdtr0HKiQhN4skSt424wntQRDeJNZPBs74mPKBGh8bxlO
Zr/VCDibKQ2Z8jS7x+TzwZrOxgE1/9x0Cub6GAdTvAfS8A+utPwSkneUyopNqpQ+
tJ5rz5R+HpmPMerizBuU+5s+tvjDPtH4/OZvOPSpYraQSFLOwx3hAm+a5k7fOGmc
XWjXnXWT0i0V3iQkwrspTNjX1RgywbsHbmXrcWn192HResvMQ9zk2gH2ch6m8JhN
yTV5V51dOZicpPuaTCvIkJpsV33p6vRz+EdPBiXoEdua5KKtOg8EnQ470dNaMR92
3ZtWmIvSgGlyPyHlSHLfGXbPUwTYvDNV3aheIoXp9E6WY3WJN9J3WXm4EHKBNVaI
H83kwuk1s92cgqh22H5Pcb0CmDcrbkUdP6hhsPS/aL80/EJMljRP2AYW1Y+l1LAf
pzMLmekHFqQEDjFQltwGvFV/EjFeMHnqOgQONx9ygMaayCGGTYSDx3FbRDesf8t9
qhoFcTPSxoo0XjrGrR6b
=tpdF
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm into next
Pull ACPI and power management updates from Rafael Wysocki:
"ACPICA is the leader this time (63 commits), followed by cpufreq (28
commits), devfreq (15 commits), system suspend/hibernation (12
commits), ACPI video and ACPI device enumeration (10 commits each).
We have no major new features this time, but there are a few
significant changes of how things work. The most visible one will
probably be that we are now going to create platform devices rather
than PNP devices by default for ACPI device objects with _HID. That
was long overdue and will be really necessary to be able to use the
same drivers for the same hardware blocks on ACPI and DT-based systems
going forward. We're not expecting fallout from this one (as usual),
but it's something to watch nevertheless.
The second change having a chance to be visible is that ACPI video
will now default to using native backlight rather than the ACPI
backlight interface which should generally help systems with broken
Win8 BIOSes. We're hoping that all problems with the native backlight
handling that we had previously have been addressed and we are in a
good enough shape to flip the default, but this change should be easy
enough to revert if need be.
In addition to that, the system suspend core has a new mechanism to
allow runtime-suspended devices to stay suspended throughout system
suspend/resume transitions if some extra conditions are met
(generally, they are related to coordination within device hierarchy).
However, enabling this feature requires cooperation from the bus type
layer and for now it has only been implemented for the ACPI PM domain
(used by ACPI-enumerated platform devices mostly today).
Also, the acpidump utility that was previously shipped as a separate
tool will now be provided by the upstream ACPICA along with the rest
of ACPICA code, which will allow it to be more up to date and better
supported, and we have one new cpuidle driver (ARM clps711x).
The rest is improvements related to certain specific use cases,
cleanups and fixes all over the place.
Specifics:
- ACPICA update to upstream version 20140424. That includes a number
of fixes and improvements related to things like GPE handling,
table loading, headers, memory mapping and unmapping, DSDT/SSDT
overriding, and the Unload() operator. The acpidump utility from
upstream ACPICA is included too. From Bob Moore, Lv Zheng, David
Box, David Binderman, and Colin Ian King.
- Fixes and cleanups related to ACPI video and backlight interfaces
from Hans de Goede. That includes blacklist entries for some new
machines and using native backlight by default.
- ACPI device enumeration changes to create platform devices rather
than PNP devices for ACPI device objects with _HID by default. PNP
devices will still be created for the ACPI device object with
device IDs corresponding to real PNP devices, so that change should
not break things left and right, and we're expecting to see more
and more ACPI-enumerated platform devices in the future. From
Zhang Rui and Rafael J Wysocki.
- Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing it
to handle system suspend/resume on Asus T100 correctly. From
Heikki Krogerus and Rafael J Wysocki.
- PM core update introducing a mechanism to allow runtime-suspended
devices to stay suspended over system suspend/resume transitions if
certain additional conditions related to coordination within device
hierarchy are met. Related PM documentation update and ACPI PM
domain support for the new feature. From Rafael J Wysocki.
- Fixes and improvements related to the "freeze" sleep state. They
affect several places including cpuidle, PM core, ACPI core, and
the ACPI battery driver. From Rafael J Wysocki and Zhang Rui.
- Miscellaneous fixes and updates of the ACPI core from Aaron Lu,
Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki.
- Fixes and cleanups for the ACPI processor and ACPI PAD (Processor
Aggregator Device) drivers from Baoquan He, Manuel Schölling, Tony
Camuso, and Toshi Kani.
- System suspend/resume optimization in the ACPI battery driver from
Lan Tianyu.
- OPP (Operating Performance Points) subsystem updates from Chander
Kashyap, Mark Brown, and Nishanth Menon.
- cpufreq core fixes, updates and cleanups from Srivatsa S Bhat,
Stratos Karafotis, and Viresh Kumar.
- Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q,
s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris,
Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and
Viresh Kumar.
- intel_pstate driver fixes and cleanups from Dirk Brandewie, Doug
Smythies, and Stratos Karafotis.
- Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown.
- Fix for the cpuidle menu governor from Chander Kashyap.
- New ARM clps711x cpuidle driver from Alexander Shiyan.
- Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter,
Fabian Frederick, Pali Rohár, and Sebastian Capella.
- Intel RAPL (Running Average Power Limit) driver updates from Jacob
Pan.
- PNP subsystem updates from Bjorn Helgaas and Fabian Frederick.
- devfreq core updates from Chanwoo Choi and Paul Bolle.
- devfreq updates for exynos4 and exynos5 from Chanwoo Choi and
Bartlomiej Zolnierkiewicz.
- turbostat tool fix from Jean Delvare.
- cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra
and Thomas Renninger.
- New ACPI ec_access.c tool for poking at the EC in a safe way from
Thomas Renninger"
* tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (187 commits)
ACPICA: Namespace: Remove _PRP method support.
intel_pstate: Improve initial busy calculation
intel_pstate: add sample time scaling
intel_pstate: Correct rounding in busy calculation
intel_pstate: Remove C0 tracking
PM / hibernate: fixed typo in comment
ACPI: Fix x86 regression related to early mapping size limitation
ACPICA: Tables: Add mechanism to control early table checksum verification.
ACPI / scan: use platform bus type by default for _HID enumeration
ACPI / scan: always register ACPI LPSS scan handler
ACPI / scan: always register memory hotplug scan handler
ACPI / scan: always register container scan handler
ACPI / scan: Change the meaning of missing .attach() in scan handlers
ACPI / scan: introduce platform_id device PNP type flag
ACPI / scan: drop unsupported serial IDs from PNP ACPI scan handler ID list
ACPI / scan: drop IDs that do not comply with the ACPI PNP ID rule
ACPI / PNP: use device ID list for PNPACPI device enumeration
ACPI / scan: .match() callback for ACPI scan handlers
ACPI / battery: wakeup the system only when necessary
power_supply: allow power supply devices registered w/o wakeup source
...
Pull HID patches from Jiri Kosina:
- RMI driver for Synaptics touchpads, by Benjamin Tissoires, Andrew
Duggan and Jiri Kosina
- cleanup of hid-sony driver and improved support for Sixaxis and
Dualshock 4, by Frank Praznik
- other usual small fixes and support for new device IDs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (29 commits)
HID: thingm: thingm_fwinfo[] doesn't need to be global
HID: core: add two new usages for digitizer
HID: hid-sensor-hub: new device id and quirk for STM Sensor hub
HID: usbhid: enable NO_INIT_REPORTS quirk for Semico USB Keykoard
HID: hid-sensor-hub: Set report quirk for Microsoft Surface
HID: debug: add labels for HID Sensor Usages
HID: uhid: Use kmemdup instead of kmalloc + memcpy
HID: rmi: do not handle touchscreens through hid-rmi
HID: quirk for Saitek RAT7 and MMO7 mices' mode button
HID: core: fix validation of report id 0
HID: rmi: fix masks for x and w_x data
HID: rmi: fix wrong struct field name
HID: rmi: do not fetch more than 16 bytes in a query
HID: rmi: check for the existence of some optional queries before reading query 12
HID: i2c-hid: hid report descriptor retrieval changes
HID: add missing hid usages
HID: hid-sony - allow 3rd party INTEC controller to turn off all leds
HID: sony: Add blink support to the Sixaxis and DualShock 4 LEDs
HID: sony: Initialize the controller LEDs with a device ID value
HID: sony: Use the controller Bluetooth MAC address as the unique value in the battery name string
...
Pull trivial tree changes from Jiri Kosina:
"Usual pile of patches from trivial tree that make the world go round"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
staging: go7007: remove reference to CONFIG_KMOD
aic7xxx: Remove obsolete preprocessor define
of: dma: doc fixes
doc: fix incorrect formula to calculate CommitLimit value
doc: Note need of bc in the kernel build from 3.10 onwards
mm: Fix printk typo in dmapool.c
modpost: Fix comment typo "Modules.symvers"
Kconfig.debug: Grammar s/addition/additional/
wimax: Spelling s/than/that/, wording s/destinatary/recipient/
aic7xxx: Spelling s/termnation/termination/
arm64: mm: Remove superfluous "the" in comment
of: Spelling s/anonymouns/anonymous/
dma: imx-sdma: Spelling s/determnine/determine/
ath10k: Improve grammar in comments
ath6kl: Spelling s/determnine/determine/
of: Improve grammar for of_alias_get_id() documentation
drm/exynos: Spelling s/contro/control/
radio-bcm2048.c: fix wrong overflow check
doc: printk-formats: do not mention casts for u64/s64
doc: spelling error changes
...
was a pretty active cycle for KVM. Changes include:
- a lot of s390 changes: optimizations, support for migration,
GDB support and more
- ARM changes are pretty small: support for the PSCI 0.2 hypercall
interface on both the guest and the host (the latter acked by Catalin)
- initial POWER8 and little-endian host support
- support for running u-boot on embedded POWER targets
- pretty large changes to MIPS too, completing the userspace interface
and improving the handling of virtualized timer hardware
- for x86, a larger set of changes is scheduled for 3.17. Still,
we have a few emulator bugfixes and support for running nested
fully-virtualized Xen guests (para-virtualized Xen guests have
always worked). And some optimizations too.
The only missing architecture here is ia64. It's not a coincidence
that support for KVM on ia64 is scheduled for removal in 3.17.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJTjtlBAAoJEBvWZb6bTYbyMOUP/2NAePghE3IjG99ikHFdn+BX
BfrURsuR6GD0AhYQnBidBmpFbAmN/LwSJxv/M7sV7OBRWLu3qbt69DrPTU2e/FK1
j9q25peu8jRyHzJ1q9rBroo74nD9lQYuVr3uXNxxcg0DRnw14JHGlM3y8LDEknO8
W+gpWTeAQ+2AuOX98MpRbCRMuzziCSv5bP5FhBVnsWHiZfvMbcUrbeJt+zYSiDAZ
0tHm/5dFKzfj/vVrrnjD4EZcRr688Bs5rztG96hY6aoVJryjZGLtLp92wCWkRRmH
CCvZwd245NmNthuKHzcs27/duSWfU0uOlu7AMrD44QYhzeDGyB/2nbCxbGqLLoBA
nnOviXH4cC65/CnisZ79zfo979HbZcX+Lzg747EjBgCSxJmLlwgiG8yXtDvk5otB
TH6GUeGDiEEPj//JD3XtgSz0sF2NvjREWRyemjDMvhz6JC/bLytXKb3sn+NXSj8m
ujzF9eQoa4qKDcBL4IQYGTJ4z5nY3Pd68dHFIPHB7n82OxFLSQUBKxXw8/1fb5og
VVb8PL4GOcmakQlAKtTMlFPmuy4bbL2r/2iV5xJiOZKmXIu8Hs1JezBE3SFAltbl
3cAGwSM9/dDkKxUbTFblyOE9bkKbg4WYmq0LkdzsPEomb3IZWntOT25rYnX+LrBz
bAknaZpPiOrW11Et1htY
=j5Od
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next
Pull KVM updates from Paolo Bonzini:
"At over 200 commits, covering almost all supported architectures, this
was a pretty active cycle for KVM. Changes include:
- a lot of s390 changes: optimizations, support for migration, GDB
support and more
- ARM changes are pretty small: support for the PSCI 0.2 hypercall
interface on both the guest and the host (the latter acked by
Catalin)
- initial POWER8 and little-endian host support
- support for running u-boot on embedded POWER targets
- pretty large changes to MIPS too, completing the userspace
interface and improving the handling of virtualized timer hardware
- for x86, a larger set of changes is scheduled for 3.17. Still, we
have a few emulator bugfixes and support for running nested
fully-virtualized Xen guests (para-virtualized Xen guests have
always worked). And some optimizations too.
The only missing architecture here is ia64. It's not a coincidence
that support for KVM on ia64 is scheduled for removal in 3.17"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits)
KVM: add missing cleanup_srcu_struct
KVM: PPC: Book3S PR: Rework SLB switching code
KVM: PPC: Book3S PR: Use SLB entry 0
KVM: PPC: Book3S HV: Fix machine check delivery to guest
KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
KVM: PPC: Book3S HV: Fix dirty map for hugepages
KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
KVM: PPC: Book3S: Add ONE_REG register names that were missed
KVM: PPC: Add CAP to indicate hcall fixes
KVM: PPC: MPIC: Reset IRQ source private members
KVM: PPC: Graciously fail broken LE hypercalls
PPC: ePAPR: Fix hypercall on LE guest
KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler
KVM: PPC: BOOK3S: Always use the saved DAR value
PPC: KVM: Make NX bit available with magic page
KVM: PPC: Disable NX for old magic page using guests
KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
...
There is no need for code other than DM core to use dm_set_device_limits
so remove its EXPORT_SYMBOL_GPL. Also, cleanup a couple whitespace nits.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
I would like to use one of the RPC client's congestion algorithm
constants in transport-specific code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The pci-rcar driver is enabled for compile tests, and this has
now shown that the driver cannot build without CONFIG_OF,
following the inclusion of f8f2fe7355 "PCI: rcar: Use new OF
interrupt mapping when possible":
drivers/built-in.o: In function `rcar_pci_map_irq':
:(.text+0x1cc7c): undefined reference to `of_irq_parse_and_map_pci'
pci/host/pcie-rcar.c: In function 'pci_dma_range_parser_init':
pci/host/pcie-rcar.c:875:2: error: implicit declaration of function 'of_n_addr_cells' [-Werror=implicit-function-declaration]
As pointed out by Ben Dooks and Geert Uytterhoeven, this is actually
supposed to build fine, which we can achieve if we make the
declaration of of_irq_parse_and_map_pci conditional on CONFIG_OF
and provide an empty inline function otherwise, as we do for
a lot of other of interfaces.
This lets us build the rcar_pci driver again without CONFIG_OF
for build testing. All platforms using this driver select OF,
so this doesn't change anything for the users.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: devicetree@vger.kernel.org
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Magnus Damm <damm@opensource.se>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: linux-pci@vger.kernel.org
Cc: linux-sh@vger.kernel.org
[robh: drop wrappers for of_n_addr_cells and of_n_size_cells which are
low-level functions that should not be used for !OF]
Signed-off-by: Rob Herring <robh@kernel.org>
Conflicts:
include/net/inetpeer.h
net/ipv6/output_core.c
Changes in net were fixing bugs in code removed in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
f2fs's cp has one page which consists of struct f2fs_checkpoint and
version bitmap of sit and nat. To support lots of segments, we need more
blocks for sit bitmap. So let's arrange sit bitmap as following:
+-----------------+------------+
| f2fs_checkpoint | sit bitmap |
| + nat bitmap | |
+-----------------+------------+
0 4k N blocks
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: simple code change for readability]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When large directory feathure is enable, We have one case which could cause
overflow in dir_buckets() as following:
special case: level + dir_level >= 32 and level < MAX_DIR_HASH_DEPTH / 2.
Here we define MAX_DIR_BUCKETS to limit the return value when the condition
could trigger potential overflow.
Changes from V1
o modify description of calculation in f2fs.txt suggested by Changman Lee.
Suggested-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
'struct blk_mq_ctx' is __percpu, so add the annotation
and fix the sparse warning reported from Fengguang:
[block:for-linus 2/3] block/blk-mq.h:75:16: sparse: incorrect
type in initializer (different address spaces)
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
It's positively immoral to have a global variable called 'io_timeout'.
Keep the module parameter called io_timeout, though.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Pull x86 build cleanups from Ingo Molnar:
"Two small build related cleanups"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Supress realmode.bin is up to date message
compiler-intel.h: Remove duplicate definition
* pm-cpufreq: (28 commits)
cpufreq: handle calls to ->target_index() in separate routine
cpufreq: s5pv210: drop check for CONFIG_PM_VERBOSE
cpufreq: intel_pstate: Remove unused member name of cpudata
cpufreq: Break out early when frequency equals target_freq
cpufreq: Tegra: drop wrapper around tegra_update_cpu_speed()
cpufreq: imx6q: Remove unused include
cpufreq: imx6q: Drop devm_clk/regulator_get usage
cpufreq: powernow-k8: Suppress checkpatch warnings
cpufreq: powernv: make local function static
cpufreq: Enable big.LITTLE cpufreq driver on arm64
cpufreq: nforce2: remove DEFINE_PCI_DEVICE_TABLE macro
intel_pstate: Add CPU IDs for Broadwell processors
cpufreq: Fix build error on some platforms that use cpufreq_for_each_*
PM / OPP: Move cpufreq specific OPP functions out of generic OPP library
PM / OPP: Remove cpufreq wrapper dependency on internal data organization
cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end
intel_pstate: Remove sample parameter in intel_pstate_calc_busy
cpufreq: Kconfig: Fix spelling errors
cpufreq: Make linux-pm@vger.kernel.org official mailing list
cpufreq: exynos: Use dev_err/info function instead of pr_err/info
...
* acpi-video:
ACPI / video: Add 4 new models to the use_native_backlight DMI list
ACPI / video: Add use native backlight quirk for the ThinkPad W530
ACPI / video: Unregister the backlight device if a raw one shows up later
backlight: Add backlight device (un)registration notification
nouveau: Don't check acpi_video_backlight_support() before registering backlight
acer-wmi: Add Aspire 5741 to video_vendor_dmi_table
acer-wmi: Switch to acpi_video_unregister_backlight
ACPI / video: Add an acpi_video_unregister_backlight function
ACPI / video: Don't register acpi_video_resume notifier without backlight devices
ACPI / video: change acpi-video brightness_switch_enabled default to 0
* acpi-enumeration:
ACPI / scan: use platform bus type by default for _HID enumeration
ACPI / scan: always register ACPI LPSS scan handler
ACPI / scan: always register memory hotplug scan handler
ACPI / scan: always register container scan handler
ACPI / scan: Change the meaning of missing .attach() in scan handlers
ACPI / scan: introduce platform_id device PNP type flag
ACPI / scan: drop unsupported serial IDs from PNP ACPI scan handler ID list
ACPI / scan: drop IDs that do not comply with the ACPI PNP ID rule
ACPI / PNP: use device ID list for PNPACPI device enumeration
ACPI / scan: .match() callback for ACPI scan handlers
* acpi-pm:
ACPI / PM: Export rest of the subsys PM callbacks
ACPI / PM: Avoid resuming devices in ACPI PM domain during system suspend
ACPI / PM: Hold ACPI scan lock over the "freeze" sleep state
ACPI / PM: Export acpi_target_system_state() to modules
* acpi-battery:
ACPI / battery: wakeup the system only when necessary
power_supply: allow power supply devices registered w/o wakeup source
ACPI / battery: introduce support for POWER_SUPPLY_PROP_CAPACITY_LEVEL
ACPI / battery: Accelerate battery resume callback
* pm-sleep:
PM / hibernate: fixed typo in comment
PM / sleep: unregister wakeup source when disabling device wakeup
PM / sleep: Introduce command line argument for sleep state enumeration
PM / sleep: Use valid_state() for platform-dependent sleep states only
PM / sleep: Add state field to pm_states[] entries
PM / sleep: Update device PM documentation to cover direct_complete
PM / sleep: Mechanism to avoid resuming runtime-suspended devices unnecessarily
PM / hibernate: Fix memory corruption in resumedelay_setup()
PM / hibernate: convert simple_strtoul to kstrtoul
PM / hibernate: Documentation: Fix script for unswapping
PM / hibernate: no kernel_power_off when pm_power_off NULL
PM / hibernate: use unsigned local variables in swsusp_show_speed()
* pm-cpuidle:
PM / suspend: Always use deepest C-state in the "freeze" sleep state
cpuidle / menu: move repeated correction factor check to init
cpuidle / menu: Return (-1) if there are no suitable states
cpuidle: Combine cpuidle_enabled() with cpuidle_select()
ARM: clps711x: Add cpuidle driver
* acpi-scan:
ACPI / scan: do not scan fixed hardware on HW-reduced platform
* acpi-hotplug:
ACPI: add dynamic_debug support
ACPI / notify: Clean up handling of hotplug events
* acpi-pci:
ACPI / PCI: Stub out pci_acpi_crs_quirks() and make it x86 specific
Pull scheduler updates from Ingo Molnar:
"The main scheduling related changes in this cycle were:
- various sched/numa updates, for better performance
- tree wide cleanup of open coded nice levels
- nohz fix related to rq->nr_running use
- cpuidle changes and continued consolidation to improve the
kernel/sched/idle.c high level idle scheduling logic. As part of
this effort I pulled cpuidle driver changes from Rafael as well.
- standardized idle polling amongst architectures
- continued work on preparing better power/energy aware scheduling
- sched/rt updates
- misc fixlets and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
sched/numa: Decay ->wakee_flips instead of zeroing
sched/numa: Update migrate_improves/degrades_locality()
sched/numa: Allow task switch if load imbalance improves
sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code
sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()
sched: Initialize rq->age_stamp on processor start
sched, nohz: Change rq->nr_running to always use wrappers
sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()
sched: Use clamp() and clamp_val() to make sys_nice() more readable
sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()
sched/numa: Fix initialization of sched_domain_topology for NUMA
sched: Call select_idle_sibling() when not affine_sd
sched: Simplify return logic in sched_read_attr()
sched: Simplify return logic in sched_copy_attr()
sched: Fix exec_start/task_hot on migrated tasks
arm64: Remove TIF_POLLING_NRFLAG
metag: Remove TIF_POLLING_NRFLAG
sched/idle: Make cpuidle_idle_call() void
sched/idle: Reflow cpuidle_idle_call()
sched/idle: Delay clearing the polling bit
...
Pull perf updates from Ingo Molnar:
"The tooling changes maintained by Jiri Olsa until Arnaldo is on
vacation:
User visible changes:
- Add -F option for specifying output fields (Namhyung Kim)
- Propagate exit status of a command line workload for record command
(Namhyung Kim)
- Use tid for finding thread (Namhyung Kim)
- Clarify the output of perf sched map plus small sched command
fixes (Dongsheng Yang)
- Wire up perf_regs and unwind support for ARM64 (Jean Pihet)
- Factor hists statistics counts processing which in turn also fixes
several bugs in TUI report command (Namhyung Kim)
- Add --percentage option to control absolute/relative percentage
output (Namhyung Kim)
- Add --list-cmds to 'kmem', 'mem', 'lock' and 'sched', for use by
completion scripts (Ramkumar Ramachandra)
Development/infrastructure changes and fixes:
- Android related fixes for pager and map dso resolving (Michael
Lentine)
- Add libdw DWARF post unwind support for ARM (Jean Pihet)
- Consolidate types.h for ARM and ARM64 (Jean Pihet)
- Fix possible null pointer dereference in session.c (Masanari Iida)
- Cleanup, remove unused variables in map_switch_event() (Dongsheng
Yang)
- Remove nr_state_machine_bugs in perf latency (Dongsheng Yang)
- Remove usage of trace_sched_wakeup(.success) (Peter Zijlstra)
- Cleanups for perf.h header (Jiri Olsa)
- Consolidate types.h and export.h within tools (Borislav Petkov)
- Move u64_swap union to its single user's header, evsel.h (Borislav
Petkov)
- Fix for s390 to properly parse tracepoints plus test code
(Alexander Yarygin)
- Handle EINTR error for readn/writen (Namhyung Kim)
- Add a test case for hists filtering (Namhyung Kim)
- Share map_groups among threads of the same group (Arnaldo Carvalho
de Melo, Jiri Olsa)
- Making some code (cpu node map and report parse callchain callback)
global to be usable by upcomming changes (Don Zickus)
- Fix pmu object compilation error (Jiri Olsa)
Kernel side changes:
- intrusive uprobes fixes from Oleg Nesterov. Since the interface is
admin-only, and the bug only affects user-space ("any probed
jmp/call can kill the application"), we queued these fixes via the
development tree, as a special exception.
- more fuzzer motivated race fixes and related refactoring and
robustization.
- allow PMU drivers to be built as modules. (No actual module yet,
because the x86 Intel uncore module wasn't ready in time for this)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
perf tools: Add automatic remapping of Android libraries
perf tools: Add cat as fallback pager
perf tests: Add a testcase for histogram output sorting
perf tests: Factor out print_hists_*()
perf tools: Introduce reset_output_field()
perf tools: Get rid of obsolete hist_entry__sort_list
perf hists: Reset width of output fields with header length
perf tools: Skip elided sort entries
perf top: Add --fields option to specify output fields
perf report/tui: Fix a bug when --fields/sort is given
perf tools: Add ->sort() member to struct sort_entry
perf report: Add -F option to specify output fields
perf tools: Call perf_hpp__init() before setting up GUI browsers
perf tools: Consolidate management of default sort orders
perf tools: Allow hpp fields to be sort keys
perf ui: Get rid of callback from __hpp__fmt()
perf tools: Consolidate output field handling to hpp format routines
perf tools: Use hpp formats to sort final output
perf tools: Support event grouping in hpp ->sort()
perf tools: Use hpp formats to sort hist entries
...
Pull RCU changes from Ingo Molnar:
"The main RCU changes in this cycle were:
- RCU torture-test changes.
- variable-name renaming cleanup.
- update RCU documentation.
- miscellaneous fixes.
- patch to suppress RCU stall warnings while sysrq requests are being
processed"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits)
rcu: Provide API to suppress stall warnings while sysrc runs
rcu: Variable name changed in tree_plugin.h and used in tree.c
torture: Remove unused definition
torture: Remove __init from torture_init_begin/end
torture: Check for multiple concurrent torture tests
locktorture: Remove reference to nonexistent Kconfig parameter
rcutorture: Run rcu_torture_writer at normal priority
rcutorture: Note diffs from git commits
rcutorture: Add missing destroy_timer_on_stack()
rcutorture: Explicitly test synchronous grace-period primitives
rcutorture: Add tests for get_state_synchronize_rcu()
rcutorture: Test RCU-sched primitives in TREE_PREEMPT_RCU kernels
torture: Use elapsed time to detect hangs
rcutorture: Check for rcu_torture_fqs creation errors
torture: Better summary diagnostics for build failures
torture: Notice if an all-zero cpumask is passed inside a critical section
rcutorture: Make rcu_torture_reader() use cond_resched()
sched,rcu: Make cond_resched() report RCU quiescent states
percpu: Fix raw_cpu_inc_return()
rcutorture: Export RCU grace-period kthread wait state to rcutorture
...
The bulk of the changes for this release are a few new drivers however
there are a couple of noticable core changes and the usual stream of
cleanups and fixes:
- Move disable of unused regulators later in init so it comes after
deferred probe has iterated making startup smoother.
- Fixes reference counting of the DT nodes for constraints from Charles
Keepax. This has little practical impact since all real users of
the regulator bindings use FDT which doesn't need the reference
counting.
- Lots of cleanups, especially to the Samsung drivers.
- Support for Linear Technologies LTC3589, Texas Instruments TPS658640
and X-Powers AXP20x.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTjaRuAAoJELSic+t+oim96AAP/jNtqLoBLIz6Idwg29Jd3Gj+
X3R+5NfuS0kccvKvIbHmKygwJ6hY9anBsKQ28AH6VUAgcUqrPnWiRyl4iZAyMSQZ
bweQ1QSAIxjAD9J+hMSMU65h3pKh7t6k20dumEUOJZ/ZuPmYj5aks3dy+DKSIkF2
GDG1SVonxheuhUbxEPnXlz5pSnFtERQQMGPdYzk3RxqdeDeVr6Oanlxuo/IKagPP
vm6BFFutEcSk4KpkEQF+iGxIDTzTFGjMe2TzDu1ubxsxcL+5sNh8rkN9rbqQ0fdo
fGMZuo9AXEPXNDwY6ug0Mxoqhg0vrf8NSnEWjM6zQJELKH6H4KMhwAGjstCDkFer
rsuZXeIM1Mk1wrPP6QWLvHVqwS9GUXki82ZgUpDOn61HpRLUu6bKY8eWoq3UM79S
qiRoNSW7eggI0/71IrzumWWE8HEufEJEgGNAqwFPkBywCg+biXjkHlcXdyBMCpqy
kUev0QVJb3FHBnyIdMAVm8hrreejh+frl2Nc+aIEhYZxc3idvVqKl1lvjbObt6PM
FK52n8c/WlQxLZ8+isv0b1+pwT6QHeLzBvBA8mE6+Mn4vLzQKT4nofQU4PSASRSM
UqkaSMhaaJ4WRXlNdhkp721EbRAltrTwzjMS/7lTTZRyN9g9ODOhDnzRT2f3J8F1
JYS22jFTL5ywNR6GFcq0
=JrfM
-----END PGP SIGNATURE-----
Merge tag 'regulator-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into next
Pull regulator updates from Mark Brown:
"The bulk of the changes for this release are a few new drivers however
there are a couple of noticable core changes and the usual stream of
cleanups and fixes:
- move disable of unused regulators later in init so it comes after
deferred probe has iterated making startup smoother.
- fixes to reference counting of the DT nodes for constraints from
Charles Keepax. This has little practical impact since all real
users of the regulator bindings use FDT which doesn't need the
reference counting.
- lots of cleanups, especially to the Samsung drivers.
- support for Linear Technologies LTC3589, Texas Instruments
TPS658640 and X-Powers AXP20x"
* tag 'regulator-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (64 commits)
regulator: pbias: remove unnecessary OOM messages
regulator: max8649: remove unnecessary OOM messages
regulator: core: Fix the init of DT defined fixed regulators
regulator: core: Disable unused regulators after deferred probing is done
regulator: Don't disable unused regulators we don't have permission for
regulator: axp20x: Use regulator_map_voltage_ascend for LDO4
regulator: use of_property_read_{bool|u32}()
regulator: Fix regulator_get_{optional,exclusive}() documentation
regulators: Add definition of regulator_set_voltage_time() for !CONFIG_REGULATOR
regulator: arizona-ldo1: add missing #include
regulator: pfuze100: Support enable/disable for fixed regulator
regulator: ltc3589: Remove ltc3589_list_voltage_fixed function
regulator: ltc3589: Fix module dependency
regulator: tps6586x: Remove unused to_tps6586x_dev() function
regulator: tps65218: Convert to use regulator_set_voltage_time_sel
regulator: tps6586x: Add support for the TPS658640
regulator: tps6586x: Prepare supporting fixed regulators
regulator: pfuze100: Don't allocate an invalid gpio
regulator: pfuze100: Support SWB enable/disable
regulator: fixed: use of_property_read_{bool|u32}()
...
For this release SPI has been exceptionally quiet, all the work has been
on improving drivers (including taking advantage of some of the recent
framework updates):
- DMA support for the rspi driver providing a nice performance boost.
- Performance improvement for the SIRF controller in PIO mode.
- New support for the Cadence SPI IP and for pxa2xx on BayTrail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTjaf+AAoJELSic+t+oim9I4UP/2iWxKkMaSc08DCkY1m+t/TZ
dWw3wLAnQFma1zGDbPRmOdNkl011EZMYNC9NnE6ri9bAV3xgemXWwEx4+X1buAe9
0JVk8292TBmcJVBoIRNX3VQYxcA2QWzhNCVR7g7nr59MDkY2Z52AjJ73lmQz6Kd3
f0vf4kpkG6dGphso3XaDcKy/rORYC/QJ2FGvcz/Qz7dVvFvv5FQfGPKhB7fi4/u9
EXgUArxTrF5s3c0KXAaaDwWd1AuK2JzMj5iqiijzQrsc+9jqRdEZrBFWSKWNpJk6
QpNRqzxtabDdOikDmZH2h1TGik9HRaiJcD84xeUBWICZVCFDm1Va5yp5pltRFoWN
+NC//lzKABZb75dqhY0/i2QvVzhLFmRgakeLRoiYeUSUNSdMVBLY6BCrMYcAbR04
RCigpGL3mDoMlW2TYPheqT9nzL2UMMIopt/1u2+Nax6iCJ0Th9BX3lUCl1fzkyHv
AMdIy07j4Gkdk59TM0/I0zAc41BWhcUMtCKwG2Ya2DRhvSnM4GK9zoap2sv097Zb
IE7MbHH/Ar3/zc938yBqnVTE+GDiwD77t1sVZWwg6nLHKPQR6LGNV4oj4+hvuq3Y
ylRuFwOtNrZ3uqesOS9Sg9W4yVQDoECtzuEo+eJDmJEog7qc9z4+izBpyoeP4YAb
tItBR0ZV8AE/dDsY/wcY
=ZdnG
-----END PGP SIGNATURE-----
Merge tag 'spi-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi into next
Pull spi updates from Mark Brown:
"For this release SPI has been exceptionally quiet, all the work has
been on improving drivers (including taking advantage of some of the
recent framework updates):
- DMA support for the rspi driver providing a nice performance boost
- performance improvement for the SIRF controller in PIO mode
- new support for the Cadence SPI IP and for pxa2xx on BayTrail"
* tag 'spi-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (59 commits)
spi: rspi: Extract rspi_common_transfer()
spi: rspi: Add DMA support for RSPI on RZ/A1H
spi: rspi: Add DMA support for QSPI on R-Car Gen2
spi: rspi: Absorb rspi_rz_transfer_out_in() into rspi_rz_transfer_one()
spi: rspi: Merge rspi_*_dma() into rspi_dma_transfer()
spi: rspi: Pass sg_tables instead of spi_tranfer to rspi_*_dma()
spi: rspi: Move RSPI-specific setup out of DMA routines
spi: rspi: Use SPI core DMA mapping framework
spi: rspi: SPI DMA core needs both RX and TX DMA to function
spi: rspi: Remove unneeded resource test in DMA setup
spi: rspi: Extract rspi_request_dma_chan()
spi: rspi: Don't consider DMA configuration failures fatal
spi: rspi: Extract rspi_pio_transfer()
spi: rspi: Use core SPI_MASTER_MUST_[RT]X handling
spi: rspi: Remove unused 16-bit DMA support
spi: rspi: Do not call rspi_receive_init() for TX-only
spi: rspi: Extract rspi_wait_for_{tx_empty,rx_full}()
spi/pxa2xx: fix runtime PM enabling order
spi/fsl-espi: fix rx_buf in fsl_espi_cmd_trans()/fsl_espi_rw_trans()
spi: core: Ignore unsupported spi-[tr]x-bus-width property values
...
Another fairly quiet release, a few bug fixes and a couple of new
features:
- Support for I2C devices connected to SMBus rather than full I2C
controllers contributed by Boris Brezillon. If the controller is
only capable of SMBus operation the framework will transparently
fall back to that.
- Suport for little endian values, contributed by Xiubo Li.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTjaBoAAoJELSic+t+oim9lz4P/jcFjELCezVmIZVmnz67SJeq
yAJebbSaY2i5foXxxpBjUhUucTN3+7cPanBfzq5S6Go7pwVhzo+BhgK1QBteSg3d
3Sl6LW6DJIpIJDBnZ0JWpg/Q4qu8s0bh/IlFpTXhlLJI08LOrc2/qB/uOZM51cBk
loXKymmMK27JxzUbrHTVDX+GLgwyHOI2f9jG2kJuv9LON4BrqjPzBQZisFxIpxq4
rf/obn2xTWW5eXGavF9geVf9kcHr8Xvvcz+HSVr7VdzuAznp7kovDfMtsdJW5rJG
0q2w60k3qcjdffjdAe4v5EP8qKHKykeJq+JsyLe7uqL2xhxWyZS0E66Gpe1nl/8O
jR0zNzjL6BfwZbNgNA+YksaavnpMQLSHinIzEgur3C25Iq2kXKsCgys+l09qcbws
dNCHu9i30Wew3B1EU6eh1M2F3WKrIAqELbByjbTeZXSr7OxYuKzF96fmg6ta5gjk
4kf6jsWJPYHqxQZXV3VxvKy6y5Au/XapVgJ1HxmAX6D2jC51xQQeNN8ViYtmgCKA
jPup3JRtGT322S7Rlwl0Fg36Whbz/H9HTbJFqclDna/z2dsSfJBVef8QFEazVSFa
BiBlGFgvf3XtTUWsUJJyY8PveqbrgbS8/dkGyMpxiFdKp5Z/ki3Lh5+QPdgYmrgk
h73oV5jRCEtQnOfKW9Jl
=XMnS
-----END PGP SIGNATURE-----
Merge tag 'regmap-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap into next
Pull regmap updates from Mark Brown:
"Another fairly quiet release, a few bug fixes and a couple of new
features:
- support for I2C devices connected to SMBus rather than full I2C
controllers contributed by Boris Brezillon. If the controller is
only capable of SMBus operation the framework will transparently
fall back to that
- suport for little endian values, contributed by Xiubo Li"
* tag 'regmap-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: mmio: Fix regmap_mmio_write for uneven counts
regmap: irq: Fix possible ZERO_SIZE_PTR pointer dereferencing error.
regmap: Add missing initialization of this_page
regmap: Fix possible ZERO_SIZE_PTR pointer dereferencing error.
regmap: i2c: fallback to SMBus if the adapter does not support standard I2C
regmap: add reg_read/reg_write callbacks to regmap_bus struct
regmap: rbtree: improve 64bits memory alignment
regmap: mmio: Fix the bug of 'offset' value parsing.
regmap: implement LE formatting/parsing for 16/32-bit values.
The function dm_accept_partial_bio allows the target to specify how many
sectors of the current bio it will process. If the target only wants to
accept part of the bio, it calls dm_accept_partial_bio and the DM core
sends the rest of the data in next bio.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull first set of s390 updates from Martin Schwidefsky:
"The biggest change in this patchset is conversion from the bootmem
bitmaps to the memblock code. This conversion requires two common
code patches to introduce the 'physmem' memblock list.
We experimented with ticket spinlocks but in the end decided against
them as they perform poorly on virtualized systems. But the spinlock
cleanup and some small improvements are included.
The uaccess code got another optimization, the get_user/put_user calls
are now inline again for kernel compiles targeted at z10 or newer
machines. This makes the text segment shorter and the code gets a
little bit faster.
And as always some bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (31 commits)
s390/lowcore: replace lowcore irb array with a per-cpu variable
s390/lowcore: reserve 96 bytes for IRB in lowcore
s390/facilities: remove extract-cpu-time facility check
s390: require mvcos facility for z10 and newer machines
s390/boot: fix boot of compressed kernel built with gcc 4.9
s390/cio: remove weird assignment during argument evaluation
s390/time: cast tv_nsec to u64 prior to shift in update_vsyscall
s390/oprofile: make return of 0 explicit
s390/spinlock: refactor arch_spin_lock_wait[_flags]
s390/rwlock: add missing local_irq_restore calls
s390/spinlock,rwlock: always to a load-and-test first
s390/cio: fix multiple structure definitions
s390/spinlock: fix system hang with spin_retry <= 0
s390/appldata: add slab.h for kzalloc/kfree
s390/uaccess: provide inline variants of get_user/put_user
s390/pci: add some new arch specific pci attributes
s390/pci: use pdev->dev.groups for attribute creation
s390/pci: use macro for attribute creation
s390/pci: improve state check when processing hotplug events
s390: split TIF bits into CIF, PIF and TIF bits
...
Here is the big USB driver pull request for 3.16-rc1.
Nothing huge here, but lots of little things in the USB core, and in
lots of drivers. Hopefully the USB power management will be work better
now that it has been reworked to do per-port power control dynamically.
There's also a raft of gadget driver updates and fixes, CONFIG_USB_DEBUG
is finally gone now that everything has been converted over to the
dynamic debug inteface, the last hold-out drivers were cleaned up and
the config option removed. There were also other minor things all
through the drivers/usb/ tree, the shortlog shows this pretty well.
All have been in linux-next, including the very last patch, which came
from linux-next to fix a build issue on some platforms.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEABECAAYFAlONYEMACgkQMUfUDdst+ynxvgCggMQBhN5icth8Y5hFglNNaISN
c4AAoMHR2kb62U1plylLbPnboQTjfcl0
=fG6y
-----END PGP SIGNATURE-----
Merge tag 'usb-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb into next
Pull USB driver updates from Greg KH:
"Here is the big USB driver pull request for 3.16-rc1.
Nothing huge here, but lots of little things in the USB core, and in
lots of drivers. Hopefully the USB power management will be work
better now that it has been reworked to do per-port power control
dynamically. There's also a raft of gadget driver updates and fixes,
CONFIG_USB_DEBUG is finally gone now that everything has been
converted over to the dynamic debug inteface, the last hold-out
drivers were cleaned up and the config option removed. There were
also other minor things all through the drivers/usb/ tree, the
shortlog shows this pretty well.
All have been in linux-next, including the very last patch, which came
from linux-next to fix a build issue on some platforms"
* tag 'usb-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (314 commits)
usb: hub_handle_remote_wakeup() only exists for CONFIG_PM=y
USB: orinoco_usb: remove CONFIG_USB_DEBUG support
USB: media: lirc: igorplugusb: remove CONFIG_USB_DEBUG support
USB: media: streamzap: remove CONFIG_USB_DEBUG
USB: media: redrat3: remove CONFIG_USB_DEBUG usage
USB: media: redrat3: remove unneeded tracing macro
usb: qcserial: add additional Sierra Wireless QMI devices
usb: host: max3421-hcd: Use module_spi_driver
usb: host: max3421-hcd: Allow platform-data to specify Vbus polarity
usb: host: max3421-hcd: fix "spi_rd8" uses dynamic stack allocation warning
usb: host: max3421-hcd: Fix missing unlock in max3421_urb_enqueue()
usb: qcserial: add Netgear AirCard 341U
Documentation: dt-bindings: update xhci-platform DT binding for R-Car H2 and M2
usb: host: xhci-plat: add xhci_plat_start()
usb: host: max3421-hcd: Fix potential NULL urb dereference
Revert "usb: gadget: net2280: Add support for PLX USB338X"
USB: usbip: remove CONFIG_USB_DEBUG reference
USB: remove CONFIG_USB_DEBUG from defconfig files
usb: resume child device when port is powered on
usb: hub_handle_remote_wakeup() depends on CONFIG_PM_RUNTIME=y
...
Here is the big tty / serial driver pull request for 3.16-rc1.
A variety of different serial driver fixes and updates and additions,
nothing huge, and no real major core tty changes at all.
All have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEABECAAYFAlONXgoACgkQMUfUDdst+ymdSwCgwL0xmWjFYr/UbJ4LslOZ29Q4
BFQAoKyYe9LsfEyodBPabxJjKUtj1htz
=ZGSN
-----END PGP SIGNATURE-----
Merge tag 'tty-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty into next
Pull tty/serial driver updates from Greg KH:
"Here is the big tty / serial driver pull request for 3.16-rc1.
A variety of different serial driver fixes and updates and additions,
nothing huge, and no real major core tty changes at all.
All have been in linux-next for a while"
* tag 'tty-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (84 commits)
Revert "serial: imx: remove the DMA wait queue"
serial: kgdb_nmi: Improve console integration with KDB I/O
serial: kgdb_nmi: Switch from tasklets to real timers
serial: kgdb_nmi: Use container_of() to locate private data
serial: cpm_uart: No LF conversion in put_poll_char()
serial: sirf: Fix compilation failure
console: Remove superfluous readonly check
console: Use explicit pointer type for vc_uni_pagedir* fields
vgacon: Fix & cleanup refcounting
ARM: tty: Move HVC DCC assembly to arch/arm
tty/hvc/hvc_console: Fix wakeup of HVC thread on hvc_kick()
drivers/tty/n_hdlc.c: replace kmalloc/memset by kzalloc
vt: emulate 8- and 24-bit colour codes.
printk/of_serial: fix serial console cessation part way through boot.
serial: 8250_dma: check the result of TX buffer mapping
serial: uart: add hw flow control support configuration
tty/serial: at91: add interrupts for modem control lines
tty/serial: at91: use mctrl_gpio helpers
tty/serial: Add GPIOLIB helpers for controlling modem lines
ARM: at91: gpio: implement get_direction
...
Here is the big staging driver pull request for 3.16-rc1.
Lots of stuff here, tons of cleanup patches, a few new drivers, and some
removed as well, but I think we are still adding a few thousand more
lines than we remove, due to the new drivers being bigger than the ones
deleted.
One notible bit of work did stand out, Jes Sorensen has gone on a tear,
fixing up a wireless driver to be "more sane" than it originally was
from the vendor, with over 500 patches merged here. Good stuff, and a
number of users laptops are better off for it.
All of this has been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEABECAAYFAlONXKQACgkQMUfUDdst+ynIwgCgq5pPIn+2aewaFK8rrN18xqls
F3YAoNDYeqMpQepvRe50HcjRrgDvsV2n
=VenO
-----END PGP SIGNATURE-----
Merge tag 'staging-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging into next
Pull staging driver updates from Greg KH:
"Here is the big staging driver pull request for 3.16-rc1.
Lots of stuff here, tons of cleanup patches, a few new drivers, and
some removed as well, but I think we are still adding a few thousand
more lines than we remove, due to the new drivers being bigger than
the ones deleted.
One notible bit of work did stand out, Jes Sorensen has gone on a
tear, fixing up a wireless driver to be "more sane" than it originally
was from the vendor, with over 500 patches merged here. Good stuff,
and a number of users laptops are better off for it.
All of this has been in linux-next for a while"
* tag 'staging-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1703 commits)
staging: skein: fix sparse warning for static declarations
staging/mt29f_spinand: coding style fixes
staging: silicom: fix sparse warning for static variable
staging: lustre: Fix coding style
staging: android: binder.c: Use more appropriate functions for euid retrieval
staging: lustre: fix integer as NULL pointer warnings
Revert "staging: dgap: remove unneeded kfree() in dgap_tty_register_ports()"
Staging: rtl8192u: r8192U_wx.c Fixed a misplaced brace
staging: ion: shrink highmem pages on kswapd
staging: ion: use compound pages on high order pages for system heap
staging: ion: remove struct ion_page_pool_item
staging: ion: simplify ion_page_pool_total()
staging: ion: tidy up a bit
staging: rtl8723au: Remove redundant casting in usb_ops_linux.c
staging: rtl8723au: Remove redundant casting in rtl8723a_hal_init.c
staging: rtl8723au: Remove redundant casting in rtw_xmit.c
staging: rtl8723au: Remove redundant casting in rtw_wlan_util.c
staging: rtl8723au: Remove redundant casting in rtw_sta_mgt.c
staging: rtl8723au: Remove redundant casting in rtw_recv.c
staging: rtl8723au: Remove redundant casting in rtw_mlme.c
...
There is still one residue of sysfs remaining: the sb_magic
SYSFS_MAGIC. However this should be kernfs user specific,
so this patch moves it out. Kerrnfs user should specify their
magic number while mouting.
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is the "big" pull request for 3.16-rc1.
Not a lot of changes here, some kernfs work, a revert of a very old
driver core change that ended up cauing some memory leaks on driver
probe error paths, and other minor things.
As was pointed out earlier today, one commit here,
26fc9cd200 (kernfs: move the last
knowledge of sysfs out from kernfs) is also needed in your 3.15-final
branch as well. If you could cherry-pick it there, it would be most
appreciated by Andy Lutomirski to prevent a regression there.
All of these have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEABECAAYFAlONV9YACgkQMUfUDdst+yn0sQCfWWYg1oVXyu6f0uJjYbVBFkpD
UHgAoJxxfwTZJq/xYrnk6+RqUowIsUlh
=ojAS
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into next
Pull driver core / kernfs changes from Greg KH:
"Here is the "big" pull request for 3.16-rc1.
Not a lot of changes here, some kernfs work, a revert of a very old
driver core change that ended up cauing some memory leaks on driver
probe error paths, and other minor things.
As was pointed out earlier today, one commit here, 26fc9cd200
("kernfs: move the last knowledge of sysfs out from kernfs") is also
needed in your 3.15-final branch as well. If you could cherry-pick it
there, it would be most appreciated by Andy Lutomirski to prevent a
regression there.
All of these have been in linux-next for a while"
* tag 'driver-core-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
crypto/nx/nx-842: dev_set_drvdata can no longer fail
kernfs: move the last knowledge of sysfs out from kernfs
sysfs: fix attribute_group bin file path on removal
sysfs.h: don't return a void-valued expression in sysfs_remove_file
init.h: Update initcall_sync variants to fix build errors
driver core: Inline dev_set/get_drvdata
driver core: dev_get_drvdata: Don't check for NULL dev
driver core: dev_set_drvdata returns void
driver core: dev_set_drvdata can no longer fail
driver core: Move driver_data back to struct device
lib/devres.c: fix checkpatch warnings
lib/devres.c: use dev in devm_request_and_ioremap
kobject: Make support for uevent_helper optional.
kernfs: make kernfs_notify() trigger inotify events too
kernfs: implement kernfs_root->supers list
Here is the big char / misc driver updates for 3.16-rc1.
Lots of different driver updates for a variety of different drivers and
minor driver subsystems.
All have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEABECAAYFAlONWI8ACgkQMUfUDdst+ykvQACdGxTChdEU7edElDAXeelVmu8v
D1UAoLDvqwUsN7t5v+WG2wkOvhf5MEA7
=tVMP
-----END PGP SIGNATURE-----
Merge tag 'char-misc-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc into next
Pull char/misc driver patches from Greg KH:
"Here is the big char / misc driver update for 3.16-rc1.
Lots of different driver updates for a variety of different drivers
and minor driver subsystems.
All have been in linux-next with no reported issues"
* tag 'char-misc-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (79 commits)
hv: use correct order when freeing monitor_pages
spmi: of: fixup generic SPMI devicetree binding example
applicom: dereferencing NULL on error path
misc: genwqe: fix uninitialized return value in genwqe_free_sync_sgl()
miscdevice.h: Simple syntax fix to make pointers consistent.
MAINTAINERS: Add miscdevice.h to file list for char/misc drivers.
mcb: Add support for shared PCI IRQs
drivers: Remove duplicate conditionally included subdirs
misc: atmel_pwm: only build for supported platforms
mei: me: move probe quirk to cfg structure
mei: add per device configuration
mei: me: read H_CSR after asserting reset
mei: me: drop harmful wait optimization
mei: me: fix hw ready reset flow
mei: fix memory leak of mei_clients array
uio: fix vma io range check in mmap
drivers: uio_dmem_genirq: Fix memory leak in uio_dmem_genirq_probe()
w1: do not unlock unheld list_mutex in __w1_remove_master_device()
w1: optional bundling of netlink kernel replies
connector: allow multiple messages to be sent in one packet
...
On Feb 17, 2014, two new usages are approved to HID usage Table 18 -
Digitizer Page:
5A Secondary Barrel Switch MC 16.4
5B Transducer Serial Number SV 16.3.1
This patch adds relevant definitions to hid/input. It also removes
outdated comments in hid.h.
Signed-off-by: Ping Cheng <pingc@wacom.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
A few more updates from the last week of development, nothing too
exciting. Highlights include:
- GPIO descriptor support for jacks
- More updates and fixes to the Freescale SSI, Intel and rsnd drivers.
- New drivers for Analog Devices ADAU1361, ADAU1381, ADAU1761 and
ADAU1781, and Realtek RT5677.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTjZkHAAoJELSic+t+oim9pl0P/RTeNipwuUbFUWSYMcARc2wE
Zpw+imtqjgE91ix5QR4OcQtL6+JtoOF7wfXA8dUmXRLrBqgSGDT/OJh771Jq4J7X
LNTd7And4INeELeV7yajsDnHKD8P0V5Dmn+94HrjBgDuseELFn1izdGjM+0GpfKo
i/C+595PtLSx3ekdYmsrXWyxgkUOrArhimiYKQzOtjncsGfXJCyhzvusXZQb3tCl
Fa/HMTgMH8DlRInruN87s5CFr3O4seDB7LYOmjUIsPj4lorEFbqaGOUtPOCDJN1h
tqMVuSIfbAEHX/ny9chsEYtMzOMPWk8s9TWn3LNfHkG8embkzaCaJqg8b8WIk/0U
scqcKptarASYAGpSMu8IoSruC4+7GKtpo8zpGGJBclAMgmpXBSa381QnsiUGGocq
BoBIjNSXzft3eRB0wjhGPLmJ5qMdHSEKuaDaACf2cot63ePF764cdg9EBItJLWEt
G8nHQErUm4UfJapLvrDC+ItSNMLeaSJvCXrM1DPdfObqryKzJDbST/4B7m+w1ZGe
2Cnc536hkYdZ6kFdDgtIyK5yTeAgoxHzYNMndejt4bjkK7bnCIxa49tZV9d8LsTG
g03cCEkkE4nCHWsSL891YBKfLLujTZoIY+f5M396FhLHBXL5krCtPRWE42I/xo58
wXH46Sd7HqDXxom5jpP+
=iPCP
-----END PGP SIGNATURE-----
Merge tag 'asoc-v3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next
ASoC: Final updates for v3.16
A few more updates from the last week of development, nothing too
exciting. Highlights include:
- GPIO descriptor support for jacks
- More updates and fixes to the Freescale SSI, Intel and rsnd drivers.
- New drivers for Analog Devices ADAU1361, ADAU1381, ADAU1761 and
ADAU1781, and Realtek RT5677.
Formats the palmas header file. Convert all
the offset values to hexadecimal.
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
abx500_dump_all_banks() has no callers in the kernel, so it's probably
safe to remove it.
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Update the addresses and names to match current silicon.
The WM8997 regmap tables have been adjusted to match the new
names.
Missing registers have been added to WM5110 default value table.
Signed-off-by: Richard Fitzgerald <rf@opensource.wolfsonmicro.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
This just updates include/linux/mfd/cros_ec_commands.h to match the
latest EC version (which is the One True Source for such things). See
<https://chromium.googlesource.com/chromiumos/platform/ec>
[dianders: took today's ToT version from the Chromium OS EC; deleted
references to cros_ec_dev and cros_ec_lpc since those aren't upstream
yet]
Signed-off-by: Bill Richardson <wfrichar@chromium.org>
Signed-off-by: Doug Anderson <dianders@chromium.org>
Reviewed-by: Simon Glass <sjg@chromium.org>
Tested-by: Andrew Bresticker <abrestic@chromium.org>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
This adds a driver for the Atmel Microcontroller found on the
iPAQ h3xxx series. This device handles some keys, the
touchscreen, and the battery monitoring.
This is a port of a driver from handhelds.org 2.6.21 kernel,
written by Alessandro Gardich based on Andrew Christians
original HAL-driver. It has been heavily cleaned and
converted to mfd-core by Dmitry Artamonow and rewritten
again for the v3.x series kernels by Linus Walleij,
bringing back some of the functionality lost from Andrew's
original driver.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Alessandro Gardich <gremlin@gremlin.it>
Signed-off-by: Dmitry Artamonow <mad_soft@inbox.ru>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
commit df73de9b0d ("mfd: syscon: Return -ENOSYS if CONFIG_MFD_SYSCON
is not enabled") introduced fallbacks for APIs, but missed out on adding
the header file. This would work only if linux/err.h is also included
in the source code from where this file is included. It would be better
to include linux/err.h in file to remove possible build errors.
Without this patch, we get following and similar build errors if this
header file is included in some source file and CONFIG_MFD_SYSCON is
not enabled.
include/linux/mfd/syscon.h: In function ‘syscon_node_to_regmap’:
include/linux/mfd/syscon.h:30:2: error: implicit declaration of function ‘ERR_PTR’ [-Werror=implicit-function-declaration]
return ERR_PTR(-ENOSYS);
^
include/linux/mfd/syscon.h:30:18: error: ‘ENOSYS’ undeclared (first use in this function)
return ERR_PTR(-ENOSYS);
^
Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
This patch introduces the preliminary support for PMICs X-Powers AXP202
and AXP209. The AXP209 and AXP202 are the PMUs (Power Management Unit)
used by A10, A13 and A20 SoCs and developed by X-Powers, a sister company
of Allwinner.
The core enables support for two subsystems:
- PEK (Power Enable Key)
- Regulators
Signed-off-by: Carlo Caione <carlo@caione.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The pm8xxx read/write wrappers are no longer necessary now that
all the sub-device drivers are using the regmap API. Remove it.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The device type was stored in sec_pmic_dev state container twice:
- unsigned long type (initialized from of_device_id or i2c_device_id)
- int device_type (initialized as above or from board files when there
is no DTS)
The 'type' field was never used outside of probe so it can be safely
removed.
Change also the device_type in sec_pmic_dev and sec_platform_data to
unsigned long to avoid any casts.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
In certain boards the source for the clk32k clock can be gated. In these
boards the clk32k clock can be provided to the driver and it is going to be
enabled/disabled when it is needed.
If the clk32k clock is not provided the driver will assume that it is always
running.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
All boards using twl6040 configures the i2c bus to 400KHz. While twl6040's
defaults to normal mode (100KHz). So far twl6040 has no problem with i2c
communication in this configuration it is safer to select fast i2c mode.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The valid gpio is GPIO0 ~ GPIO58, so ngpio should be 59.
This patch also renames RDC321X_MAX_GPIO to RDC321X_NUM_GPIO because it
actually means the number of available GPIOs.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Update the documentation for sec_pmic state container structure to
reflect current code.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The rtc-s5m driver does not support all of S2M and S5M chipsets
supported by main MFD sec-core driver. For such chipsets unsupported by
rtc-s5m, the MFD sec-core driver initialized regmap with default config.
This config in such cases wouldn't work at all.
The main MFD sec-core driver shouldn't initialize regmap for child
drivers which is not used by them and even not valid.
Move the allocation of RTC I2C dummy device and initialization of RTC
regmap from main MFD sec-core driver to the rtc-s5m driver. The rtc-s5m
driver will use proper regmap config for supported devices.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The *rdev[] is not used since commit 413be59e2f
"regulator: tps65218: Remove unnecessary regulator_unregister call".
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>