This patchset addresses a race which was described in the changelog for
5695be142e ("OOM, PM: OOM killed task shouldn't escape PM suspend"):
: PM freezer relies on having all tasks frozen by the time devices are
: getting frozen so that no task will touch them while they are getting
: frozen. But OOM killer is allowed to kill an already frozen task in order
: to handle OOM situtation. In order to protect from late wake ups OOM
: killer is disabled after all tasks are frozen. This, however, still keeps
: a window open when a killed task didn't manage to die by the time
: freeze_processes finishes.
The original patch hasn't closed the race window completely because that
would require a more complex solution as it can be seen by this patchset.
The primary motivation was to close the race condition between OOM killer
and PM freezer _completely_. As Tejun pointed out, even though the race
condition is unlikely the harder it would be to debug weird bugs deep in
the PM freezer when the debugging options are reduced considerably. I can
only speculate what might happen when a task is still runnable
unexpectedly.
On a plus side and as a side effect the oom enable/disable has a better
(full barrier) semantic without polluting hot paths.
I have tested the series in KVM with 100M RAM:
- many small tasks (20M anon mmap) which are triggering OOM continually
- s2ram which resumes automatically is triggered in a loop
echo processors > /sys/power/pm_test
while true
do
echo mem > /sys/power/state
sleep 1s
done
- simple module which allocates and frees 20M in 8K chunks. If it sees
freezing(current) then it tries another round of allocation before calling
try_to_freeze
- debugging messages of PM stages and OOM killer enable/disable/fail added
and unmark_oom_victim is delayed by 1s after it clears TIF_MEMDIE and before
it wakes up waiters.
- rebased on top of the current mmotm which means some necessary updates
in mm/oom_kill.c. mark_tsk_oom_victim is now called under task_lock but
I think this should be OK because __thaw_task shouldn't interfere with any
locking down wake_up_process. Oleg?
As expected there are no OOM killed tasks after oom is disabled and
allocations requested by the kernel thread are failing after all the tasks
are frozen and OOM disabled. I wasn't able to catch a race where
oom_killer_disable would really have to wait but I kinda expected the race
is really unlikely.
[ 242.609330] Killed process 2992 (mem_eater) total-vm:24412kB, anon-rss:2164kB, file-rss:4kB
[ 243.628071] Unmarking 2992 OOM victim. oom_victims: 1
[ 243.636072] (elapsed 2.837 seconds) done.
[ 243.641985] Trying to disable OOM killer
[ 243.643032] Waiting for concurent OOM victims
[ 243.644342] OOM killer disabled
[ 243.645447] Freezing remaining freezable tasks ... (elapsed 0.005 seconds) done.
[ 243.652983] Suspending console(s) (use no_console_suspend to debug)
[ 243.903299] kmem_eater: page allocation failure: order:1, mode:0x204010
[...]
[ 243.992600] PM: suspend of devices complete after 336.667 msecs
[ 243.993264] PM: late suspend of devices complete after 0.660 msecs
[ 243.994713] PM: noirq suspend of devices complete after 1.446 msecs
[ 243.994717] ACPI: Preparing to enter system sleep state S3
[ 243.994795] PM: Saving platform NVS memory
[ 243.994796] Disabling non-boot CPUs ...
The first 2 patches are simple cleanups for OOM. They should go in
regardless the rest IMO.
Patches 3 and 4 are trivial printk -> pr_info conversion and they should
go in ditto.
The main patch is the last one and I would appreciate acks from Tejun and
Rafael. I think the OOM part should be OK (except for __thaw_task vs.
task_lock where a look from Oleg would appreciated) but I am not so sure I
haven't screwed anything in the freezer code. I have found several
surprises there.
This patch (of 5):
This patch is just a preparatory and it doesn't introduce any functional
change.
Note:
I am utterly unhappy about lowmemory killer abusing TIF_MEMDIE just to
wait for the oom victim and to prevent from new killing. This is
just a side effect of the flag. The primary meaning is to give the oom
victim access to the memory reserves and that shouldn't be necessary
here.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce the basic control files to account, partition, and limit
memory using cgroups in default hierarchy mode.
This interface versioning allows us to address fundamental design
issues in the existing memory cgroup interface, further explained
below. The old interface will be maintained indefinitely, but a
clearer model and improved workload performance should encourage
existing users to switch over to the new one eventually.
The control files are thus:
- memory.current shows the current consumption of the cgroup and its
descendants, in bytes.
- memory.low configures the lower end of the cgroup's expected
memory consumption range. The kernel considers memory below that
boundary to be a reserve - the minimum that the workload needs in
order to make forward progress - and generally avoids reclaiming
it, unless there is an imminent risk of entering an OOM situation.
- memory.high configures the upper end of the cgroup's expected
memory consumption range. A cgroup whose consumption grows beyond
this threshold is forced into direct reclaim, to work off the
excess and to throttle new allocations heavily, but is generally
allowed to continue and the OOM killer is not invoked.
- memory.max configures the hard maximum amount of memory that the
cgroup is allowed to consume before the OOM killer is invoked.
- memory.events shows event counters that indicate how often the
cgroup was reclaimed while below memory.low, how often it was
forced to reclaim excess beyond memory.high, how often it hit
memory.max, and how often it entered OOM due to memory.max. This
allows users to identify configuration problems when observing a
degradation in workload performance. An overcommitted system will
have an increased rate of low boundary breaches, whereas increased
rates of high limit breaches, maximum hits, or even OOM situations
will indicate internally overcommitted cgroups.
For existing users of memory cgroups, the following deviations from
the current interface are worth pointing out and explaining:
- The original lower boundary, the soft limit, is defined as a limit
that is per default unset. As a result, the set of cgroups that
global reclaim prefers is opt-in, rather than opt-out. The costs
for optimizing these mostly negative lookups are so high that the
implementation, despite its enormous size, does not even provide
the basic desirable behavior. First off, the soft limit has no
hierarchical meaning. All configured groups are organized in a
global rbtree and treated like equal peers, regardless where they
are located in the hierarchy. This makes subtree delegation
impossible. Second, the soft limit reclaim pass is so aggressive
that it not just introduces high allocation latencies into the
system, but also impacts system performance due to overreclaim, to
the point where the feature becomes self-defeating.
The memory.low boundary on the other hand is a top-down allocated
reserve. A cgroup enjoys reclaim protection when it and all its
ancestors are below their low boundaries, which makes delegation
of subtrees possible. Secondly, new cgroups have no reserve per
default and in the common case most cgroups are eligible for the
preferred reclaim pass. This allows the new low boundary to be
efficiently implemented with just a minor addition to the generic
reclaim code, without the need for out-of-band data structures and
reclaim passes. Because the generic reclaim code considers all
cgroups except for the ones running low in the preferred first
reclaim pass, overreclaim of individual groups is eliminated as
well, resulting in much better overall workload performance.
- The original high boundary, the hard limit, is defined as a strict
limit that can not budge, even if the OOM killer has to be called.
But this generally goes against the goal of making the most out of
the available memory. The memory consumption of workloads varies
during runtime, and that requires users to overcommit. But doing
that with a strict upper limit requires either a fairly accurate
prediction of the working set size or adding slack to the limit.
Since working set size estimation is hard and error prone, and
getting it wrong results in OOM kills, most users tend to err on
the side of a looser limit and end up wasting precious resources.
The memory.high boundary on the other hand can be set much more
conservatively. When hit, it throttles allocations by forcing
them into direct reclaim to work off the excess, but it never
invokes the OOM killer. As a result, a high boundary that is
chosen too aggressively will not terminate the processes, but
instead it will lead to gradual performance degradation. The user
can monitor this and make corrections until the minimal memory
footprint that still gives acceptable performance is found.
In extreme cases, with many concurrent allocations and a complete
breakdown of reclaim progress within the group, the high boundary
can be exceeded. But even then it's mostly better to satisfy the
allocation from the slack available in other groups or the rest of
the system than killing the group. Otherwise, memory.max is there
to limit this type of spillover and ultimately contain buggy or
even malicious applications.
- The original control file names are unwieldy and inconsistent in
many different ways. For example, the upper boundary hit count is
exported in the memory.failcnt file, but an OOM event count has to
be manually counted by listening to memory.oom_control events, and
lower boundary / soft limit events have to be counted by first
setting a threshold for that value and then counting those events.
Also, usage and limit files encode their units in the filename.
That makes the filenames very long, even though this is not
information that a user needs to be reminded of every time they
type out those names.
To address these naming issues, as well as to signal clearly that
the new interface carries a new configuration model, the naming
conventions in it necessarily differ from the old interface.
- The original limit files indicate the state of an unset limit with
a very high number, and a configured limit can be unset by echoing
-1 into those files. But that very high number is implementation
and architecture dependent and not very descriptive. And while -1
can be understood as an underflow into the highest possible value,
-2 or -10M etc. do not work, so it's not inconsistent.
memory.low, memory.high, and memory.max will use the string
"infinity" to indicate and set the highest possible value.
[akpm@linux-foundation.org: use seq_puts() for basic strings]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The unified hierarchy interface for memory cgroups will no longer use "-1"
to mean maximum possible resource value. In preparation for this, make
the string an argument and let the caller supply it.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit b2052564e6 ("mm: memcontrol: continue cache reclaim from
offlined groups") pages charged to a memory cgroup are not reparented when
the cgroup is removed. Instead, they are supposed to be reclaimed in a
regular way, along with pages accounted to online memory cgroups.
However, an lruvec of an offline memory cgroup will sooner or later get so
small that it will be scanned only at low scan priorities (see
get_scan_count()). Therefore, if there are enough reclaimable pages in
big lruvecs, pages accounted to offline memory cgroups will never be
scanned at all, wasting memory.
Fix this by unconditionally forcing scanning dead lruvecs from kswapd.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
next_zones_zonelist() returns a zoneref pointer, as well as a zone pointer
via extra parameter. Since the latter can be trivially obtained by
dereferencing the former, the overhead of the extra parameter is
unjustified.
This patch thus removes the zone parameter from next_zones_zonelist().
Both callers happen to be in the same header file, so it's simple to add
the zoneref dereference inline. We save some bytes of code size.
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-105 (-105)
function old new delta
nr_free_zone_pages 129 115 -14
__alloc_pages_nodemask 2300 2285 -15
get_page_from_freelist 2652 2576 -76
add/remove: 0/0 grow/shrink: 1/0 up/down: 10/0 (10)
function old new delta
try_to_compact_pages 569 579 +10
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Expand the usage of the struct alloc_context introduced in the previous
patch also for calling try_to_compact_pages(), to reduce the number of its
parameters. Since the function is in different compilation unit, we need
to move alloc_context definition in the shared mm/internal.h header.
With this change we get simpler code and small savings of code size and stack
usage:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-27 (-27)
function old new delta
__alloc_pages_direct_compact 283 256 -27
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-13 (-13)
function old new delta
try_to_compact_pages 582 569 -13
Stack usage of __alloc_pages_direct_compact goes from 24 to none (per
scripts/checkstack.pl).
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have a race condition between move_pages() and freeing hugepages, where
move_pages() calls follow_page(FOLL_GET) for hugepages internally and
tries to get its refcount without preventing concurrent freeing. This
race crashes the kernel, so this patch fixes it by moving FOLL_GET code
for hugepages into follow_huge_pmd() with taking the page table lock.
This patch intentionally removes page==NULL check after pte_page.
This is justified because pte_page() never returns NULL for any
architectures or configurations.
This patch changes the behavior of follow_huge_pmd() for tail pages and
then tail pages can be pinned/returned. So the caller must be changed to
properly handle the returned tail pages.
We could have a choice to add the similar locking to
follow_huge_(addr|pud) for consistency, but it's not necessary because
currently these functions don't support FOLL_GET flag, so let's leave it
for future development.
Here is the reproducer:
$ cat movepages.c
#include <stdio.h>
#include <stdlib.h>
#include <numaif.h>
#define ADDR_INPUT 0x700000000000UL
#define HPS 0x200000
#define PS 0x1000
int main(int argc, char *argv[]) {
int i;
int nr_hp = strtol(argv[1], NULL, 0);
int nr_p = nr_hp * HPS / PS;
int ret;
void **addrs;
int *status;
int *nodes;
pid_t pid;
pid = strtol(argv[2], NULL, 0);
addrs = malloc(sizeof(char *) * nr_p + 1);
status = malloc(sizeof(char *) * nr_p + 1);
nodes = malloc(sizeof(char *) * nr_p + 1);
while (1) {
for (i = 0; i < nr_p; i++) {
addrs[i] = (void *)ADDR_INPUT + i * PS;
nodes[i] = 1;
status[i] = 0;
}
ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
MPOL_MF_MOVE_ALL);
if (ret == -1)
err("move_pages");
for (i = 0; i < nr_p; i++) {
addrs[i] = (void *)ADDR_INPUT + i * PS;
nodes[i] = 0;
status[i] = 0;
}
ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
MPOL_MF_MOVE_ALL);
if (ret == -1)
err("move_pages");
}
return 0;
}
$ cat hugepage.c
#include <stdio.h>
#include <sys/mman.h>
#include <string.h>
#define ADDR_INPUT 0x700000000000UL
#define HPS 0x200000
int main(int argc, char *argv[]) {
int nr_hp = strtol(argv[1], NULL, 0);
char *p;
while (1) {
p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
if (p != (void *)ADDR_INPUT) {
perror("mmap");
break;
}
memset(p, 0, nr_hp * HPS);
munmap(p, nr_hp * HPS);
}
}
$ sysctl vm.nr_hugepages=40
$ ./hugepage 10 &
$ ./movepages 10 $(pgrep -f hugepage)
Fixes: e632a938d9 ("mm: migrate: add hugepage migration code to move_pages()")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Found it when I want to jump to the definition of MIGRATE_RESERVE ctags.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The complexity of memcg page stat synchronization is currently leaking
into the callsites, forcing them to keep track of the move_lock state and
the IRQ flags. Simplify the API by tracking it in the memcg.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The body of this function was removed by commit 0a31bc97c8 ("mm:
memcontrol: rewrite uncharge API").
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add KPF_ZERO_PAGE flag for zero_page, so that userspace processes can
detect zero_page in /proc/kpageflags, and then do memory analysis more
accurately.
Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add VM_BUG_ON_PAGE() for slab pages. _mapcount is an union with slab
struct in struct page, so we must avoid accessing _mapcount if this page
is a slab page. Also remove the unneeded bracket.
Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, we use lru.next/lru.prev plus cast to access or set
destructor and order of compound page.
Let's replace it with explicit fields in struct page.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch resolves the following warnings.
include/trace/events/f2fs.h:150:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:180:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:150:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:180:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
fs/f2fs/checkpoint.c:27:19: warning: symbol 'inode_entry_slab' was not declared. Should it be static?
fs/f2fs/checkpoint.c:577:15: warning: cast to restricted __le32
fs/f2fs/checkpoint.c:592:15: warning: cast to restricted __le32
fs/f2fs/trace.c:19:1: warning: symbol 'pids' was not declared. Should it be static?
fs/f2fs/trace.c:21:21: warning: symbol 'last_io' was not declared. Should it be static?
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds two macros for transition between byte and block offsets.
Currently, f2fs only supports 4KB blocks, so use the default size for now.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds FASTBOOT flag into checkpoint as follows.
- CP_UMOUNT_FLAG is set when system is umounted.
- CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries
was done.
So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly.
Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently, there are several variables with Boolean type as below:
struct f2fs_sb_info {
...
int s_dirty;
bool need_fsck;
bool s_closing;
...
bool por_doing;
...
}
For this there are some issues:
1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean
type variables by compiler.
2. if we continuously add new flag into f2fs_sb_info, structure will be messed
up.
So in this patch, we try to:
1. switch s_dirty to Boolean type variable since it has two status 0/1.
2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag.
3. introduce an enum type which can indicate different states of sbi.
4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag
to operate flags for sbi.
After that, above issues will be fixed.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Change remote checksum handling to set checksum partial as default
behavior. Added an iflink parameter to configure not using
checksum partial (calling csum_partial to update checksum).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change remote checksum handling to set checksum partial as default
behavior. Added an iflink parameter to configure not using
checksum partial (calling csum_partial to update checksum).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds infrastructure so that remote checksum offload can
set CHECKSUM_PARTIAL instead of calling csum_partial and writing
the modfied checksum field.
Add skb_remcsum_adjust_partial function to set an skb for using
CHECKSUM_PARTIAL with remote checksum offload. Changed
skb_remcsum_process and skb_gro_remcsum_process to take a boolean
argument to indicate if checksum partial can be set or the
checksum needs to be modified using the normal algorithm.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the free and same_flow fields to be bit fields
(2 and 1 bit sized respectively). This frees up some space for u16's.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current meaning of CHECKSUM_PARTIAL for validating checksums
is that _all_ checksums in the packet are considered valid.
However, in the manner that CHECKSUM_PARTIAL is set only the checksum
at csum_start+csum_offset and any preceding checksums may
be considered valid. If there are checksums in the packet after
csum_offset it is possible they have not been verfied.
This patch changes CHECKSUM_PARTIAL logic in skb_csum_unnecessary and
__skb_gro_checksum_validate_needed to only considered checksums
referring to csum_start and any preceding checksums (with starting
offset before csum_start) to be verified.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remote checksum offload processing is currently the same for both
the GRO and non-GRO path. When the remote checksum offload option
is encountered, the checksum field referred to is modified in
the packet. So in the GRO case, the packet is modified in the
GRO path and then the operation is skipped when the packet goes
through the normal path based on skb->remcsum_offload. There is
a problem in that the packet may be modified in the GRO path, but
then forwarded off host still containing the remote checksum option.
A remote host will again perform RCO but now the checksum verification
will fail since GRO RCO already modified the checksum.
To fix this, we ensure that GRO restores a packet to it's original
state before returning. In this model, when GRO processes a remote
checksum option it still changes the checksum per the algorithm
but on return from lower layer processing the checksum is restored
to its original value.
In this patch we add define gro_remcsum structure which is passed
to skb_gro_remcsum_process to save offset and delta for the checksum
being changed. After lower layer processing, skb_gro_remcsum_cleanup
is called to restore the checksum before returning from GRO.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the normal {} instead of a macro to terminate an array.
Remove the macro too.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the normal {} instead of a macro to terminate an array.
Remove the macro too.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using the IPCB() macro to get the IPv4 options is convenient, but
unfortunately NetLabel often needs to examine the CIPSO option outside
of the scope of the IP layer in the stack. While historically IPCB()
worked above the IP layer, due to the inclusion of the inet_skb_param
struct at the head of the {tcp,udp}_skb_cb structs, recent commit
971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
reordered the tcp_skb_cb struct and invalidated this IPCB() trick.
This patch fixes the problem by creating a new function,
cipso_v4_optptr(), which locates the CIPSO option inside the IP header
without calling IPCB(). Unfortunately, this isn't as fast as a simple
lookup so some additional tweaks were made to limit the use of this
new function.
Cc: <stable@vger.kernel.org> # 3.18
Reported-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
- Framework changes and enhancements:
- Passing -DDEBUG recursively to subdir drivers so we get
debug messages properly turned on.
- Infer map type from DT property in the groups parsing code
in the generic pinconfig code.
- Support for custom parameter passing in generic pin config.
This is used when you are using the generic pin config, but
want to add a few custom properties that no other driver
will use.
- New drivers:
- Driver for the Xilinx Zynq
- Driver for the AmLogic Meson SoCs
- New features in drivers:
- Sleep support (suspend/resume) for the Cherryview driver
- mvebeu a38x can now mux a UART on pins MPP19 and MPP20
- Migrated the qualcomm driver to generic pin config handling
of extended config options in the core code.
- Support BUS1 and AUDIO in the Exynos pin controller.
- Add some missing functions in the sun6i driver.
- Add support for the A31S variant in the sun6i driver.
- EMEv2 support in the Renesas PFC driver.
- Ass support for Qualcomm MSM8916 in the qcom driver.
- Deleted features
- Drop support for the SiRF Marco that was never released to
the market.
- Drop SH7372 support as the support for this platform is
removed from the kernel.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU2w2GAAoJEEEQszewGV1z7kYQAKw4Y4SY9Mq9O97GBq0JWvzv
uLK4P8NvegHkgX0IDc/etAtHzBN6L+4axh7rDAsaDhug+42CbZxVZjXCfLGFClZP
kJ/gz4II28AGWiP7TZPNHspIJYgKUdWcVfg0cTZwpM22/AEdBAo9HQ2a/FltvrCn
eYwzFlAOKUUygmDdbfXHk5Z+ndrJw0ahLjXn8zjBe1HkD2QVaigM9ecA2aQHiG9a
QvABUJ2qVVs9rqTIxoVzSIGTLeLzrv8cezDLQhZ4KaEasAkxtWKM4kYQSMx/PoTB
mg+FZ5B8IXqlksnSljT+wOcSP1nmtRdjnED/MpsSLbo9RfJgHkA4Lu4Q8iqt7rZL
+k/kKi3+p9pTE2pIi56nSpHnfgF8JHgdRAYIXBea5Ug0YnBp83/5jLrU7Fmcjr7s
l0PH0Fk0iRFRdfn6crcs+SLrhQKtuP+Douwg+3ujVOQiKIW6m+b161GwEVYkvWlq
1JRWPSjncpsmyg5O8dEwZDwgtzPU65UMEsLgRk9wMNJYw0TqGPugEy4+2rBdJWLy
CYzJo2At9OcHbB2rT8UKwtErQF85IcWmfnMyfo2PANTLGaj5EFruhmSc2J0m7kOe
ExPXtWOWxGCtWG53ZDeJUYBg9ySMyleY10LYBP9fPQnotvNB3vfyAkBwV74fnBXs
ijxO6/Uamd4Rrs5LDRBL
=Nbzh
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pincontrol updates from Linus Walleij:
:This is the bulk of pin control changes for the v3.20 cycle:
Framework changes and enhancements:
- Passing -DDEBUG recursively to subdir drivers so we get debug
messages properly turned on.
- Infer map type from DT property in the groups parsing code in the
generic pinconfig code.
- Support for custom parameter passing in generic pin config. This
is used when you are using the generic pin config, but want to add
a few custom properties that no other driver will use.
New drivers:
- Driver for the Xilinx Zynq
- Driver for the AmLogic Meson SoCs
New features in drivers:
- Sleep support (suspend/resume) for the Cherryview driver
- mvebeu a38x can now mux a UART on pins MPP19 and MPP20
- Migrated the qualcomm driver to generic pin config handling of
extended config options in the core code.
- Support BUS1 and AUDIO in the Exynos pin controller.
- Add some missing functions in the sun6i driver.
- Add support for the A31S variant in the sun6i driver.
- EMEv2 support in the Renesas PFC driver.
- Add support for Qualcomm MSM8916 in the qcom driver.
Deleted features
- Drop support for the SiRF Marco that was never released to the
market.
- Drop SH7372 support as the support for this platform is removed
from the kernel"
* tag 'pinctrl-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (40 commits)
sh-pfc: emev2 - Fix mangled author name
pinctrl: cherryview: Configure HiZ pins to be input when requested as GPIOs
pinctrl: imx25: fix numbering for pins
pinctrl: pinctrl-imx: don't use invalid value of conf_reg
pinctrl: qcom: delete pin_config_get/set pinconf operations
pinctrl: qcom: Add msm8916 pinctrl driver
DT: pinctrl: Document Qualcomm MSM8916 pinctrl binding
pinctrl: qcom: increase variable size for register offsets
pinctrl: hide PCONFDUMP in #ifdef
pinctrl: rockchip: Only mask interrupts; never disable
pinctrl: zynq: Fix usb0 pins
pinctrl: sh-pfc: sh7372: Remove DT binding documentation
pinctrl: sh-pfc: sh7372: Remove PFC support
sh-pfc: Add emev2 pinmux support
sh-pfc: add macro to define pinmux without function
pinctrl: add driver for Amlogic Meson SoCs
staging: drivers: pinctrl: Fixed checkpatch.pl warnings
pinctrl: exynos: Add AUDIO pin controller for exynos7
sh-pfc: r8a7790: add MLB+ pin group
sh-pfc: r8a7791: add MLB+ pin group
...
- GPIOLIB core changes:
- Create and use of_mm_gpiochip_remove() for removing
memory-mapped OF GPIO chips
- GPIO MMIO library suppports bgpio_set_multiple for
switching several lines at once, a feature merged in
the last cycle.
- New drivers:
- New driver for the APM X-gene standby GPIO controller
- New driver for the Fujitsu MB86S7x GPIO controller
- Cleanups:
- Moved rcar driver to use gpiolib irqchip
- Moxart converted to the GPIO MMIO library
- GE driver converted to GPIO MMIO library
- Move sx150x to irqdomain
- Move max732x to irqdomain
- Move vx855 to use managed resources
- Move dwapb to use managed resources
- Clean tc3589x from platform data
- Clean stmpe driver to use device tree only probe
- New subtypes:
- sx1506 support in the sx150x driver
- Quark 1000 SoC support in the SCH driver
- Support X86 in the Xilinx driver
- Support PXA1928 in the PXA driver
- Extended drivers:
- max732x supports device tree probe
- sx150x supports device tree probe
- Various minor cleanups and bug fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU2W1HAAoJEEEQszewGV1zuRkQAKBukQwx46zQuFH8oalSuxmy
H2/ESMnDmRlBI0zX+dAGDg4HlkO9VwIrdcyzMMe7uO8EFUu9d4/mu2E1f8cY2mxu
kwUSntVaYGVPVj/Vok3kzq1wW/pUQ9E2iMbNeDVZJH/cqEvylPPa2LZ3hZVri57J
orJzaYtZWf640y3McGTUDwIQokgxxdMMyWfm26P1iZkByjofUaYRHS1NIxhKVSVC
Y6uA/Ivvh56ezlPQykc7m6YEjoUS91AMllJca1A3KF3+qvQ3Hnc+i9mB7hAQbJ8L
6Iv2qCD4TgtWT5YXgP0eZsfSV/19kvlVlGX7QQT6o7uIFOLVNOmX9FJAc4n3SGNI
HbxsDdlF9XHVFh0XyKcBQfg0uUmRWUDQ/LsXn9KL6Yrpyx3QZGH/PTYl/XoCj1IO
IL6Bw51FCBXvCvLP5alXMouosAxrc19YpljcAuAMmjVTXe6RoULOZJs+lhTHMhvj
S6FWr6l0XDr4yKb0SXTOGI4hvRJvL8SWnIt5ezZrNrTKLsklL3Bi/o3BoKzzxmqS
6l/rxvcVUov455IrfqJ0ZEM6ZxC9/cvGec0RFyglopNSlKeTQsgWRNa6Pw2xZVk8
orT+G3L4jZ1+0hcjcSwfvHng5Nv6DuhIfUNPbLF/ZzmJcBPqGc/DuhGkFqOrBaXg
ey7Lua6+l+8zcDtugRRG
=yWrB
-----END PGP SIGNATURE-----
Merge tag 'gpio-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO changes from Linus Walleij:
"This is the GPIO bulk changes for the v3.20 series:
GPIOLIB core changes:
- Create and use of_mm_gpiochip_remove() for removing memory-mapped
OF GPIO chips
- GPIO MMIO library suppports bgpio_set_multiple for switching
several lines at once, a feature merged in the last cycle.
New drivers:
- New driver for the APM X-gene standby GPIO controller
- New driver for the Fujitsu MB86S7x GPIO controller
Cleanups:
- Moved rcar driver to use gpiolib irqchip
- Moxart converted to the GPIO MMIO library
- GE driver converted to GPIO MMIO library
- Move sx150x to irqdomain
- Move max732x to irqdomain
- Move vx855 to use managed resources
- Move dwapb to use managed resources
- Clean tc3589x from platform data
- Clean stmpe driver to use device tree only probe
New subtypes:
- sx1506 support in the sx150x driver
- Quark 1000 SoC support in the SCH driver
- Support X86 in the Xilinx driver
- Support PXA1928 in the PXA driver
Extended drivers:
- max732x supports device tree probe
- sx150x supports device tree probe
Various minor cleanups and bug fixes"
* tag 'gpio-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (61 commits)
gpio: kconfig: replace PPC_OF with PPC
gpio: pxa: add PXA1928 gpio type support
dt/bindings: gpio: add compatible string for marvell,pxa1928-gpio
gpio: pxa: remove mach IRQ includes
gpio: max732x: use an inline function for container cast
gpio: use sizeof() instead of hardcoded values
gpio: max732x: add set_multiple function
gpio: sch: Consolidate similar algorithms
gpio: tz1090-pdc: Use resource_size to fix off-by-one resource size calculation
gpio: ge: Convert to use devm_kstrdup
gpio: correctly use const char * const
gpio: sx150x: fixup OF support
gpio: mpc8xxx: Use of_mm_gpiochip_remove
gpio: Add Fujitsu MB86S7x GPIO driver
gpio: mpc8xxx: Convert to platform device interface.
gpio: zevio: Use of_mm_gpiochip_remove
gpio: gpio-mm-lantiq: Use of_mm_gpiochip_remove
gpio: gpio-mm-lantiq: Use of_property_read_u32
gpio: gpio-mm-lantiq: Do not replicate code
gpio :gpio-mm-lantiq: Use devm_kzalloc
...
- Support for MMC power sequences.
- SDIO function devicetree subnode parsing.
- Refactor the hardware reset routines and enable it for SD cards.
- Various code quality improvements, especially for slot-gpio.
MMC host:
- dw_mmc: Various fixes and cleanups.
- dw_mmc: Convert to mmc_send_tuning().
- moxart: Fix probe logic.
- sdhci: Various fixes and cleanups
- sdhci: Asynchronous request handling support.
- sdhci-pxav3: Various fixes and cleanups.
- sdhci-tegra: Fixes for T114, T124 and T132.
- rtsx: Various fixes and cleanups.
- rtsx: Support for SDIO.
- sdhi/tmio: Refactor and cleanup of header files.
- omap_hsmmc: Use slot-gpio and common MMC DT parser.
- Make all hosts to deal with errors from mmc_of_parse().
- sunxi: Various fixes and cleanups.
- sdhci: Support for Fujitsu SDHCI controller f_sdh30.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU2b9MAAoJEP4mhCVzWIwp678P/2Hjoo17FDnCQT2qXCRWmMmx
98n7mrkPw20cVm6dlXyVxHFxrgRWan1eATiu1vBdnNmXkeUmThMbuGpATDi40fIT
C2g9wPDM1/naJ+Qg8mPGy0vEDQYHEzxHHlAyfOaeXdhxhll1iHqhk+Jb6cFQN5DP
/CvNmuL/7m9uuFhHlGJnqSNMyenLAFFXthIiVJrQeZeYq9NZ1ZZfW7+esHDmu2lP
EFkrZf+xYFmFWAqccyTR58QZsYKlDv4NS/0UMU941DkO7x7R8ZsQG8xFu9bIN5Wn
EJfgP7EfEXHlD5a1/QQ918IT1ifxhPGiCbBXpdfAUt7Xte6zYyASpTyAm8v7vT2I
2hot1T1BZgADALE2EHAP4kzK49ipfhQmlVZgFeYVsTpPKk8Nvczio7Y3LYlzNmBo
V0jaTUTtU7u7ICtGbo7OqOybW/Sm5E00xsq22txIXObURa7bPbZ4CnxJpstSaU2Z
nweZaa79HaHZE7xyUNh9kAbxfGC0pOT0oPoPYcTxcpk2vva+atULEYnLEHUULrgs
D4+m8tnbuwoZoGanlMKqgPXP8Xkau/meEdz4WaYrXQEIafrVIR2/kcXGQjhD8ucO
VkjUaZDKxNXTkwOzM/siOxJwj75Ka6GDHM7JGx4F30QHqgRTtg2wzInU9nsViuiA
02698dNk9CdP3JirDtbm
=ojsj
-----END PGP SIGNATURE-----
Merge tag 'mmc-v3.20-1' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Support for MMC power sequences.
- SDIO function devicetree subnode parsing.
- Refactor the hardware reset routines and enable it for SD cards.
- Various code quality improvements, especially for slot-gpio.
MMC host:
- dw_mmc: Various fixes and cleanups.
- dw_mmc: Convert to mmc_send_tuning().
- moxart: Fix probe logic.
- sdhci: Various fixes and cleanups
- sdhci: Asynchronous request handling support.
- sdhci-pxav3: Various fixes and cleanups.
- sdhci-tegra: Fixes for T114, T124 and T132.
- rtsx: Various fixes and cleanups.
- rtsx: Support for SDIO.
- sdhi/tmio: Refactor and cleanup of header files.
- omap_hsmmc: Use slot-gpio and common MMC DT parser.
- Make all hosts to deal with errors from mmc_of_parse().
- sunxi: Various fixes and cleanups.
- sdhci: Support for Fujitsu SDHCI controller f_sdh30"
* tag 'mmc-v3.20-1' of git://git.linaro.org/people/ulf.hansson/mmc: (117 commits)
mmc: sdhci-s3c: solve problem with sleeping in atomic context
mmc: pwrseq: add driver for emmc hardware reset
mmc: moxart: fix probe logic
mmc: core: Invoke mmc_pwrseq_post_power_on() prior MMC_POWER_ON state
mmc: pwrseq_simple: Add optional reference clock support
mmc: pwrseq: Document optional clock for the simple power sequence
mmc: pwrseq_simple: Extend to support more pins
mmc: pwrseq: Document that simple sequence support more than one GPIO
mmc: Add hardware dependencies for sdhci-pxav3 and sdhci-pxav2
mmc: sdhci-pxav3: Modify clock settings for the SDR50 and DDR50 modes
mmc: sdhci-pxav3: Extend binding with SDIO3 conf reg for the Armada 38x
mmc: sdhci-pxav3: Fix Armada 38x controller's caps according to erratum ERR-7878951
mmc: sdhci-pxav3: Fix SDR50 and DDR50 capabilities for the Armada 38x flavor
mmc: sdhci: switch voltage before sdhci_set_ios in runtime resume
mmc: tegra: Write xfer_mode, CMD regs in together
mmc: Resolve BKOPS compatability issue
mmc: sdhci-pxav3: fix setting of pdata->clk_delay_cycles
mmc: dw_mmc: rockchip: remove incorrect __exit_p()
mmc: dw_mmc: exynos: remove incorrect __exit_p()
mmc: Fix menuconfig alignment of MMC_SDHCI_* options
...
This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas,
megaraid_sas, ses) plus an assortment of minor updates. There's also an
update to ufs which adds new phy drivers and finally a new logging
infrastructure for SCSI.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJU2Ty5AAoJEDeqqVYsXL0M9rAH/1xNpAxXuxQq+dW5Z+uOaX60
5RRIu7/xA1HEfzkT5FTHrolmogDjVqawu4PZS66iHDeo05RBVUlbTA8qCK+MlRcN
U6s0cLEw59eH3EaCfOGuYp/MnbhuV0eNxe0btmqJIQwuW3+gwZKGJdOq6LS2YasJ
k/DyIBVmkJAVsN56vm9q2vbtcZp+Bg+ngqBS+SC4TF7vV1WCtFmS6yaUf62PYW3D
+Irx37qHZntDR5wdw3dsuKDi5U8bl6myPjaVLnVJqg/WIF9RlCkjk5xpWT99AmVO
NmtYQxLLBlAQ5K+sIlBUwxZe+8q1l+Aj4TTmJHAfFtyfp25s7JR9I6/QtOyC5Kw=
=odol
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull first round of SCSI updates from James Bottomley:
"This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas,
megaraid_sas, ses) plus an assortment of minor updates.
There's also an update to ufs which adds new phy drivers and finally a
new logging infrastructure for SCSI"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (114 commits)
scsi_logging: return void for dev_printk() functions
scsi: print single-character strings with seq_putc
scsi: merge consecutive seq_puts calls
scsi: replace seq_printf with seq_puts
aha152x: replace seq_printf with seq_puts
advansys: replace seq_printf with seq_puts
scsi: remove SPRINTF macro
sg: remove an unused variable
hpsa: Use local workqueues instead of system workqueues
hpsa: add in P840ar controller model name
hpsa: add in gen9 controller model names
hpsa: detect and report failures changing controller transport modes
hpsa: shorten the wait for the CISS doorbell mode change ack
hpsa: refactor duplicated scan completion code into a new routine
hpsa: move SG descriptor set-up out of hpsa_scatter_gather()
hpsa: do not use function pointers in fast path command submission
hpsa: print CDBs instead of kernel virtual addresses for uncommon errors
hpsa: do not use a void pointer for scsi_cmd field of struct CommandList
hpsa: return failed from device reset/abort handlers
hpsa: check for ctlr lockup after command allocation in main io path
...
Pull input updates from Dmitry Torokhov:
"The first round of updates for the input subsystem.
A few new drivers (power button handler for AXP20x PMIC, tps65218
power button driver, sun4i keys driver, regulator haptic driver, NI
Ettus Research USRP E3x0 button, Alwinner A10/A20 PS/2 controller).
Updates to Synaptics and ALPS touchpad drivers (with more to come
later), brand new Focaltech PS/2 support, update to Cypress driver to
handle Gen5 (in addition to Gen3) devices, and number of other fixups
to various drivers as well as input core"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (54 commits)
Input: elan_i2c - fix wrong %p extension
Input: evdev - do not queue SYN_DROPPED if queue is empty
Input: gscps2 - fix MODULE_DEVICE_TABLE invocation
Input: synaptics - use dmax in input_mt_assign_slots
Input: pxa27x_keypad - remove unnecessary ARM includes
Input: ti_am335x_tsc - replace delta filtering with median filtering
ARM: dts: AM335x: Make charge delay a DT parameter for TSC
Input: ti_am335x_tsc - read charge delay from DT
Input: ti_am335x_tsc - remove udelay in interrupt handler
Input: ti_am335x_tsc - interchange touchscreen and ADC steps
Input: MT - add support for balanced slot assignment
Input: drv2667 - remove wrong and unneeded drv2667-haptics modalias
Input: drv260x - remove wrong and unneeded drv260x-haptics modalias
Input: cap11xx - remove wrong and unneeded cap11xx modalias
Input: sun4i-ts - add support for touchpanel controller on A31
Input: serio - add support for Alwinner A10/A20 PS/2 controller
Input: gtco - use sign_extend32() for sign extension
Input: elan_i2c - verify firmware signature applying it
Input: elantech - remove stale comment from Kconfig
Input: cyapa - off by one in cyapa_update_fw_store()
...
In this batch, you can find lots of cleanups through the whole
subsystem, as our good New Year's resolution. Lots of LOCs and
commits are about LINE6 driver that was promoted finally from staging
tree, and as usual, there've been widely spread ASoC changes.
Here some highlights:
ALSA core changes
- Embedding struct device into ALSA core structures
- sequencer core cleanups / fixes
- PCM msbits constraints cleanups / fixes
- New SNDRV_PCM_TRIGGER_DRAIN command
- PCM kerneldoc fixes, header cleanups
- PCM code cleanups using more standard codes
- Control notification ID fixes
Driver cleanups
- Cleanups of PCI PM callbacks
- Timer helper usages cleanups
- Simplification (e.g. argument reduction) of many driver codes
HD-audio
- Hotkey and LED support on HP laptops with Realtek codecs
- Dock station support on HP laptops
- Toshiba Satellite S50D fixup
- Enhanced wallclock timestamp handling for HD-audio
- Componentization to simplify the linkage between i915 and hd-audio
drivers for Intel HDMI/DP
USB-audio
- Akai MPC Element support
- Enhanced timestamp handling
ASoC
- Lots of refactoringin ASoC core, moving drivers to more data
driven initialization and rationalizing a lot of DAPM usage
- Much improved handling of CDCLK clocks on Samsung I2S controllers
- Lots of driver specific cleanups and feature improvements
- CODEC support for TI PCM514x and TLV320AIC3104 devices
- Board support for Tegra systems with Realtek RT5677
- New driver for Maxim max98357a
- More enhancements / fixes for Intel SST driver
Others
- Promotion of LINE6 driver from staging along with lots of rewrites
and cleanups
- DT support for old non-ASoC atmel driver
- oxygen cleanups, XIO2001 init, Studio Evolution SE6x support
- Emu8000 DRAM size detection fix on ISA(!!) AWE64 boards
- A few more ak411x fixes for ice1724 boards
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJU2ySIAAoJEGwxgFQ9KSmk/YMP/1v0r60aDt6VxlTbadt008R/
jTEIzD4oGEMkhFQdlmN8MegZlx+05vxQUCGrVy8PLelhfy/mnj6z/iUt9ohE1PqK
530eVr5FnlAbHs1JzP8Tm8Xbbtk8RXt5uvgohJvt7HBrc0Def9N/w37fUQ0ytO+s
Ot/0Xm8BNsdJ90DfMVLc0Ok9cAFn4Z70gylE/PuGxbBBzxQh8PYPXtJ6Q/s5lKLV
QC7VitJa0H7vsFYb+Ve7GU4cKMTt8uEPw8CdnQbDwb63ia93iWJJrlqKVUWYF2Gu
K+mX5Igdb88ToXbMPrLKXe73IfFcdpWNTbj8IAv+Rp9fArylzz+3GAYmrqTAdare
JEE5qAZTtJZEeD2vgNCnA4JpSbRzL0bHrEow21LnPONq3V9FB044NAeMSx3dI4j1
fk+SnqrpJMtlCtgj2PuWzIcqRiJ25F/Qax/xFeZHo7FwLIBF7z5pLu9DP4CfUSXj
fDEcB9aNF2VirJkQdbhHaPqTYVf2rHQ/ebDpDHBwkqFe865IHlJ8g8MrHnAFInKN
jQlSTOqi9V3two53U1JIKcB6QcBH3vh60w2JsWsQadsr45YYQ/bvBHGYNpQ00C3U
rbDBANhAHaF/hFncNnOQDsH65FqHj/ZlBQRhzX0LqxN4K0DM1FqGcLf2k6u/pzZU
09+QlcIOOzN8lbvHR8Qx
=/84r
-----END PGP SIGNATURE-----
Merge tag 'sound-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"In this batch, you can find lots of cleanups through the whole
subsystem, as our good New Year's resolution. Lots of LOCs and
commits are about LINE6 driver that was promoted finally from staging
tree, and as usual, there've been widely spread ASoC changes.
Here some highlights:
ALSA core changes
- Embedding struct device into ALSA core structures
- sequencer core cleanups / fixes
- PCM msbits constraints cleanups / fixes
- New SNDRV_PCM_TRIGGER_DRAIN command
- PCM kerneldoc fixes, header cleanups
- PCM code cleanups using more standard codes
- Control notification ID fixes
Driver cleanups
- Cleanups of PCI PM callbacks
- Timer helper usages cleanups
- Simplification (e.g. argument reduction) of many driver codes
HD-audio
- Hotkey and LED support on HP laptops with Realtek codecs
- Dock station support on HP laptops
- Toshiba Satellite S50D fixup
- Enhanced wallclock timestamp handling for HD-audio
- Componentization to simplify the linkage between i915 and hd-audio
drivers for Intel HDMI/DP
USB-audio
- Akai MPC Element support
- Enhanced timestamp handling
ASoC
- Lots of refactoringin ASoC core, moving drivers to more data driven
initialization and rationalizing a lot of DAPM usage
- Much improved handling of CDCLK clocks on Samsung I2S controllers
- Lots of driver specific cleanups and feature improvements
- CODEC support for TI PCM514x and TLV320AIC3104 devices
- Board support for Tegra systems with Realtek RT5677
- New driver for Maxim max98357a
- More enhancements / fixes for Intel SST driver
Others
- Promotion of LINE6 driver from staging along with lots of rewrites
and cleanups
- DT support for old non-ASoC atmel driver
- oxygen cleanups, XIO2001 init, Studio Evolution SE6x support
- Emu8000 DRAM size detection fix on ISA(!!) AWE64 boards
- A few more ak411x fixes for ice1724 boards"
* tag 'sound-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (542 commits)
ALSA: line6: toneport: Use explicit type for firmware version
ALSA: line6: Use explicit type for serial number
ALSA: line6: Return EIO if read/write not successful
ALSA: line6: Return error if device not responding
ALSA: line6: Add delay before reading status
ASoC: Intel: Clean data after SST fw fetch
ALSA: hda - Add docking station support for another HP machine
ALSA: control: fix failure to return new numerical ID in 'replace' event data
ALSA: usb: update trigger timestamp on first non-zero URB submitted
ALSA: hda: read trigger_timestamp immediately after starting DMA
ALSA: pcm: allow for trigger_tstamp snapshot in .trigger
ALSA: pcm: don't override timestamp unconditionally
ALSA: off by one bug in snd_riptide_joystick_probe()
ASoC: rt5670: Set use_single_rw flag for regmap
ASoC: rt286: Add rt288 codec support
ASoC: max98357a: Fix build in !CONFIG_OF case
ASoC: Intel: fix platform_no_drv_owner.cocci warnings
ARM: dts: Switch Odroid X2/U2 to simple-audio-card
ARM: dts: Exynos4 and Odroid X2/U3 sound device nodes update
ALSA: control: fix failure to return numerical ID in 'add' event
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU2NQQAAoJEAhfPr2O5OEV5BgQAIja/XsIgpeNhfN8kJ3GrdhL
Z+QRTcHNc6AWGm1dkI+YTl4B38/xLlmxhUYPKsDl19N7n1oKkqdUxYtLe1mLdecW
dvqMXMVBKQSCgyDP5sgZNHKlavEX1ZPTTtkrY8zYWaXbkcf4dOZyisbNQrmFdO3T
wt4zwaO8+ziCEYbotLsaI1VpEDKFZV6AVhKnLsWxc4ZoCnAqJbmA31jtANxrQ0tw
UgXRjJmf1uWrS+MWM5xFDi+v+FmZiUAHMJ5iksqWhp2pKj41geIqy7lAueytEN+Q
vQHZ9cfhnoF/7VrqDtqq5CaJZPKfA80PSxml9mbjc4wytvWLevoc4UxFtU+lohOf
YbM3nB5J3nAcq0bNF/cSpuYUoiGnK86FazuM6YAQy2CaucrVKALKHHmziWbK6gBv
1yA4qnDuRYKps3SQSQQKuNlv8dmcVTD/sVhf8EIx62son6xxeXf21nas61lw8k5P
lrUVH9nJxkwTkRJ7wMjlAZeh0pTyB/Ag1bSn81myziv0r4AsNyWJT5qxN8szmZDe
nXGIdQ1h5JkMQ0kCfhhLqgdIUwhx7dMXIlXcCfR/8a9uYm4StegPNCEZDybIi6co
8Ok3rPYt15PlrCyfMjXFOG/TYi/cZ/xIbffLbSFMOqnCUZElaA7RNpOnswNc9fc6
2WsY54Lb4ftC4bQ7hM90
=VH6m
-----END PGP SIGNATURE-----
Merge tag 'media/v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- Some documentation updates and a few new pixel formats
- Stop btcx-risc abuse by cx88 and move it to bt8xx driver
- New platform driver: am437x
- New webcam driver: toptek
- New remote controller hardware protocols added to img-ir driver
- Removal of a few very old drivers that relies on old kABIs and are
for very hard to find hardware: parallel port webcam drivers
(bw-qcam, c-cam, pms and w9966), tlg2300, Video In/Out for SGI (vino)
- Removal of the USB Telegent driver (tlg2300). The company that
developed this driver has long gone and the hardware is hard to find.
As it relies on a legacy set of kABI symbols and nobody seems to care
about it, remove it.
- several improvements at rtl2832 driver
- conversion on cx28521 and au0828 to use videobuf2 (VB2)
- several improvements, fixups and board additions
* tag 'media/v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (321 commits)
[media] dvb_net: Convert local hex dump to print_hex_dump_debug
[media] dvb_net: Use standard debugging facilities
[media] dvb_net: Use vsprintf %pM extension to print Ethernet addresses
[media] staging: lirc_serial: adjust boolean assignments
[media] stb0899: use sign_extend32() for sign extension
[media] si2168: add support for 1.7MHz bandwidth
[media] si2168: return error if set_frontend is called with invalid parameters
[media] lirc_dev: avoid potential null-dereference
[media] mn88472: simplify bandwidth registers setting code
[media] dvb: tc90522: re-add symbol-rate report
[media] lmedm04: add read snr, signal strength and ber call backs
[media] lmedm04: Create frontend call back for read status
[media] lmedm04: create frontend callbacks for signal/snr/ber/ucblocks
[media] lmedm04: Fix usb_submit_urb BOGUS urb xfer, pipe 1 != type 3 in interrupt urb
[media] lmedm04: Increase Interupt due time to 200 msec
[media] cx88-dvb: whitespace cleanup
[media] rtl28xxu: properly initialize pdata
[media] rtl2832: declare functions as static
[media] rtl2830: declare functions as static
[media] rtl2832_sdr: add kernel-doc comments for platform_data
...
This patch is based on exynos-drm-next branch of Inki Dae's tree at:
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git
DECON(Display and Enhancement Controller) is the new IP
in exynos7 SOC for generating video signals using pixel data.
DECON driver can be used to drive 2 different interfaces on Exynos7:
DECON-INT(video controller) and DECON-EXT(Mixer for HDMI)
The existing FIMD driver code was used as a template to create
DECON driver. Only DECON-INT is supported as of now, and
DECON-EXT support will be added later.
The current version of the driver supports video mode displays.
Changelog v2:
- Change config name, DRM_EXYNOS_DECON to DRM_EXYNOS7_DECON.
Signed-off-by: Akshu Agrawal <akshua@gmail.com>
Signed-off-by: Ajay Kumar <ajaykumar.rs@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Disappointing, as this was kind of neat (especially getting to use RCU
to manage the address -> eventfd mapping). But now the devices are PCI
handled in userspace, we get rid of both the NOTIFY hypercall and
the interface to connect an eventfd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use the ptrace API struct, and we currently don't let them set
anything but the normal registers (we'd have to filter the others).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Flushing out my drm-misc queue with a few oddball things all over.
* tag 'topic/drm-misc-2015-02-06' of git://anongit.freedesktop.org/drm-intel:
drm: Use static attribute groups for managing connector sysfs entries
drm: remove DRM_FORMAT_NV12MT
drm/modes: Print the mode status in human readable form
drm/irq: Don't disable vblank interrupts when already disabled
The VIRTIO_F_ANY_LAYOUT and VIRTIO_F_NOTIFY_ON_EMPTY features are pre-1.0
only.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
This allows modern implementations to ensure they don't use legacy
feature bits or SCSI commands (which are not used in v1.0 non-legacy).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
This provides backdoor access to the device MMIOs, and every device should
have one. From the virtio 1.0 spec (CS03):
4.1.4.7.1 Device Requirements: PCI configuration access capability
The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG capability.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Pull networking updates from David Miller:
1) More iov_iter conversion work from Al Viro.
[ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
wrong, and this pull actually adds an extra commit on top of the
branch I'm pulling to fix that up, so that the pre-merge state is
ok. - Linus ]
2) Various optimizations to the ipv4 forwarding information base trie
lookup implementation. From Alexander Duyck.
3) Remove sock_iocb altogether, from CHristoph Hellwig.
4) Allow congestion control algorithm selection via routing metrics.
From Daniel Borkmann.
5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.
6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.
7) Add xmit_more support to r8169, e1000, and e1000e drivers. From
Florian Westphal.
8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.
9) Add BPF packet actions to packet scheduler, from Jiri Pirko.
10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.
11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
Kwok.
12) More sanely handle out-of-window dupacks, which can result in
serious ACK storms. From Neal Cardwell.
13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
Patrick McHardy, and Thomas Graf.
14) Support xmit_more in be2net, from Sathya Perla.
15) Group Policy extensions for vxlan, from Thomas Graf.
16) Remove Checksum Offload support for vxlan, from Tom Herbert.
17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From
Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
crypto: fix af_alg_make_sg() conversion to iov_iter
ipv4: Namespecify TCP PMTU mechanism
i40e: Fix for stats init function call in Rx setup
tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
ipv6: Make __ipv6_select_ident static
ipv6: Fix fragment id assignment on LE arches.
bridge: Fix inability to add non-vlan fdb entry
net: Mellanox: Delete unnecessary checks before the function call "vunmap"
cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
net: dsa: Remove redundant phy_attach()
IB/mlx4: Reset flow support for IB kernel ULPs
IB/mlx4: Always use the correct port for mirrored multicast attachments
net/bonding: Fix potential bad memory access during bonding events
tipc: remove tipc_snprintf
tipc: nl compat add noop and remove legacy nl framework
tipc: convert legacy nl stats show to nl compat
tipc: convert legacy nl net id get to nl compat
tipc: convert legacy nl net id set to nl compat
...
Pull trivial tree changes from Jiri Kosina:
"Patches from trivial.git that keep the world turning around.
Mostly documentation and comment fixes, and a two corner-case code
fixes from Alan Cox"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
kexec, Kconfig: spell "architecture" properly
mm: fix cleancache debugfs directory path
blackfin: mach-common: ints-priority: remove unused function
doubletalk: probe failure causes OOPS
ARM: cache-l2x0.c: Make it clear that cache-l2x0 handles L310 cache controller
msdos_fs.h: fix 'fields' in comment
scsi: aic7xxx: fix comment
ARM: l2c: fix comment
ibmraid: fix writeable attribute with no store method
dynamic_debug: fix comment
doc: usbmon: fix spelling s/unpriviledged/unprivileged/
x86: init_mem_mapping(): use capital BIOS in comment
Pull live patching infrastructure from Jiri Kosina:
"Let me provide a bit of history first, before describing what is in
this pile.
Originally, there was kSplice as a standalone project that implemented
stop_machine()-based patching for the linux kernel. This project got
later acquired, and the current owner is providing live patching as a
proprietary service, without any intentions to have their
implementation merged.
Then, due to rising user/customer demand, both Red Hat and SUSE
started working on their own implementation (not knowing about each
other), and announced first versions roughly at the same time [1] [2].
The principle difference between the two solutions is how they are
making sure that the patching is performed in a consistent way when it
comes to different execution threads with respect to the semantic
nature of the change that is being introduced.
In a nutshell, kPatch is issuing stop_machine(), then looking at
stacks of all existing processess, and if it decides that the system
is in a state that can be patched safely, it proceeds insterting code
redirection machinery to the patched functions.
On the other hand, kGraft provides a per-thread consistency during one
single pass of a process through the kernel and performs a lazy
contignuous migration of threads from "unpatched" universe to the
"patched" one at safe checkpoints.
If interested in a more detailed discussion about the consistency
models and its possible combinations, please see the thread that
evolved around [3].
It pretty quickly became obvious to the interested parties that it's
absolutely impractical in this case to have several isolated solutions
for one task to co-exist in the kernel. During a dedicated Live
Kernel Patching track at LPC in Dusseldorf, all the interested parties
sat together and came up with a joint aproach that would work for both
distro vendors. Steven Rostedt took notes [4] from this meeting.
And the foundation for that aproach is what's present in this pull
request.
It provides a basic infrastructure for function "live patching" (i.e.
code redirection), including API for kernel modules containing the
actual patches, and API/ABI for userspace to be able to operate on the
patches (look up what patches are applied, enable/disable them, etc).
It's relatively simple and minimalistic, as it's making use of
existing kernel infrastructure (namely ftrace) as much as possible.
It's also self-contained, in a sense that it doesn't hook itself in
any other kernel subsystem (it doesn't even touch any other code).
It's now implemented for x86 only as a reference architecture, but
support for powerpc, s390 and arm is already in the works (adding
arch-specific support basically boils down to teaching ftrace about
regs-saving).
Once this common infrastructure gets merged, both Red Hat and SUSE
have agreed to immediately start porting their current solutions on
top of this, abandoning their out-of-tree code. The plan basically is
that each patch will be marked by flag(s) that would indicate which
consistency model it is willing to use (again, the details have been
sketched out already in the thread at [3]).
Before this happens, the current codebase can be used to patch a large
group of secruity/stability problems the patches for which are not too
complex (in a sense that they don't introduce non-trivial change of
function's return value semantics, they don't change layout of data
structures, etc) -- this corresponds to LEAVE_FUNCTION &&
SWITCH_FUNCTION semantics described at [3].
This tree has been in linux-next since December.
[1] https://lkml.org/lkml/2014/4/30/477
[2] https://lkml.org/lkml/2014/7/14/857
[3] https://lkml.org/lkml/2014/11/7/354
[4] http://linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_LivePatching.txt
[ The core code is introduced by the three commits authored by Seth
Jennings, which got a lot of changes incorporated during numerous
respins and reviews of the initial implementation. All the followup
commits have materialized only after public tree has been created,
so they were not folded into initial three commits so that the
public tree doesn't get rebased ]"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: add missing newline to error message
livepatch: rename config to CONFIG_LIVEPATCH
livepatch: fix uninitialized return value
livepatch: support for repatching a function
livepatch: enforce patch stacking semantics
livepatch: change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING
livepatch: fix deferred module patching order
livepatch: handle ancient compilers with more grace
livepatch: kconfig: use bool instead of boolean
livepatch: samples: fix usage example comments
livepatch: MAINTAINERS: add git tree location
livepatch: use FTRACE_OPS_FL_IPMODIFY
livepatch: move x86 specific ftrace handler code to arch/x86
livepatch: samples: add sample live patching module
livepatch: kernel: add support for live patching
livepatch: kernel: add TAINT_LIVEPATCH
Pull HID updates from Jiri Kosina:
"Updates for HID code
- improveements of Logitech HID++ procotol implementation, from
Benjamin Tissoires
- support for composite RMI devices, from Andrew Duggan
- new driver for BETOP controller, from Huang Bo
- fixup for conflicting mapping in HID core between PC-101/103/104
and PC-102/105 keyboards from David Herrmann
- new hardware support and fixes in Wacom driver, from Ping Cheng
- assorted small fixes and device ID additions all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (33 commits)
HID: wacom: add support for Cintiq 27QHD and 27QHD touch
HID: wacom: consolidate input capability settings for pen and touch
HID: wacom: make sure touch arbitration is applied consistently
HID: pidff: Fix initialisation forMicrosoft Sidewinder FF Pro 2
HID: hyperv: match wait_for_completion_timeout return type
HID: wacom: Report ABS_MISC event for Cintiq Companion Hybrid
HID: Use Kbuild idiom in Makefiles
HID: do not bind to Microchip Pick16F1454
HID: hid-lg4ff: use DEVICE_ATTR_RW macro
HID: hid-lg4ff: fix sysfs attribute permission
HID: wacom: peport In Range event according to the spec
HID: wacom: process invalid Cintiq and Intuos data in wacom_intuos_inout()
HID: rmi: Add support for the touchpad in the Razer Blade 14 laptop
HID: rmi: Support touchpads with external buttons
HID: rmi: Use hid_report_len to compute the size of reports
HID: logitech-hidpp: store the name of the device in struct hidpp
HID: microsoft: add support for Japanese Surface Type Cover 3
HID: fixup the conflicting keyboard mappings quirk
HID: apple: fix battery support for the 2009 ANSI wireless keyboard
HID: fix Kconfig text
...
Merge misc updates from Andrew Morton:
"Bite-sized chunks this time, to avoid the MTA ratelimiting woes.
- fs/notify updates
- ocfs2
- some of MM"
That laconic "some MM" is mainly the removal of remap_file_pages(),
which is a big simplification of the VM, and which gets rid of a *lot*
of random cruft and special cases because we no longer support the
non-linear mappings that it used.
From a user interface perspective, nothing has changed, because the
remap_file_pages() syscall still exists, it's just done by emulating the
old behavior by creating a lot of individual small mappings instead of
one non-linear one.
The emulation is slower than the old "native" non-linear mappings, but
nobody really uses or cares about remap_file_pages(), and simplifying
the VM is a big advantage.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits)
memcg: zap memcg_slab_caches and memcg_slab_mutex
memcg: zap memcg_name argument of memcg_create_kmem_cache
memcg: zap __memcg_{charge,uncharge}_slab
mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check
mm: hugetlb: fix type of hugetlb_treat_as_movable variable
mm, hugetlb: remove unnecessary lower bound on sysctl handlers"?
mm: memory: merge shared-writable dirtying branches in do_wp_page()
mm: memory: remove ->vm_file check on shared writable vmas
xtensa: drop _PAGE_FILE and pte_file()-related helpers
x86: drop _PAGE_FILE and pte_file()-related helpers
unicore32: drop pte_file()-related helpers
um: drop _PAGE_FILE and pte_file()-related helpers
tile: drop pte_file()-related helpers
sparc: drop pte_file()-related helpers
sh: drop _PAGE_FILE and pte_file()-related helpers
score: drop _PAGE_FILE and pte_file()-related helpers
s390: drop pte_file()-related helpers
parisc: drop _PAGE_FILE and pte_file()-related helpers
openrisc: drop _PAGE_FILE and pte_file()-related helpers
nios2: drop _PAGE_FILE and pte_file()-related helpers
...
Pull quota interface unification and misc cleanups from Jan Kara:
"The first part of the series unifying XFS and VFS quota interfaces.
This part unifies turning quotas on and off so quota-tools and
xfs_quota can be used to manage any filesystem. This is useful so
that userspace doesn't have to distinguish which filesystem it is
working with. As a result we can then easily reuse tests for project
quotas in XFS for ext4.
This also contains minor cleanups and fixes for udf, isofs, and ext3"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (23 commits)
udf: remove bool assignment to 0/1
udf: use bool for done
quota: Store maximum space limit in bytes
quota: Remove quota_on_meta callback
ocfs2: Use generic helpers for quotaon and quotaoff
ext4: Use generic helpers for quotaon and quotaoff
quota: Add ->quota_{enable,disable} callbacks for VFS quotas
quota: Wire up ->quota_{enable,disable} callbacks into Q_QUOTA{ON,OFF}
quota: Split ->set_xstate callback into two
xfs: Remove some pointless quota checks
xfs: Remove some useless flags tests
xfs: Remove useless test
quota: Verify flags passed to Q_SETINFO
quota: Cleanup flags definitions
ocfs2: Move OLQF_CLEAN flag out of generic quota flags
quota: Don't store flags for v2 quota format
jbd: drop jbd_ENOSYS debug
udf: destroy sbi mutex in put_super
udf: Check length of extended attributes and allocation descriptors
udf: Remove repeated loads blocksize
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU1MYmAAoJEAAOaEEZVoIV/rAQAKoHj/PCOATTy05lF/NDhJlS
6NbNjupnC8HrbNPv6Z/cQ902eC1YRVH96gf6we4FeAm9Tjctpje6uEqvPQCUxpot
2jWgCG+g95OeEaQEjXQvR3x5ZfXvPUtwKVOnMF423L1p5Xfbj3kJfGi+dv2k8XOi
GArsUB7uCwqLyyz+L47RJ2Cz7s47M9O25HkVRfWlgYOv+4afq5OpADGKQAhMLL/s
CPhYgqw/7r1p+pLkjUE/x+5BAliDzUinFtDatgD4CeHOdq0RKlxzQ1rFg6uJVg/k
3ZttGOxWUtGIeGM4v5cosDFReLPCESax/TUzn58jxxFR702MjHAA+lHRgjZoWvW/
9EnShl0XlznQX1ns6f0rI1seWe4M5R3CWus8AcG0kDmdbTp8nARo+pBLFhCME/kZ
15GHLz4tDSRt5SNow6aqJdlYJR7p3WrsceKyM5aH9M7odM3eaB5vJxIJ0fljsZbS
Qtz4t+Ua1oVSYD7TX3y7EUiQVPVo8VKS3o6Ua73wCHIXNbSH7hZLOvPLFs6V1Psi
RKqRiad5iO3+iavVGuDDcs12zXZ5hmksE8oMh0NkjFZ6wJlO4Hf5iOt5thABNDmT
Km+40IBq1DYwclPTofaRpB+ytDOnWedMxdWfWdEWQ710zuuNY3cfi/XMXEX34kBY
fLhUMabqcyfUegpA6S0R
=6+UV
-----END PGP SIGNATURE-----
Merge tag 'locks-v3.20-1' of git://git.samba.org/jlayton/linux
Pull file locking related changes #1 from Jeff Layton:
"This patchset contains a fairly major overhaul of how file locks are
tracked within the inode. Rather than a single list, we now create a
per-inode "lock context" that contains individual lists for the file
locks, and a new dedicated spinlock for them.
There are changes in other trees that are based on top of this set so
it may be easiest to pull this in early"
* tag 'locks-v3.20-1' of git://git.samba.org/jlayton/linux:
locks: update comments that refer to inode->i_flock
locks: consolidate NULL i_flctx checks in locks_remove_file
locks: keep a count of locks on the flctx lists
locks: clean up the lm_change prototype
locks: add a dedicated spinlock to protect i_flctx lists
locks: remove i_flock field from struct inode
locks: convert lease handling to file_lock_context
locks: convert posix locks to file_lock_context
locks: move flock locks to file_lock_context
ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks
locks: add a new struct file_locking_context pointer to struct inode
locks: have locks_release_file use flock_lock_file to release generic flock locks
locks: add new struct list_head to struct file_lock
- Rework of the core ACPI resources parsing code to fix issues
in it and make using resource offsets more convenient and
consolidation of some resource-handing code in a couple of places
that have grown analagous data structures and code to cover the
the same gap in the core (Jiang Liu, Thomas Gleixner, Lv Zheng).
- ACPI-based IOAPIC hotplug support on top of the resources handling
rework (Jiang Liu, Yinghai Lu).
- ACPICA update to upstream release 20150204 including an interrupt
handling rework that allows drivers to install raw handlers for
ACPI GPEs which then become entirely responsible for the given GPE
and the ACPICA core code won't touch it (Lv Zheng, David E Box,
Octavian Purdila).
- ACPI EC driver rework to fix several concurrency issues and other
problems related to events handling on top of the ACPICA's new
support for raw GPE handlers (Lv Zheng).
- New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power
Subsystem) driver for Intel chips (Ken Xue).
- Two minor fixes of the ACPI LPSS driver (Heikki Krogerus,
Jarkko Nikula).
- Two new blacklist entries for machines (Samsung 730U3E/740U3E and
510R) where the native backlight interface doesn't work correctly
while the ACPI one does (Hans de Goede).
- Rework of the ACPI processor driver's handling of idle states
to make the code more straightforward and less bloated overall
(Rafael J Wysocki).
- Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht,
Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki,
Yaowei Bai).
- PCI core power management modification to avoid resuming (some)
runtime-suspended devices during system suspend if they are in
the right states already (Rafael J Wysocki).
- New SFI-based cpufreq driver for Intel platforms using SFI
(Srinidhi Kasagar).
- cpufreq core fixes, cleanups and simplifications (Viresh Kumar,
Doug Anderson, Wolfram Sang).
- SkyLake CPU support and other updates for the intel_pstate driver
(Kristen Carlson Accardi, Srinivas Pandruvada).
- cpufreq-dt driver cleanup (Markus Elfring).
- Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla).
- Generic power domains core code fixes and cleanups (Ulf Hansson).
- Operating Performance Points (OPP) core code cleanups and kernel
documentation update (Nishanth Menon).
- New dabugfs interface to make the list of PM QoS constraints
available to user space (Nishanth Menon).
- New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso).
- New devfreq class (devfreq_event) to provide raw utilization data
to devfreq governors (Chanwoo Choi).
- Assorted minor fixes and cleanups related to power management
(Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist,
Pavel Machek, Todd E Brandt, Wonhong Kwon).
- turbostat updates (Len Brown) and cpupower Makefile improvement
(Sriram Raghunathan).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJU2neOAAoJEILEb/54YlRx51QP/jrv1Wb5eMaemzMksPIWI5Zn
I8IbxzToxu7wDDsrTBRv+LuyllMPrnppFOHHvB35gUYu7Y6I066s3ErwuqeFlbmy
+VicmyGMahv3yN74qg49MXzWtaJZa8hrFXn8ItujiUIcs08yELi0vBQFlZImIbTB
PdQngO88VfiOVjDvmKkYUU//9Sc9LCU0ZcdUQXSnA1oNOxuUHjiARz98R03hhSqu
BWR+7M0uaFbu6XeK+BExMXJTpKicIBZ1GAF6hWrS8V4aYg+hH1cwjf2neDAzZkcU
UkXieJlLJrCq+ZBNcy7WEhkWQkqJNWei5WYiy6eoQeQpNoliY2V+2OtSMJaKqDye
PIiMwXstyDc5rgyULN0d1UUzY6mbcUt2rOL0VN2bsFVIJ1HWCq8mr8qq689pQUYv
tcH18VQ2/6r2zW28sTO/ByWLYomklD/Y6bw2onMhGx3Knl0D8xYJKapVnTGhr5eY
d4k41ybHSWNKfXsZxdJc+RxndhPwj9rFLfvY/CZEhLcW+2pAiMarRDOPXDoUI7/l
aJpmPzy/6mPXGBnTfr6jKDSY3gXNazRIvfPbAdiGayKcHcdRM4glbSbNH0/h1Iq6
HKa8v9Fx87k1X5r4ZbhiPdABWlxuKDiM7725rfGpvjlWC3GNFOq7YTVMOuuBA225
Mu9PRZbOsZsnyNkixBpX
=zZER
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
"We have a few new features this time, including a new SFI-based
cpufreq driver, a new devfreq driver for Tegra Activity Monitor, a new
devfreq class for providing its governors with raw utilization data
and a new ACPI driver for AMD SoCs.
Still, the majority of changes here are reworks of existing code to
make it more straightforward or to prepare it for implementing new
features on top of it. The primary example is the rework of ACPI
resources handling from Jiang Liu, Thomas Gleixner and Lv Zheng with
support for IOAPIC hotplug implemented on top of it, but there is
quite a number of changes of this kind in the cpufreq core, ACPICA,
ACPI EC driver, ACPI processor driver and the generic power domains
core code too.
The most active developer is Viresh Kumar with his cpufreq changes.
Specifics:
- Rework of the core ACPI resources parsing code to fix issues in it
and make using resource offsets more convenient and consolidation
of some resource-handing code in a couple of places that have grown
analagous data structures and code to cover the the same gap in the
core (Jiang Liu, Thomas Gleixner, Lv Zheng).
- ACPI-based IOAPIC hotplug support on top of the resources handling
rework (Jiang Liu, Yinghai Lu).
- ACPICA update to upstream release 20150204 including an interrupt
handling rework that allows drivers to install raw handlers for
ACPI GPEs which then become entirely responsible for the given GPE
and the ACPICA core code won't touch it (Lv Zheng, David E Box,
Octavian Purdila).
- ACPI EC driver rework to fix several concurrency issues and other
problems related to events handling on top of the ACPICA's new
support for raw GPE handlers (Lv Zheng).
- New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power
Subsystem) driver for Intel chips (Ken Xue).
- Two minor fixes of the ACPI LPSS driver (Heikki Krogerus, Jarkko
Nikula).
- Two new blacklist entries for machines (Samsung 730U3E/740U3E and
510R) where the native backlight interface doesn't work correctly
while the ACPI one does (Hans de Goede).
- Rework of the ACPI processor driver's handling of idle states to
make the code more straightforward and less bloated overall (Rafael
J Wysocki).
- Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht,
Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki, Yaowei
Bai).
- PCI core power management modification to avoid resuming (some)
runtime-suspended devices during system suspend if they are in the
right states already (Rafael J Wysocki).
- New SFI-based cpufreq driver for Intel platforms using SFI
(Srinidhi Kasagar).
- cpufreq core fixes, cleanups and simplifications (Viresh Kumar,
Doug Anderson, Wolfram Sang).
- SkyLake CPU support and other updates for the intel_pstate driver
(Kristen Carlson Accardi, Srinivas Pandruvada).
- cpufreq-dt driver cleanup (Markus Elfring).
- Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla).
- Generic power domains core code fixes and cleanups (Ulf Hansson).
- Operating Performance Points (OPP) core code cleanups and kernel
documentation update (Nishanth Menon).
- New dabugfs interface to make the list of PM QoS constraints
available to user space (Nishanth Menon).
- New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso).
- New devfreq class (devfreq_event) to provide raw utilization data
to devfreq governors (Chanwoo Choi).
- Assorted minor fixes and cleanups related to power management
(Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist, Pavel
Machek, Todd E Brandt, Wonhong Kwon).
- turbostat updates (Len Brown) and cpupower Makefile improvement
(Sriram Raghunathan)"
* tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (151 commits)
tools/power turbostat: relax dependency on APERF_MSR
tools/power turbostat: relax dependency on invariant TSC
Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources
tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS
tools/power turbostat: relax dependency on root permission
ACPI / video: Add disable_native_backlight quirk for Samsung 510R
ACPI / PM: Remove unneeded nested #ifdef
USB / PM: Remove unneeded #ifdef and associated dead code
intel_pstate: provide option to only use intel_pstate with HWP
ACPI / EC: Add GPE reference counting debugging messages
ACPI / EC: Add query flushing support
ACPI / EC: Refine command storm prevention support
ACPI / EC: Add command flushing support.
ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag
ACPI: add AMD ACPI2Platform device support for x86 system
ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse()
ACPI / EC: Update revision due to raw handler mode.
ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp.
ACPI / EC: Fix several GPE handling issues by deploying ACPI_GPE_DISPATCH_RAW_HANDLER mode.
ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model
...
mem_cgroup->memcg_slab_caches is a list of kmem caches corresponding to
the given cgroup. Currently, it is only used on css free in order to
destroy all caches corresponding to the memory cgroup being freed. The
list is protected by memcg_slab_mutex. The mutex is also used to protect
kmem_cache->memcg_params->memcg_caches arrays and synchronizes
kmem_cache_destroy vs memcg_unregister_all_caches.
However, we can perfectly get on without these two. To destroy all caches
corresponding to a memory cgroup, we can walk over the global list of kmem
caches, slab_caches, and we can do all the synchronization stuff using the
slab_mutex instead of the memcg_slab_mutex. This patch therefore gets rid
of the memcg_slab_caches and memcg_slab_mutex.
Apart from this nice cleanup, it also:
- assures that rcu_barrier() is called once at max when a root cache is
destroyed or a memory cgroup is freed, no matter how many caches have
SLAB_DESTROY_BY_RCU flag set;
- fixes the race between kmem_cache_destroy and kmem_cache_create that
exists, because memcg_cleanup_cache_params, which is called from
kmem_cache_destroy after checking that kmem_cache->refcount=0,
releases the slab_mutex, which gives kmem_cache_create a chance to
make an alias to a cache doomed to be destroyed.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of passing the name of the memory cgroup which the cache is
created for in the memcg_name_argument, let's obtain it immediately in
memcg_create_kmem_cache.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
They are simple wrappers around memcg_{charge,uncharge}_kmem, so let's
zap them and call these functions directly.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One bit in ->vm_flags is unused now!
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After removing vma->shared.nonlinear we have only one member of
vma->shared union, which doesn't make much sense.
This patch drops the union and move struct vma->shared.linear to
vma->shared.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We don't create non-linear mappings anymore. Let's drop code which
handles them in rmap.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We don't create non-linear mappings anymore. Let's drop code which
handles them on page fault.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have remap_file_pages(2) emulation in -mm tree for few release cycles
and we plan to have it mainline in v3.20. This patchset removes rest of
VM_NONLINEAR infrastructure.
Patches 1-8 take care about generic code. They are pretty
straight-forward and can be applied without other of patches.
Rest patches removes pte_file()-related stuff from architecture-specific
code. It usually frees up one bit in non-present pte. I've tried to reuse
that bit for swap offset, where I was able to figure out how to do that.
For obvious reason I cannot test all that arch-specific code and would
like to see acks from maintainers.
In total, remap_file_pages(2) required about 1.4K lines of not-so-trivial
kernel code. That's too much for functionality nobody uses.
Tested-by: Felipe Balbi <balbi@ti.com>
This patch (of 38):
We don't create non-linear mappings anymore. Let's drop code which
handles them on unmap/zap.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
remap_file_pages(2) was invented to be able efficiently map parts of
huge file into limited 32-bit virtual address space such as in database
workloads.
Nonlinear mappings are pain to support and it seems there's no
legitimate use-cases nowadays since 64-bit systems are widely available.
Let's drop it and get rid of all these special-cased code.
The patch replaces the syscall with emulation which creates new VMA on
each remap_file_pages(), unless they it can be merged with an adjacent
one.
I didn't find *any* real code that uses remap_file_pages(2) to test
emulation impact on. I've checked Debian code search and source of all
packages in ALT Linux. No real users: libc wrappers, mentions in
strace, gdb, valgrind and this kind of stuff.
There are few basic tests in LTP for the syscall. They work just fine
with emulation.
To test performance impact, I've written small test case which
demonstrate pretty much worst case scenario: map 4G shmfs file, write to
begin of every page pgoff of the page, remap pages in reverse order,
read every page.
The test creates 1 million of VMAs if emulation is in use, so I had to
set vm.max_map_count to 1100000 to avoid -ENOMEM.
Before: 23.3 ( +- 4.31% ) seconds
After: 43.9 ( +- 0.85% ) seconds
Slowdown: 1.88x
I believe we can live with that.
Test case:
#define _GNU_SOURCE
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#define MB (1024UL * 1024)
#define SIZE (4096 * MB)
int main(int argc, char **argv)
{
unsigned long *p;
long i, pass;
for (pass = 0; pass < 10; pass++) {
p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED) {
perror("mmap");
return -1;
}
for (i = 0; i < SIZE / 4096; i++)
p[i * 4096 / sizeof(*p)] = i;
for (i = 0; i < SIZE / 4096; i++) {
if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096,
0, (SIZE - 4096 * (i + 1)) >> 12, 0)) {
perror("remap_file_pages");
return -1;
}
}
for (i = SIZE / 4096 - 1; i >= 0; i--)
assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1);
munmap(p, SIZE);
}
return 0;
}
[akpm@linux-foundation.org: fix spello]
[sasha.levin@oracle.com: initialize populate before usage]
[sasha.levin@oracle.com: grab file ref to prevent race while mmaping]
Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Armin Rigo <arigo@tunes.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
compound_head() is implemented with assumption that there would be race
condition when checking tail flag. This assumption is only true when we
try to access arbitrary positioned struct page.
The situation that virt_to_head_page() is called is different case. We
call virt_to_head_page() only in the range of allocated pages, so there
is no race condition on tail flag. In this case, we don't need to
handle race condition and we can reduce overhead slightly. This patch
implements compound_head_fast() which is similar with compound_head()
except tail flag race handling. And then, virt_to_head_page() uses this
optimized function to improve performance.
I saw 1.8% win in a fast-path loop over kmem_cache_alloc/free, (14.063
ns -> 13.810 ns) if target object is on tail page.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit e9fd702a58 ("audit: convert audit watches to use fsnotify
instead of inotify") broke handling of renames in audit. Audit code
wants to update inode number of an inode corresponding to watched name
in a directory. When something gets renamed into a directory to a
watched name, inotify previously passed moved inode to audit code
however new fsnotify code passes directory inode where the change
happened. That confuses audit and it starts watching parent directory
instead of a file in a directory.
This can be observed for example by doing:
cd /tmp
touch foo bar
auditctl -w /tmp/foo
touch foo
mv bar foo
touch foo
In audit log we see events like:
type=CONFIG_CHANGE msg=audit(1423563584.155:90): auid=1000 ses=2 op="updated rules" path="/tmp/foo" key=(null) list=4 res=1
...
type=PATH msg=audit(1423563584.155:91): item=2 name="bar" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
type=PATH msg=audit(1423563584.155:91): item=3 name="foo" inode=1046842 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
type=PATH msg=audit(1423563584.155:91): item=4 name="foo" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=CREATE
...
and that's it - we see event for the first touch after creating the
audit rule, we see events for rename but we don't see any event for the
last touch. However we start seeing events for unrelated stuff
happening in /tmp.
Fix the problem by passing moved inode as data in the FS_MOVED_FROM and
FS_MOVED_TO events instead of the directory where the change happens.
This doesn't introduce any new problems because noone besides
audit_watch.c cares about the passed value:
fs/notify/fanotify/fanotify.c cares only about FSNOTIFY_EVENT_PATH events.
fs/notify/dnotify/dnotify.c doesn't care about passed 'data' value at all.
fs/notify/inotify/inotify_fsnotify.c uses 'data' only for FSNOTIFY_EVENT_PATH.
kernel/audit_tree.c doesn't care about passed 'data' at all.
kernel/audit_watch.c expects moved inode as 'data'.
Fixes: e9fd702a58 ("audit: convert audit watches to use fsnotify instead of inotify")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Reworked handling for foreign (grant mapped) pages to simplify the
code, enable a number of additional use cases and fix a number of
long-standing bugs.
- Prefer the TSC over the Xen PV clock when dom0 (and the TSC is
stable).
- Assorted other cleanup and minor bug fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJU2JC+AAoJEFxbo/MsZsTRIvAH/1lgQ0EQlxaZtEFWY8cJBzxY
dXaTMfyGQOddGYDCW0r42hhXJHeX7DWXSERSD3aW9DZOn/eYdneHq9gWRD4uPrGn
hEFQ26J4jZWR5riGXaja0LqI2gJKLZ6BhHIQciLEbY+jw4ynkNBLNRPFehuwrCsZ
WdBwJkyvXC3RErekncRl/aNhxdi4p1P6qeiaW/mo3UcSO/CFSKybOLwT65iePazg
XuY9UiTn2+qcRkm/tjx8K9heHK8SBEGNWuoTcWYF1to8mwwUfKIAc4NO2UBDXJI+
rp7Z2lVFdII15JsQ08ATh3t7xDrMWLzCX/y4jCzmF3DBXLbSWdHCQMgI7TWt5pE=
=PyJK
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen features and fixes from David Vrabel:
- Reworked handling for foreign (grant mapped) pages to simplify the
code, enable a number of additional use cases and fix a number of
long-standing bugs.
- Prefer the TSC over the Xen PV clock when dom0 (and the TSC is
stable).
- Assorted other cleanup and minor bug fixes.
* tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (25 commits)
xen/manage: Fix USB interaction issues when resuming
xenbus: Add proper handling of XS_ERROR from Xenbus for transactions.
xen/gntdev: provide find_special_page VMA operation
xen/gntdev: mark userspace PTEs as special on x86 PV guests
xen-blkback: safely unmap grants in case they are still in use
xen/gntdev: safely unmap grants in case they are still in use
xen/gntdev: convert priv->lock to a mutex
xen/grant-table: add a mechanism to safely unmap pages that are in use
xen-netback: use foreign page information from the pages themselves
xen: mark grant mapped pages as foreign
xen/grant-table: add helpers for allocating pages
x86/xen: require ballooned pages for grant maps
xen: remove scratch frames for ballooned pages and m2p override
xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs()
mm: add 'foreign' alias for the 'pinned' page flag
mm: provide a find_special_page vma operation
x86/xen: cleanup arch/x86/xen/mmu.c
x86/xen: add some __init annotations in arch/x86/xen/mmu.c
x86/xen: add some __init and static annotations in arch/x86/xen/setup.c
x86/xen: use correct types for addresses in arch/x86/xen/setup.c
...
We no longer use it outside of blk-mq.c, so we can make it static
and stop exporting it. Additionally, kill the 'async' argument, as
there's only one used of it.
Signed-off-by: Jens Axboe <axboe@fb.com>
Userspace can opt to receive a device request notification,
indicating that the device should be released. This is setup
the same way as the error IRQ and also supports eventfd signaling.
Future support may forcefully remove the device from the user if
the request is ignored.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When a request is made to unbind a device from a vfio bus driver,
we need to wait for the device to become unused, ie. for userspace
to release the device. However, we have a long standing TODO in
the code to do something proactive to make that happen. To enable
this, we add a request callback on the vfio bus driver struct,
which is intended to signal the user through the vfio device
interface to release the device. Instead of passively waiting for
the device to become unused, we can now pester the user to give
it up.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* pm-cpufreq: (46 commits)
intel_pstate: provide option to only use intel_pstate with HWP
cpufreq-dt: Drop unnecessary check before cpufreq_cooling_unregister() invocation
cpufreq: Create for_each_governor()
cpufreq: Create for_each_policy()
cpufreq: Drop cpufreq_disabled() check from cpufreq_cpu_{get|put}()
cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
intel_pstate: honor user space min_perf_pct override on resume
intel_pstate: respect cpufreq policy request
intel_pstate: Add num_pstates to sysfs
intel_pstate: expose turbo range to sysfs
intel_pstate: Add support for SkyLake
cpufreq: stats: drop unnecessary locking
cpufreq: stats: don't update stats on false notifiers
cpufreq: stats: don't update stats from show_trans_table()
cpufreq: stats: time_in_state can't be NULL in cpufreq_stats_update()
cpufreq: stats: create sysfs group once we are ready
cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications
cpufreq: stats: drop 'cpu' field of struct cpufreq_stats
cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy
cpufreq: stats: rename 'struct cpufreq_stats' objects as 'stats'
...
* pm-domains:
PM: Convert dev_pm_put_subsys_data() into a void function
PM: Update function header for dev_pm_get_subsys_data()
PM / Domains: Handle errors from genpd's ->attach_dev() callback
PM / Domains: Re-order initialization of generic_pm_domain_data
PM / Domains: Free pm_subsys_data in error path in __pm_genpd_add_device()
PM / Domains: Eliminate the mutex for the generic_pm_domain_data
PM / Domains: Don't check for an existing device when adding a new
PM / Domains: Don't allow an existing generic_pm_domain_data
PM / Domains: Remove reference counting for the generic_pm_domain_data
PM / Domains: Rename __pm_genpd_alloc|free_dev_data()
PM / Domains: Remove pm_genpd_dev_need_restore() API
* acpi-resources: (23 commits)
Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources
x86/irq, ACPI: Implement ACPI driver to support IOAPIC hotplug
ACPI: Add interfaces to parse IOAPIC ID for IOAPIC hotplug
x86/PCI: Refine the way to release PCI IRQ resources
x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation
x86/PCI: Fix the range check for IO resources
PCI: Use common resource list management code instead of private implementation
resources: Move struct resource_list_entry from ACPI into resource core
ACPI: Introduce helper function acpi_dev_filter_resource_type()
ACPI: Add field offset to struct resource_list_entry
ACPI: Translate resource into master side address for bridge window resources
ACPI: Return translation offset when parsing ACPI address space resources
ACPI: Enforce stricter checks for address space descriptors
ACPI: Set flag IORESOURCE_UNSET for unassigned resources
ACPI: Normalize return value of resource parser functions
ACPI: Fix a bug in parsing ACPI Memory24 resource
ACPI: Add prefetch decoding to the address space parser
ACPI: Move the window flag logic to the combined parser
ACPI: Unify the parsing of address_space and ext_address_space
ACPI: Let the parser return false for disabled resources
...
* acpica:
ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model
ACPICA: Events: Introduce acpi_set_gpe()/acpi_finish_gpe() to reduce divergences
ACPICA: Events: Introduce ACPI_GPE_DISPATCH_RAW_HANDLER to fix 2 issues for the current GPE APIs
ACPICA: Update version to 20150204
ACPICA: Update Copyright headers to 2015
ACPICA: Hardware: Cast GPE enable_mask before storing
ACPICA: Events: Cleanup GPE dispatcher type obtaining code
ACPICA: Events: Cleanup to move acpi_gbl_global_event_handler invocation out of acpi_ev_gpe_dispatch()
ACPICA: Events: Cleanup of resetting the GPE handler to NULL before removing
ACPICA: Events: Fix uninitialized variable
ACPICA: Events: Remove acpi_ev_valid_gpe_event() due to current restriction
ACPICA: Events: Remove duplicated sanity check in acpi_ev_enable_gpe()
ACPICA: Events: Back port "ACPICA: Save current masks of enabled GPEs after enable register writes"
ACPICA: Resources: Provide common part for struct acpi_resource_address structures.
ACPI: Introduce acpi_unload_parent_table() usages in Linux kernel
ACPICA: take ACPI_MTX_INTERPRETER in acpi_unload_table_id()
Pull libata changes from Tejun Heo:
"Mostly driver-specific changes. Nothing too noteworthy.
This pull request contains three merges from for-3.19-fixes. The
first two are to pull ahci_xgene and sata_dwc_460ex fix commits which
are depended upon by later changes. The last one is to pull in a fix
patch which missed the v3.19-rc window"
* 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (24 commits)
ahci_xgene: Fix the dma state machine lockup for the ATA_CMD_SMART PIO mode command.
ata: libahci: Use of_platform_device_create only if supported
sata_mv: Delete unnecessary checks before the function call "phy_power_off"
ata: Delete unnecessary checks before the function call "pci_dev_put"
ata: pata_platform: fix owner module reference mismatch for scsi host
ata: ahci_platform: fix owner module reference mismatch for scsi host
pata_pdc2027x: Use 64-bit timekeeping
ata: libahci: Fix devres cleanup on failure
ata: libahci: Allow using multiple regulators
Documentation: bindings: Add the regulator property to the sub-nodes AHCI bindings
ata: libahci: Clean-up the ahci_platform_en/disable_phys functions
sata_rcar: extend PM methods
sata_dwc_460ex: disable compilation on ARM and ARM64
ata: libata-core: Remove unused function
sata_dwc_460ex: convert to devm_kzalloc in ->probe()
sata_dwc_460ex: remove extra message
sata_dwc_460ex: use np local variable in ->probe()
sata_dwc_460ex: fix most of the sparse warnings
sata_dwc_460ex: enable COMPILE_TEST for the driver
sata_dwc_460ex: remove redundant dev_set_drvdata
...
Pull cgroup changes from Tejun Heo:
"Three mostly trivial patches. The biggest change is that blkio is now
initialized before memcg which will be needed to make memcg and blkcg
cooperate on writeback IOs"
* 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: add dummy css_put() for !CONFIG_CGROUPS
cgroup: reorder SUBSYS(blkio) in cgroup_subsys.h
Update of Documentation/cgroups/00-INDEX
Pull percpu changes from Tejun Heo:
"Nothing too interesting. One cleanup patch and another to add a
trivial state check function"
* 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu_ref: implement percpu_ref_is_dying()
percpu_ref: remove unnecessary ACCESS_ONCE() in percpu_ref_tryget_live()
Packetization Layer Path MTU Discovery works separately beside
Path MTU Discovery at IP level, different net namespace has
various requirements on which one to chose, e.g., a virutalized
container instance would require TCP PMTU to probe an usable
effective mtu for underlying tunnel, while the host would
employ classical ICMP based PMTU to function.
Hence making TCP PMTU mechanism per net namespace to decouple
two functionality. Furthermore the probe base MSS should also
be configured separately for each namespace.
Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull EFI updates from Ingo Molnar:
"Main changes:
- Move efivarfs from the misc filesystem section to pseudo filesystem
- Expose firmware platform size in sysfs
- Improve robustness of get_memory_map() by removing assumptions on
the size of efi_memory_desc_t.
- various cleanups and fixes
The biggest risk is the get_memory_map() change, which changes the way
that both the arm64 and x86 EFI boot stub build the early memory map.
There are no known regressions with it at the moment, BYMMV"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Don't look for chosen@0 node on DT platforms
firmware: efi: Remove unneeded guid unparse
efi/libstub: Call get_memory_map() to obtain map and desc sizes
efi: Small leak on error in runtime map code
efi: rtc-efi: Mark UIE as unsupported
arm64/efi: efistub: Apply __init annotation
efi: Expose underlying UEFI firmware platform size to userland
efi: Rename efi_guid_unparse to efi_guid_to_str
efi: Update the URLs for efibootmgr
fs: Make efivarfs a pseudo filesystem, built by default with EFI
Pull x86 APIC updates from Ingo Molnar:
"Continued fallout of the conversion of the x86 IRQ code to the
hierarchical irqdomain framework: more cleanups, simplifications,
memory allocation behavior enhancements, mainly in the interrupt
remapping and APIC code"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
x86, init: Fix UP boot regression on x86_64
iommu/amd: Fix irq remapping detection logic
x86/acpi: Make acpi_[un]register_gsi_ioapic() depend on CONFIG_X86_LOCAL_APIC
x86: Consolidate boot cpu timer setup
x86/apic: Reuse apic_bsp_setup() for UP APIC setup
x86/smpboot: Sanitize uniprocessor init
x86/smpboot: Move apic init code to apic.c
init: Get rid of x86isms
x86/apic: Move apic_init_uniprocessor code
x86/smpboot: Cleanup ioapic handling
x86/apic: Sanitize ioapic handling
x86/ioapic: Add proper checks to setp/enable_IO_APIC()
x86/ioapic: Provide stub functions for IOAPIC%3Dn
x86/smpboot: Move smpboot inlines to code
x86/x2apic: Use state information for disable
x86/x2apic: Split enable and setup function
x86/x2apic: Disable x2apic from nox2apic setup
x86/x2apic: Add proper state tracking
x86/x2apic: Clarify remapping mode for x2apic enablement
x86/x2apic: Move code in conditional region
...
Pull timer updates from Ingo Molnar:
"The main changes in this cycle were:
- rework hrtimer expiry calculation in hrtimer_interrupt(): the
previous code had a subtle bug where expiry caching would miss an
expiry, resulting in occasional bogus (late) expiry of hrtimers.
- continuing Y2038 fixes
- ktime division optimization
- misc smaller fixes and cleanups"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Make __hrtimer_get_next_event() static
rtc: Convert rtc_set_ntp_time() to use timespec64
rtc: Remove redundant rtc_valid_tm() from rtc_hctosys()
rtc: Modify rtc_hctosys() to address y2038 issues
rtc: Update rtc-dev to use y2038-safe time interfaces
rtc: Update interface.c to use y2038-safe time interfaces
time: Expose get_monotonic_boottime64 for in-kernel use
time: Expose getboottime64 for in-kernel uses
ktime: Optimize ktime_divns for constant divisors
hrtimer: Prevent stale expiry time in hrtimer_interrupt()
ktime.h: Introduce ktime_ms_delta
Pull scheduler updates from Ingo Molnar:
"The main scheduler changes in this cycle were:
- various sched/deadline fixes and enhancements
- rescheduling latency fixes/cleanups
- rework the rq->clock code to be more consistent and more robust.
- minor micro-optimizations
- ->avg.decay_count fixes
- add a stack overflow check to might_sleep()
- idle-poll handler fix, possibly resulting in power savings
- misc smaller updates and fixes"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/Documentation: Remove unneeded word
sched/wait: Introduce wait_on_bit_timeout()
sched: Pull resched loop to __schedule() callers
sched/deadline: Remove cpu_active_mask from cpudl_find()
sched: Fix hrtick_start() on UP
sched/deadline: Avoid pointless __setscheduler()
sched/deadline: Fix stale yield state
sched/deadline: Fix hrtick for a non-leftmost task
sched/deadline: Modify cpudl::free_cpus to reflect rd->online
sched/idle: Add missing checks to the exit condition of cpu_idle_poll()
sched: Fix missing preemption opportunity
sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target
sched/debug: Print rq->clock_task
sched/core: Rework rq->clock update skips
sched/core: Validate rq_clock*() serialization
sched/core: Remove check of p->sched_class
sched/fair: Fix sched_entity::avg::decay_count initialization
sched/debug: Fix potential call to __ffs(0) in sched_show_task()
sched/debug: Check for stack overflow in ___might_sleep()
sched/fair: Fix the dealing with decay_count in __synchronize_entity_decay()
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- AMD range breakpoints support:
Extend breakpoint tools and core to support address range through
perf event with initial backend support for AMD extended
breakpoints.
The syntax is:
perf record -e mem:addr/len:type
For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512)
perf record -e mem:0x1000/512:w
- event throttling/rotating fixes
- various event group handling fixes, cleanups and general paranoia
code to be more robust against bugs in the future.
- kernel stack overhead fixes
User-visible tooling side changes:
- Show precise number of samples in at the end of a 'record' session,
if processing build ids, since we will then traverse the whole
perf.data file and see all the PERF_RECORD_SAMPLE records,
otherwise stop showing the previous off-base heuristicly counted
number of "samples" (Namhyung Kim).
- Support to read compressed module from build-id cache (Namhyung
Kim)
- Enable sampling loads and stores simultaneously in 'perf mem'
(Stephane Eranian)
- 'perf diff' output improvements (Namhyung Kim)
- Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho
de Melo)
Tooling side infrastructure changes:
- Cache eh/debug frame offset for dwarf unwind (Namhyung Kim)
- Support parsing parameterized events (Cody P Schafer)
- Add support for IP address formats in libtraceevent (David Ahern)
Plus other misc fixes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
perf: Decouple unthrottling and rotating
perf: Drop module reference on event init failure
perf: Use POLLIN instead of POLL_IN for perf poll data in flag
perf: Fix put_event() ctx lock
perf: Fix move_group() order
perf: Fix event->ctx locking
perf: Add a bit of paranoia
perf symbols: Convert lseek + read to pread
perf tools: Use perf_data_file__fd() consistently
perf symbols: Support to read compressed module from build-id cache
perf evsel: Set attr.task bit for a tracking event
perf header: Set header version correctly
perf record: Show precise number of samples
perf tools: Do not use __perf_session__process_events() directly
perf callchain: Cache eh/debug frame offset for dwarf unwind
perf tools: Provide stub for missing pthread_attr_setaffinity_np
perf evsel: Don't rely on malloc working for sz 0
tools lib traceevent: Add support for IP address formats
perf ui/tui: Show fatal error message only if exists
perf tests: Fix typo in sample-parsing.c
...
Pull core locking updates from Ingo Molnar:
"The main changes are:
- mutex, completions and rtmutex micro-optimizations
- lock debugging fix
- various cleanups in the MCS and the futex code"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Optimize setting task running after being blocked
locking/rwsem: Use task->state helpers
sched/completion: Add lock-free checking of the blocking case
sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state
locking/mutex: Explicitly mark task as running after wakeup
futex: Fix argument handling in futex_lock_pi() calls
doc: Fix misnamed FUTEX_CMP_REQUEUE_PI op constants
locking/Documentation: Update code path
softirq/preempt: Add missing current->preempt_disable_ip update
locking/osq: No need for load/acquire when acquire-polling
locking/mcs: Better differentiate between MCS variants
locking/mutex: Introduce ww_mutex_set_context_slowpath()
locking/mutex: Move MCS related comments to proper location
locking/mutex: Checking the stamp is WW only
Pull RCU updates from Ingo Molnar:
"The main RCU changes in this cycle are:
- Documentation updates.
- Miscellaneous fixes.
- Preemptible-RCU fixes, including fixing an old bug in the
interaction of RCU priority boosting and CPU hotplug.
- SRCU updates.
- RCU CPU stall-warning updates.
- RCU torture-test updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
rcu: Initialize tiny RCU stall-warning timeouts at boot
rcu: Fix RCU CPU stall detection in tiny implementation
rcu: Add GP-kthread-starvation checks to CPU stall warnings
rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
rcu: Optionally run grace-period kthreads at real-time priority
ksoftirqd: Use new cond_resched_rcu_qs() function
ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
rcutorture: Add more diagnostics in rcu_barrier() test failure case
torture: Flag console.log file to prevent holdovers from earlier runs
torture: Add "-enable-kvm -soundhw pcspk" to qemu command line
rcutorture: Handle different mpstat versions
rcutorture: Check from beginning to end of grace period
rcu: Remove redundant rcu_batches_completed() declaration
rcutorture: Drop rcu_torture_completed() and friends
rcu: Provide rcu_batches_completed_sched() for TINY_RCU
rcutorture: Use unsigned for Reader Batch computations
rcutorture: Make build-output parsing correctly flag RCU's warnings
rcu: Make _batches_completed() functions return unsigned long
rcutorture: Issue warnings on close calls due to Reader Batch blows
documentation: Fix smp typo in memory-barriers.txt
...
Make __ipv6_select_ident() static as it isn't used outside
the file.
Fixes: 0508c07f5e (ipv6: Select fragment id during UFO segmentation if not set.)
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renamed the reserved1 member of struct ethtool_drvinfo to erom_version to get
expansion/option ROM version of the adapter if present.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver exposes interfaces that directly relate to HW state. Upon fatal
error, consumers of these interfaces (ULPs) that rely on completion of
all their posted work-request could hang, thereby introducing dependencies
in shutdown order. To prevent this from happening, we manage the
relevant resources (CQs, QPs) that are used by the device. Upon a fatal error,
we now generate simulated completions for outstanding WQEs that were not
completed at the time the HW was reset.
It includes invoking the completion event handler for all involved CQs so that
the ULPs will poll those CQs. When polled we return simulated CQEs with
IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and
not wait forever for completions upon receiving remove_one.
The above change requires an extra check in the data path to make sure that when
device is in error state, the simulated CQEs will be returned and no further
WQEs will be posted.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When queuing work to send the NETDEV_BONDING_INFO netdev event, it's
possible that when the work is executed, the pointer to the slave
becomes invalid. This can happen if between queuing the event and the
execution of the work, the net-device was un-ensvaled and re-enslaved.
Fix that by queuing a work with the data of the slave instead of the
slave structure.
Fixes: 69e6113343 ('net/bonding: Notify state change on slaves')
Reported-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This has not been a busy release for the regulator framework, though we
do have the first parts of some ongoing work from Bjorn Andersson to
allow us to support more complex modern systems with dynamic
configuration of regulators in suspend and idle states.
- Support for device-specific properties on regulator nodes when using
simplified DT parsing in the core from Krzysztof Kozlowski.
- Restructuring of the load tracking code, intended to support future
improvements in this area for more complex system designs.
- New drivers for Maxim MAX77843 and Mediatek MT6397.
- Lots of smaller fixes and improvements.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU2GajAAoJECTWi3JdVIfQa/IH/i2Ktdxq6Y5tu1CvUqmGlA+P
tgpL0CAKn5rH3rM4DnAZ5MFNKZ2QgYpEmdkqcO3qunT7V9luQDBIh1CILUDicCla
T+jcHSt2xypnteScxDUOYlzUpjoHhx2Y2L8qneQoYVgMCevbWx8BhSKNSwOnkJrb
1Tl0s3k3CKdU261aIBSdzZ/bipRV0QFG5jSnc70PXrwX+dRd+utKfvXIC1M4pi3E
AbRWY4G5YMrBPviNoTJRpaiZkvOaTwd6riETZWSSQfs3JQ8dpGCfS00gJHgk0A4E
+2cUIhikhKKkDNkNysVRGPZr4D9q3Fnhek9plzQmwHdSb7w7+j4wmB/p6bRR/ME=
=XPki
-----END PGP SIGNATURE-----
Merge tag 'regulator-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"This has not been a busy release for the regulator framework, though
we do have the first parts of some ongoing work from Bjorn Andersson
to allow us to support more complex modern systems with dynamic
configuration of regulators in suspend and idle states.
- Support for device-specific properties on regulator nodes when
using simplified DT parsing in the core from Krzysztof Kozlowski.
- Restructuring of the load tracking code, intended to support future
improvements in this area for more complex system designs.
- New drivers for Maxim MAX77843 and Mediatek MT6397.
- Lots of smaller fixes and improvements"
* tag 'regulator-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (29 commits)
regulator: max77843: Add max77843 regulator driver
regulator: Fix build breakage on !REGULATOR
regulator: Build sysfs entries with static attribute groups
regulator: qcom-rpm: Make it possible to specify supply
regulator: core: Consolidate drms update handling
regulator: qcom-rpm: signedness bug in probe()
regulator: da9211: Add gpio control for enable/disable of buck
regulator: qcom_rpm: Don't update vreg->uV/mV if rpm_reg_write fails
regulator: lp872x: Remove **regulators from struct lp872x
regulator: da9211: fix unmatched of_node
regulator: Update documentation after renaming function argument
regulator: axp20x: Migrate to regulator core's simplified DT parsing code
regulator: axp20x: Fill regulators_node and of_match descriptor fields
regulator: pfuze100-regulator: add pfuze3000 support
regulator: max77686: Document gpio properties
regulator: Allow parsing custom properties when using simplified DT parsing
regulator: max77686: Add GPIO control
regulator: Copy config passed during registration
regulator: tps65023: Constify struct regmap_config and regulator_ops
regulator: max8649: Constify struct regmap_config and regulator_ops
...
The major highlight this release is a refactoring of the core to allow
us to run synchronous transfers in the context of the caller when there
is no contention for the bus. This improves performance in the very
common case by eliminating context switches and reducing the number of
hardware setup and teardown operations we need to perform.
Other changes:
- New drivers for DLN-2 USB-SPI adapter and ST SPI controllers.
- A big round of cleanups, performance and feature improvements
for the xilinx driver from Ricardo Ribalda Delgado.
- A wide range of smaller cleanups, fixes and feature improvements
throughout the subsystem.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU2GNgAAoJECTWi3JdVIfQLiYH/0uLN43CunPp0gSWllQ2PY1O
R1QiqXg1fr1uZKRuGy59QF0TkU/JlWPY+tpGiOH1jrnDsoecnWsxDx3YEeuYdV6U
c//UrlK2uvESivbc48zVUTwCsgxsE8apG0JgqLjsfUpqZTEFxFpeSskepSJ2kIUz
bsXHU8Xi0WkLalsk/8Ik8aUvOwVi5EtRE9OMvnU6QPqQMCszgv1TH4UbwbhqwwzZ
U23WbNHQ262XDRwY2LKl/QROULeU5pd9F19wrveKMa42fkbu/e+kk6E3n7/Hd4mV
CUjv1wTCpPZvzh3bTk50uXwA9XQOzv6ddw6jqsgLcV6jS8Ju3Z3Beya3fmdhOl0=
=3ZQr
-----END PGP SIGNATURE-----
Merge tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"The major highlight this release is a refactoring of the core to allow
us to run synchronous transfers in the context of the caller when
there is no contention for the bus. This improves performance in the
very common case by eliminating context switches and reducing the
number of hardware setup and teardown operations we need to perform.
Other changes:
- New drivers for DLN-2 USB-SPI adapter and ST SPI controllers.
- A big round of cleanups, performance and feature improvements for
the xilinx driver from Ricardo Ribalda Delgado.
- A wide range of smaller cleanups, fixes and feature improvements
throughout the subsystem"
* tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (68 commits)
spi: mxs: cleanup wait_for_completion return handling
spi: ti-qspi: cleanup wait_for_completion return handling
spi: spi-imx: cleanup wait_for_completion handling
spi: sh-msiof: cleanup wait_for_completion return handling
spi: match var type to return type of wait_for_completion
spi: spi-pxa2xx: only include mach/dma.h for legacy DMA
spi: atmel: cleanup wait_for_completion return handling
spi: fsl-dspi: Remove possible memory leak of 'chip'
spi: sh-msiof: Update calculation of frequency dividing
spi: spidev: Convert buf pointers for 32-bit compat SPI_IOC_MESSAGE(n)
spi/xilinx: Fix access invalid memory on xilinx_spi_tx
spi: Revert "spi/xilinx: Remove iowrite/ioread wrappers"
spi/xilinx: Check number of slaves range
spi/xilinx: Use polling mode on small transfers
spi/xilinx: Remove remaining_words driver data variable
spi/xilinx: Remove iowrite/ioread wrappers
spi/xilinx: Convert bits_per_word in bytes_per_word
spi/xilinx: Convert remainding_bytes in remaining words
spi/xilinx: Make spi_tx and spi_rx simmetric
spi/xilinx: Remove rx_fn and tx_fn pointer
...
Add functionality for safely appending string data to a TLV without
keeping write count in the caller.
Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit.
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a framework for transcoding legacy nl action into actions
(.doit) calls from the new nl API. This is done by converting the
incoming TLV data into netlink data with nested netlink attributes.
Unfortunately due to the randomness of the legacy API we can't do this
generically so each legacy netlink command requires a specific
transcoding recipe. In this case for bearer enable and bearer disable.
Convert TIPC_CMD_ENABLE_BEARER and TIPC_CMD_DISABLE_BEARER into doit
compat calls.
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a framework for dumping netlink data from the new netlink
API and formatting it to the old legacy API format. This is done by
looping the dump data and calling a format handler for each entity, in
this case a bearer.
We dump until either all data is dumped or we reach the limited buffer
size of the legacy API. Remember, the legacy API doesn't scale.
In this commit we convert TIPC_CMD_GET_BEARER_NAMES to use the compat
layer.
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A very quiet release for regmap this time around:
- Fix an endianness issue for I2C devices connected via SMBus where
we were getting two layers both trying to do endianness handling.
- Use a union to reduce the size of the regmap struct.
- A couple of smaller fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU2GDZAAoJECTWi3JdVIfQId0H/3dsylUYN0uYzDA1fKy0At5E
7dmKbWkp1Ab62wQYC59a5KH1OgdZe/mWf3x9kxEZlqcmS15OPEuqs7g+BNoAi3Fj
ysmIbKRpJIoZs+XrgBGPCunchLq6rOmtcVaQEUQZZnKg31kQCBhMoPip1QCT5Qn7
UGoYgwP3O7TPVxJDZXA3gCGZER/Jr4Bhrkb3qJKzofpoRvzqSeqKuDJdwOTy7dzx
w1xMQWUlrPuCKHe3HA5uD4ZPHtNsN23APd7spmu/HhqXtN1nCIxh5/0z4qKVGvLO
h3xSCTtOTJZI2kYEFDn3/6DFWQokp90fwBKaG8EtixtD/8M9OQA7tuW1mtsUgOE=
=nPfp
-----END PGP SIGNATURE-----
Merge tag 'regmap-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"A very quiet release for regmap this time around:
- Fix an endianness issue for I2C devices connected via SMBus where
we were getting two layers both trying to do endianness handling.
- Use a union to reduce the size of the regmap struct.
- A couple of smaller fixes"
* tag 'regmap-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: Fix i2c word access when using SMBus access functions
regmap: Export regmap_get_val_endian
regmap: ac97: Clean up indentation
regmap: correct the description of structure element in reg_field
regmap: Move spinlock_flags into the union
iwlwifi:
* more work for new devices (4165 / 8260)
* cleanups / improvemnts in rate control
* fixes for TDLS
* major statistics work from Johannes - more to come
* improvements for the fw error dump infrastructure
* usual amount of small fixes here and there (scan, D0i3 etc...)
* add support for beamforming
* enable stuck queue detection for iwlmvm
* a few fixes for EBS scan
* fixes for various failure paths
* improvements for TDLS Offchannel
wil6210:
* performance tuning
* some AP features
brcm80211:
* rework some code in SDIO part of the brcmfmac driver related to
suspend/resume that were found doing stress testing
* in PCIe part scheduling of worker thread needed to be relaxed
* minor fixes and exposing firmware revision information to
user-space, ie. ethtool.
mwifiex:
* enhancements for change virtual interface handling
* remove coupling between netdev and FW supported interface
combination, now conversion from any type of supported interface
types to any other type is possible
* DFS support in AP mode
ath9k:
* fix calibration issues on some boards
* Wake-on-WLAN improvements
ath10k:
* add support for qca6174 hardware
* enable RX batching to reduce CPU load
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJU1fUoAAoJEG4XJFUm622bnjMH/25f2kyLAJSLJmiIhpEcYlNJ
CJMAYsSTcqhKaMEbx742StUUDH3pa1oV4mQ5csVa2QJIsS90WzPpB1eRnNtQ69Nj
89Zfwa6bbX0TXDgw1Aa28NewPY/3xWpCjI03HBQaMIncToWv3/dzNUr0bEmIuBds
wsr5Y+fy80VKnkoXG7XzGFOqmxxFwNS+UF3M1WNtQ+xcr9rK3/LLxyWFy84S9UIe
4lVOb+df91YFIJeLs28/hfTiRhvV0fWIbupGv8UhuBMho+F0dpLvra+3xksEqALu
4+AzI0j9GjqFfMzbmRBcPcT4Rl37nLmvQ+7XokrFgAYS5zp7OGylOMU3nHyvkuY=
=iL8v
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-for-davem-2015-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Major changes:
iwlwifi:
* more work for new devices (4165 / 8260)
* cleanups / improvemnts in rate control
* fixes for TDLS
* major statistics work from Johannes - more to come
* improvements for the fw error dump infrastructure
* usual amount of small fixes here and there (scan, D0i3 etc...)
* add support for beamforming
* enable stuck queue detection for iwlmvm
* a few fixes for EBS scan
* fixes for various failure paths
* improvements for TDLS Offchannel
wil6210:
* performance tuning
* some AP features
brcm80211:
* rework some code in SDIO part of the brcmfmac driver related to
suspend/resume that were found doing stress testing
* in PCIe part scheduling of worker thread needed to be relaxed
* minor fixes and exposing firmware revision information to
user-space, ie. ethtool.
mwifiex:
* enhancements for change virtual interface handling
* remove coupling between netdev and FW supported interface
combination, now conversion from any type of supported interface
types to any other type is possible
* DFS support in AP mode
ath9k:
* fix calibration issues on some boards
* Wake-on-WLAN improvements
ath10k:
* add support for qca6174 hardware
* enable RX batching to reduce CPU load
Conflicts:
drivers/net/wireless/rtlwifi/pci.c
Conflict resolution is to get rid of the 'end' label and keep
the rest.
Signed-off-by: David S. Miller <davem@davemloft.net>
For blk-mq request-based DM the responsibility of allocating a cloned
request is transfered from DM core to the target type. Doing so
enables the cloned request to be allocated from the appropriate
blk-mq request_queue's pool (only the DM target, e.g. multipath, can
know which block device to send a given cloned request to).
Care was taken to preserve compatibility with old-style block request
completion that requires request-based DM _not_ acquire the clone
request's queue lock in the completion path. As such, there are now 2
different request-based DM target_type interfaces:
1) the original .map_rq() interface will continue to be used for
non-blk-mq devices -- the preallocated clone request is passed in
from DM core.
2) a new .clone_and_map_rq() and .release_clone_rq() will be used for
blk-mq devices -- blk_get_request() and blk_put_request() are used
respectively from these hooks.
dm_table_set_type() was updated to detect if the request-based target is
being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set.
DM core disallows switching the DM table's type after it is set. This
means that there is no mixing of non-blk-mq and blk-mq devices within
the same request-based DM table.
[This patch was started by Keith and later heavily modified by Mike]
Tested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Don't use generic snapshot of trigger_tstamp if low-level driver or
hardware can get a more precise value for better audio/system time
synchronization.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
09c32aaa36 ("ahci_xgene: Fix the dma state machine lockup for the
ATA_CMD_SMART PIO mode command.") missed 3.19 release. Fold it into
for-3.20.
Signed-off-by: Tejun Heo <tj@kernel.org>
Make sure root user does not try something stupid.
Also make sure mask field in struct rps_sock_flow_table
does not share a cache line with the potentially often dirtied
flow table.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 567e4b7973 ("net: rfs: add hash collision detection")
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead we rely on SO_REUSEPORT to provide the reconnection semantics
that we need for NFSv2/v3.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The socket lock is currently held by the task that is requesting the
connection be established. While that is efficient in the case where
the connection happens quickly, it is racy in the case where it doesn't.
What we really want is for the connect helper to be able to block access
to the socket while it is being set up.
This patch does so by arranging to transfer the socket lock from the
task that is requesting the connect attempt, and then releasing that
lock once everything is done.
This scheme also gives us automatic protection against collisions with
the RPC close code, so we can kill the cancel_delayed_work_sync()
call in xs_close().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
out a real bug. During suspend and resume the tlb_flush tracepoint is
called when the CPU is going offline. As the CPU has been noted as offline,
RCU is ignoring that CPU, which means that it can not use RCU protected
locks. When tracepoints are activated, they require RCU locking, and
if RCU is ignoring a CPU that runs a tracepoint, there is a chance that
the tracepoint could cause corruption.
The solution was to change the tracepoint into a TRACE_EVENT_CONDITION()
which allows us to check a condition to determine if the tracepoint
should be called or not. If the condition is not met, the rcu protected
code will not be executed. By adding the condition
"cpu_online(smp_processor_id())", this will prevent the RCU protected
code from being executed if the CPU is marked offline.
After adding this, another bug was discovered. As RCU checks rcu callers,
if a rcu call is not done, there is no check (obviously). We found that
tracepoints could be added in RCU ignored locations and not have lockdep
complain until the tracepoint is activated. This missed places where
tracepoints were added in places they should not have been. To fix this,
code was added in 3.18 that if lockdep is enabled, any tracepoint will
still call the rcu checks even if the tracepoint is not enabled. The bug
here, is that the check does not take the CONDITION into account. As the
condition may prevent tracepoints from being activated in RCU ignored
areas (as the one patch does), we get false positives when we enable
lockdep and hit a tracepoint that the condition prevents it from being
called in a RCU ignored location. The fix for this is to add the
CONDITION to the rcu checks, even if the tracepoint is not enabled.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU1rQfAAoJEEjnJuOKh9ld19UH/juFLZFjpYBgtRmbCZa/54Zk
i2Fa3U8jQe8MHHEYOjCLT9MQTYo/42btmJhr7kWKtIoUgDEli4lkOpbs+H0qar5y
Vv9+1cLeNFQzgIE3nwV7cjAw7Jufoyzd1lstDqIQvcmzZnQ5sNyyVeigMcxGv8Ls
4FyqzG6zCVgiDL4LyYNHdNcMr6qLs3KTFDEqp+kQreeO7R1r3ZEpq3JoWaEUgoPP
qrYv/rqVosLBUGA0pd7RmiGOxhjeKm15qz1GkiPeeus6DDWC6bvPC8cAc/FfkXH0
hYpoQghSZVnXGy0LzVsd44gj7tYx1FHEpYy1s8G6d5WJcNOGZ6OoZOdOZMyjPVw=
=PeL2
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fixes from Steven Rostedt:
"During testing Sedat Dilek hit a "suspicious RCU usage" splat that
pointed out a real bug. During suspend and resume the tlb_flush
tracepoint is called when the CPU is going offline. As the CPU has
been noted as offline, RCU is ignoring that CPU, which means that it
can not use RCU protected locks. When tracepoints are activated, they
require RCU locking, and if RCU is ignoring a CPU that runs a
tracepoint, there is a chance that the tracepoint could cause
corruption.
The solution was to change the tracepoint into a
TRACE_EVENT_CONDITION() which allows us to check a condition to
determine if the tracepoint should be called or not. If the condition
is not met, the rcu protected code will not be executed. By adding
the condition "cpu_online(smp_processor_id())", this will prevent the
RCU protected code from being executed if the CPU is marked offline.
After adding this, another bug was discovered. As RCU checks rcu
callers, if a rcu call is not done, there is no check (obviously). We
found that tracepoints could be added in RCU ignored locations and not
have lockdep complain until the tracepoint is activated. This missed
places where tracepoints were added in places they should not have
been. To fix this, code was added in 3.18 that if lockdep is enabled,
any tracepoint will still call the rcu checks even if the tracepoint
is not enabled. The bug here, is that the check does not take the
CONDITION into account. As the condition may prevent tracepoints from
being activated in RCU ignored areas (as the one patch does), we get
false positives when we enable lockdep and hit a tracepoint that the
condition prevents it from being called in a RCU ignored location.
The fix for this is to add the CONDITION to the rcu checks, even if
the tracepoint is not enabled"
* tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
x86/tlb/trace: Do not trace on CPU that is offline
tracing: Add condition check to RCU lockdep checks
Receive Flow Steering is a nice solution but suffers from
hash collisions when a mix of connected and unconnected traffic
is received on the host, when flow hash table is populated.
Also, clearing flow in inet_release() makes RFS not very good
for short lived flows, as many packets can follow close().
(FIN , ACK packets, ...)
This patch extends the information stored into global hash table
to not only include cpu number, but upper part of the hash value.
I use a 32bit value, and dynamically split it in two parts.
For host with less than 64 possible cpus, this gives 6 bits for the
cpu number, and 26 (32-6) bits for the upper part of the hash.
Since hash bucket selection use low order bits of the hash, we have
a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big
enough.
If the hash found in flow table does not match, we fallback to RPS (if
it is enabled for the rxqueue).
This means that a packet for an non connected flow can avoid the
IPI through a unrelated/victim CPU.
This also means we no longer have to clear the table at socket
close time, and this helps short lived flows performance.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure that in state FIN_WAIT2 or TIME_WAIT, where the connection is
represented by a tcp_timewait_sock, we rate limit dupacks in response
to incoming packets (a) with TCP timestamps that fail PAWS checks, or
(b) with sequence numbers that are out of the acceptable window.
We do not send a dupack in response to out-of-window packets if it has
been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we
last sent a dupack in response to an out-of-window packet.
Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure that in state ESTABLISHED, where the connection is represented
by a tcp_sock, we rate limit dupacks in response to incoming packets
(a) with TCP timestamps that fail PAWS checks, or (b) with sequence
numbers or ACK numbers that are out of the acceptable window.
We do not send a dupack in response to out-of-window packets if it has
been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we
last sent a dupack in response to an out-of-window packet.
There is already a similar (although global) rate-limiting mechanism
for "challenge ACKs". When deciding whether to send a challence ACK,
we first consult the new per-connection rate limit, and then the
global rate limit.
Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the SYN_RECV state, where the TCP connection is represented by
tcp_request_sock, we now rate-limit SYNACKs in response to a client's
retransmitted SYNs: we do not send a SYNACK in response to client SYN
if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms)
since we last sent a SYNACK in response to a client's retransmitted
SYN.
This allows the vast majority of legitimate client connections to
proceed unimpeded, even for the most aggressive platforms, iOS and
MacOS, which actually retransmit SYNs 1-second intervals for several
times in a row. They use SYN RTO timeouts following the progression:
1,1,1,1,1,2,4,8,16,32.
Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Helpers for mitigating ACK loops by rate-limiting dupacks sent in
response to incoming out-of-window packets.
This patch includes:
- rate-limiting logic
- sysctl to control how often we allow dupacks to out-of-window packets
- SNMP counter for cases where we rate-limited our dupack sending
The rate-limiting logic in this patch decides to not send dupacks in
response to out-of-window segments if (a) they are SYNs or pure ACKs
and (b) the remote endpoint is sending them faster than the configured
rate limit.
We rate-limit our responses rather than blocking them entirely or
resetting the connection, because legitimate connections can rely on
dupacks in response to some out-of-window segments. For example, zero
window probes are typically sent with a sequence number that is below
the current window, and ZWPs thus expect to thus elicit a dupack in
response.
We allow dupacks in response to TCP segments with data, because these
may be spurious retransmissions for which the remote endpoint wants to
receive DSACKs. This is safe because segments with data can't
realistically be part of ACK loops, which by their nature consist of
each side sending pure/data-less ACKs to each other.
The dupack interval is controlled by a new sysctl knob,
tcp_invalid_ratelimit, given in milliseconds, in case an administrator
needs to dial this upward in the face of a high-rate DoS attack. The
name and units are chosen to be analogous to the existing analogous
knob for ICMP, icmp_ratelimit.
The default value for tcp_invalid_ratelimit is 500ms, which allows at
most one such dupack per 500ms. This is chosen to be 2x faster than
the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule
2.4). We allow the extra 2x factor because network delay variations
can cause packets sent at 1 second intervals to be compressed and
arrive much closer.
Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
OVS userspace already probes the openvswitch kernel module for
OVS_ACTION_ATTR_SET_MASKED support. This patch adds the kernel module
implementation of masked set actions.
The existing set action sets many fields at once. When only a subset
of the IP header fields, for example, should be modified, all the IP
fields need to be exact matched so that the other field values can be
copied to the set action. A masked set action allows modification of
an arbitrary subset of the supported header bits without requiring the
rest to be matched.
Masked set action is now supported for all writeable key types, except
for the tunnel key. The set tunnel action is an exception as any
input tunnel info is cleared before action processing starts, so there
is no tunnel info to mask.
The kernel module converts all (non-tunnel) set actions to masked set
actions. This makes action processing more uniform, and results in
less branching and duplicating the action processing code. When
returning actions to userspace, the fully masked set actions are
converted back to normal set actions. We use a kernel internal action
code to be able to tell the userspace provided and converted masked
set actions apart.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the second NFC pull request for 3.20.
It brings:
- NCI NFCEE (NFC Execution Environment, typically an embedded or
external secure element) discovery and enabling/disabling support.
In order to communicate with an NFCEE, we also added NCI's logical
connections support to the NCI stack.
- HCI over NCI protocol support. Some secure elements only understand
HCI and thus we need to send them HCI frames when they're part of
an NCI chipset.
- NFC_EVT_TRANSACTION userspace API addition. Whenever an application
running on a secure element needs to notify its host counterpart,
we send an NFC_EVENT_SE_TRANSACTION event to userspace through the
NFC netlink socket.
- Secure element and HCI transaction event support for the st21nfcb
chipset.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU06ozAAoJEIqAPN1PVmxKdtkP/2CeQ0+xJnJWgyh4xjkVxfPb
wPYrz3cHeHdnVX36mnIdRoQZRjxh/Ef/vgK8RsXohcldzHA9D8GjxRpj0GrkVGQ9
xoDkiHsFDdsZHeezSrlnXNnObvZrkeMsmnUDJxfwLtcX1nzFKzBgW2GaDiNjUazC
r5Ip4YIfoCaVLq5MSqYRqJ6uhplXd/ePdtPoo54L/E81y4ptQi8pUsQBy4K3GrFn
FbXoemcymhi9AVBGl+OlppZ93t6bdOV1D5tcOwpytAbfa3jp2oUnJPZay7EoWruR
aj8l5v3P+SSphAJwaGSbny0L83tTS7sq+N4QG+VxxdMkw2YjpmxgN3AGguNBtPGG
SUThpZV6QhrkAOnwzP2BJnh68WSU1eJrGvk8zUWW0SanA8k2jaHO/VzXCA3/ZkC4
IFcXl4Jr6R3iUxDFgS/6MB5E9EGedzFTacsWoHVyZPtaDAQp4WVdAIRQ4QKgJCLw
1yRmCeVunTsrs6O0aenU1xHinPlnx+tkuiNh4H64APQYTz54WwXH1/S04OL1SkqI
mTzUvFgWqJvSIOuU19g0HBw+xZ/9vvX8LLkm9kSTbe7tcmFH5CI7ZCN/hNDHvMSd
sjNYOyag/SAHJ01tTKcSPAgWO49bHWPodI04zP0lK0gMLjKvRknVMdTgIDfu49HN
2da9E23iBmNNgG79iVHN
=8caD
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-3.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
NFC: 3.20 second pull request
This is the second NFC pull request for 3.20.
It brings:
- NCI NFCEE (NFC Execution Environment, typically an embedded or
external secure element) discovery and enabling/disabling support.
In order to communicate with an NFCEE, we also added NCI's logical
connections support to the NCI stack.
- HCI over NCI protocol support. Some secure elements only understand
HCI and thus we need to send them HCI frames when they're part of
an NCI chipset.
- NFC_EVT_TRANSACTION userspace API addition. Whenever an application
running on a secure element needs to notify its host counterpart,
we send an NFC_EVENT_SE_TRANSACTION event to userspace through the
NFC netlink socket.
- Secure element and HCI transaction event support for the st21nfcb
chipset.
Signed-off-by: David S. Miller <davem@davemloft.net>
The system suspend calls (used to synchronize between system suspend of
the CPU and it's PMIC) are called from board code but not stubbed when
regulator is disabled causing build failures in such cases. This patch
fixes those build failures.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU0orTAAoJECTWi3JdVIfQnjEH/0dF1gfaCbZJidzPnsZWexm8
sofDP5Lmzd/oym2M5LT06i4FGL/1jg08X8SEqGmZK84F/HKvTgZneHyusF952WAj
Rf4HZ+QgVpItJH0Try+O6Ww+QEzwDjrFhJcmTwhX9vOuHKBRFG+K4d8U3HaIWNu/
j5nggRzXKO4jcuPFBOda7CoiHb4bk8YBVrmmm0uA41/RKM1c80+sTLASpb4JIoL2
VOcktUGpfIwtZKTZAX+xsbDDYQ+4fmVApRlldCrJqJVMHcUsPjFV5PUIatUs2ZVE
+t9kafuLyXpA8ZNHQA22TyOD8UXMn/Iyc9Xkp+EieoFamCErTdnBjk1Im3g3BAA=
=yXkZ
-----END PGP SIGNATURE-----
Merge tag 'regulator-v3.19-rc7' into regulator-linus
regulator: Fix !REGULATOR builds of systems using system suspend calls
The system suspend calls (used to synchronize between system suspend of
the CPU and it's PMIC) are called from board code but not stubbed when
regulator is disabled causing build failures in such cases. This patch
fixes those build failures.
# gpg: Signature made Thu 05 Feb 2015 05:10:43 HKT using RSA key ID 5D5487D0
# gpg: WARNING: digest algorithm MD5 is deprecated
# gpg: please see http://www.gnupg.org/faq/weak-digest-algos.html for more information
# gpg: Oops: keyid_from_fingerprint: no pubkey
# gpg: key AF88CD16: no public key for trusted key - skipped
# gpg: key AF88CD16 marked as ultimately trusted
# gpg: key 5621E907: no public key for trusted key - skipped
# gpg: key 5621E907 marked as ultimately trusted
# gpg: Good signature from "Mark Brown <broonie@sirena.org.uk>"
# gpg: aka "Mark Brown <broonie@debian.org>"
# gpg: aka "Mark Brown <broonie@kernel.org>"
# gpg: aka "Mark Brown <broonie@tardis.ed.ac.uk>"
# gpg: aka "Mark Brown <broonie@linaro.org>"
# gpg: aka "Mark Brown <Mark.Brown@linaro.org>"
When taking a CPU down for suspend and resume, a tracepoint may be called
when the CPU has been designated offline. As tracepoints require RCU for
protection, they must not be called if the current CPU is offline.
Unfortunately, trace_tlb_flush() is called in this scenario as was noted
by LOCKDEP:
...
Disabling non-boot CPUs ...
intel_pstate CPU 1 exiting
===============================
smpboot: CPU 1 didn't die...
[ INFO: suspicious RCU usage. ]
3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
-------------------------------
include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 0
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1
Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011
ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600
0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78
Call Trace:
[<ffffffff817e370d>] dump_stack+0x4c/0x65
[<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120
[<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0
[<ffffffff81054c4e>] play_dead_common+0xe/0x50
[<ffffffff81054ca5>] native_play_dead+0x15/0x140
[<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20
[<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580
[<ffffffff81053e20>] start_secondary+0x140/0x150
intel_pstate CPU 2 exiting
...
By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the
condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected
code when the CPU is offline.
Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
Cc: stable@vger.kernel.org # 3.17+
Fixes: d17d8f9ded "x86/mm: Add tracepoints for TLB flushes"
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Hansen <dave@sr71.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The trace_tlb_flush() tracepoint can be called when a CPU is going offline.
When a CPU is offline, RCU is no longer watching that CPU and since the
tracepoint is protected by RCU, it must not be called. To prevent the
tlb_flush tracepoint from being called when the CPU is offline, it was
converted to a TRACE_EVENT_CONDITION where the condition checks if the
CPU is online before calling the tracepoint.
Unfortunately, this was not enough to stop lockdep from complaining about
it. Even though the RCU protected code of the tracepoint will never be
called, the condition is hidden within the tracepoint, and even though the
condition prevents RCU code from being called, the lockdep checks are
outside the tracepoint (this is to test tracepoints even when they are not
enabled).
Even though tracepoints should be checked to be RCU safe when they are not
enabled, the condition should still be considered when checking RCU.
Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
Fixes: 3a630178fd "tracing: generate RCU warnings even when tracepoints are disabled"
Cc: stable@vger.kernel.org # 3.18+
Acked-by: Dave Hansen <dave@sr71.net>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
- Yann realized that the previous revert of new userspace ABI did not
go far enough, and we're still exposing a change that we don't want.
Revert even closer to 3.18 interface to make sure we get things right
in the long run.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJU1S+NAAoJEENa44ZhAt0hbq0P/0EhrI204nNULnK7UUV8nUAk
E5lN45Er0uorsX4dcX4nM+YinNSmlrV7uX5B7Nn9/B6MyTvT02Ws4WDjHnvjrhbN
3oCuOhwT4lFldAPFIobYMoK0zZVpEffzXOgZrnabEK1PuqI4jSw/wNupGAnvRyxs
t+m3qhKgRDpQRylIC3HT9zamf5n/r0ouehzTJvT5jLTpcxIW/CTyO6JVn1YWRMNC
Weh3cFhMpZXnuxMo02Jtvd9SIJXoI+oD5hws1/oomNZYXiqSG//u5g9FMZ2jkdhI
KJmuuTfp7SG7f7XCElguEbsSXTO0z3Bfq+OjH/9JH4FYOS2AUK2zxC9ftGzF7hky
/qxT0LKEvBUbNLEihFaWRs9O0W40nrcCldRtpeX/0ChSNVQxhO72O4WbSi/rzAyD
VCzlSkEI9Dz3XESxG5C7DuY057wWVnVPWByrjoo4EL3C/A/PGwGQH1vTXeChkWHQ
pV0FAJTRWvUtWWWJMyma+ZjmTn8upguj5eYXYkH28uLfAs2IOhwDxQBEN/+8nIQ5
iT4DvqM26FQ0kh9YJYOJkSxgWqHhjzb4v0PHm/Gx3KNiO1mwN8zQNsI7Karwmdt1
L/LpexwYfwWB8BJHAcKnuvSAVAhyYzDSzjwUJp26iwyhDLHczA5+dofpslK5/Pat
0N2HI9xdAmstONTNNcZz
=5lkC
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull one more infiniband revert from Roland Dreier:
"One more last-second RDMA change for 3.19: Yann realized that the
previous revert of new userspace ABI did not go far enough, and we're
still exposing a change that we don't want. Revert even closer to
3.18 interface to make sure we get things right in the long run"
Yann Droneaud pipes up:
"I hope this could go in v3.19 as, at this stage, we don't want to
expose any bits of this ABI in a released kernel"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
Revert "IB/core: Add support for extended query device caps"
This is the last missing piece to get a kernel booting to a prompt in qemu-cris.
Signed-off-by: Niklas Cassel <nks@flawful.org>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull scheduler fixes from Ingo Molnar:
"Misc fixes"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix deadline parameter modification handling
sched/wait: Remove might_sleep() from wait_event_cmd()
sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask
sched/fair: Avoid using uninitialized variable in preferred_group_nid()
Hopefully the final pull request for 3.19: this ended up with a
slightly higher volume than wished, but I put them all as they are
either stable or 3.19 regression fixes.
Most of commits are from ASoC, and have been stewed for a while in
linux-next. The only change in the common code is the regression
fixes for ASoC AC97 stuff wrt device registrations. The rest are
device-specific, mostly small fixes in various ASoC drivers and
ak411x on ice1724 boards.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJU1IACAAoJEGwxgFQ9KSmk3doP/AvQw++5VviePzYisnwWqiVF
zZ4hLTHXKyVTz00Lvhuf31Xnv7MUBQ1m8ww+b6jaUWKd1ZZRshITzIhYtP8HHE5k
LEDPuY1UeHiu0syidEVTCLZ1kXlNxKZ0F8da0rSl6/yN35l1lmiscPV0bpxXOsZH
wxo8HItzkB8Jl6KrvjX5Ftlu30no6qw475Ud0DmIXLzE9l2J5pcV/TYLiyLrKV1M
VnPvBFx5eCyIe2c9+V+29auQ2pdMmdqod66pnFDPSFPtGUF1cX/UuULRParEQWys
iMG/zMBBIa5H+7HP9DT4hjipZSuFyWM9nNx+l1Hlwr5GoaK/Q0wyunULjLtuE4x5
ScWA89D6okYpwJsiV6VNuzhk8qe7UUzUU/8CT8rtr8vbyJrFmGm9g7a3wmfPz2LV
GNFApYkz5q4eHViqiXE15k8obJRgQH/bXjlhGtZMFSDx1RFgbacZTxsIjlSO6krG
Wwt2mRMLQ/du2T1lIXcRYB6EwxxHFDdy3Yp8VzY3Yrf4Wg7Exyb3V2YrxtCDuONI
4DADJNk4xWII6ApniF94PTjiOkyX1juLvuQKAhVz/scteCOmAZbGaV7fNs4unwhn
nD2unzH+0EcOe77Ej/QS5I7X85R6k3aAQzhZdlQVQ46TdrlFBCvmMMhGJL8EtdXF
IeOnw/QI7KVo3mSE7XyS
=JNb4
-----END PGP SIGNATURE-----
Merge tag 'sound-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Hopefully the final pull request for 3.19: this ended up with a
slightly higher volume than wished, but I put them all as they are
either stable or 3.19 regression fixes.
Most of commits are from ASoC, and have been stewed for a while in
linux-next. The only change in the common code is the regression
fixes for ASoC AC97 stuff wrt device registrations. The rest are
device-specific, mostly small fixes in various ASoC drivers and ak411x
on ice1724 boards"
* tag 'sound-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: Intel: fix sst firmware path for cht-bsw-rt5672
ARM: dts: Fix I2S1, I2S2 compatible for exynos4 SoCs
ASoC: sgtl5000: add delay before first I2C access
MAINTAINERS: ASoC: add maintainer for Intel BDW/HSW ASoC driver
ASoC: atmel_ssc_dai: fix the setting for DSP mode
ASoC: sgtl5000: Use shift mask when setting codec mode
ASoC: tlv320aic3x: Fix data delay configuration
ALSA: ak411x: Fix stall in work callback
ASoC: Intel: Used lock version to update shim registers
ASoC: wm8731: init mutex in i2c init path
ASoC: atmel_ssc_dai: fix start event for I2S mode
ASoC: rt5640: Add RT5642 ACPI ID for Intel Baytrail
ASoC: wm97xx: Reset AC'97 device before registering it
ASoC: Add support for allocating AC'97 device before registering it
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- adds coupled cpuidle support for exynos4210
: fix for Exynos platform PM code preparing it for the coupled
cpuidle support and adds coupled cpuidle AFTR mode on exynos4210
Note this is mostrly based on earlier cpuidle-exynos4210 driver
from Daniel Lezcano and Bart updated.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJU0ikrAAoJEA0Cl+kVi2xqLoUQAI2JZi6XyeyJ+0YNGn+TpGEp
OBMks8hVyUXDIk+juuijRLh1MTzOkguMNKjS8y84IHpM8SvVUiPzh/b4pPBr7i14
c6zaocm17HePZwF9tR5VwLAAiyV+0f9rRgUUBSaVTyc5/tMFJES6iuNVzPwa7fqf
5u7YjAuV1N349NC8148wvvrg/ICktpdCydYmfOfOTsjgVNwiyd4kF3ofC7Xt9MrN
fNgdl+OQZxOIWFKkBw3p3+exWybgjd9GE5K9kuASmGWrIOXjRO4KxO/UqRBNJIhH
bVjnkEq88kczf4Z999YHO33fMaMC3y/h++luRgkzxOZge0KVbqusikq+dCc5je3D
k/TPn0tnc8A2nbbcvziuXOe/EhudqXWD+QZXhRcmUAeRFDA1DNsEfSHzujD1fbyn
TA9+RjLodCv2LT5wQq/EHs5yiAYChiUz/TGOBwhHzMUNvXKs3cSBtATabXgWrCtd
pYtI+o1FYekccafF1bUWMIR6BDHxAOb41FDQVZYzi7+ez5yHHYEwZGqPXLmY3x/f
8Nvyy2PgOouB5cHSxn9r0wtMM0nATWB1WNUIf3cPRCHZKpxpHOhrH4CEDsQG02FS
qhs8QARvDA2pCV+m9hFFKDt/VNMpzHnw6uC0XWlICrDk9sHPzwXVuB1MHcjr/n6w
wJhgNOVyfTEnIDQi2mGt
=nadx
-----END PGP SIGNATURE-----
Merge tag 'samsung-cpuidle' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/drivers
Merge "Samsung CPUIdle updates for v3.20" from Kukjin Kim:
- adds coupled cpuidle support for exynos4210
: fix for Exynos platform PM code preparing it for the coupled
cpuidle support and adds coupled cpuidle AFTR mode on exynos4210
Note this is mostrly based on earlier cpuidle-exynos4210 driver
from Daniel Lezcano and Bart updated.
* tag 'samsung-cpuidle' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
cpuidle: exynos: add coupled cpuidle support for exynos4210
ARM: EXYNOS: apply S5P_CENTRAL_SEQ_OPTION fix only when necessary
Signed-off-by: Olof Johansson <olof@lixom.net>
For assigning sysfs entries for a card device from the driver,
introduce a new helper function, snd_card_add_dev_attr(). In this
way, we can avoid the possible race between the device registration
and the sysfs addition / removal.
The driver can pass a new attribute group to add freely. This has to
be called before snd_card_register().
Currently, up to two extra groups can be added. More than that, it'll
return an error.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
While commit 7e36ef8205 ("IB/core: Temporarily disable
ex_query_device uverb") is correct as it makes the extended
QUERY_DEVICE uverb (which came as part of commit 5a77abf9a9
("IB/core: Add support for extended query device caps") and commit
860f10a799 ("IB/core: Add flags for on demand paging support")) not
available to userspace, it doesn't address the initial issue regarding
ib_copy_to_udata() [1][2].
Additionally, further discussions around this new uverb seems to
conclude it would require a different data structure than the one
currently described in <rdma/ib_user_verbs.h> [3].
Both of these issues require a revert of the changes, so this patch
partially reverts commit 8cdd312cfe ("IB/mlx5: Implement the ODP
capability query verb") and commit 860f10a799 ("IB/core: Add flags
for on demand paging support") and fully reverts commit 5a77abf9a9
("IB/core: Add support for extended query device caps").
[1] "Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps"
http://mid.gmane.org/1418733236.2779.26.camel@opteya.com
[2] "Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb"
http://mid.gmane.org/1423067503.3030.83.camel@opteya.com
[3] "RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask"
http://mid.gmane.org/2807E5FD2F6FDA4886F6618EAC48510E0CC12C30@CRSMSX101.amr.corp.intel.com
Cc: Eli Cohen <eli@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
variant of the rk3288-evb and the setting of a clock for the watchdog.
Also the lcd and hdmi controllers on both the firefly and the evb get
enabled and let us now boot into fbcon console sucessfully.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCAAGBQJUy+f+AAoJEPOmecmc0R2BdFcH/i+9hgt9ElJxKbpPPVPVv81O
QzmIi3g+B4WqAoKBfzkug5l82mSZ4FAvmQK5ad8BYHsBaGl+GRjwwaqRqZkPUOzl
umXk0GneZ5sPK+S1UdA/S/v7hTwgEpXOLnGUCGUXbLt7Gqda99ume08jgCe2JbH5
Tfjoia5uRlOgdZIf3KBr4TkPR+LHog1i/DI9N3XrAsjLtj/igHpoRmFydBmv7noo
OuRuPY0afg1jKs/MextjM0W1JzQbGyv4Jp4NjVFv60+KZrMgS64vTDZamxYEK4Sv
wn0pQUZgevQhd3ewdLtqrcbo4wIGiLWBXQK0vzonmo5tCk8r+59QFKxeMiq2C7Q=
=ZVrD
-----END PGP SIGNATURE-----
Merge tag 'v3.20-rockchip-dts3' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into next/dt
Merge "ARM: rockchip: third (and last) batch of dts updates for 3.20" from
Heiko Stübner:
Change are regulator nodes for the cpu and gpu regulators on the act8846
variant of the rk3288-evb and the setting of a clock for the watchdog.
Also the lcd and hdmi controllers on both the firefly and the evb get
enabled and let us now boot into fbcon console sucessfully.
* tag 'v3.20-rockchip-dts3' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: dts: rockchip: move the hdmi ddc-i2c-bus property to the actual boards
ARM: dts: rockchip: enable vops and hdmi output on rk3288-firefly and -evb
ARM: dts: rockchip: housekeeping off i2c0 on rk3288-evb boards
ARM: dts: rockchip: add cpu and gpu regulators to rk3288-evb-act8846
ARM: dts: rockchip: add rk3288 watchdog clock
clk: rockchip: add id for watchdog pclk on rk3288
clk: rockchip: add clock IDs for the PVTM clocks
clk: rockchip: add clock ID for usbphy480m_src
Signed-off-by: Olof Johansson <olof@lixom.net>
If we're sending an asynchronous layoutreturn, then we need to ensure
that the inode and the super block remain pinned.
Cc: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>
If we're sending an asynchronous layoutcommit, then we need to ensure
that the inode and the super block remain pinned.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>
RFC 4429 ("Optimistic DAD") states that optimistic addresses
should be treated as deprecated addresses. From section 2.1:
Unless noted otherwise, components of the IPv6 protocol stack
should treat addresses in the Optimistic state equivalently to
those in the Deprecated state, indicating that the address is
available for use but should not be used if another suitable
address is available.
Optimistic addresses are indeed avoided when other addresses are
available (i.e. at source address selection time), but they have
not heretofore been available for things like explicit bind() and
sendmsg() with struct in6_pktinfo, etc.
This change makes optimistic addresses treated more like
deprecated addresses than tentative ones.
Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/vxlan.c
drivers/vhost/net.c
include/linux/if_vlan.h
net/core/dev.c
The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.
In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.
In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.
In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.
Signed-off-by: David S. Miller <davem@davemloft.net>
These are rather too large for this late in the release cycle but
they're clear, well understood and have been tested to fix a regression
which was introduced for v3.19. The details are all in Lars' changelog
and they've been cooking in -next for a while, to a large extent out
of conservatism about the size.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU08iuAAoJECTWi3JdVIfQ9cUH/1ZNkoS67zEj/wsvcS+co/Dp
hInq7mAwO/usFfFWMzb68Lofc4g5yTGE+vtTAhggmzMGuqjXEMCAS/tO85EoA65d
2f0YNw7gEhUyp0oYPe5tS3I0F8O7ES+u27PwmM8sGO0KdCc9qrb3ppZN3BkEcr2U
akg854atOcKwBvGRLNSyj79WPkjNyMcM/kWkJCEHFLysu8rSilZ+UaekH+CUw02E
hstLM+PlM0LE/9rKmQvRyx6cnVqKqPWxalG0mCtBwpdHLYMn4yDFc3nn71lMn3oQ
/TIcHSfDCRgx3VVW8daBjNHqLgfg4XCcZ5W646zrpyyrlXRCCPJT8N0SY9jZUU4=
=ScU7
-----END PGP SIGNATURE-----
Merge tag 'asoc-fix-ac97-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: AC'97 fixes
These are rather too large for this late in the release cycle but
they're clear, well understood and have been tested to fix a regression
which was introduced for v3.19. The details are all in Lars' changelog
and they've been cooking in -next for a while, to a large extent out
of conservatism about the size.
A few last minute fixes for v3.19, all driver specific. None of them
stand out particularly - it's all the standard people who are affected
will care stuff.
The Samsung fix is a DT only fix for the audio controller, it's being
merged via the ASoC tree due to process messups (the submitter sent it
at the end of a tangentally related series rather than separately to the
ARM folks) in order to make sure that it gets to people sooner.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU08eCAAoJECTWi3JdVIfQJxYH/RfwfI9/PitCr1Ry4lqwqVj6
nq4aq4kRS3Lr9zfUnGphkWDbvS8XWko8fBtUehYQmjHunVTc4vtcmJ7Smbdh9nXs
tKbNd0hCZHo2kIpDqDQxYD1FJCB/CdibWfIUSRbfCD6BVkWftz1itbdY5kZ7L1oH
iWdGREBPLxsd80r87I0y3pxaBMHnq/BPt5RkG6LQuFPPR4eueufU2CE8VzwCCirC
hWR4C5XEY8gN9wdF2Imfc1H9PDuafvrVNR6WSUwYi0ynP85XRHJem/DcZ1ngKoZv
KpbVokpobmfCBPxICG4Aqpn4Udz4T5Y3P1lB/1exjiQu4rKpTLMgp1+F4gKl76M=
=xQoH
-----END PGP SIGNATURE-----
Merge tag 'asoc-fix-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v3.19
A few last minute fixes for v3.19, all driver specific. None of them
stand out particularly - it's all the standard people who are affected
will care stuff.
The Samsung fix is a DT only fix for the audio controller, it's being
merged via the ASoC tree due to process messups (the submitter sent it
at the end of a tangentally related series rather than separately to the
ARM folks) in order to make sure that it gets to people sooner.
Pull networking fixes from David Miller:
1) Stretch ACKs can kill performance with Reno and CUBIC congestion
control, largely due to LRO and GRO. Fix from Neal Cardwell.
2) Fix userland breakage because we accidently emit zero length netlink
messages from the bridging code. From Roopa Prabhu.
3) Carry handling in generic csum_tcpudp_nofold is broken, fix from
Karl Beldan.
4) Remove bogus dev_set_net() calls from CAIF driver, from Nicolas
Dichtel.
5) Make sure PPP deflation never returns a length greater then the
output buffer, otherwise we overflow and trigger skb_over_panic().
Fix from Florian Westphal.
6) COSA driver needs VIRT_TO_BUS Kconfig dependencies, from Arnd
Bergmann.
7) Don't increase route cached MTU on datagram too big ICMPs. From Li
Wei.
8) Fix error path leaks in nf_tables, from Pablo Neira Ayuso.
9) Fix bitmask handling regression in netlink that broke things like
acpi userland tools. From Pablo Neira Ayuso.
10) Wrong header pointer passed to param_type2af() in SCTP code, from
Saran Maruti Ramanara.
11) Stacked vlans not handled correctly by vlan_get_protocol(), from
Toshiaki Makita.
12) Add missing DMA memory barrier to xgene driver, from Iyappan
Subramanian.
13) Fix crash in rate estimators, from Eric Dumazet.
14) We've been adding various workarounds, one after another, for the
change which added the per-net tcp_sock. It was meant to reduce
socket contention but added lots of problems.
Reduce this instead to a proper per-cpu socket and that rids us of
all the daemons.
From Eric Dumazet.
15) Fix memory corruption and OOPS in mlx4 driver, from Jack
Morgenstein.
16) When we disabled UFO in the virtio_net device, it introduces some
serious performance regressions. The orignal problem was IPV6
fragment ID generation, so fix that properly instead. From Vlad
Yasevich.
17) sr9700 driver build breaks on xtensa because it defines macros with
the same name as those used by the arch code. Use more unique
names. From Chen Gang.
18) Fix endianness in new virio 1.0 mode of the vhost net driver, from
Michael S Tsirkin.
19) Several sysctls were setting the maxlen attribute incorrectly, from
Sasha Levin.
20) Don't accept an FQ scheduler quantum of zero, that leads to crashes.
From Kenneth Klette Jonassen.
21) Fix dumping of non-existing actions in the packet scheduler
classifier. From Ignacy Gawędzki.
22) Return the write work_done value when doing TX work in the qlcnic
driver.
23) ip6gre_err accesses the info field with the wrong endianness, from
Sabrina Dubroca.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
sit: fix some __be16/u16 mismatches
ipv6: fix sparse errors in ip6_make_flowlabel()
net: remove some sparse warnings
flow_keys: n_proto type should be __be16
ip6_gre: fix endianness errors in ip6gre_err
qlcnic: Fix NAPI poll routine for Tx completion
amd-xgbe: Set RSS enablement based on hardware features
amd-xgbe: Adjust for zero-based traffic class count
cls_api.c: Fix dumping of non-existing actions' stats.
pkt_sched: fq: avoid hang when quantum 0
net: rds: use correct size for max unacked packets and bytes
vhost/net: fix up num_buffers endian-ness
gianfar: correct the bad expression while writing bit-pattern
net: usb: sr9700: Use 'SR_' prefix for the common register macros
Revert "drivers/net: Disable UFO through virtio"
Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets"
ipv6: Select fragment id during UFO segmentation if not set.
xen-netback: stop the guest rx thread after a fatal error
net/mlx4_core: Fix kernel Oops (mem corruption) when working with more than 80 VFs
isdn: off by one in connect_res()
...
And also remove the unused bdev argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Make use of a new interface provided by iov_iter, backed by
scatter-gather list of iovec, instead of the old interface based on
sg_iovec. Also use iov_iter_advance() instead of manual iteration.
This commit should contain only literal replacements, without
functional changes.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>
[hch: fixed to do a deep clone of the iov_iter, and to properly use
the iov_iter direction]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
ACPICA commit 199cad16530a45aea2bec98e528866e20c5927e1
Since whether the GPE should be disabled/enabled/cleared should only be
determined by the GPE driver's state machine:
1. GPE should be disabled if the driver wants to switch to the GPE polling
mode when a GPE storm condition is indicated and should be enabled if
the driver wants to switch back to the GPE interrupt mode when all of
the storm conditions are cleared. The conditions should be protected by
the driver's specific lock.
2. GPE should be enabled if the driver has accepted more than one request
and should be disabled if the driver has completed all of the requests.
The request count should be protected by the driver's specific lock.
3. GPE should be cleared either when the driver is about to handle an edge
triggered GPE or when the driver has completed to handle a level
triggered GPE. The handling code should be protected by the driver's
specific lock.
Thus the GPE enabling/disabling/clearing operations are likely to be
performed with the driver's specific lock held while we currently cannot do
this. This is because:
1. We have the acpi_gbl_gpe_lock held before invoking the GPE driver's
handler. Driver's specific lock is likely to be held inside of the
handler, thus we can see some dead lock issues due to the reversed
locking order or recursive locking. In order to solve such dead lock
issues, we need to unlock the acpi_gbl_gpe_lock before invoking the
handler. BZ 1100.
2. Since GPE disabling/enabling/clearing should be determined by the GPE
driver's state machine, we shouldn't perform such operations inside of
ACPICA for a GPE handler to mess up the driver's state machine. BZ 1101.
Originally this patch includes a logic to flush GPE handlers, it is dropped
due to the following reasons:
1. This is a different issue;
2. Linux OSL has fixed this by flushing SCI in acpi_os_wait_events_complete().
We will pick up this topic when the Linux OSL fix turns out to be not
sufficient.
Note that currently the internal operations and the acpi_gbl_gpe_lock are
also used by ACPI_GPE_DISPATCH_METHOD and ACPI_GPE_DISPATCH_NOTIFY. In
order not to introduce regressions, we add one
ACPI_GPE_DISPATCH_RAW_HANDLER type to be distiguished from
ACPI_GPE_DISPATCH_HANDLER. For which the acpi_gbl_gpe_lock is unlocked before
invoking the GPE handler and the internal enabling/disabling operations are
bypassed to allow drivers to perform them at a proper position using the
GPE APIs and ACPI_GPE_DISPATCH_RAW_HANDLER users should invoke acpi_set_gpe()
instead of acpi_enable_gpe()/acpi_disable_gpe() to bypass the internal GPE
clearing code in acpi_enable_gpe(). Lv Zheng.
Known issues:
1. Edge-triggered GPE lost for frequent enablings
On some buggy silicon platforms, GPE enable line may not be directly
wired to the GPE trigger line. In that case, when GPE enabling is
frequently performed for edge-triggered GPEs, GPE status may stay set
without being triggered.
This patch may maginify this problem as it allows GPE enabling to be
parallel performed during the process the GPEs are handled.
This is an existing issue, because:
1. For task context:
Current ACPI_GPE_DISPATCH_METHOD practices have proven that this
isn't a real issue - we can re-enable edge-triggered GPE in a work
queue where the GPE status bit might already be set.
2. For IRQ context:
This can even happen when the GPE enabling occurs before returning
from the GPE handler and after unlocking the GPE lock.
Thus currently no code is included to protect this.
Link: https://github.com/acpica/acpica/commit/199cad16
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit e06b1624b02dc8317d144e9a6fe9d684c5fa2f90
Version 20150204.
Link: https://github.com/acpica/acpica/commit/e06b1624
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit 8990e73ab2aa15d6a0068b860ab54feff25bee36
Link: https://github.com/acpica/acpica/commit/8990e73a
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit 7926d5ca9452c87f866938dcea8f12e1efb58f89
There is an issue in acpi_install_gpe_handler() and acpi_remove_gpe_handler().
The code to obtain the GPE dispatcher type from the Handler->original_flags
is wrong:
if (((Handler->original_flags & ACPI_GPE_DISPATCH_METHOD) ||
(Handler->original_flags & ACPI_GPE_DISPATCH_NOTIFY)) &&
ACPI_GPE_DISPATCH_NOTIFY is 0x03 and ACPI_GPE_DISPATCH_METHOD is 0x02, thus
this statement is TRUE for the following dispatcher types:
0x01 (ACPI_GPE_DISPATCH_HANDLER): not expected
0x02 (ACPI_GPE_DISPATCH_METHOD): expected
0x03 (ACPI_GPE_DISPATCH_NOTIFY): expected
There is no functional issue due to this because Handler->original_flags is
only set in acpi_install_gpe_handler(), and an earlier checker has excluded
the ACPI_GPE_DISPATCH_HANDLER:
if ((gpe_event_info->Flags & ACPI_GPE_DISPATCH_MASK) ==
ACPI_GPE_DISPATCH_HANDLER)
{
Status = AE_ALREADY_EXISTS;
goto free_and_exit;
}
...
Handler->original_flags = (u8) (gpe_event_info->Flags &
(ACPI_GPE_XRUPT_TYPE_MASK | ACPI_GPE_DISPATCH_MASK));
We need to clean this up before modifying the GPE dispatcher type values.
In order to prevent such issue from happening in the future, this patch
introduces ACPI_GPE_DISPATCH_TYPE() macro to be used to obtain the GPE
dispatcher types. Lv Zheng.
Link: https://github.com/acpica/acpica/commit/7926d5ca
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We need to parse APIC ID for IOAPIC registration for IOAPIC hotplug.
ACPI _MAT method and MADT table are used to figure out IOAPIC ID, just
like parsing CPU APIC ID for CPU hotplug.
[ tglx: Fixed docbook comment ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/1414387308-27148-8-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use common resource list management data structure and interfaces
instead of private implementation.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently ACPI, PCI and pnp all implement the same resource list
management with different data structure. We need to transfer from
one data structure into another when passing resources from one
subsystem into another subsystem. So move struct resource_list_entry
from ACPI into resource core and rename it as resource_entry,
then it could be reused by different subystems and avoid the data
structure conversion.
Introduce dedicated header file resource_ext.h instead of embedding
it into ioport.h to avoid header file inclusion order issues.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cevt-r4k driver used to call into the GIC driver to find whether the
timer was pending, but only with External Interrupt Controller (EIC)
mode, where the Cause.IP bits can't be used as they encode the interrupt
priority level (Cause.RIPL) instead.
However commit e9de688dac ("irqchip: mips-gic: Support local
interrupts") changed the condition from cpu_has_veic to gic_present.
This fails on cores such as P5600 which have a GIC but the local
interrupts aren't routable by the GIC, causing c0_compare_int_usable()
to consider the interrupt unusable so r4k_clockevent_init() fails.
The previous behaviour, added in commit 98b67c37db ("MIPS: Add EIC
support for GIC."), wasn't really correct either as far as I can tell,
since P5600 apparently supports EIC mode too, and in any case the use of
Cause.TI with r2 should have been sufficient anyway since commit
010c108d7a ("MIPS: PowerTV: Fix support for timer interrupts with > 64
external IRQs").
Therefore drop the call into the gic driver altogether, and add a
comment in c0_compare_int_pending() to clarify that Cause.TI does get
checked since MIPS r2.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Fixes: e9de688dac ("irqchip: mips-gic: Support local interrupts")
Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Steven J. Hill <steven.hill@imgtec.com>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9077/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Add three methods to allow exporting pnfs block layout volumes:
- get_uuid: get a filesystem unique signature exposed to clients
- map_blocks: map and if nessecary allocate blocks for a layout
- commit_blocks: commit blocks in a layout once the client is done with them
For now we stick the external pnfs block layout interfaces into s_export_op to
avoid mixing them up with the internal interface between the NFS server and
the layout drivers. Once we've fully internalized the latter interface we
can redecide if these methods should stay in s_export_ops.
Signed-off-by: Christoph Hellwig <hch@lst.de>
include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types)
include/net/ipv6.h:713:22: expected restricted __be32 [usertype] hash
include/net/ipv6.h:713:22: got unsigned int
include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer
include/net/ipv6.h:719:22: warning: invalid assignment: ^=
include/net/ipv6.h:719:22: left side has type restricted __be32
include/net/ipv6.h:719:22: right side has type unsigned int
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(struct flow_keys)->n_proto is in network order, use
proper type for this.
Fixes following sparse errors :
net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:139:39: expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:139:39: got restricted __be16 [assigned] [usertype] proto
net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:237:23: expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:237:23: got restricted __be16 [assigned] [usertype] proto
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e0f31d8498 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an optimization for the MS_LAZYTIME mount option so that we will
opportunistically write out any inodes with the I_DIRTY_TIME flag set
in a particular inode table block when we need to update some inode in
that inode table block anyway.
Also add some temporary code so that we can set the lazytime mount
option without needing a modified /sbin/mount program which can set
MS_LAZYTIME. We can eventually make this go away once util-linux has
added support.
Google-Bug-Id: 18297052
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a new function find_inode_nowait() which is an even more general
version of ilookup5_nowait(). It is designed for callers which need
very fine grained control over when the function is allowed to block
or increment the inode's reference count.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a new mount option which enables a new "lazytime" mode. This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode. The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.
This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.
For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table. The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives. Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact long tail latencies --- which
is a very big deal for web serving tiers (for example).
Google-Bug-Id: 18297052
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Instead of using magic number in the code the patch provides
DW_DMA_MAX_NR_MASTERS constant.
While here, restrict the reading of data width array by amount of the actual
number of AHB masters.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
The dma_dev field is widely used in filter functions to mach with a proper DMA
controller device. Thus it's not deprecated. The patch fixes the description of
that field. There is no functional change.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
FQ has a fast path for skb attached to a socket, as it does not
have to compute a flow hash. But for other packets, FQ being non
stochastic means that hosts exposed to random Internet traffic
can allocate million of flows structure (104 bytes each) pretty
easily. Not only host can OOM, but lookup in RB trees can take
too much cpu and memory resources.
This patch adds a new attribute, orphan_mask, that is adding
possibility of having a stochastic hash for orphaned skb.
Its default value is 1024 slots, to mimic SFQ behavior.
Note: This does not apply to locally generated TCP traffic,
and no locally generated traffic will share a flow structure
with another perfect or stochastic flow.
This patch also handles the specific case of SYNACK messages:
They are attached to the listener socket, and therefore all map
to a single hash bucket. If listener have set SO_MAX_PACING_RATE,
hoping to have new accepted socket inherit this rate, SYNACK
might be paced and even dropped.
This is very similar to an internal patch Google have used more
than one year.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
More updates for v3.20:
- Lots of refactoring from Lars-Peter Clausen, moving drivers to more
data driven initialization and rationalizing a lot of DAPM usage.
- Much improved handling of CDCLK clocks on Samsung I2S controllers.
- Lots of driver specific cleanups and feature improvements.
- CODEC support for TI PCM514x and TLV320AIC3104 devices.
- Board support for Tegra systems with Realtek RT5677.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU0ok8AAoJECTWi3JdVIfQ3ccH/2zb+b477H4GNhUw5AEzdHtC
L7CI+5Q9VOGJQmZTjayRdUcPUHx4sTpYdQVTI56Q6CJk0OFljKNJcRrbzPqNgJ46
yOrMTIpNvzFEv46f7rX2uKfvAFOVRAA2f+gl34AMLXqxL5aydbZZBtoe2jP9lL8z
fIZ/s7qXHn3xnvxqNwOz3pnu6wFDrxbG34lXZaTaFQOueZ3fsthWLsz0xtO6a7aI
J9tA+wejd9qf+D3i6svsi+MhB6OehYMh9Fbm+ODj6NMWZGCIA3SJ/PD+gbCg318+
Xo0FNOyiw0fSAeBcHrEcfoaWBJrswxXjXaNbrIXc1/0gTj8AOJHCRus3w0OQU+Q=
=Yw2l
-----END PGP SIGNATURE-----
Merge tag 'asoc-v3.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next
ASoC: Updates for v3.20
More updates for v3.20:
- Lots of refactoring from Lars-Peter Clausen, moving drivers to more
data driven initialization and rationalizing a lot of DAPM usage.
- Much improved handling of CDCLK clocks on Samsung I2S controllers.
- Lots of driver specific cleanups and feature improvements.
- CODEC support for TI PCM514x and TLV320AIC3104 devices.
- Board support for Tegra systems with Realtek RT5677.
Conflicts:
sound/soc/intel/sst-mfld-platform-pcm.c
When we added pacing to TCP, we decided to let sch_fq take care
of actual pacing.
All TCP had to do was to compute sk->pacing_rate using simple formula:
sk->pacing_rate = 2 * cwnd * mss / rtt
It works well for senders (bulk flows), but not very well for receivers
or even RPC :
cwnd on the receiver can be less than 10, rtt can be around 100ms, so we
can end up pacing ACK packets, slowing down the sender.
Really, only the sender should pace, according to its own logic.
Instead of adding a new bit in skb, or call yet another flow
dissection, we tweak skb->truesize to a small value (2), and
we instruct sch_fq to use new helper and not pace pure ack.
Note this also helps TCP small queue, as ack packets present
in qdisc/NIC do not prevent sending a data packet (RPC workload)
This helps to reduce tx completion overhead, ack packets can use regular
sock_wfree() instead of tcp_wfree() which is a bit more expensive.
This has no impact in the case packets are sent to loopback interface,
as we do not coalesce ack packets (were we would detect skb->truesize
lie)
In case netem (with a delay) is used, skb_orphan_partial() also sets
skb->truesize to 1.
This patch is a combination of two patches we used for about one year at
Google.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some existing rhashtable users get too intimate with it by walking
the buckets directly. This prevents us from easily changing the
internals of rhashtable.
This patch adds the helpers rhashtable_walk_init/exit/start/next/stop
which will replace these custom walkers.
They are meant to be usable for both procfs seq_file walks as well
as walking by a netlink dump. The iterator structure should fit
inside a netlink dump cb structure, with at least one element to
spare.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
drm-intel-next-2015-01-30:
- chv rps improvements from Ville
- atomic state handling prep work from Ander
- execlist request tracking refactoring from Nick Hoath
- forcewake code consolidation from Chris&Mika
- fastboot plane config refactoring and skl support from Damien
- some more skl pm patches all over (Damien)
- refactor dsi code to use drm dsi helpers and drm_panel infrastructure (Jani)
- first cut at experimental atomic plane updates (Matt Roper)
- piles of smaller things all over, as usual
* 'drm-intel-next' of git://anongit.freedesktop.org/drm-intel: (102 commits)
drm/i915: Remove bogus locking check in the hangcheck code
drm/i915: Update DRIVER_DATE to 20150130
drm/i915: Use pipe_config's cpu_transcoder for reading encoder hw state
drm/i915: Fix a use-after-free in intel_execlists_retire_requests
drm/i915: Split shared dpll setup out of __intel_set_mode()
drm/i915: Don't do posting reads on getting forcewake
drm/i915: Do uncore early sanitize after domain init
drm/i915: Handle CHV in vlv_set_rps_idle()
drm/i915: Remove nested work in gpu error handling
drm/i915/documentation: Add intel_uncore.c to drm.tmpl
drm/i915/dsi: remove intel_dsi_cmd.c and the unused functions therein
drm/i915/dsi: move dpi_send_cmd() to intel_dsi.c and make it static
drm/i915/dsi: remove old read/write functions in favor of new stuff
drm/i915/dsi: make the vbt panel driver use mipi_dsi_device for transfers
drm/i915/dsi: add drm mipi dsi host support
drm/i915/dsi: switch to drm_panel interface
drm/i915/skl: Enabling PSR on Skylake
Revert "drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES"
drm/i915: Be consistent on printing seqnos
drm/i915: Display current hangcheck status in debugfs
...
Supply interface functions to bond and unbond ports of a mlx4 internal
interfaces. Example for such an interface is the one registered by the
mlx4 IB driver under RoCE.
There are
1. Functions to go in/out to/from bonded mode
2. Function to remap virtual ports to physical ports
The bond_mutex prevents simultaneous access to data that keep status of
the device in bonded mode.
The upper mlx4 interface marks to the mlx4 core module that they
want to be subject for such bonding by setting the MLX4_INTFF_BONDING
flag. Interface which goes to/from bonded mode is re-created.
The mlx4 Ethernet driver does not set this flag when registering the
interface, the IB driver does.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the hardware interface required for port aggregation.
1. Disable RX port check on receive - don't perform a validity check
that matches to QP's port and the port where the packet is received.
2. Virtual to physical port remap - configure virtual to physical port
mapping. Port remap capability for virtual functions.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use notifier chain to dispatch an event upon a change in slave state.
Event is dispatched with slave specific info.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move slave state changes to a helper function, this is a pre-step for adding
functionality of dispatching an event when this helper is called.
This commit doesn't add new functionality.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add event which provides an indication on a change in the state
of a bonding slave. The event handler should cast the pointer to the
appropriate type (struct netdev_bonding_info) in order to get the
full info about the slave.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* revert a patch that caused a regression with mesh userspace (Bob)
* fix a number of suspend/resume related races
(from Emmanuel, Luca and myself - we'll look at backporting later)
* add software implementations for new ciphers (Jouni)
* add a new ACPI ID for Broadcom's rfkill (Mika)
* allow using netns FD for wireless (Vadim)
* some other cleanups (various)
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJU0NEaAAoJEDBSmw7B7bqr2DAP/3nbk6WB2Lz4Vwi8fh9C4y6X
4ZzN2v7NHaimC+Wpxg3wP2wCdX2VG3ZWwF3yLm6qgVHGnE35RFLxURen1eqNeW77
Yf5gvZ066nCGEQ8l8J6YK9vrLX4qp5c4lyE00bbxpZTA4Qq71SgTg+rmGC1be8uX
vacfaLwfDWffuOOnjBAPfanj7f4AQaUEY2uN1WkBFC7iEeOtPcWVkHAFVIeGjJfQ
vfgQJcwOjgWjYbwZdQQi7Aj+k1Yzda4pg1yEWn3CkZ8zyeCYGs1gk2ovoBgPgXBP
yT9ypIdFsN242VJvy7nkFnCKA8mhKyltMQ1Xjs0Q9lAxWdaq9U+iqt5cUn4jxVIG
T9Vi3PCbx/nVOqcfR81dBTQ3uDU5AyosPsmh2YTxi5lpRBrsjNY2FtaKE0sm2Om3
wRiSPOdPrXBeEnU0KssI9e6euXgS4JQV78Naq85OWZDd2yZ1fT5U2fi8y4drRGlz
rucbbobdVhQch5L4FStPz1uW5pNuJrhekXeZIE8MruTNg2A2oBAK3ApO7hxn68sE
RnbAnkxVLwgedC9042JF5eiS1PDIU46w4e782j/+XskVKMEakqd23iJycx3tmgHZ
cxDi/qKZ2RCE74YsT61o/9ErSVPvCfNPL3+CVov918jidQQg2WHfKm/jFlIxHIxk
4wBP7p2VGlENMgw/R8GJ
=wOaR
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2015-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Last round of updates for net-next:
* revert a patch that caused a regression with mesh userspace (Bob)
* fix a number of suspend/resume related races
(from Emmanuel, Luca and myself - we'll look at backporting later)
* add software implementations for new ciphers (Jouni)
* add a new ACPI ID for Broadcom's rfkill (Mika)
* allow using netns FD for wireless (Vadim)
* some other cleanups (various)
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-02-03
Here's what's likely the last bluetooth-next pull request for 3.20.
Notable changes include:
- xHCI workaround + a new id for the ath3k driver
- Several new ids for the btusb driver
- Support for new Intel Bluetooth controllers
- Minor cleanups to ieee802154 code
- Nested sleep warning fix in socket accept() code path
- Fixes for Out of Band pairing handling
- Support for LE scan restarting for HCI_QUIRK_STRICT_DUPLICATE_FILTER
- Improvements to data we expose through debugfs
- Proper handling of Hardware Error HCI events
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds skb_remcsum_process and skb_gro_remcsum_process to
perform the appropriate adjustments to the skb when receiving
remote checksum offload.
Updated vxlan and gue to use these functions.
Tested: Ran TCP_RR and TCP_STREAM netperf for VXLAN and GUE, did
not see any change in performance.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A typical qdisc setup is the following :
bond0 : bonding device, using HTB hierarchy
eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc
XPS allows to spread packets on specific tx queues, based on the cpu
doing the send.
Problem is that dequeues from bond0 qdisc can happen on random cpus,
due to the fact that qdisc_run() can dequeue a batch of packets.
CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0
CPUB -> dequeue packet P1 from bond0
enqueue packet on eth1/eth2
CPUC -> dequeue packet P2 from bond0
enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)
get_xps_queue() then might select wrong queue for P1, since current cpu
might be different than CPUA.
P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)
Effect of this bug is TCP reorders, and more generally not optimal
TX queue placement. (A victim bulk flow can be migrated to the wrong TX
queue for a while)
To fix this, we have to record sender cpu number the first time
dev_queue_xmit() is called for one tx skb.
We can union napi_id (used on receive path) and sender_cpu,
granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for
this union idea)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the include of mach/dma.h to the legacy PXA DMA code where it is used.
This enables building spi-pxa2xx on ARM64.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Here's the big pull request for Gadgets and PHYs. It's
a total of 217 non-merge commits with pretty much everything
being touched.
The most important bits are a ton of new documentation for
almost all usb gadget functions, a new isp1760 UDC driver,
several improvements to the old net2280 UDC driver, and
some minor tracepoint improvements to dwc3.
Other than that, a big list of minor cleanups, smaller bugfixes
and new features all over the place.
Signed-off-by: Felipe Balbi <balbi@ti.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU0lRFAAoJEIaOsuA1yqRE17sP/R4iPwrPQGVQBaqg5AOHZGEe
dKf9GqZZIPzNIs4146Ua5W/9d4U6zQKndy+fRQaNEVc2SR3Tm0IwOSokvSaC3FYr
NEGHMoRnTWd/JWSVB/6sy0qn8rKRMbkxR7u9lG9M/JACUymn3NJfH4D0jq85ewPR
0Tjv4g5wGnv3YEmnWgR5ieFgn0OxgUBiGUF7QufgMp7G3F2hjmeligBD0jt3w6tD
G4oMHp+pRfPCcm8mcdiHoP3aXOtNJ824rI+b1EZkKBKeo7FxRDIe48Vl107XOpOB
yUFnQVGZazh1Oi6Vxmh9O1mmjpNOir/4dni7gZfh1uGC7cJ7tSkOfbN4jH4Ycsay
Ckt8XQkmf/z9VWTONsAkDwfPhnMbxCafz8Fi/UdOXsoR69YV1MKnt1zRN5dzgNq9
7EIqDwPPJi6qwLACoqxVYknSmXQqhW8B0IMPpMqEByvR1mnIOWomlFot63AufMaQ
+uS7JGJguUmMvkyP1FJRKcPsd9u4PYll5JzymPsvSB6xtDisVFqYb3BbfieZHpBn
+/ZFqltT71pQ3TxIx2ZiTk1e91PiKJUbEQikV6TBiLhgtkpn2J8obHtF50K4+xHh
wXOU3VHFd2ZONN+WB5F5EoVtZiwsd3pARr8QJRcVhdXltTWElJ2qsA4Z1+5QVhAy
mqXYcwsvBe9C+5p2pYwR
=bGMq
-----END PGP SIGNATURE-----
Merge tag 'usb-for-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next
Felipe writes:
usb: patches for v3.20 merge window
Here's the big pull request for Gadgets and PHYs. It's
a total of 217 non-merge commits with pretty much everything
being touched.
The most important bits are a ton of new documentation for
almost all usb gadget functions, a new isp1760 UDC driver,
several improvements to the old net2280 UDC driver, and
some minor tracepoint improvements to dwc3.
Other than that, a big list of minor cleanups, smaller bugfixes
and new features all over the place.
Signed-off-by: Felipe Balbi <balbi@ti.com>
This function coresight_is_bit_set() isn't called, so we should
remove it.
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It took me a few tries to figure out what this code did; lets rewrite
it into a more regular form.
The thing that makes this one 'special' is the BSG_F_BLOCK flag, if
that is not set we're not supposed/allowed to block and should spin
wait for completion.
The (new) io_wait_event() will never see a false condition in case of
the spinning and we will therefore not block.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
dev_printk() is now a void function, so the related functions
scmd_printk() and sdev_prefix_printk() should be made void, too.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
If we're using NFSv4.1, then we have the ability to let the server know
whether or not we believe that returning a delegation as part of our OPEN
request would be useful.
The feature needs to be used with care, since the client sending the request
doesn't necessarily know how other clients are using that file, and how
they may be affected by the delegation.
For this reason, our initial use of the feature will be to let the server
know when the client believes that handing out a delegation would not be
useful.
The first application for this function is when opening the file using
O_DIRECT.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Add missing stubs for regulator_suspend_prepare() and
regulator_suspend_finish() to fix exynos_defconfig build without
REGULATOR:
arch/arm/mach-exynos/built-in.o: In function `exynos_suspend_finish':
arch/arm/mach-exynos/suspend.c:537: undefined reference to `regulator_suspend_finish'
arch/arm/mach-exynos/built-in.o: In function `exynos_suspend_prepare':
arch/arm/mach-exynos/suspend.c:520: undefined reference to `regulator_suspend_prepare'
make: *** [vmlinux] Error 1
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reported-by: Joerg Roedel <joro@8bytes.org>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Rename CONFIG_LIVE_PATCHING to CONFIG_LIVEPATCH to make the naming of
the config and the code more consistent.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
So this has been merged originally in
commit 83052d4d5c
Author: Seung-Woo Kim <sw0312.kim@samsung.com>
Date: Thu Dec 15 15:40:55 2011 +0900
drm: Add multi buffer plane pixel formats
which hasn't seen a lot of review really. The problem is that it's not
a real pixel format, but just a different way to lay out NV12 pixels
in macroblocks, i.e. a tiling format.
The new way of doing this is with the soon-to-be-merged fb modifiers.
This was brough up in some long irc discussion around the entire
topic, as an example of where things have gone wrong. Luckily we can
correct the mistake:
- The kms side support for NV12MT is all dead code because
format_check in drm_crtc.c never accepted NV12MT.
- The gem side for the gsc support doesn't look better: The code
forgets to set the pixel format and makes a big mess with the tiling
mode bits, inadvertedly setting them all.
Conclusion: This never really worked (at least not in upstream) and
hence we can safely correct our mistake here.
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Rob Clark <robclark@freedesktop.org>
Cc: Daniel Stone <daniel@fooishbar.org>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: Joonyoung Shim <jy0922.shim@samsung.com>
Acked-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
conn_info is currently allocated only after nfcee_discovery_ntf
which is not generic enough for logical connection other than
NFCEE. The corresponding conn_info is now created in
nci_core_conn_create_rsp().
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
For consistency sake change nci_core_conn_create_rsp structure
credits field to credits_cnt.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The current implementation limits nci_core_conn_create_req()
to only manage NCI_DESTINATION_NFCEE.
Add new parameters to nci_core_conn_create() to support all
destination types described in the NCI specification.
Because there are some parameters with variable size dynamic
buffer allocation is needed.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The NCI_STATIC_RF_CONN_ID logical connection is the most used
connection. Keeping it directly accessible in the nci_dev
structure will simplify and optimize the access.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Currently the adjusments made as part of perf_event_task_tick() use the
percpu rotation lists to iterate over any active PMU contexts, but these
are not used by the context rotation code, having been replaced by
separate (per-context) hrtimer callbacks. However, some manipulation of
the rotation lists (i.e. removal of contexts) has remained in
perf_rotate_context(). This leads to the following issues:
* Contexts are not always removed from the rotation lists. Removal of
PMUs which have been placed in rotation lists, but have not been
removed by a hrtimer callback can result in corruption of the rotation
lists (when memory backing the context is freed).
This has been observed to result in hangs when PMU drivers built as
modules are inserted and removed around the creation of events for
said PMUs.
* Contexts which do not require rotation may be removed from the
rotation lists as a result of a hrtimer, and will not be considered by
the unthrottling code in perf_event_task_tick.
This patch fixes the issue by updating the rotation ist when events are
scheduled in/out, ensuring that each rotation list stays in sync with
the HW state. As each event holds a refcount on the module of its PMU,
this ensures that when a PMU module is unloaded none of its CPU contexts
can be in a rotation list. By maintaining a list of perf_event_contexts
rather than perf_event_cpu_contexts, we don't need separate paths to
handle the cpu and task contexts, which also makes the code a little
simpler.
As the rotation_list variables are not used for rotation, these are
renamed to active_ctx_list, which better matches their current function.
perf_pmu_rotate_{start,stop} are renamed to
perf_pmu_ctx_{activate,deactivate}.
Reported-by: Johannes Jensen <johannes.jensen@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Will Deacon <Will.Deacon@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150129134511.GR17721@leverpostej
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If the IPv6 fragment id has not been set and we perform
fragmentation due to UFO, select a new fragment id.
We now consider a fragment id of 0 as unset and if id selection
process returns 0 (after all the pertrubations), we set it to
0x80000000, thus giving us ample space not to create collisions
with the next packet we may have to fragment.
When doing UFO integrity checking, we also select the
fragment id if it has not be set yet. This is stored into
the skb_shinfo() thus allowing UFO to function correclty.
This patch also removes duplicate fragment id generation code
and moves ipv6_select_ident() into the header as it may be
used during GSO.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new wait_on_bit_timeout() helper, basically the same as
wait_on_bit() except that it also takes a 'timeout' parameter.
All the building blocks like bit_wait_timeout() and
out_of_line_wait_on_bit_timeout() are already in place so the
addition is rather simple.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: davem@davemloft.net
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1422616476-2917-2-git-send-email-johan.hedberg@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
it has just verified that it asks no more than the length of the
first segment of iovec.
And with that the last user of stuff in lib/iovec.c is gone.
RIP.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
With that, all ->sendmsg() instances are converted to iov_iter primitives
and are agnostic wrt the kind of iov_iter they are working with.
So's the last remaining ->recvmsg() instance that wasn't kind-agnostic yet.
All ->sendmsg() and ->recvmsg() advance ->msg_iter by the amount actually
copied and none of them modifies the underlying iovec, etc.
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
That takes care of the majority of ->sendmsg() instances - most of them
via memcpy_to_msg() or assorted getfrag() callbacks. One place where we
still keep memcpy_fromiovecend() is tipc - there we potentially read the
same data over and over; separate patch, that...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
patch is actually smaller than it seems to be - most of it is unindenting
the inner loop body in tcp_sendmsg() itself...
the bit in tcp_input.c is going to get reverted very soon - that's what
memcpy_from_msg() will become, but not in this commit; let's keep it
reasonably contained...
There's one potentially subtle change here: in case of short copy from
userland, mainline tcp_send_syn_data() discards the skb it has allocated
and falls back to normal path, where we'll send as much as possible after
rereading the same data again. This patch trims SYN+data skb instead -
that way we don't need to copy from the same place twice.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert skb_add_data() to iov_iter; allows to get rid of the explicit
messing with iovec in its only caller - skb_add_data() will keep advancing
->msg_iter for us, so there's no need to similate that manually.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently it is not easy to grep for the definition of struct of_device_id.
This is trivially fixed by moving the brace to the right place.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
sram_get_gpool is only used for legacy, non-DT MMP/PXA platforms. Provide
an empty version in order to build on ARM64.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
The I2O layer deals with a technology that to say the least didn't catch on
in the market.
The only relevant products are some of the AMI MegaRAID - which supported I2O
and its native mode (The native mode is faster and runs on Linux), an
obscure crypto ethernet card that's now so many years out of date nobody
would use it, the old DPT controllers, which speak their own dialect and
have their own driver - and ermm.. thats about it.
We also know the code isn't in good shape as recently a patch was proposed
and queried as buggy, which in turn showed the existing code was broken
already by prior "clean up" and nobody had noticed that either.
It's coding style robot code nothing more. Like some forgotten corridor
cleaned relentlessly by a lost Roomba but where no user has trodden in years.
Move it to staging and then to /dev/null.
The headers remain as they are shared with dpt_i2o.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The big issue here is:
of_property_read_u32(np, "flow_cntrl", (u32 *)&dt_pdata->flow_cntrl);
"->flow_cntrl" is a char so when we write a 32 bit number to it then it
corrupts past the end of the char. It's probably hard to notice because
the struct has padding so the code works on little endian systems. But
on a big endian system the code would fail and on a 64 bit, big endian
systems then "nshutdown_gpio" and "baud_rate" would be buggy as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Clients using the dev_pm_put_subsys_data() API isn't interested of a
return value. They care only of decreasing a reference to the device's
pm_subsys_data. So, let's convert the API to a void function, which
anyway seems like reasonable thing to do.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
While adding devices to their PM domains, dev_pm_qos_add_notifier() was
invoked while allocating the generic_pm_domain_data for the device.
Since the generic_pm_domain_data's device pointer will be assigned
after allocation, the ->genpd_dev_pm_qos_notifier() callback could be
called prior having a valid pointer to the device. Similar scenario
existed while removing a device from a genpd.
To cope with these scenarios a mutex was used to protect the pointer to
the device.
By re-order the sequence for when dev_pm_qos_add|remove_notifier() are
invoked, we make sure the ->genpd_dev_pm_qos_notifier() callback are
always called with a valid device pointer available.
In this way, we eliminate the need for protecting the pointer and thus
we can remove the mutex from the struct generic_pm_domain_data.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The reference counting was needed when genpd supported PM domain device
callbacks. Since this option has been removed, let's also remove the
reference counting of the struct generic_pm_domain_data.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix an Oopsable condition when nsm_mon_unmon is called as part of the
namespace cleanup, which now apparently happens after the utsname
has been freed.
Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home
Reported-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: stable@vger.kernel.org # 3.18
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Introduce helper function acpi_dev_filter_resource_type(), which may
be used by acpi_dev_get_resources() to filer out resource based on
resource type.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add field offset to struct resource_list_entry to host address space
translation offset so it could be used to represent bridge resources.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Change function acpi_dev_resource_address_space() and
acpi_dev_resource_ext_address_space() to return address space
translation offset.
It's based on a patch from Yinghai Lu <yinghai@kernel.org>.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* flexfiles: (53 commits)
pnfs: lookup new lseg at lseg boundary
nfs41: .init_read and .init_write can be called with valid pg_lseg
pnfs: Update documentation on the Layout Drivers
pnfs/flexfiles: Add the FlexFile Layout Driver
nfs: count DIO good bytes correctly with mirroring
nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes
nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
nfs/flexfiles: send layoutreturn before freeing lseg
nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
nfs41: allow async version layoutreturn
nfs41: add range to layoutreturn args
pnfs: allow LD to ask to resend read through pnfs
nfs: add nfs_pgio_current_mirror helper
nfs: only reset desc->pg_mirror_idx when mirroring is supported
nfs41: add a debug warning if we destroy an unempty layout
pnfs: fail comparison when bucket verifier not set
nfs: mirroring support for direct io
nfs: add mirroring support to pgio layer
pnfs: pass ds_commit_idx through the commit path
...
Conflicts:
fs/nfs/pnfs.c
fs/nfs/pnfs.h
TSC interrupt handler had udelay to avoid reporting of false pen-up
interrupt to user space. This patch implements workaround suggesting in
Advisory 1.0.31 of silicon errata for am335x, thus eliminating udelay and
touchscreen lag. This also improves performance of touchscreen and
eliminates sudden jump of cursor at touch release.
IDLECONFIG and CHARGECONFIG registers are to be configured with same values
in order to eliminate false pen-up events. This workaround may result in
false pen-down to be detected, hence considerable charge step delay needs
to be added. The charge delay is set to 0xB000 (in terms of ADC clock
cycles) by default.
TSC steps are disabled at the end of every sampling cycle and EOS bit is
set. Once the EOS bit is set, the TSC steps need to be re-enabled to begin
next sampling cycle.
Signed-off-by: Brad Griffis <bgriffis@ti.com>
[vigneshr@ti.com: Ported the patch from v3.12 to v3.19rc1]
Signed-off-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The flexfile layout is a new layout that extends the
file layout. It is currently being drafted as a specification at
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Tao Peng <bergwolf@primarydata.com>
So that callers can specify which range to return.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
This patch adds mirrored write support to the pgio layer. The default
is to use one mirror, but pgio callers may define callbacks to change
this to any value up to the (arbitrarily selected) limit of 16.
The basic idea is to break out members of nfs_pageio_descriptor that cannot
be shared between mirrored DSes and put them in a new structure.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
So that it is possible to return a specific iomode layouts.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
Flexfiles layout would want to use them to report DS IO status.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>