Commit Graph

589590 Commits

Author SHA1 Message Date
Alexander Potapenko 33334e2576 lib/stackdepot.c: allow the stack trace hash to be zero
Do not bail out from depot_save_stack() if the stack trace has zero hash.
Initially depot_save_stack() silently dropped stack traces with zero
hashes, however there's actually no point in reserving this zero value.

Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Vladimir Zapolskiy 99f23c2cde rapidio: fix potential NULL pointer dereference
The change fixes improper check for a returned error value by
class_create() function, which on error returns ERR_PTR() value, thus the
original check always results in a dead code on error path.

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Konstantin Khlebnikov c2e7e00b71 mm/memory-failure: fix race with compound page split/merge
get_hwpoison_page() must recheck relation between head and tail pages.

n-horiguchi said: without this recheck, the race causes kernel to pin an
irrelevant page, and finally makes kernel crash for refcount mismatch.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
xuejiufei b73413647e ocfs2/dlm: return zero if deref_done message is successfully handled
dlm_deref_lockres_done_handler() should return zero if the message is
successfully handled.

Fixes: 60d663cb52 ("ocfs2/dlm: add DEREF_DONE message").
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Ananth N Mavinakayanahalli a320817c68 Ananth has moved
The current ID is going away soon... update email address

Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Andrey Ryabinin 36f05ae8bc kcov: don't profile branches in kcov
Profiling 'if' statements in __sanitizer_cov_trace_pc() leads to
unbound recursion and crash:

	__sanitizer_cov_trace_pc() ->
		ftrace_likely_update ->
			__sanitizer_cov_trace_pc() ...

Define DISABLE_BRANCH_PROFILING to disable this tracer.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
James Morse bdab42dfc9 kcov: don't trace the code coverage code
Kcov causes the compiler to add a call to __sanitizer_cov_trace_pc() in
every basic block.  Ftrace patches in a call to _mcount() to each
function it has annotated.

Letting these mechanisms annotate each other is a bad thing.  Break the
loop by adding 'notrace' to __sanitizer_cov_trace_pc() so that ftrace
won't try to patch this code.

This patch lets arm64 with KCOV and STACK_TRACER boot.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Vlastimil Babka fd901c9538 mm: wake kcompactd before kswapd's short sleep
When kswapd goes to sleep it checks if the node is balanced and at first
it sleeps only for HZ/10 time, then rechecks if the node is still
balanced and nobody has woken it during the initial sleep.  Only then it
goes fully sleep until an allocation slowpath wakes it up again.

For higher-order allocations, waking up kcompactd is done only before
the full sleep.  This turns out to be an issue in case another
high-order allocation fails during the initial sleep.  It will wake
kswapd up, however kswapd considers the zone balanced from the order-0
perspective, and will just quickly try to sleep again.  So if there's a
longer stream of high-order allocations hitting the slowpath and waking
up kswapd, it might never actually wake up kcompactd, which may be
considered a regression from kswapd-based compaction.  In the worst
case, it might be that a single allocation that cannot direct
reclaim/compact itself is waking kswapd in the retry loop and preventing
kcompactd from being woken up and unblocking it.

This patch makes sure kcompactd is woken up in such situations by simply
moving the wakeup before the short initial sleep.  More efficient
solution would be to wake kcompactd immediately instead of kswapd if the
node is already order-0 balanced, but in that case we should also move
reset_isolation_suitable() call to kcompactd so it's not adding to the
allocator's latency.  Since it's late in the 4.6 cycle, let's go with
the simpler change for now.

Fixes: accf62422b ("mm, kswapd: replace kswapd compaction with waking up kcompactd")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Frank Rowand eeb68d1d2d .mailmap: add Frank Rowand
Set current email address to replace obsolete email addresses.

Signed-off-by: Frank Rowand <frank.rowand@sonymobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Minchan Kim d7e69488bd mm/hwpoison: fix wrong num_poisoned_pages accounting
Currently, migration code increses num_poisoned_pages on *failed*
migration page as well as successfully migrated one at the trial of
memory-failure.  It will make the stat wrong.  As well, it marks the
page as PG_HWPoison even if the migration trial failed.  It would mean
we cannot recover the corrupted page using memory-failure facility.

This patches fixes it.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Minchan Kim b06bad17c7 mm: call swap_slot_free_notify() with page lock held
Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page)) in
page_swap_info.  The reason is that page_endio in rw_page unlocks the
page if read I/O is completed so we need to hold a PG_lock again to
check PageSwapCache.  Otherwise, the page can be removed from swapcache.

  Kernel BUG at c00f9040 [verbose debug info unavailable]
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
  Modules linked in:
  CPU: 4 PID: 13446 Comm: RenderThread Tainted: G        W 3.10.84-g9f14aec-dirty #73
  task: c3b73200 ti: dd192000 task.ti: dd192000
  PC is at page_swap_info+0x10/0x2c
  LR is at swap_slot_free_notify+0x18/0x6c
  pc : [<c00f9040>]    lr : [<c00f5560>]    psr: 400f0113
  sp : dd193d78  ip : c2deb1e4  fp : da015180
  r10: 00000000  r9 : 000200da  r8 : c120fe08
  r7 : 00000000  r6 : 00000000  r5 : c249a6c0  r4 : = c249a6c0
  r3 : 00000000  r2 : 40080009  r1 : 200f0113  r0 : = c249a6c0
  ..<snip> ..
  Call Trace:
    page_swap_info+0x10/0x2c
    swap_slot_free_notify+0x18/0x6c
    swap_readpage+0x90/0x11c
    read_swap_cache_async+0x134/0x1ac
    swapin_readahead+0x70/0xb0
    handle_pte_fault+0x320/0x6fc
    handle_mm_fault+0xc0/0xf0
    do_page_fault+0x11c/0x36c
    do_DataAbort+0x34/0x118

Fixes: 3f2b1a04f4 ("zram: revive swap_slot_free_notify")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Tested-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Minchan Kim 7bf52fb891 mm: vmscan: reclaim highmem zone if buffer_heads is over limit
We have been reclaimed highmem zone if buffer_heads is over limit but
commit 6b4f7799c6 ("mm: vmscan: invoke slab shrinkers from
shrink_zone()") changed the behavior so it doesn't reclaim highmem zone
although buffer_heads is over the limit.  This patch restores the logic.

Fixes: 6b4f7799c6 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Gerald Schaefer 28093f9f34 numa: fix /proc/<pid>/numa_maps for THP
In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
because the layouts may differ depending on the architecture.  On s390
this will lead to inaccurate numa_maps accounting in /proc because of
misguided pte_present() and pte_dirty() checks on the fake pte.

On other architectures pte_present() and pte_dirty() may work by chance,
but there may be an issue with direct-access (dax) mappings w/o
underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
available.  In vm_normal_page() the fake pte will be checked with
pte_special() and because there is no "special" bit in a pmd, this will
always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
skipped.  On dax mappings w/o struct pages, an invalid struct page
pointer would then be returned that can crash the kernel.

This patch fixes the numa_maps THP handling by introducing new "_pmd"
variants of the can_gather_numa_stats() and vm_normal_page() functions.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>	[4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Konstantin Khlebnikov 3486b85a29 mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
Khugepaged detects own VMAs by checking vm_file and vm_ops but this way
it cannot distinguish private /dev/zero mappings from other special
mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap.

This fixes false-positive VM_BUG_ON and prevents installing THP where
they are not expected.

Link: http://lkml.kernel.org/r/CACT4Y+ZmuZMV5CjSFOeXviwQdABAgT7T+StKfTqan9YDtgEi5g@mail.gmail.com
Fixes: 78f11a2557 ("mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups")
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Krzysztof Kozlowski 314e9b75b3 mailmap: fix Krzysztof Kozlowski's misspelled name
Patchwork introduced a garbled Polish character in commit 1e3012d0fd
("crypto: s5p-sss - Use memcpy_toio for iomem annotated memory") so fix
the mail mapping.  Additionally prefer to use kernel.org account for
personal work, instead of my gmail address.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Kirill A. Shutemov aa88b68c3b thp: keep huge zero page pinned until tlb flush
Andrea has found[1] a race condition on MMU-gather based TLB flush vs
split_huge_page() or shrinker which frees huge zero under us (patch 1/2
and 2/2 respectively).

With new THP refcounting, we don't need patch 1/2: mmu_gather keeps the
page pinned until flush is complete and the pin prevents the page from
being split under us.

We still need patch 2/2.  This is simplified version of Andrea's patch.
We don't need fancy encoding.

[1] http://lkml.kernel.org/r/1447938052-22165-1-git-send-email-aarcange@redhat.com

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Steve Capper 66ee95d16a mm: exclude HugeTLB pages from THP page_mapped() logic
HugeTLB pages cannot be split, so we use the compound_mapcount to track
rmaps.

Currently page_mapped() will check the compound_mapcount, but will also
go through the constituent pages of a THP compound page and query the
individual _mapcount's too.

Unfortunately, page_mapped() does not distinguish between HugeTLB and
THP compound pages and assumes that a compound page always needs to have
HPAGE_PMD_NR pages querying.

For most cases when dealing with HugeTLB this is just inefficient, but
for scenarios where the HugeTLB page size is less than the pmd block
size (e.g.  when using contiguous bit on ARM) this can lead to crashes.

This patch adjusts the page_mapped function such that we skip the
unnecessary THP reference checks for HugeTLB pages.

Fixes: e1534ae950 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
Signed-off-by: Steve Capper <steve.capper@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Atsushi Kumagai d7f53518f7 kexec: export OFFSET(page.compound_head) to find out compound tail page
PageAnon() always look at head page to check PAGE_MAPPING_ANON and tail
page's page->mapping has just a poisoned data since commit 1c290f6421
("mm: sanitize page->mapping for tail pages").

If makedumpfile checks page->mapping of a compound tail page to
distinguish anonymous page as usual, it must fail in newer kernel.  So
it's necessary to export OFFSET(page.compound_head) to avoid checking
compound tail pages.

The problem is that unnecessary hugepages won't be removed from a dump
file in kernels 4.5.x and later.  This means that extra disk space would
be consumed.  It's a problem, but not critical.

Signed-off-by: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Atsushi Kumagai 8639a847b0 kexec: update VMCOREINFO for compound_order/dtor
makedumpfile refers page.lru.next to get the order of compound pages for
page filtering.

However, now the order is stored in page.compound_order, hence
VMCOREINFO should be updated to export the offset of
page.compound_order.

The fact is, page.compound_order was introduced already in kernel 4.0,
but the offset of it was the same as page.lru.next until kernel 4.3, so
this was not actual problem.

The above can be said also for page.lru.prev and page.compound_dtor,
it's necessary to detect hugetlbfs pages.  Further, the content was
changed from direct address to the ID which means dtor.

The problem is that unnecessary hugepages won't be removed from a dump
file in kernels 4.4.x and later.  This means that extra disk space would
be consumed.  It's a problem, but not critical.

Signed-off-by: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28 19:34:04 -07:00
Linus Torvalds b75a2bf899 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
 "So, it turns out we had a silly bug in the most fundamental part of
  workqueue for a very long time.  AFAICS, this dates back to pre-git
  era and has quite likely been there from the time workqueue was first
  introduced.

  A work item uses its PENDING bit to synchronize multiple queuers.
  Anyone who wins the PENDING bit owns the pending state of the work
  item.  Whether a queuer wins or loses the race, one thing should be
  guaranteed - there will soon be at least one execution of the work
  item - where "after" means that the execution instance would be able
  to see all the changes that the queuer has made prior to the queueing
  attempt.

  Unfortunately, we were missing a smp_mb() after clearing PENDING for
  execution, so nothing guaranteed visibility of the changes that a
  queueing loser has made, which manifested as a reproducible blk-mq
  stall.

  Lots of kudos to Roman for debugging the problem.  The patch for
  -stable is the minimal one.  For v3.7, Peter is working on a patch to
  make the code path slightly more efficient and less fragile"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix ghost PENDING flag while doing MQ IO
2016-04-27 12:03:59 -07:00
Linus Torvalds 763cfc86ee Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Two patches to fix a deadlock which can be easily triggered if memcg
  charge moving is used.

  This bug was introduced while converting threadgroup locking to a
  global percpu_rwsem and is caused by cgroup controller task migration
  path depending on the ability to create new kthreads.  cpuset had a
  similar issue which was fixed by performing heavy-lifting operations
  asynchronous to task migration.  The two patches fix the same issue in
  memcg in a similar way.  The first patch makes the mechanism generic
  and the second relocates memcg charge moving outside the migration
  path.

  Given that we don't want to perform heavy operations while
  writelocking threadgroup lock anyway, moving them out of the way is a
  desirable solution.  One thing to note is that the problem was
  difficult to debug because lockdep couldn't figure out the deadlock
  condition.  Looking into how to improve that"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  memcg: relocate charge moving from ->attach to ->post_attach
  cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
2016-04-27 11:41:14 -07:00
Linus Torvalds 3118e5f966 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "I2C has one buildfix, one ABBA deadlock fix, and three simple 'add ID'
  patches"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
  i2c: cpm: Fix build break due to incompatible pointer types
  i2c: ismt: Add Intel DNV PCI ID
  i2c: xlp9xx: add support for Broadcom Vulcan
  i2c: rk3x: add support for rk3228
2016-04-27 11:34:45 -07:00
Linus Torvalds 24131a61ec ARC fixes for 4.6-rc6
- LOCKDEP now words for ARCv2 builds
  - Enabling DT reserved-memory binding to work (for forthcoming HDMI driver)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXIMkcAAoJEGnX8d3iisJeHdkP/1Eb+V6asWOPVGinbdKs0nhs
 xdnFiBVesgdfXKBxY/K1GN6bVNUqaPNVRk7Y1fWFzS6O2tgVMMDFz+4FebU6sDMv
 wiQuxqzEYhyaiqNvw2JUH60Y5GiCjWFLMfgR2FKCkUM99NZm/2DFos2hPE81K2yc
 4JfEQoPcsuYYN6nheMKa0sjomGHg6qIL4wi3uvB3RCrbqs1MbauKpsbdbNF3thqZ
 xpeHavHLRLUTnN/lIf7Z1SwSh6S0Ey7YFLePxsC48vZCL0a8L1HfFSfvjcxuHBMU
 cGsRXWQmwVjPUeEC9JIPcDkbEfQ1nlezU8lZcg7PJHOt4DByxN3sCngAPKt6Skls
 I2Ql5tP12IUmuWv4zpM7VP/ZZvC5EOh4RmG2xQwuV+rtDilkYHZrUPx7PdilRt3S
 a+A+FoYMgczdTTSCJnI0kJYADmPtz/6e1N9rSyzzmVnDmSPR9hClO9dENtpSwiXD
 Jtpo4tBsMLyw6+Oj68e70c58t8ek9PObR1lIqZ3zJ97hvgACjjpeDytVjv7oPpYH
 scTUfx69s6qF96Wn+y44Iw1gRV896UFlLmHtX9Hk0BtuVdjZuwOIF1Kqfsk6SYsJ
 0WFdnxoNJHPJVWkp+Pmrz7g8BaUxfAtc7Ly6C+8yUUa1nQswYQmrK+84uKLVjiWl
 E2sBDIJl2Xu2E7amTJss
 =rlDa
 -----END PGP SIGNATURE-----

Merge tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - lockdep now works for ARCv2 builds

 - enable DT reserved-memory binding (for forthcoming HDMI driver)

* tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: add support for reserved memory defined by device tree
  ARC: support generic per-device coherent dma mem
  Documentation: dt: arc: fix spelling mistakes
  ARCv2: Enable LOCKDEP
2016-04-27 09:46:21 -07:00
Linus Torvalds 508fea71c6 nios2 fix for v4.6
nios2: memset: use the right constraint modifier for the %4 output operand
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXIH+FAAoJEFWoEK+e3syCfRoQAMPXIiWR/V/dLn3OX8f8CeA6
 I9duVqMrKrVh/a+bwxzmVJkumm0xzqYnOyhOpX5fZd3Nx44Q4NynJakwgWpMDVAI
 +xXxNtHZUhjcRC4EuqJW677plR0Uq8bWY2UibpARPHfB9d0arJOCuL11vdGCAjkg
 lWeVzUFg7iB9n0tRFwvsN29EcRZDo7+WbJh3cGIfTYNbcihJfiAAlmNyXS7XFiBY
 DqSYyTXIc8scH1q66gArTnyDryvM7cEZ+zyYoX9v8/E/+xPLLLhogthtqf3u4opn
 J/70k1LBgzgHCYrlEG8vvd1kCr114PLo7RlgkwJqdAtVhtMAtcGZFfkVQkSo7R3h
 gHQHf8f0exg3JKp0VesB443FyaIvpCNkth3eGdNMWunhCPB4bXE5W+hg5J5gZBNi
 1Ft9VB9Ug/8sh9Es4muinNX1kR5Fc8IWQqIa2U/OCt4O2wR1aFanJvaRqeCozbES
 SpbRAOoXtzOZ0xRPZGPQpqP6ggfizq9Zil2ZTeXbNPBRybFvmEpgewdJYqvrNwZj
 pbgB1+7zVcsfGTiMhJ+d1rLUX/oeMuUWT+eHY1jM1k+gTQVUo6k8KNVoaiNKa9z5
 cZh+XqgWKE6Qv4DVRnj+ouvDuglhwqIAyI3oXElghrYmCWxjvVKCXowQhnSq727M
 th2iyocnFMjIVWd5u3b7
 =PRJ4
 -----END PGP SIGNATURE-----

Merge tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2

Pull arch/nios2 fix from Ley Foon Tan:
 "memset: use the right constraint modifier for the %4 output operand"

* tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
  nios2: memset: use the right constraint modifier for the %4 output operand
2016-04-27 09:33:24 -07:00
Linus Torvalds 9453203bf8 platform-drivers-x86 for 4.6-3
toshiba_acpi:
  - Fix regression caused by hotkey enabling value
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXIEZSAAoJEKbMaAwKp364mLwH/3j01EDn0JF1FIIP+kxVgeeL
 g8xI+0tlFzxmdcBqW3n4q0apzVuCmHr0pbOik289l3dv7hQ5PEvdmK/VhVPYmJDL
 2u/4EWmW7cvYMUAVhGB499pKac38fMUN5y97dkmoikiTQO6VaWsvdczvXuhuz/dP
 OcQzRR/UttCLMe/ERxz3xh4R9kbY5Hzh4slW8Ay/sGDRrgOUFRLT8Zg3Uo7MY27i
 Kq++SrH96edL1dW6XkWFIqO7NzWGlbBxTMlTlh+xmGUkOtVxUyzAID3NEDIaw6zC
 7QU61eyfIJToa2SxHZ/mT9bEFNHNbJR4KoLREG6K2LbRyMhsQfMxaTym8MNzT/Q=
 =+IXa
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull x86 platform driver fix from Darren Hart:
 "Fix regression caused by hotkey enabling value in toshiba_acpi"

* tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  toshiba_acpi: Fix regression caused by hotkey enabling value
2016-04-27 08:57:11 -07:00
Alexey Brodkin 1b10cb21d8 ARC: add support for reserved memory defined by device tree
Enable reserved memory initialization from device tree.

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-04-27 17:06:56 +05:30
Alexey Brodkin 32ed9a0e0d ARC: support generic per-device coherent dma mem
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-04-27 17:06:55 +05:30
Romain Perier a8950e49bd nios2: memset: use the right constraint modifier for the %4 output operand
Depending on the size of the area to be memset'ed, the nios2 memset implementation
either uses a naive loop (for buffers smaller or equal than 8 bytes) or a more optimized
implementation (for buffers larger than 8 bytes). This implementation does 4-byte stores
rather than 1-byte stores to speed up memset.

However, we discovered that on our nios2 platform, memset() was not properly setting the
buffer to the expected value. A memset of 0xff would not set the entire buffer to 0xff, but to:

0xff 0x00 0xff 0x00 0xff 0x00 0xff 0x00 ...

Which is obviously incorrect. Our investigation has revealed that the problem lies in the
incorrect constraints used in the inline assembly.

The following piece of assembly, from the nios2 memset implementation, is supposed to
create a 4-byte value that repeats 4 times the 1-byte pattern passed as memset argument:

/* fill8 %3, %5 (c & 0xff) */
"       slli    %4, %5, 8\n"
"       or      %4, %4, %5\n"
"       slli    %3, %4, 16\n"
"       or      %3, %3, %4\n"

However, depending on the compiler and optimization level, this code might be compiled as:

34:	280a923a 	slli	r5,r5,8
38:	294ab03a 	or	r5,r5,r5
3c:	2808943a 	slli	r4,r5,16
40:	2148b03a 	or	r4,r4,r5

This is wrong because r5 gets used both for %5 and %4, which leads to the final pattern
stored in r4 to be 0xff00ff00 rather than the expected 0xffffffff.

%4 is defined with the "=r" constraint, i.e as an output operand. However, as explained in
http://www.ethernut.de/en/documents/arm-inline-asm.html, this does not prevent gcc from
using the same register for an output operand (%4) and input operand (%5). By using the
constraint modifier '&', we indicate that the register should be used for output only. With this
change, we get the following assembly output:

34:	2810923a 	slli	r8,r5,8
38:	4150b03a 	or	r8,r8,r5
3c:	400e943a 	slli	r7,r8,16
40:	3a0eb03a 	or	r7,r7,r8

Which correctly produces the 0xffffffff pattern when 0xff is passed as the memset() pattern.

It is worth mentioning the observed consequence of this bug: we were hitting the kernel
BUG() in mm/bootmem.c:__free() that verifies when marking a page as free that it was
previously marked as occupied (i.e that the bit was set to 1). The entire bootmem bitmap is
set to 0xff bit via a memset() during the bootmem initialization. The bootmem_free() call right
after the initialization was finding some bits to be set to 0, which didn't make sense since the
bitmap has just been memset'ed to 0xff. Except that due to the bug explained above, the
bitmap was in fact initialized to 0xff00ff00.

Thanks to Marek Vasut for his help and feedback.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Marek Vasut <marex@denx.de>
Acked-by: Ley Foon Tan <lftan@altera.com>
2016-04-27 16:35:55 +08:00
Linus Torvalds f28f20da70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Handle v4/v6 mixed sockets properly in soreuseport, from Craig
    Gallak.

 2) Bug fixes for the new macsec facility (missing kmalloc NULL checks,
    missing locking around netdev list traversal, etc.) from Sabrina
    Dubroca.

 3) Fix handling of host routes on ifdown in ipv6, from David Ahern.

 4) Fix double-fdput in bpf verifier.  From Jann Horn.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
  bpf: fix double-fdput in replace_map_fd_with_map_ptr()
  net: ipv6: Delete host routes on an ifdown
  Revert "ipv6: Revert optional address flusing on ifdown."
  net/mlx4_en: fix spurious timestamping callbacks
  net: dummy: remove note about being Y by default
  cxgbi: fix uninitialized flowi6
  ipv6: Revert optional address flusing on ifdown.
  ipv4/fib: don't warn when primary address is missing if in_dev is dead
  net/mlx5: Add pci shutdown callback
  net/mlx5_core: Remove static from local variable
  net/mlx5e: Use vport MTU rather than physical port MTU
  net/mlx5e: Fix minimum MTU
  net/mlx5e: Device's mtu field is u16 and not int
  net/mlx5_core: Add ConnectX-5 to list of supported devices
  net/mlx5e: Fix MLX5E_100BASE_T define
  net/mlx5_core: Fix soft lockup in steering error flow
  qlcnic: Update version to 5.3.64
  net: stmmac: socfpga: Remove re-registration of reset controller
  macsec: fix netlink attribute validation
  macsec: add missing macsec prefix in uapi
  ...
2016-04-26 16:25:51 -07:00
Linus Torvalds 91ea692f87 Here are the latest bug fixes for ARM SoCs, mostly addressing
recent regressions. Changes are across several platforms, so
 I'm listing every change separately here.
 
 Regressions since 4.5:
 
  - A correction of the psci firmware DT binding, to prevent
    users from relying on unintended semantics
 
  - Actually getting the newly merged clock driver for some OMAP
    platforms to work
 
  - A revert of patches for the Qualcomm BAM, these need to be
    reworked for 4.7 to avoid breaking boards other than the one
    they were intended for
 
  - A correction for the I2C device nodes on the Socionext Uniphier
    platform
 
  - i.MX SDHCI was broken for non-DT platforms due to a change
    with the setting of the DMA mask
 
  - A revert of a patch that accidentally added a nonexisting
    clock on the Rensas "Porter" board
 
  - A couple of OMAP fixes that are all related to suspend after
    the power domain changes for dra7
 
  - On Mediatek, revert part of the power domain initialization
    changes that broke mt8173-evb
 
 Fixes for older bugs:
 
  - Workaround for an "external abort" in the omap34xx
    suspend/resume code.
 
  - The USB1/eSATA should not be listed as an excon device on
    am57xx-beagle-x15 (broken since v4.0)
 
  - A v4.5 regression in the TI AM33xx and AM43XX DT specifying
    incorrect DMA request lines for the GPMC
 
  - The jiffies calibration on Renesas platforms was incorrect
    for some modern CPU cores.
 
  - A hardware errata woraround for clockdomains on TI DRA7
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAVx+5v2CrR//JCVInAQJ/ZBAArI3ZiR+Jj2dZCm9c7+PjlDWngJpBME3V
 o4aF9CeyuA/eyx+QtKAq1ScG2eRIbfab03XGBMEHXpKmmiTXYFIcLFHewwSGBYsy
 XUsNO+ZKsw92ImSdcX9p45BjkAADJvUwX5BzDlfOQ5mNX+o0Godb/8Mi2Y6RIqTK
 5C0xQ0YE8ZN7xtyNzFylaI+CL6wsVLy6PUKig7UIrOOXQK3Tzt4mEz2ksrSBJzON
 RiG7kPLf+Zd013WyF/ZUdC3VErDOP7C1Z+YRcK+2rxjlL+4oJUznsoaBYJgLUV+T
 GmcD0TZNwt6x6FWF6cSiUa+gl+6oWRZwTGfUooS1zEcuLHBsONdMtVat4Z01RYos
 rdMvFgZ6bxG7n4tajI2jg1gokGfyMfYuKwnHuA8Ynzn4N/VcnnbfxPRyV/RMLN0W
 ad/e12SlLMX1XahrD9uo/oH/X73gHPnbHlLLzWfDfnyvNGvWiW3SNklFT03q/Yn+
 fgfB0OnzG8+a3c/LHZbtAo/yYYLdqIuOg8I40AizN3CKHamUWPAjgFfdHdQADVV8
 yC5ugVB6x7RYID/49IPT1C3n/SjoypYyRbo30ipqyz2dTf6kz35SY/YjYNSaIYvY
 QfnGFuywsKsTprGAzI+x/fGo61Ve0/XkK9RPt0opU1+WdYr3sE+ufGVLVn4g4Cw3
 wfd20UTVwGs=
 =YgL2
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Here are the latest bug fixes for ARM SoCs, mostly addressing recent
  regressions.  Changes are across several platforms, so I'm listing
  every change separately here.

  Regressions since 4.5:

   - A correction of the psci firmware DT binding, to prevent users from
     relying on unintended semantics

   - Actually getting the newly merged clock driver for some OMAP
     platforms to work

   - A revert of patches for the Qualcomm BAM, these need to be reworked
     for 4.7 to avoid breaking boards other than the one they were
     intended for

   - A correction for the I2C device nodes on the Socionext Uniphier
     platform

   - i.MX SDHCI was broken for non-DT platforms due to a change with the
     setting of the DMA mask

   - A revert of a patch that accidentally added a nonexisting clock on
     the Rensas "Porter" board

   - A couple of OMAP fixes that are all related to suspend after the
     power domain changes for dra7

   - On Mediatek, revert part of the power domain initialization changes
     that broke mt8173-evb

  Fixes for older bugs:

   - Workaround for an "external abort" in the omap34xx suspend/resume
     code.

   - The USB1/eSATA should not be listed as an excon device on
     am57xx-beagle-x15 (broken since v4.0)

   - A v4.5 regression in the TI AM33xx and AM43XX DT specifying
     incorrect DMA request lines for the GPMC

   - The jiffies calibration on Renesas platforms was incorrect for some
     modern CPU cores.

   - A hardware errata woraround for clockdomains on TI DRA7"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  drivers: firmware: psci: unify enable-method binding on ARM {64,32}-bit systems
  arm64: dts: uniphier: fix I2C nodes of PH1-LD20
  ARM: shmobile: timer: Fix preset_lpj leading to too short delays
  Revert "ARM: dts: porter: Enable SCIF_CLK frequency and pins"
  ARM: dts: r8a7791: Don't disable referenced optional clocks
  Revert "ARM: OMAP: Catch callers of revision information prior to it being populated"
  ARM: OMAP3: Fix external abort on 36xx waking from off mode idle
  ARM: dts: am57xx-beagle-x15: remove extcon_usb1
  ARM: dts: am437x: Fix GPMC dma properties
  ARM: dts: am33xx: Fix GPMC dma properties
  Revert "soc: mediatek: SCPSYS: Fix double enabling of regulators"
  ARM: mach-imx: sdhci-esdhc-imx: initialize DMA mask
  ARM: DRA7: clockdomain: Implement timer workaround for errata i874
  ARM: OMAP: Catch callers of revision information prior to it being populated
  ARM: dts: dra7: Correct clock tree for sys_32k_ck
  ARM: OMAP: DRA7: Provide proper class to omap2_set_globals_tap
  ARM: OMAP: DRA7: wakeupgen: Skip SAR save for wakeupgen
  Revert "dts: msm8974: Add dma channels for blsp2_i2c1 node"
  Revert "dts: msm8974: Add blsp2_bam dma node"
  ARM: dts: Add clocks for dm814x ADPLL
2016-04-26 16:17:01 -07:00
Linus Torvalds 8ead9dd547 devpts: more pty driver interface cleanups
This is more prep-work for the upcoming pty changes.  Still just code
cleanup with no actual semantic changes.

This removes a bunch pointless complexity by just having the slave pty
side remember the dentry associated with the devpts slave rather than
the inode.  That allows us to remove all the "look up the dentry" code
for when we want to remove it again.

Together with moving the tty pointer from "inode->i_private" to
"dentry->d_fsdata" and getting rid of pointless inode locking, this
removes about 30 lines of code.  Not only is the end result smaller,
it's simpler and easier to understand.

The old code, for example, depended on the d_find_alias() to not just
find the dentry, but also to check that it is still hashed, which in
turn validated the tty pointer in the inode.

That is a _very_ roundabout way to say "invalidate the cached tty
pointer when the dentry is removed".

The new code just does

	dentry->d_fsdata = NULL;

in devpts_pty_kill() instead, invalidating the tty pointer rather more
directly and obviously.  Don't do something complex and subtle when the
obvious straightforward approach will do.

The rest of the patch (ie apart from code deletion and the above tty
pointer clearing) is just switching the calling convention to pass the
dentry or file pointer around instead of the inode.

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jann Horn <jann@thejh.net>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-26 15:47:32 -07:00
Jann Horn 8358b02bf6 bpf: fix double-fdput in replace_map_fd_with_map_ptr()
When bpf(BPF_PROG_LOAD, ...) was invoked with a BPF program whose bytecode
references a non-map file descriptor as a map file descriptor, the error
handling code called fdput() twice instead of once (in __bpf_map_get() and
in replace_map_fd_with_map_ptr()). If the file descriptor table of the
current task is shared, this causes f_count to be decremented too much,
allowing the struct file to be freed while it is still in use
(use-after-free). This can be exploited to gain root privileges by an
unprivileged user.

This bug was introduced in
commit 0246e64d9a ("bpf: handle pseudo BPF_LD_IMM64 insn"), but is only
exploitable since
commit 1be7f75d16 ("bpf: enable non-root eBPF programs") because
previously, CAP_SYS_ADMIN was required to reach the vulnerable code.

(posted publicly according to request by maintainer)

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 17:37:21 -04:00
David Ahern 38bd10c447 net: ipv6: Delete host routes on an ifdown
It was a simple idea -- save IPv6 configured addresses on a link down
so that IPv6 behaves similar to IPv4. As always the devil is in the
details and the IPv6 stack as too many behavioral differences from IPv4
making the simple idea more complicated than it needs to be.

The current implementation for keeping IPv6 addresses can panic or spit
out a warning in one of many paths:

1. IPv6 route gets an IPv4 route as its 'next' which causes a panic in
   rt6_fill_node while handling a route dump request.

2. rt->dst.obsolete is set to DST_OBSOLETE_DEAD hitting the WARN_ON in
   fib6_del

3. Panic in fib6_purge_rt because rt6i_ref count is not 1.

The root cause of all these is references related to the host route for
an address that is retained.

So, this patch deletes the host route every time the ifdown loop runs.
Since the host route is deleted and will be re-generated an up there is
no longer a need for the l3mdev fix up. On the 'admin up' side move
addrconf_permanent_addr into the NETDEV_UP event handling so that it
runs only once versus on UP and CHANGE events.

All of the current panics and warnings appear to be related to
addresses on the loopback device, but given the catastrophic nature when
a bug is triggered this patch takes the conservative approach and evicts
all host routes rather than trying to determine when it can be re-used
and when it can not. That can be a later optimizaton if desired.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 11:48:26 -04:00
David S. Miller 6a923934c3 Revert "ipv6: Revert optional address flusing on ifdown."
This reverts commit 841645b5f2.

Ok, this puts the feature back.  I've decided to apply David A.'s
bug fix and run with that rather than make everyone wait another
whole release for this feature.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 11:47:41 -04:00
Roman Pen 346c09f804 workqueue: fix ghost PENDING flag while doing MQ IO
The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
with the following backtrace:

[  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
[  601.347574]       Tainted: G           O    4.4.5-1-storage+ #6
[  601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  601.348142] kworker/u129:5  D ffff880803077988     0  1636      2 0x00000000
[  601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
[  601.348999]  ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
[  601.349662]  ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
[  601.350333]  ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
[  601.350965] Call Trace:
[  601.351203]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
[  601.351444]  [<ffffffff815b01d5>] schedule+0x35/0x80
[  601.351709]  [<ffffffff815b2dd2>] schedule_timeout+0x192/0x230
[  601.351958]  [<ffffffff812d43f7>] ? blk_flush_plug_list+0xc7/0x220
[  601.352208]  [<ffffffff810bd737>] ? ktime_get+0x37/0xa0
[  601.352446]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
[  601.352688]  [<ffffffff815af784>] io_schedule_timeout+0xa4/0x110
[  601.352951]  [<ffffffff815b3a4e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
[  601.353196]  [<ffffffff815b093b>] bit_wait_io+0x1b/0x70
[  601.353440]  [<ffffffff815b056d>] __wait_on_bit+0x5d/0x90
[  601.353689]  [<ffffffff81127bd0>] wait_on_page_bit+0xc0/0xd0
[  601.353958]  [<ffffffff81096db0>] ? autoremove_wake_function+0x40/0x40
[  601.354200]  [<ffffffff81127cc4>] __filemap_fdatawait_range+0xe4/0x140
[  601.354441]  [<ffffffff81127d34>] filemap_fdatawait_range+0x14/0x30
[  601.354688]  [<ffffffff81129a9f>] filemap_write_and_wait_range+0x3f/0x70
[  601.354932]  [<ffffffff811ced3b>] blkdev_fsync+0x1b/0x50
[  601.355193]  [<ffffffff811c82d9>] vfs_fsync_range+0x49/0xa0
[  601.355432]  [<ffffffff811cf45a>] blkdev_write_iter+0xca/0x100
[  601.355679]  [<ffffffff81197b1a>] __vfs_write+0xaa/0xe0
[  601.355925]  [<ffffffff81198379>] vfs_write+0xa9/0x1a0
[  601.356164]  [<ffffffff811c59d8>] kernel_write+0x38/0x50

The underlying device is a null_blk, with default parameters:

  queue_mode    = MQ
  submit_queues = 1

Verification that nullb0 has something inflight:

root@pserver8:~# cat /sys/block/nullb0/inflight
       0        1
root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
...
/sys/block/nullb0/mq/0/cpu2/rq_list
CTX pending:
        ffff8838038e2400
...

During debug it became clear that stalled request is always inserted in
the rq_list from the following path:

   save_stack_trace_tsk + 34
   blk_mq_insert_requests + 231
   blk_mq_flush_plug_list + 281
   blk_flush_plug_list + 199
   wait_on_page_bit + 192
   __filemap_fdatawait_range + 228
   filemap_fdatawait_range + 20
   filemap_write_and_wait_range + 63
   blkdev_fsync + 27
   vfs_fsync_range + 73
   blkdev_write_iter + 202
   __vfs_write + 170
   vfs_write + 169
   kernel_write + 56

So blk_flush_plug_list() was called with from_schedule == true.

If from_schedule is true, that means that finally blk_mq_insert_requests()
offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
i.e. it calls kblockd_schedule_delayed_work_on().

That means, that we race with another CPU, which is about to execute
__blk_mq_run_hw_queue() work.

Further debugging shows the following traces from different CPUs:

  CPU#0                                  CPU#1
  ----------------------------------     -------------------------------
  reqeust A inserted
  STORE hctx->ctx_map[0] bit marked
  kblockd_schedule...() returns 1
  <schedule to kblockd workqueue>
                                         request B inserted
                                         STORE hctx->ctx_map[1] bit marked
                                         kblockd_schedule...() returns 0
  *** WORK PENDING bit is cleared ***
  flush_busy_ctxs() is executed, but
  bit 1, set by CPU#1, is not observed

As a result request B pended forever.

This behaviour can be explained by speculative LOAD of hctx->ctx_map on
CPU#0, which is reordered with clear of PENDING bit and executed _before_
actual STORE of bit 1 on CPU#1.

The proper fix is an explicit full barrier <mfence>, which guarantees
that clear of PENDING bit is to be executed before all possible
speculative LOADS or STORES inside actual work function.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Michael Wang <yun.wang@profitbricks.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-04-26 11:23:22 -04:00
Sudeep Holla 978fa43623 drivers: firmware: psci: unify enable-method binding on ARM {64,32}-bit systems
Currently ARM CPUs DT bindings allows different enable-method value for
PSCI based systems. On ARM 64-bit this property is required and must be
"psci" while on ARM 32-bit systems this property is optional and must
be "arm,psci" if present.

However, "arm,psci" has always been the compatible string for the PSCI
node, and was never intended to be the enable-method. So this is a bug
in the binding and not a deliberate attempt at specifying 32-bit
differently.

This is problematic if 32-bit OS is run on 64-bit system which has
"psci" as enable-method rather than the expected "arm,psci".

So let's unify the value into "psci" and remove support for "arm,psci"
before it finds any users.

Reported-by: Soby Mathew <Soby.Mathew@arm.com>
Cc: Rob Herring <robh+dt@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2016-04-26 12:46:08 +02:00
Eric Dumazet fc96256c90 net/mlx4_en: fix spurious timestamping callbacks
When multiple skb are TX-completed in a row, we might incorrectly keep
a timestamp of a prior skb and cause extra work.

Fixes: ec693d4701 ("net/mlx4_en: Add HW timestamping (TS) support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 01:13:18 -04:00
Ivan Babrou 9f5db53507 net: dummy: remove note about being Y by default
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 01:11:55 -04:00
Jiri Benc 3d6d30d60a cxgbi: fix uninitialized flowi6
ip6_route_output looks into different fields in the passed flowi6 structure,
yet cxgbi passes garbage in nearly all those fields. Zero the structure out
first.

Fixes: fc8d0590d9 ("libcxgbi: Add ipv6 api to driver")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25 16:20:49 -04:00
Tejun Heo 264a0ae164 memcg: relocate charge moving from ->attach to ->post_attach
Hello,

So, this ended up a lot simpler than I originally expected.  I tested
it lightly and it seems to work fine.  Petr, can you please test these
two patches w/o the lru drain drop patch and see whether the problem
is gone?

Thanks.
------ 8< ------
If charge moving is used, memcg performs relabeling of the affected
pages from its ->attach callback which is called under both
cgroup_threadgroup_rwsem and thus can't create new kthreads.  This is
fragile as various operations may depend on workqueues making forward
progress which relies on the ability to create new kthreads.

There's no reason to perform charge moving from ->attach which is deep
in the task migration path.  Move it to ->post_attach which is called
after the actual migration is finished and cgroup_threadgroup_rwsem is
dropped.

* move_charge_struct->mm is added and ->can_attach is now responsible
  for pinning and recording the target mm.  mem_cgroup_clear_mc() is
  updated accordingly.  This also simplifies mem_cgroup_move_task().

* mem_cgroup_move_task() is now called from ->post_attach instead of
  ->attach.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Debugged-and-tested-by: Petr Mladek <pmladek@suse.com>
Reported-by: Cyril Hrubis <chrubis@suse.cz>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Fixes: 1ed1328792 ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem")
Cc: <stable@vger.kernel.org> # 4.4+
2016-04-25 15:45:14 -04:00
Tejun Heo 5cf1cacb49 cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
Since e93ad19d05 ("cpuset: make mm migration asynchronous"), cpuset
kicks off asynchronous NUMA node migration if necessary during task
migration and flushes it from cpuset_post_attach_flush() which is
called at the end of __cgroup_procs_write().  This is to avoid
performing migration with cgroup_threadgroup_rwsem write-locked which
can lead to deadlock through dependency on kworker creation.

memcg has a similar issue with charge moving, so let's convert it to
an official callback rather than the current one-off cpuset specific
function.  This patch adds cgroup_subsys->post_attach callback and
makes cpuset register cpuset_post_attach_flush() as its ->post_attach.

The conversion is mostly one-to-one except that the new callback is
called under cgroup_mutex.  This is to guarantee that no other
migration operations are started before ->post_attach callbacks are
finished.  cgroup_mutex is one of the outermost mutex in the system
and has never been and shouldn't be a problem.  We can add specialized
synchronization around __cgroup_procs_write() but I don't think
there's any noticeable benefit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> # 4.4+ prerequisite for the next patch
2016-04-25 15:45:14 -04:00
David S. Miller 841645b5f2 ipv6: Revert optional address flusing on ifdown.
This reverts the following three commits:

70af921db6
799977d9aa
f1705ec197

The feature was ill conceived, has terrible semantics, and has added
nothing but regressions to the already fragile ipv6 stack.

Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25 15:33:55 -04:00
Azael Avalos a30b8f81d9 toshiba_acpi: Fix regression caused by hotkey enabling value
Commit 52cbae0127 ("toshiba_acpi: Change default Hotkey enabling value")
changed the hotkeys enabling value, as it was the same value Windows uses,
however, it turns out that the value tells the EC that the driver will now
take care of the hardware events like the physical RFKill switch or the
pointing device toggle button.

This patch reverts such commit by changing the default hotkey enabling
value to 0x09, which enables hotkey events only, making the hardware
buttons working again.

Fixes bugs 113331 and 114941.

Signed-off-by: Azael Avalos <coproscefalo@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-04-25 10:31:59 -07:00
Linus Torvalds bcc981e9ed Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes a couple of regressions in the talitos driver that were
  introduced back in 4.3.

  The first bug causes a crash when the driver's AEAD functionality is
  used while the second bug prevents its AEAD feature from working once
  you get past the first bug"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: talitos - fix AEAD tcrypt tests
  crypto: talitos - fix crash in talitos_cra_init()
2016-04-25 09:32:45 -07:00
Kevin Hilman 004cb62efd Enable dm814x and dra62x clock driver. This branch has a dependency
to the clk-ti branch from the Linux clk tree for the ADPLL clock driver.
 Otherwise things won't keep booting properly when we flip over to use
 the clock driver instead of fixed clocks set up by the bootloader.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW2MN/AAoJEBvUPslcq6VzVqEP/A2qxFhMYyf/q0Za54RNRHjj
 iHbpxmfPUrFIGO/IQqkk7bl5Ufgs+L+uW+mjoe0CvIYGEoVDY/cfe3AZW734cGMO
 rLbs3CovoHst2UTZbcKVRs/QjkN+R9nvowvvqK87vSbpAbMX7pRrEcfZSN77T4ej
 vRbtRyD0msNCm8s0TgdpQ6ObK0GHmfqtq3GFA/g5Rs5m1X7h9PaD+PWUsVDugKyM
 9bEmGZSyOaRN5qGrOX/PTTKK3OiCOSJXB/8tnRgNW7DaISop+KJwxzMhBHyx2iKP
 JaD5lDJk/ArE+4Za0FmSpug8muUHLHH+htCu76uU6qi/s7+q8g9UR9ewzfLg7/rG
 wYl8IwDBVxWbi5PLnHPRrghGheEkM2ykiqEp5DqlZ1B7vfGv4Wl/3ZxiL1ZO5FZv
 jre1yHm5sguLCtA6BErWi69SCI3rpi4GrDQuYnlAbg9ABRH+YBVZ+3lCqtBVyh6J
 5yHexIELEEOUHHT2KLPNi7HexzK6xtMpsh4QlP1Yxt2o6dVI2LmTS+oKtU6QAjSn
 WE5pbLd9XZUBaDO4jw3T01edhGVpf2OxmtGSOd/ptF5JjZoDA2Kg+NcDE0eTPL72
 v7yVlNVU3g8PlD1EfRENi/KL0K4GLNq3eT2WCFQYp/nYuN1dVkk5xraEv3w7N0iZ
 vGlh9twbt7VvjiGoImcF
 =2Lew
 -----END PGP SIGNATURE-----

Merge tag 'omap-for-v4.6/dt-ti81xx-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes

Enable dm814x and dra62x clock driver. This branch has a dependency
to the clk-ti branch from the Linux clk tree for the ADPLL clock driver.
Otherwise things won't keep booting properly when we flip over to use
the clock driver instead of fixed clocks set up by the bootloader.

* tag 'omap-for-v4.6/dt-ti81xx-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: Add clocks for dm814x ADPLL
2016-04-25 08:55:17 -07:00
Eric Engestrom da67e68c0e Documentation: dt: arc: fix spelling mistakes
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-04-25 14:21:09 +05:30
Paolo Abeni 391a20333b ipv4/fib: don't warn when primary address is missing if in_dev is dead
After commit fbd40ea018 ("ipv4: Don't do expensive useless work
during inetdev destroy.") when deleting an interface,
fib_del_ifaddr() can be executed without any primary address
present on the dead interface.

The above is safe, but triggers some "bug: prim == NULL" warnings.

This commit avoids warning if the in_dev is dead

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 23:26:29 -04:00
Linus Torvalds 02da2d7217 Linux 4.6-rc5 2016-04-24 16:17:05 -07:00
David S. Miller 91e019de2f Merge branch 'mlx5-fixes'
Saeed Mahameed says:

====================
mlx5 driver updates and fixes

Changes from V0:
	- Dropped: ("net/mlx5e: Reset link modes upon setting speed to zero")
	- Fixed compilation issue introduced to mlx5_ib driver.
	- Rebased to df63719390 ('Revert "Prevent NUll pointer dereference with two PHYs on cpsw"')

This series has few bug fixes for mlx5 core and ethernet driver.

Eli fixed a wrong static local variable declaration in flow steering API.
Majd added the support of ConnectX-5 PF and VF and added the support
for kernel shutdown pci callback for more robust reboot procedures.
Maor fixed a soft lockup in flow steering.
Rana fixed a wrog speed define in mlx5 EN driver.
I also had the chance to introduce some bug fixes in mlx5 EN mtu
reporting and handling.

For -stable:
	net/mlx5_core: Fix soft lockup in steering error flow
	net/mlx5e: Device's mtu field is u16 and not int
	net/mlx5e: Fix minimum MTU
	net/mlx5e: Use vport MTU rather than physical port MTU
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:50 -04:00
Majd Dibbiny 5fc7197d3a net/mlx5: Add pci shutdown callback
This patch introduces kexec support for mlx5.
When switching kernels, kexec() calls shutdown, which unloads
the driver and cleans its resources.

In addition, remove unregister netdev from shutdown flow. This will
allow a clean shutdown, even if some netdev clients did not release their
reference from this netdev. Releasing The HW resources only is enough as
the kernel is shutting down

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:39 -04:00