Commit Graph

676139 Commits

Author SHA1 Message Date
Geert Uytterhoeven 5c4e679898 lib: add module support to array-based sort tests
Allow to compile the array-based sort test code either to a loadable
module, or builtin into the kernel.

Link: http://lkml.kernel.org/r/1488287219-15832-3-git-send-email-geert@linux-m68k.org
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Geert Uytterhoeven ebd03a9aac Revert "lib/test_sort.c: make it explicitly non-modular"
Patch series "lib: add module support to sort tests".

This patch series allows to compile the array-based and linked list sort
test code either to loadable modules, or builtin into the kernel.

It's very valuable to have modular tests, so you can run them just by
insmodding the test modules, instead of needing a separate kernel that
runs them at boot.

This patch (of 3):

This reverts commit 8893f51933.

It's very valuable to have modular tests, so you can run them just by
insmodding the test modules, instead of needing a separate kernel that
runs them at boot.

Link: http://lkml.kernel.org/r/1488287219-15832-2-git-send-email-geert@linux-m68k.org
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Dan Carpenter 8128a31eaa drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
c2port_device_register() never returns NULL, it uses error pointers.

Link: http://lkml.kernel.org/r/20170412083321.GC3250@mwanda
Fixes: 65131cd52b ("c2port: add c2port support for Eurotech Duramar 2150")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Rodolfo Giometti <giometti@linux.it>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Dan Carpenter 146180c052 drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow tests
The "DIV_ROUND_UP(size, PAGE_SIZE)" operation can overflow if "size" is
more than ULLONG_MAX - PAGE_SIZE.

Link: http://lkml.kernel.org/r/20170322111950.GA11279@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Tetsuo Handa 780cbcf287 kernel/hung_task.c: defer showing held locks
When I was running my testcase which may block hundreds of threads on fs
locks, I got lockup due to output from debug_show_all_locks() added by
commit b2d4c2edb2 ("locking/hung_task: Show all locks").

For example, if 1000 threads were blocked in TASK_UNINTERRUPTIBLE state
and 500 out of 1000 threads hold some lock, debug_show_all_locks() from
for_each_process_thread() loop will report locks held by 500 threads for
1000 times.  This is a too much noise.

In order to make sure rcu_lock_break() is called frequently, we should
avoid calling debug_show_all_locks() from for_each_process_thread() loop
because debug_show_all_locks() effectively calls for_each_process_thread()
loop.  Let's defer calling debug_show_all_locks() till before panic() or
leaving for_each_process_thread() loop.

Link: http://lkml.kernel.org/r/1489296834-60436-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Randy Dunlap 31b8cc8077 make help: add tools help target
Add a top-level Makefile help target for Userspace tools.

Also make each help "heading" end with a colon ':'.

Link: http://lkml.kernel.org/r/55c986ff-3966-3e47-2984-7349da2cce51@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Matthias Kaehlcke 7c30f352c8 jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp
jiffies_64 is defined in kernel/time/timer.c with
____cacheline_aligned_in_smp, however this macro is not part of the
declaration of jiffies and jiffies_64 in jiffies.h.

As a result clang generates the following warning:

  kernel/time/timer.c:57:26: error: section does not match previous declaration [-Werror,-Wsection]
  __visible u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;
                           ^
  include/linux/cache.h:39:36: note: expanded from macro '__cacheline_aligned_in_smp'
                                     ^
  include/linux/cache.h:34:4: note: expanded from macro '__cacheline_aligned'
                   __section__(".data..cacheline_aligned")))
                   ^
  include/linux/jiffies.h:77:12: note: previous attribute is here
  extern u64 __jiffy_data jiffies_64;
             ^
  include/linux/jiffies.h:70:38: note: expanded from macro '__jiffy_data'

Link: http://lkml.kernel.org/r/20170403190200.70273-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Cc: Grant Grundler <grundler@chromium.org>
Cc: Michael Davidson <md@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Lorenzo Stoakes 3d88936f35 drivers/virt/fsl_hypervisor.c: use get_user_pages_unlocked()
Moving from get_user_pages() to get_user_pages_unlocked() simplifies the
code and takes advantage of VM_FAULT_RETRY functionality when faulting
in pages.

Link: http://lkml.kernel.org/r/20161101194332.23961-1-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Gao Feng 63259457a2 proc/sysctl: fix the int overflow for jiffies conversion
do_proc_dointvec_jiffies_conv() uses LONG_MAX/HZ as the max value to
avoid overflow.  But actually the *valp is int type, so it still causes
overflow.

For example,

  echo 2147483647 > ./sys/net/ipv4/tcp_keepalive_time

Then,

  cat ./sys/net/ipv4/tcp_keepalive_time

The output is "-1", it is not expected.

Now use INT_MAX/HZ as the max value instead LONG_MAX/HZ to fix it.

Link: http://lkml.kernel.org/r/1490109532-9228-1-git-send-email-fgao@ikuai8.com
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Tobin C. Harding f245e1c17a fs/proc/inode.c: remove cast from memory allocation
Coccinelle emits this warning:

  WARNING: casting value returned by memory allocation function to (struct proc_inode *) is useless.

Remove unnecessary cast.

Link: http://lkml.kernel.org/r/1487745720-16967-1-git-send-email-me@tobin.cc
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Vlastimil Babka baf6a9a1db mm, compaction: finish whole pageblock to reduce fragmentation
The main goal of direct compaction is to form a high-order page for
allocation, but it should also help against long-term fragmentation when
possible.

Most lower-than-pageblock-order compactions are for non-movable
allocations, which means that if we compact in a movable pageblock and
terminate as soon as we create the high-order page, it's unlikely that
the fallback heuristics will claim the whole block.  Instead there might
be a single unmovable page in a pageblock full of movable pages, and the
next unmovable allocation might pick another pageblock and increase
long-term fragmentation.

To help against such scenarios, this patch changes the termination
criteria for compaction so that the current pageblock is finished even
though the high-order page already exists.  Note that it might be
possible that the high-order page formed elsewhere in the zone due to
parallel activity, but this patch doesn't try to detect that.

This is only done with sync compaction, because async compaction is
limited to pageblock of the same migratetype, where it cannot result in
a migratetype fallback.  (Async compaction also eagerly skips
order-aligned blocks where isolation fails, which is against the goal of
migrating away as much of the pageblock as possible.)

As a result of this patch, long-term memory fragmentation should be
reduced.

In testing based on 4.9 kernel with stress-highalloc from mmtests
configured for order-4 GFP_KERNEL allocations, this patch has reduced
the number of unmovable allocations falling back to movable pageblocks
by 20%.  The number

Link: http://lkml.kernel.org/r/20170307131545.28577-9-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Vlastimil Babka 282722b0d2 mm, compaction: restrict async compaction to pageblocks of same migratetype
The migrate scanner in async compaction is currently limited to
MIGRATE_MOVABLE pageblocks.  This is a heuristic intended to reduce
latency, based on the assumption that non-MOVABLE pageblocks are
unlikely to contain movable pages.

However, with the exception of THP's, most high-order allocations are
not movable.  Should the async compaction succeed, this increases the
chance that the non-MOVABLE allocations will fallback to a MOVABLE
pageblock, making the long-term fragmentation worse.

This patch attempts to help the situation by changing async direct
compaction so that the migrate scanner only scans the pageblocks of the
requested migratetype.  If it's a non-MOVABLE type and there are such
pageblocks that do contain movable pages, chances are that the
allocation can succeed within one of such pageblocks, removing the need
for a fallback.  If that fails, the subsequent sync attempt will ignore
this restriction.

In testing based on 4.9 kernel with stress-highalloc from mmtests
configured for order-4 GFP_KERNEL allocations, this patch has reduced
the number of unmovable allocations falling back to movable pageblocks
by 30%.  The number of movable allocations falling back is reduced by
12%.

Link: http://lkml.kernel.org/r/20170307131545.28577-8-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Vlastimil Babka d39773a062 mm, compaction: add migratetype to compact_control
Preparation patch.  We are going to need migratetype at lower layers
than compact_zone() and compact_finished().

Link: http://lkml.kernel.org/r/20170307131545.28577-7-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Vlastimil Babka b682debd97 mm, compaction: change migrate_async_suitable() to suitable_migration_source()
Preparation for making the decisions more complex and depending on
compact_control flags.  No functional change.

Link: http://lkml.kernel.org/r/20170307131545.28577-6-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Vlastimil Babka 02aa0cdd72 mm, page_alloc: count movable pages when stealing from pageblock
When stealing pages from pageblock of a different migratetype, we count
how many free pages were stolen, and change the pageblock's migratetype
if more than half of the pageblock was free.  This might be too
conservative, as there might be other pages that are not free, but were
allocated with the same migratetype as our allocation requested.

While we cannot determine the migratetype of allocated pages precisely
(at least without the page_owner functionality enabled), we can count
pages that compaction would try to isolate for migration - those are
either on LRU or __PageMovable().  The rest can be assumed to be
MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE, which we cannot easily
distinguish.  This counting can be done as part of free page stealing
with little additional overhead.

The page stealing code is changed so that it considers free pages plus
pages of the "good" migratetype for the decision whether to change
pageblock's migratetype.

The result should be more accurate migratetype of pageblocks wrt the
actual pages in the pageblocks, when stealing from semi-occupied
pageblocks.  This should help the efficiency of page grouping by
mobility.

In testing based on 4.9 kernel with stress-highalloc from mmtests
configured for order-4 GFP_KERNEL allocations, this patch has reduced
the number of unmovable allocations falling back to movable pageblocks
by 47%.  The number of movable allocations falling back to other
pageblocks are increased by 55%, but these events don't cause permanent
fragmentation, so the tradeoff should be positive.  Later patches also
offset the movable fallback increase to some extent.

[akpm@linux-foundation.org: merge fix]
Link: http://lkml.kernel.org/r/20170307131545.28577-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:10 -07:00
Vlastimil Babka 3bc48f96cf mm, page_alloc: split smallest stolen page in fallback
The __rmqueue_fallback() function is called when there's no free page of
requested migratetype, and we need to steal from a different one.

There are various heuristics to make this event infrequent and reduce
permanent fragmentation.  The main one is to try stealing from a
pageblock that has the most free pages, and possibly steal them all at
once and convert the whole pageblock.  Precise searching for such
pageblock would be expensive, so instead the heuristics walks the free
lists from MAX_ORDER down to requested order and assumes that the block
with highest-order free page is likely to also have the most free pages
in total.

Chances are that together with the highest-order page, we steal also
pages of lower orders from the same block.  But then we still split the
highest order page.  This is wasteful and can contribute to
fragmentation instead of avoiding it.

This patch thus changes __rmqueue_fallback() to just steal the page(s)
and put them on the freelist of the requested migratetype, and only
report whether it was successful.  Then we pick (and eventually split)
the smallest page with __rmqueue_smallest().  This all happens under
zone lock, so nobody can steal it from us in the process.  This should
reduce fragmentation due to fallbacks.  At worst we are only stealing a
single highest-order page and waste some cycles by moving it between
lists and then removing it, but fallback is not exactly hot path so that
should not be a concern.  As a side benefit the patch removes some
duplicate code by reusing __rmqueue_smallest().

[vbabka@suse.cz: fix endless loop in the modified __rmqueue()]
  Link: http://lkml.kernel.org/r/59d71b35-d556-4fc9-ee2e-1574259282fd@suse.cz
Link: http://lkml.kernel.org/r/20170307131545.28577-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:09 -07:00
Vlastimil Babka 228d7e3390 mm, compaction: remove redundant watermark check in compact_finished()
When detecting whether compaction has succeeded in forming a high-order
page, __compact_finished() employs a watermark check, followed by an own
search for a suitable page in the freelists.  This is not ideal for two
reasons:

 - The watermark check also searches high-order freelists, but has a
   less strict criteria wrt fallback. It's therefore redundant and waste
   of cycles. This was different in the past when high-order watermark
   check attempted to apply reserves to high-order pages.

 - The watermark check might actually fail due to lack of order-0 pages.
   Compaction can't help with that, so there's no point in continuing
   because of that. It's possible that high-order page still exists and
   it terminates.

This patch therefore removes the watermark check.  This should save some
cycles and terminate compaction sooner in some cases.

Link: http://lkml.kernel.org/r/20170307131545.28577-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:09 -07:00
Vlastimil Babka f25ba6dccc mm, compaction: reorder fields in struct compact_control
Patch series "try to reduce fragmenting fallbacks", v3.

Last year, Johannes Weiner has reported a regression in page mobility
grouping [1] and while the exact cause was not found, I've come up with
some ways to improve it by reducing the number of allocations falling
back to different migratetype and causing permanent fragmentation.

The series was tested with mmtests stress-highalloc modified to do
GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
wasn't woken up) on UMA machine with 4GB memory.  There were 5 repeats
of each run, as the extfrag stats are quite volatile (note the stats
below are sums, not averages, as it was less perl hacking for me).

Success rate are the same, already high due to the low allocation order
used, so I'm not including them.

Compaction stats:
(the patches are stacked, and I haven't measured the non-functional-changes
patches separately)

                                     patch 1     patch 2     patch 3     patch 4     patch 7     patch 8
  Compaction stalls                    22449       24680       24846       19765       22059       17480
  Compaction success                   12971       14836       14608       10475       11632        8757
  Compaction failures                   9477        9843       10238        9290       10426        8722
  Page migrate success               3109022     3370438     3312164     1695105     1608435     2111379
  Page migrate failure                911588     1149065     1028264     1112675     1077251     1026367
  Compaction pages isolated          7242983     8015530     7782467     4629063     4402787     5377665
  Compaction migrate scanned       980838938   987367943   957690188   917647238   947155598  1018922197
  Compaction free scanned          557926893   598946443   602236894   594024490   541169699   763651731
  Compaction cost                      10243       10578       10304        8286        8398        9440

Compaction stats are mostly within noise until patch 4, which decreases
the number of compactions, and migrations.  Part of that could be due to
more pageblocks marked as unmovable, and async compaction skipping
those.  This changes a bit with patch 7, but not so much.  Patch 8
increases free scanner stats and migrations, which comes from the
changed termination criteria.  Interestingly number of compactions
decreases - probably the fully compacted pageblock satisfies multiple
subsequent allocations, so it amortizes.

Next comes the extfrag tracepoint, where "fragmenting" means that an
allocation had to fallback to a pageblock of another migratetype which
wasn't fully free (which is almost all of the fallbacks).  I have
locally added another tracepoint for "Page steal" into
steal_suitable_fallback() which triggers in situations where we are
allowed to do move_freepages_block().  If we decide to also do
set_pageblock_migratetype(), it's "Pages steal with pageblock" with
break down for which allocation migratetype we are stealing and from
which fallback migratetype.  The last part "due to counting" comes from
patch 4 and counts the events where the counting of movable pages
allowed us to change pageblock's migratetype, while the number of free
pages alone wouldn't be enough to cross the threshold.

                                                       patch 1     patch 2     patch 3     patch 4     patch 7     patch 8
  Page alloc extfrag event                            10155066     8522968    10164959    15622080    13727068    13140319
  Extfrag fragmenting                                 10149231     8517025    10159040    15616925    13721391    13134792
  Extfrag fragmenting for unmovable                     159504      168500      184177       97835       70625       56948
  Extfrag fragmenting unmovable placed with movable     153613      163549      172693       91740       64099       50917
  Extfrag fragmenting unmovable placed with reclaim.      5891        4951       11484        6095        6526        6031
  Extfrag fragmenting for reclaimable                     4738        4829        6345        4822        5640        5378
  Extfrag fragmenting reclaimable placed with movable     1836        1902        1851        1579        1739        1760
  Extfrag fragmenting reclaimable placed with unmov.      2902        2927        4494        3243        3901        3618
  Extfrag fragmenting for movable                      9984989     8343696     9968518    15514268    13645126    13072466
  Pages steal                                           179954      192291      210880      123254       94545       81486
  Pages steal with pageblock                             22153       18943       20154       33562       29969       33444
  Pages steal with pageblock for unmovable               14350       12858       13256       20660       19003       20852
  Pages steal with pageblock for unmovable from mov.     12812       11402       11683       19072       17467       19298
  Pages steal with pageblock for unmovable from recl.     1538        1456        1573        1588        1536        1554
  Pages steal with pageblock for movable                  7114        5489        5965       11787       10012       11493
  Pages steal with pageblock for movable from unmov.      6885        5291        5541       11179        9525       10885
  Pages steal with pageblock for movable from recl.        229         198         424         608         487         608
  Pages steal with pageblock for reclaimable               689         596         933        1115         954        1099
  Pages steal with pageblock for reclaimable from unmov.   273         219         537         658         547         667
  Pages steal with pageblock for reclaimable from mov.     416         377         396         457         407         432
  Pages steal with pageblock due to counting                                                 11834       10075        7530
  ... for unmovable                                                                           8993        7381        4616
  ... for movable                                                                             2792        2653        2851
  ... for reclaimable                                                                           49          41          63

What we can see is that "Extfrag fragmenting for unmovable" and "...
placed with movable" drops with almost each patch, which is good as we
are polluting less movable pageblocks with unmovable pages.

The most significant change is patch 4 with movable page counting.  On
the other hand it increases "Extfrag fragmenting for movable" by 50%.
"Pages steal" drops though, so these movable allocation fallbacks find
only small free pages and are not allowed to steal whole pageblocks
back.  "Pages steal with pageblock" raises, because the patch increases
the chances of pageblock migratetype changes to happen.  This affects
all migratetypes.

The summary is that patch 4 is not a clear win wrt these stats, but I
believe that the tradeoff it makes is a good one.  There's less
pollution of movable pageblocks by unmovable allocations.  There's less
stealing between pageblock, and those that remain have higher chance of
changing migratetype also the pageblock itself, so it should more
faithfully reflect the migratetype of the pages within the pageblock.
The increase of movable allocations falling back to unmovable pageblock
might look dramatic, but those allocations can be migrated by compaction
when needed, and other patches in the series (7-9) improve that aspect.

Patches 7 and 8 continue the trend of reduced unmovable fallbacks and
also reduce the impact on movable fallbacks from patch 4.

[1] https://www.spinics.net/lists/linux-mm/msg114237.html

This patch (of 8):

While currently there are (mostly by accident) no holes in struct
compact_control (on x86_64), but we are going to add more bool flags, so
place them all together to the end of the structure.  While at it, just
order all fields from largest to smallest.

Link: http://lkml.kernel.org/r/20170307131545.28577-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:09 -07:00
Jon Mason 922c60e89d net: mdio-mux: bcm-iproc: call mdiobus_free() in error path
If an error is encountered in mdio_mux_init(), the error path will call
mdiobus_free().  Since mdiobus_register() has been called prior to
mdio_mux_init(), the bus->state will not be MDIOBUS_UNREGISTERED.  This
causes a BUG_ON() in mdiobus_free().  To correct this issue, add an
error path for mdio_mux_init() which calls mdiobus_unregister() prior to
mdiobus_free().

Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Fixes: 98bc865a1e ("net: mdio-mux: Add MDIO mux driver for iProc SoCs")
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 17:59:33 -04:00
Mikulas Patocka acfead32f3 ide: don't call memcpy with the same source and destination
The parisc architecture recently reimplemented the memcpy function and
their reimplementation crashed when source and destination overlapped.

The crash happened in the function ide_complete_cmd where memcpy is called
with the same source and destination pointer. According to the C
specification, memcpy behavior is undefined if the source and destination
range overlaps. This patches fixes the undefined behavior.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 17:36:39 -04:00
Geliang Tang 1a1df90ae1 ide: use setup_timer
Use setup_timer() instead of init_timer() to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 17:36:39 -04:00
Grygorii Strashko 48f5bccc60 net: ethernet: ti: cpsw: adjust cpsw fifos depth for fullduplex flow control
When users set flow control using ethtool the bits are set properly in the
CPGMAC_SL MACCONTROL register, but the FIFO depth in the respective Port n
Maximum FIFO Blocks (Pn_MAX_BLKS) registers remains set to the minimum size
reset value. When receive flow control is enabled on a port, the port's
associated FIFO block allocation must be adjusted. The port RX allocation
must increase to accommodate the flow control runout. The TRM recommends
numbers of 5 or 6.

Hence, apply required Port FIFO configuration to
Pn_MAX_BLKS.Pn_TX_MAX_BLKS=0xF and Pn_MAX_BLKS.Pn_RX_MAX_BLKS=0x5 during
interface initialization.

Cc: Schuyler Patton <spatton@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 17:33:19 -04:00
WANG Cong 242d3a49a2 ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
For each netns (except init_net), we initialize its null entry
in 3 places:

1) The template itself, as we use kmemdup()
2) Code around dst_init_metrics() in ip6_route_net_init()
3) ip6_route_dev_notify(), which is supposed to initialize it after
   loopback registers

Unfortunately the last one still happens in a wrong order because
we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to
net->loopback_dev's idev, thus we have to do that after we add
idev to loopback. However, this notifier has priority == 0 same as
ipv6_dev_notf, and ipv6_dev_notf is registered after
ip6_route_dev_notifier so it is called actually after
ip6_route_dev_notifier. This is similar to commit 2f460933f5
("ipv6: initialize route null entry in addrconf_init()") which
fixes init_net.

Fix it by picking a smaller priority for ip6_route_dev_notifier.
Also, we have to release the refcnt accordingly when unregistering
loopback_dev because device exit functions are called before subsys
exit functions.

Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 17:31:24 -04:00
David S. Miller 29cee56c0b A couple more fixes:
* don't try to authenticate during reconfiguration, which causes
    drivers to get confused
  * fix a kernel-doc warning for a recently merged change
  * fix MU-MIMO group configuration (relevant only for monitor mode)
  * more rate flags fix: remove stray RX_ENC_FLAG_40MHZ
  * fix IBSS probe response allocation size
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAlkQhDgACgkQa3t4Rpy0
 AB07Pg/+MGBW/OFoxdsQtq7eGPzVuQXC4NCXjzBunueY/cTpExuzoOWvKTRA+p7o
 pxgqxhngO3l8u/FqhWE7jh+aGxydmq8BhQefrAMi6VgkH7oh4JwP6Mjxf4xGpxZk
 W15oNncNaxLiC4U+GaVUZ0oEsc0fCFuqsmAEGas25VOOQyr4NNJ9jivecOI70bHH
 b1wvCilwDAIeg7CKAtxja40/81ldnm9A7mSABGM6M13AJ1yiNnf9FEteSxAr7s4r
 xx6flFQQzlT+pzoVZeEg0u6yGWqucL/4V9OGcjJcoyLVnbey+1hlypLef4n+Cgol
 yP8yR5n1I3RWsJEsOftfVvjG54e/UAIR4xkGe+LHiWn7XjIK6EbCrmYt1uxknZU7
 LTkFj6b4EHgGPRL0BrIRA4FpQdLeslbwsMF96gWP5VQJE3T4gocuTv4McG2g9Isl
 suiB1zn4Y24UuEHdg+mDlIiEuOr5h4+XvIBOHAw1ZfbVKSKBDLFbqvYF/sHbgsDF
 uR6CvMuGeRM2wOxD8QXFLueNGS7Znrd2ETVx2hx35/qAR/X54nu5kEqcsvjgEamY
 vPUr0RO/+plltQgiBBtTrr6x6uGJg1AsNEyrlMng5hP/mT3yLziU2G8qBozU5e7c
 mDA/7Lz7wk3a6MKVTzt8vaCIcpC42oVhvwS/6kKQ/BZ4GbajdBM=
 =FcxE
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2017-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A couple more fixes:
 * don't try to authenticate during reconfiguration, which causes
   drivers to get confused
 * fix a kernel-doc warning for a recently merged change
 * fix MU-MIMO group configuration (relevant only for monitor mode)
 * more rate flags fix: remove stray RX_ENC_FLAG_40MHZ
 * fix IBSS probe response allocation size
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 16:02:23 -04:00
Jim Baxter aeca3a77b1 net: cdc_ncm: Fix TX zero padding
The zero padding that is added to NTB's does
not zero the memory correctly.
This is because the skb_put modifies the value
of skb_out->len which results in the memset
command not setting any memory to zero as
(ctx->tx_max - skb_out->len) == 0.

I have resolved this by storing the size of
the memory to be zeroed before the skb_put
and using this in the memset call.

Signed-off-by: Jim Baxter <jim_baxter@mentor.com>
Reviewed-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 16:01:10 -04:00
Linus Torvalds 2d3e4866de * ARM: HYP mode stub supports kexec/kdump on 32-bit; improved PMU
support; virtual interrupt controller performance improvements; support
 for userspace virtual interrupt controller (slower, but necessary for
 KVM on the weird Broadcom SoCs used by the Raspberry Pi 3)
 
 * MIPS: basic support for hardware virtualization (ImgTec
 P5600/P6600/I6400 and Cavium Octeon III)
 
 * PPC: in-kernel acceleration for VFIO
 
 * s390: support for guests without storage keys; adapter interruption
 suppression
 
 * x86: usual range of nVMX improvements, notably nested EPT support for
 accessed and dirty bits; emulation of CPL3 CPUID faulting
 
 * generic: first part of VCPU thread request API; kvm_stat improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZEHUkAAoJEL/70l94x66DBeYH/09wrpJ2FjU4Rqv7FxmqgWfH
 9WGi4wvn/Z+XzQSyfMJiu2SfZVzU69/Y67OMHudy7vBT6knB+ziM7Ntoiu/hUfbG
 0g5KsDX79FW15HuvuuGh9kSjUsj7qsQdyPZwP4FW/6ZoDArV9mibSvdjSmiUSMV/
 2wxaoLzjoShdOuCe9EABaPhKK0XCrOYkygT6Paz1pItDxaSn8iW3ulaCuWMprUfG
 Niq+dFemK464E4yn6HVD88xg5j2eUM6bfuXB3qR3eTR76mHLgtwejBzZdDjLG9fk
 32PNYKhJNomBxHVqtksJ9/7cSR6iNPs7neQ1XHemKWTuYqwYQMlPj1NDy0aslQU=
 =IsiZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - HYP mode stub supports kexec/kdump on 32-bit
   - improved PMU support
   - virtual interrupt controller performance improvements
   - support for userspace virtual interrupt controller (slower, but
     necessary for KVM on the weird Broadcom SoCs used by the Raspberry
     Pi 3)

  MIPS:
   - basic support for hardware virtualization (ImgTec P5600/P6600/I6400
     and Cavium Octeon III)

  PPC:
   - in-kernel acceleration for VFIO

  s390:
   - support for guests without storage keys
   - adapter interruption suppression

  x86:
   - usual range of nVMX improvements, notably nested EPT support for
     accessed and dirty bits
   - emulation of CPL3 CPUID faulting

  generic:
   - first part of VCPU thread request API
   - kvm_stat improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  kvm: nVMX: Don't validate disabled secondary controls
  KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
  Revert "KVM: Support vCPU-based gfn->hva cache"
  tools/kvm: fix top level makefile
  KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
  KVM: Documentation: remove VM mmap documentation
  kvm: nVMX: Remove superfluous VMX instruction fault checks
  KVM: x86: fix emulation of RSM and IRET instructions
  KVM: mark requests that need synchronization
  KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
  KVM: add explicit barrier to kvm_vcpu_kick
  KVM: perform a wake_up in kvm_make_all_cpus_request
  KVM: mark requests that do not need a wakeup
  KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
  KVM: x86: always use kvm_make_request instead of set_bit
  KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
  s390: kvm: Cpu model support for msa6, msa7 and msa8
  KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  kvm: better MWAIT emulation for guests
  KVM: x86: virtualize cpuid faulting
  ...
2017-05-08 12:37:56 -07:00
Linus Torvalds 9c6ee01ed5 Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
 "Lots of little things this time:

   - allow modules to be autoloaded according to the HWCAP feature bits
     (used primarily for crypto modules)

   - split module core and init PLT sections, since the core code and
     init code could be placed far apart, and the PLT sections need to
     be local to the code block.

   - three patches from Chris Brandt to allow Cortex-A9 L2 cache
     optimisations to be disabled where a SoC didn't wire up the out of
     band signals.

   - NoMMU compliance fixes, avoiding corruption of vector table which
     is not being used at this point, and avoiding possible register
     state corruption when switching mode.

   - fixmap memory attribute compliance update.

   - remove unnecessary locking from update_sections_early()

   - ftrace fix for DEBUG_RODATA with !FRAME_POINTER"

* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8672/1: mm: remove tasklist locking from update_sections_early()
  ARM: 8671/1: V7M: Preserve registers across switch from Thread to Handler mode
  ARM: 8670/1: V7M: Do not corrupt vector table around v7m_invalidate_l1 call
  ARM: 8668/1: ftrace: Fix dynamic ftrace with DEBUG_RODATA and !FRAME_POINTER
  ARM: 8667/3: Fix memory attribute inconsistencies when using fixmap
  ARM: 8663/1: wire up HWCAP/HWCAP2 feature bits to the CPU modalias
  ARM: 8666/1: mm: dump: Add domain to output
  ARM: 8662/1: module: split core and init PLT sections
  ARM: 8661/1: dts: r7s72100: add l2 cache
  ARM: 8660/1: shmobile: r7s72100: Enable L2 cache
  ARM: 8659/1: l2c: allow CA9 optimizations to be disabled
2017-05-08 12:32:00 -07:00
Linus Torvalds d9dc089583 Xtensa updates for v4.12
- clearly mark references to spilled register locations with SPILL_SLOT
   macros;
 - clean up xtensa ptrace: use generic tracehooks, move internal kernel
   definitions from uapi/asm to asm, make locally-used functions static,
   fix code style and alignment;
 - use command line parameters passed to ISS as kernel command line.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZD1w5AAoJEFH5zJH4P6BEFAwP+gM22rd9BA5QVHbosqlGVwiv
 /GLfMSTbeO9OEIF9+YOakfzwYWFUXMrT3SjQ8WPE1e7sY61rdggUMfIvXafbgXiE
 Rj/pNXSQmN4jQpeDvRuGBgdl0V5QCsQ+czmQupv3xREgt/Q5iJiFQKvPprawaFMe
 HlXS8+RV6n0D+8C7xxyDqYuahfXmqPZB9fwV/AB2z/ccnARnIs72C6qzOQlffTKF
 mTxQWPQ6bvTTgHdNozWNVUfCNfy9Npf6sFg9jrx317r0CW1dBcaaNokFHYD+UGcC
 HBKMtmHBdV1KpwNwBl5Ucj7efsVj0JdQOhM37WCfQhx+GPhPbyw4JQ5Cr4JvFHCA
 PmYj1AaeCaeQfAfSqZQcORSFJUCFAkk8uydzWTIzp9pzItwTsYnkG2ETNRbHABPv
 puTPnvrfGvkY8VFm6DdaF+ga470rUnHWtDE+BcmPImpG2bj/LGW7NaOtR0xrxRKe
 ztQbNAN9w8n1J00FGnEc6UO0sroLFObzP0AujHEdjaCozC2HpI0H1nuiAPtEHTTs
 MSynWyPaE/lA9DsRknfe3BiPVYXCK5BKSCorKZhzMFydpLHD7AxlqPJBnpaSK0aG
 xZ6umtZ2bGrnXWXRrFREZ6YfrjQbQ1Ffv+NTYZsLREPWDa3gE3/jDmhWkETnz1iN
 u+DcKJnTh0gYDAV/B98q
 =Jcjj
 -----END PGP SIGNATURE-----

Merge tag 'xtensa-20170507' of git://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa updates from Max Filippov:

 - clearly mark references to spilled register locations with SPILL_SLOT
   macros

 - clean up xtensa ptrace: use generic tracehooks, move internal kernel
   definitions from uapi/asm to asm, make locally-used functions static,
   fix code style and alignment

 - use command line parameters passed to ISS as kernel command line.

* tag 'xtensa-20170507' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: clean up access to spilled registers locations
  xtensa: use generic tracehooks
  xtensa: move internal ptrace definitions from uapi/asm to asm
  xtensa: clean up xtensa/kernel/ptrace.c
  xtensa: drop unused fast_io_protect function
  xtensa: use ITLB_HIT_BIT instead of hardcoded number
  xtensa: ISS: update kernel command line in platform_setup
  xtensa: ISS: add argc/argv simcall definitions
  xtensa: ISS: cleanup setup.c
2017-05-08 12:29:46 -07:00
Linus Torvalds 70ef8f0d37 for-f2fs-4.12
In this round, we've focused on enhancing performance with regards to block
 allocation, GC, and discard/in-place-update IO controls. There are a bunch
 of clean-ups as well as minor bug fixes.
 
 = Enhancement
 - disable heap-based allocation by default
 - issue small-sized discard commands by default
 - change the policy of data hotness for logging
 - distinguish IOs in terms of size and wbc type
 - start SSR earlier to avoid foreground GC
 - enhance data structures managing discard commands
 - enhance in-place update flow
 - add some more fault injection routines
 - secure one more xattr entry
 
 = Bug fix
 - calculate victim cost for GC correctly
 - remain correct victim segment number for GC
 - race condition in nid allocator and initializer
 - stale pointer produced by atomic_writes
 - fix missing REQ_SYNC for flush commands
 - handle missing errors in more corner cases
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZEKXrAAoJEEAUqH6CSFDSJJ8P/1Zy0NS9TM/PFtT7Sevb6vgC
 LcKLtX1bVhUuX9wAt5Q6BZ9927tCQPt5vLEYUxtniqEQaC0fsJAMbRYot+gR/dvN
 4bGgv1TeVST5pKbmctzhAL30PvZ1w4QS6dLvPMm2sPQSrPKGUGt0J8wPiHHZuvH4
 pygKzDxbrIJTeMhLm9tgFg7dWTJXV3VDb57WpA1AM1LAFVsIPF4vZnryLv3GsRmY
 eGRxgZEtt/90hCRbEcPirPZrtpv/O5f12K4Vp/NPw+4XGMEk+nTYndq6rlUWVNjg
 iPEDuxONyk/yb274SqB6sbNDuxHOqn7stGJepdUpSbprIsLZ0RmMaYWjSNsLU3Vh
 p4fAzRqvfSqAHCt0FEL/vT8M9ST5xQRVr9P/l0kDK5Ww95RROd05bEaGm/sKc7NB
 PHiWUoMIFFmuVsoCi6sM0AKps53ZGON8GEUyVKyM7NWTw1oWLPWifGMthEkysmwm
 08SdU5+XqbCeyMPAA2GURqMA5A8ssuA8+F0Citf4JPckQHPPj5pAydmx2wVlfBlc
 /bneR7T/8OsUbxgG8JSbdHUiPcjb20F0GTxSOTXiV/AaZAMCtyETnw64K2V6E0n7
 uraKcYYhypyphCj/IYc4vnQ3dCu3U2/NvTYEVX8DBvboN38/JVqmNWgQx9g+tLzj
 +r5s7PqTDuXv5Cfzc5NC
 =SBUb
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've focused on enhancing performance with regards to
  block allocation, GC, and discard/in-place-update IO controls. There
  are a bunch of clean-ups as well as minor bug fixes.

  Enhancements:
   - disable heap-based allocation by default
   - issue small-sized discard commands by default
   - change the policy of data hotness for logging
   - distinguish IOs in terms of size and wbc type
   - start SSR earlier to avoid foreground GC
   - enhance data structures managing discard commands
   - enhance in-place update flow
   - add some more fault injection routines
   - secure one more xattr entry

  Bug fixes:
   - calculate victim cost for GC correctly
   - remain correct victim segment number for GC
   - race condition in nid allocator and initializer
   - stale pointer produced by atomic_writes
   - fix missing REQ_SYNC for flush commands
   - handle missing errors in more corner cases"

* tag 'for-f2fs-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (111 commits)
  f2fs: fix a mount fail for wrong next_scan_nid
  f2fs: enhance scalability of trace macro
  f2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGS
  f2fs: Make flush bios explicitely sync
  f2fs: show available_nids in f2fs/status
  f2fs: flush dirty nats periodically
  f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard
  f2fs: allow cpc->reason to indicate more than one reason
  f2fs: release cp and dnode lock before IPU
  f2fs: shrink size of struct discard_cmd
  f2fs: don't hold cmd_lock during waiting discard command
  f2fs: nullify fio->encrypted_page for each writes
  f2fs: sanity check segment count
  f2fs: introduce valid_ipu_blkaddr to clean up
  f2fs: lookup extent cache first under IPU scenario
  f2fs: reconstruct code to write a data page
  f2fs: introduce __wait_discard_cmd
  f2fs: introduce __issue_discard_cmd
  f2fs: enable small discard by default
  f2fs: delay awaking discard thread
  ...
2017-05-08 12:24:17 -07:00
David S. Miller b210aeaedf Merge branch 'stmmac-pci-fix-crash-on-Intel-Galileo-Gen2'
Andy Shevchenko says:

====================
stmmac: pci: Fix crash on Intel Galileo Gen2

Due to misconfiguration of PCI driver for Intel Quark the user will get
a kernel crash:

udhcpc: started, v1.26.2
stmmaceth 0000:00:14.6 eth0: device MAC address 98:4f:ee:05:ac:47
Generic PHY stmmac-a6:01: attached PHY driver [Generic PHY] (mii_bus:phy_addr=stmmac-a6:01, irq=-1)
stmmaceth 0000:00:14.6 eth0: IEEE 1588-2008 Advanced Timestamp supported
stmmaceth 0000:00:14.6 eth0: registered PTP clock
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

udhcpc: sending discover

stmmaceth 0000:00:14.6 eth0: Link is Up - 100Mbps/Full - flow control off
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: stmmac_xmit+0xf1/0x1080

Fix this by adding necessary settings.

P.S. I split fix to three patches according to what each of them adds.
====================

Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 15:15:03 -04:00
Andy Shevchenko 70fe4432bb stmmac: pci: split out common_default_data() helper
New helper is added in order to prevent misconfiguration happened
for one of the platforms when configuration data is expanded.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 15:15:02 -04:00
Andy Shevchenko efcd24147f stmmac: pci: RX queue routing configuration
The commit abe80fdc6e

    ("net: stmmac: RX queue routing configuration")

missed Intel Quark configuration. Append it here.

Fixes: abe80fdc6e ("net: stmmac: RX queue routing configuration")
Cc: Joao Pinto <Joao.Pinto@synopsys.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 15:15:02 -04:00
Andy Shevchenko a1437e57af stmmac: pci: TX and RX queue priority configuration
The commit a8f5102af2

	("net: stmmac: TX and RX queue priority configuration")

missed Intel Quark configuration. Append it here.

Fixes: a8f5102af2 ("net: stmmac: TX and RX queue priority configuration")
Cc: Joao Pinto <Joao.Pinto@synopsys.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 15:15:02 -04:00
Andy Shevchenko 05c5d00419 stmmac: pci: set default number of rx and tx queues
The commit 26d6851fd2

	("net: stmmac: set default number of rx and tx queues in stmmac_pci")

missed Intel Quark configuration. Append it here.

Fixes: 26d6851fd2 ("net: stmmac: set default number of rx and tx queues in stmmac_pci")
Cc: Joao Pinto <Joao.Pinto@synopsys.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Joao Pinto <jpinto@synopsys.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 15:15:02 -04:00
Hangbin Liu 8ed508fd4b vti: check nla_put_* return value
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 15:10:31 -04:00
Daniel Borkmann 0d0e57697f bpf: don't let ldimm64 leak map addresses on unprivileged
The patch fixes two things at once:

1) It checks the env->allow_ptr_leaks and only prints the map address to
   the log if we have the privileges to do so, otherwise it just dumps 0
   as we would when kptr_restrict is enabled on %pK. Given the latter is
   off by default and not every distro sets it, I don't want to rely on
   this, hence the 0 by default for unprivileged.

2) Printing of ldimm64 in the verifier log is currently broken in that
   we don't print the full immediate, but only the 32 bit part of the
   first insn part for ldimm64. Thus, fix this up as well; it's okay to
   access, since we verified all ldimm64 earlier already (including just
   constants) through replace_map_fd_with_map_ptr().

Fixes: 1be7f75d16 ("bpf: enable non-root eBPF programs")
Fixes: cbd3570086 ("bpf: verifier (add ability to receive verification log)")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 15:06:46 -04:00
Geliang Tang 871ff2ebe0 yam: use memdup_user
Use memdup_user() helper instead of open-coding to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 15:02:10 -04:00
Geliang Tang 294316a4af net/hippi/rrunner: use memdup_user
Use memdup_user() helper instead of open-coding to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 15:02:09 -04:00
Ganesh Goudar 3bb4858fda cxgb4: avoid disabling FEC by default
Recent Chelsio firmware started using few port capablity bits to
manage FEC and as driver was not aware of FEC changes those bits
were zeroed, consequently disabling FEC.

Avoid zeroing those bits and default to whatever the firmware
tells us the Link is currently advertising.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 15:00:57 -04:00
Christophe Jaillet 8ce7aaaa97 net: dsa: loop: Check for memory allocation failure
If 'devm_kzalloc' fails, a NULL pointer will be dereferenced.
Return -ENOMEM instead, as done for some other memory allocation just a
few lines above.

Fixes: 98cd1552ea ("net: dsa: Mock-up driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 14:58:36 -04:00
Hangbin Liu d62844a825 bonding: check nla_put_be32 return value
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 14:57:05 -04:00
Dan Carpenter ac45bd93a5 bnxt_en: allocate enough space for ->ntp_fltr_bmap
We have the number of longs, but we need to calculate the number of
bytes required.

Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 14:47:31 -04:00
Kees Cook df5303a8aa qlge: Avoid reading past end of buffer
Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, use strncpy() which will fill the trailing bytes
with zeros.

This was found with the future CONFIG_FORTIFY_SOURCE feature.

Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 14:41:42 -04:00
Kees Cook 4dc69c1c1f bna: ethtool: Avoid reading past end of buffer
Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, use strncpy() which will fill the trailing bytes
with zeros.

This was found with the future CONFIG_FORTIFY_SOURCE feature.

Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 14:41:42 -04:00
Kees Cook 9e4eb1ce47 bna: Avoid reading past end of buffer
Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, use strncpy() which will fill the trailing bytes
with zeros.

This was found with the future CONFIG_FORTIFY_SOURCE feature.

Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 14:41:41 -04:00
Linus Torvalds 677375cef8 Only bug fixes and cleanups for this merge window.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlkPYHkACgkQ8vlZVpUN
 gaM97ggAlOm8n/tlbcdonX/+HHjlnqcy5uYD7A9AH/JordpRzy4eqcMbxMG39p1R
 DBtjo9Y0i3iFEGajRc0h7KXDLeTBUQ/JZpR8H60MFfAQHnTowuI91eb3/6QeZiHh
 CN/2KKzpYitPIEUfEHnVeYKOfvrzR7je5hrEiAwEkPeKv7XyrNVM0LHQ/jKpbQwg
 ntIzHvxjQyo8plx/m5S4Yew7tqjYpNiq4plmyk/Vxtw2FmB/FC76UxYeadoB3EI5
 etw+bCORB0tFZO27o56kXywg+mDcp7HEtVvq9LG28oEuBDAVKNoeKEvV7SiOBlZp
 +HnqIz5Hx1UTxOlTAc10IjvEhriEuw==
 =qCDl
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt

Pull fscrypt updates from Ted Ts'o:
 "Only bug fixes and cleanups for this merge window"

* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
  fscrypt: correct collision claim for digested names
  MAINTAINERS: fscrypt: update mailing list, patchwork, and git
  ext4: clean up ext4_match() and callers
  f2fs: switch to using fscrypt_match_name()
  ext4: switch to using fscrypt_match_name()
  fscrypt: introduce helper function for filename matching
  fscrypt: avoid collisions when presenting long encrypted filenames
  f2fs: check entire encrypted bigname when finding a dentry
  ubifs: check for consistent encryption contexts in ubifs_lookup()
  f2fs: sync f2fs_lookup() with ext4_lookup()
  ext4: remove "nokey" check from ext4_lookup()
  fscrypt: fix context consistency check when key(s) unavailable
  fscrypt: Remove __packed from fscrypt_policy
  fscrypt: Move key structure and constants to uapi
  fscrypt: remove fscrypt_symlink_data_len()
  fscrypt: remove unnecessary checks for NULL operations
2017-05-08 11:40:34 -07:00
Vlad Yasevich 8403debeea vlan: Keep NETIF_F_HW_CSUM similar to other software devices
Vlan devices, like all other software devices, enable
NETIF_F_HW_CSUM feature.  However, unlike all the othe other
software devices, vlans will switch to using IP|IPV6_CSUM
features, if the underlying devices uses them.  In these situations,
checksum offload features on the vlan device can't be controlled
via ethtool.

This patch makes vlans keep HW_CSUM feature if the underlying
device supports checksum offloading.  This makes vlan devices
behave like other software devices, and restores control to the
user.

A side-effect is that some offload settings (typically UFO)
may be enabled on the vlan device while being disabled on the HW.
However, the GSO code will correctly process the packets. This
actually results in slightly better raw throughput.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 14:39:19 -04:00
Wei Wang 1b1fc3fdda tcp: make congestion control optionally skip slow start after idle
Congestion control modules that want full control over congestion
control behavior do not want the cwnd modifications controlled by
the sysctl_tcp_slow_start_after_idle code path.
So skip those code paths for CC modules that use the cong_control()
API.
As an example, those cwnd effects are not desired for the BBR congestion
control algorithm.

Fixes: c0402760f5 ("tcp: new CC hook to set sending rate with rate_sample in any CA state")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 14:37:07 -04:00
WANG Cong 82486aa6f1 ipv4: restore rt->fi for reference counting
IPv4 dst could use fi->fib_metrics to store metrics but fib_info
itself is refcnt'ed, so without taking a refcnt fi and
fi->fib_metrics could be freed while dst metrics still points to
it. This triggers use-after-free as reported by Andrey twice.

This patch reverts commit 2860583fe8 ("ipv4: Kill rt->fi") to
restore this reference counting. It is a quick fix for -net and
-stable, for -net-next, as Eric suggested, we can consider doing
reference counting for metrics itself instead of relying on fib_info.

IPv6 is very different, it copies or steals the metrics from mx6_config
in fib6_commit_metrics() so probably doesn't need a refcnt.

Decnet has already done the refcnt'ing, see dn_fib_semantic_match().

Fixes: 2860583fe8 ("ipv4: Kill rt->fi")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08 14:35:03 -04:00
Linus Torvalds dd727dad37 Add GETFSMAP support; some performance improvements for very large
file systems and for random write workloads into a preallocated file;
 bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlkPYB8ACgkQ8vlZVpUN
 gaP1HwgApoMQGegtRIbCZKUzKBJ2S6vwIoPAMz62JuwngOyWygJ1T1TliKTitG04
 XvijKpUHtEggMO/ZsUOCoyr2LzJlpVvvrJZsavEubO12LKreYMpvNraZF1GACYTb
 lIZpdWkpcEz5WnPV/PXW/dEMcSMhnKe8tbmHXMyAouSC6a55F5Wp456KF/plqkHU
 zkWTCDbEOtHThzpL8cthUL71ji62I3Op5jn/qOfKCm6/JtUlw5pYjWkRUNqqjSQE
 uQqMpqLxI/VjOdEiBPxEF6A+ZudZmoBQKY15ibWCcHUPFOPqk4RdYz6VivRI7zrg
 KrrKcdFT29MtKnRfAAoJcc0nJ4e1Iw==
 =il74
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:

 - add GETFSMAP support

 - some performance improvements for very large file systems and for
   random write workloads into a preallocated file

 - bug fixes and cleanups.

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: cleanup write flags handling from jbd2_write_superblock()
  ext4: mark superblock writes synchronous for nobarrier mounts
  ext4: inherit encryption xattr before other xattrs
  ext4: replace BUG_ON with WARN_ONCE in ext4_end_bio()
  ext4: avoid unnecessary transaction stalls during writeback
  ext4: preload block group descriptors
  ext4: make ext4_shutdown() static
  ext4: support GETFSMAP ioctls
  vfs: add common GETFSMAP ioctl definitions
  ext4: evict inline data when writing to memory map
  ext4: remove ext4_xattr_check_entry()
  ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries()
  ext4: merge ext4_xattr_list() into ext4_listxattr()
  ext4: constify static data that is never modified
  ext4: trim return value and 'dir' argument from ext4_insert_dentry()
  jbd2: fix dbench4 performance regression for 'nobarrier' mounts
  jbd2: Fix lockdep splat with generic/270 test
  mm: retry writepages() on ENOMEM when doing an data integrity writeback
2017-05-08 11:30:05 -07:00