Commit Graph

816562 Commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo c3b81a500f perf beauty msg_flags: Add missing %s lost when adding prefix suppression logic
When the prefix suppresion/enabling logic was added, I forgot to add an
extra %, which ended up chopping off the strings:

Before:

  # perf trace -e *mmsg --map-dump syscalls
  [299] = 1,
  [307] = 1,
  DNS Res~ver #3/14587 sendmmsg(106<socket:[3462393]>, 0x7f252b0fcaf0, 2, MSG_) = 2
  chronyd/1053 recvmmsg(4, 0x558542ca5740, 4, MSG_, NULL) = 1
  DNS Res~ver #2/14445 sendmmsg(106<socket:[3461475]>, 0x7f252ab09af0, 2, MSG_) = 2
  DNS Res~ver #2/14444 sendmmsg(146<socket:[3457863]>, 0x7f2521a7aaf0, 2, MSG_) = 2
  DNS Res~ver #2/14445 sendmmsg(106<socket:[3461475]>, 0x7f252ab09af0, 2, MSG_) = 2
  DNS Res~ver #3/14587 sendmmsg(148<socket:[3460636]>, 0x7f252b0fcaf0, 2, MSG_) = 2
  DNS Res~ver #2/14444 sendmmsg(146<socket:[3457863]>, 0x7f2521a7aaf0, 2, MSG_) = 2
  ^C#

After:

  # perf trace -e *mmsg --map-dump syscalls
  [299] = 1,
  [307] = 1,
  NetworkManager/17467 sendmmsg(22<socket:[3466493]>, 0x7f28927f9bb0, 2, MSG_NOSIGNAL) = 2
  pool/17478 sendmmsg(10<socket:[3466523]>, 0x7f2769f95e90, 2, MSG_NOSIGNAL) = 2
  DNS Res~ver #3/14587 sendmmsg(121<socket:[3466132]>, 0x7f252b0fcaf0, 2, MSG_NOSIGNAL) = 2
  chronyd/1053 recvmmsg(4, 0x558542ca5740, 4, MSG_DONTWAIT, NULL) = 1
  Socket Thread/17433 sendmmsg(121<socket:[3460903]>, 0x7f252668baf0, 2, MSG_NOSIGNAL) = 2
  ^C#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: c65c83ffe9 ("perf trace: Allow asking for not suppressing common string prefixes")
Link: https://lkml.kernel.org/n/tip-t2eu1rqx710k6jr4814mlzg7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 15:45:35 -03:00
Adrian Hunter ae8b887c00 perf scripts python: exported-sql-viewer.py: Add call tree
Add a new report to display a call tree. The Call Tree report is very
similar to the Context-Sensitive Call Graph, but the data is not
aggregated. Also the 'Count' column, which would be always 1, is replaced
by the 'Call Time'.

Committer testing:

  $ cat simple-retpoline.c
  /*

    https://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com

  $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
  $ objdump -d simple-retpoline
  */

  __attribute__((noinline)) int bar(void)
  {
          return -1;
  }

  int foo(void)
  {
          return bar() + 1;
  }

  __attribute__((indirect_branch("thunk"))) int main()
  {
          int (*volatile fn)(void) = foo;

          fn();
          return fn();
  }
  $
  $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
  $ perf script -i simple-retpoline.perf.data --itrace=be -s ~acme/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
  $ python ~acme/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db

And in the GUI select:

    "Reports"
      "Call Tree"

    Call Path                 | Object          | Call Time (ns) | Time (ns) | Time (%) | Branch Count | Brach Count (%) |
    > simple-retpolin
      > PID:TID
        > _start                ld-2.28.so       2193855505777      156267      100.0       10602          100.0
            unknown             unknown          2193855506010        2276        1.5           1            0.0
          > _dl_start           ld-2.28.so       2193855508286      137047       87.7       10088           95.2
          > _dl_init            ld-2.28.so       2193855645444        9142        5.9         326            3.1
          > _start              simple-retpoline 2193855654587        7457        4.8         182            1.7
            > __libc_start_main <SNIP>
              <SNIP>
              > main            simple-retpoline 2193855657493          32        0.5          12            6.7
                > foo           simple-retpoline 2193855657493          14       43.8           5           41.7
              <SNIP>

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lkml.kernel.org/n/tip-enf0w96gqzfpv4fi16pw9ovc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 15:04:16 -03:00
Adrian Hunter 254c0d820b perf scripts python: exported-sql-viewer.py: Factor out CallGraphModelBase
Factor out a base class CallGraphModelBase from CallGraphModel, so that
CallGraphModelBase can be reused.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lkml.kernel.org/n/tip-76eybebzjwvgnadkm2oufrqi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 14:56:17 -03:00
Adrian Hunter a448ba232a perf scripts python: exported-sql-viewer.py: Improve TreeModel abstraction
Instead of passing the tree root, get it from a method that can be
implemented in any derived class.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lkml.kernel.org/n/tip-ovcv28bg4mt9swk36ypdyz14@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 14:55:32 -03:00
Adrian Hunter a731cc4c99 perf scripts python: exported-sql-viewer.py: Factor out TreeWindowBase
Factor out a base class TreeWindowBase from CallGraphWindow, so that
TreeWindowBase can be reused.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lkml.kernel.org/n/tip-ifirw0c0mhkwxg6l12lk6k4p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 14:54:39 -03:00
Adrian Hunter febce6dc1f perf scripts python: export-to-postgresql.py: Export calls parent_id
Export to the 'calls' table the newly created 'parent_id' and create an
index for it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lkml.kernel.org/n/tip-eybd6fnk6j9r7g643lsideoo@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 14:53:33 -03:00
Adrian Hunter 07c5ebead8 perf scripts python: export-to-postgresql.py: Fix invalid input syntax for integer error
Fix SQL query error "invalid input syntax for integer":

  Traceback (most recent call last):
    File "tools/perf/scripts/python/export-to-postgresql.py", line 465, in <module>
      do_query(query, 'CREATE VIEW calls_view AS '
    File "tools/perf/scripts/python/export-to-postgresql.py", line 274, in do_query
      raise Exception("Query failed: " + q.lastError().text())
  Exception: Query failed: ERROR:  invalid input syntax for integer: ""
  LINE 1: ...ch_count,call_id,return_id,CASE WHEN flags=0 THEN '' WHEN fl...
                                                               ^
  (22P02) QPSQL: Unable to create query
  Error running python script tools/perf/scripts/python/export-to-postgresql.py

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Fixes: f08046cb30 ("perf thread-stack: Represent jmps to the start of a different symbol")
Link: https://lkml.kernel.org/n/tip-strfpdozrvg7bi1xzrivxzqt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 14:53:12 -03:00
Adrian Hunter 8ce9a7251d perf scripts python: export-to-sqlite.py: Export calls parent_id
Export to the 'calls' table the newly created 'parent_id'.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lkml.kernel.org/n/tip-b09oukl48rsl9azkp2wmh0bl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 14:52:29 -03:00
Adrian Hunter f435887ec0 perf db-export: Add calls parent_id to enable creation of call trees
The call_path can be used to find the parent symbol for a call but not
the exact parent call. To do that add parent_id to the call_return
export. This enables the creation of a call tree from the exported data.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lkml.kernel.org/n/tip-6j7tzdxo67cox6kan7k22oo6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 14:50:47 -03:00
Adrian Hunter 076333870c perf intel-pt: Fix divide by zero when TSC is not available
When TSC is not available, "timeless" decoding is used but a divide by
zero occurs if perf_time_to_tsc() is called.

Ensure the divisor is not zero.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org # v4.9+
Link: https://lkml.kernel.org/n/tip-1i4j0wqoc8vlbkcizqqxpsf4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 14:48:30 -03:00
Adrian Hunter c1c49204b0 perf auxtrace: Improve address filter error message when there is no DSO
The message does not indicate the possibility that the symbol is not
found because the file does not exist.

Before:

  $ perf record -e intel_pt//u --filter 'filter strcmp / strcpy @ foo ' ls
  Symbol 'strcmp' not found.
  Note that symbols must be functions.
  Failed to parse address filter: 'filter strcmp / strcpy @ foo '
  Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
  Where multiple filters are separated by space or comma.

After:

  $ perf record -e intel_pt//u --filter 'filter strcmp / strcpy @ foo ' ls
  File 'foo' not found or has no symbols.
  Symbol 'strcmp' not found.
  Note that symbols must be functions.
  Failed to parse address filter: 'filter strcmp / strcpy @ foo '
  Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
  Where multiple filters are separated by space or comma.

Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lkml.kernel.org/n/tip-dvngzxd0jkplzw1ary69dilb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 14:47:06 -03:00
Linus Torvalds a215ce8f0e IOMMU Fix for Linux v5.0-rc8
One important patch:
 
 	- Fix for a memory corruption issue in the Intel VT-d driver
 	  that triggers on hardware with deep PCI hierarchies
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJceWO/AAoJECvwRC2XARrjBtQQAJ5w8QuiuPsukHSE6Qt7OqAM
 2W0B6J4Mb2LoY08Zfh0eG59WrJZU+vZSgi/NoKU7VJJFQmlD/Y+DMFdkzCpBfGXq
 NP1lCvKlcAaKi81PnAr2LFR7jMr4n5j7kPl6DVizFuBztwZcoYXXesNcbemMJLfo
 NMCYDlo8qyAf9+ETY6UKa5rbr6lJZCGodiNQazkAsVAAs5LNlYoXZKlD4/SYElmL
 Kj7OFeKD2TgtmD1MKA6LdP3MfjP3HvI9nXO/+20LZdDxJoBIkD/4SWCrjxtzny84
 z85ypxwGZahKRZwKNSvjKMNaQJfu/S9uiN2yx4IZI8prYG5Po7lB4bI/Ol420Ze5
 oKdMjFQ4mfKswm3fBOwlMpCJr41jwY1oWjdtLwHNXX3iwCh7EoGTKzQpjqH7GAa2
 iSzlVhQS/o2FS7OcklYtmHnOwgbM1t3J4r3viAPpyVjkQh1RCnw3RBAsxM6ta3Rn
 BFVzoTdf3oDAyTVhBpfbPIGCmy3BD7KBanarYPXAKdSUn94UNj2qe6e1ePsGtESk
 Xmj9eHxO2eJXo1CndWB+kElGrdGT0WuRE2kYI9A3PzNPgR504yqyPcTUaJ7A74l6
 iubggAE6SY6QyLVTLUe25x9papjudLELlG+bAFKnjpQ9IxL34X2ExIZEYc9K9jsu
 OrgXdHKFjnhHvlMUNs/b
 =H9KF
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fix-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fix from Joerg Roedel:
 "One important fix for a memory corruption issue in the Intel VT-d
  driver that triggers on hardware with deep PCI hierarchies"

* tag 'iommu-fix-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/dmar: Fix buffer overflow during PCI bus notification
2019-03-01 09:13:04 -08:00
Linus Torvalds 2d28e01dca Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "2 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  hugetlbfs: fix races and page leaks during migration
  kasan: turn off asan-stack for clang-8 and earlier
2019-03-01 09:04:59 -08:00
Mike Kravetz cb6acd01e2 hugetlbfs: fix races and page leaks during migration
hugetlb pages should only be migrated if they are 'active'.  The
routines set/clear_page_huge_active() modify the active state of hugetlb
pages.

When a new hugetlb page is allocated at fault time, set_page_huge_active
is called before the page is locked.  Therefore, another thread could
race and migrate the page while it is being added to page table by the
fault code.  This race is somewhat hard to trigger, but can be seen by
strategically adding udelay to simulate worst case scheduling behavior.
Depending on 'how' the code races, various BUG()s could be triggered.

To address this issue, simply delay the set_page_huge_active call until
after the page is successfully added to the page table.

Hugetlb pages can also be leaked at migration time if the pages are
associated with a file in an explicitly mounted hugetlbfs filesystem.
For example, consider a two node system with 4GB worth of huge pages
available.  A program mmaps a 2G file in a hugetlbfs filesystem.  It
then migrates the pages associated with the file from one node to
another.  When the program exits, huge page counts are as follows:

  node0
  1024    free_hugepages
  1024    nr_hugepages

  node1
  0       free_hugepages
  1024    nr_hugepages

  Filesystem                         Size  Used Avail Use% Mounted on
  nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool

That is as expected.  2G of huge pages are taken from the free_hugepages
counts, and 2G is the size of the file in the explicitly mounted
filesystem.  If the file is then removed, the counts become:

  node0
  1024    free_hugepages
  1024    nr_hugepages

  node1
  1024    free_hugepages
  1024    nr_hugepages

  Filesystem                         Size  Used Avail Use% Mounted on
  nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool

Note that the filesystem still shows 2G of pages used, while there
actually are no huge pages in use.  The only way to 'fix' the filesystem
accounting is to unmount the filesystem

If a hugetlb page is associated with an explicitly mounted filesystem,
this information in contained in the page_private field.  At migration
time, this information is not preserved.  To fix, simply transfer
page_private from old to new page at migration time if necessary.

There is a related race with removing a huge page from a file and
migration.  When a huge page is removed from the pagecache, the
page_mapping() field is cleared, yet page_private remains set until the
page is actually freed by free_huge_page().  A page could be migrated
while in this state.  However, since page_mapping() is not set the
hugetlbfs specific routine to transfer page_private is not called and we
leak the page count in the filesystem.

To fix that, check for this condition before migrating a huge page.  If
the condition is detected, return EBUSY for the page.

Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com
Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com
Fixes: bcc5422230 ("mm: hugetlb: introduce page_huge_active")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
[mike.kravetz@oracle.com: v2]
  Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com
[mike.kravetz@oracle.com: update comment and changelog]
  Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-01 09:02:33 -08:00
Arnd Bergmann 6baec880d7 kasan: turn off asan-stack for clang-8 and earlier
Building an arm64 allmodconfig kernel with clang results in over 140
warnings about overly large stack frames, the worst ones being:

  drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare'
  drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
  drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread'
  drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe'
  drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd'
  drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo'

None of these happen with gcc today, and almost all of these are the
result of a single known issue in llvm.  Hopefully it will eventually
get fixed with the clang-9 release.

In the meantime, the best idea I have is to turn off asan-stack for
clang-8 and earlier, so we can produce a kernel that is safe to run.

I have posted three patches that address the frame overflow warnings
that are not addressed by turning off asan-stack, so in combination with
this change, we get much closer to a clean allmodconfig build, which in
turn is necessary to do meaningful build regression testing.

It is still possible to turn on the CONFIG_ASAN_STACK option on all
versions of clang, and it's always enabled for gcc, but when
CONFIG_COMPILE_TEST is set, the option remains invisible, so
allmodconfig and randconfig builds (which are normally done with a
forced CONFIG_COMPILE_TEST) will still result in a mostly clean build.

Link: http://lkml.kernel.org/r/20190222222950.3997333-1-arnd@arndb.de
Link: https://bugs.llvm.org/show_bug.cgi?id=38809
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Qian Cai <cai@lca.pw>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-01 09:02:33 -08:00
Linus Torvalds 6357c8127b drm amdgfx, bochs and one core fix
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJceMUnAAoJEAx081l5xIa+n40P/iFOvzFXDdCaEf21XNdtH1Jk
 gjMBeMwxKdrQCsknaQhADtX/IwoSoUyu50jRhkdTQrhCwprkFFnq3LBt11QmYv/k
 Wj94D0yjIQAgnt+/rXi8ZKvH+MDZxmwGR+l9gRfKLywtreNaeLqjn875wkhzy6mB
 DgyfB4Xuyrpgg9K0yDbVdSQrxmHJuEx3VYVF/tvbL8HLFbxjhP2PV3eYhNxGC9Kj
 Ft1ulGZCfT8xQnxc1n/Bb+CHkQX9ZH2sVv8uezkaItEA7okboU71RCkMd5MjZ4AD
 HcmJEEpYv2dJPdGpnjrn2xfHKB/HEnBaGeFAIsfmhat5paYh7DV9gATFwQ8ZcCQ8
 HzX5/nD7rFA8G9gGjjVXYzCgX6egNtY6Oj35/IHfFJnKoqLeJvv7qkpX4OMUdcP8
 dB5TR+S6bD1VCDj6Wi7lFfA9uRBkpLbyDUhDnFS/fKvD0Ecw/BopwRnhTPn+lqtA
 50//WT5QZGQ+aRzzSxlIyfXJAyZBT2lr09r97G5noE0bal5iT8pc0xRiz6EHVrAd
 y9J2yLOpLrPym1av5RB5Nnj7Pohup9WPf29T7T5XbQ8kxuk+sK5EJt8NqD8KIyU6
 3RUG3cWlJGlWmX+V9GiMV/DnrfSZMnif0HUhxCxsDPmI0BHbQ3kn8vWtw9SAbHN2
 nFe8B5Ok057cb3FR8uYC
 =TtJZ
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Three final fixes, one for a feature that is new in this kernel, one
  bochs fix for qemu riscv and one atomic modesetting fix.

  I've left a few of the other late fixes until next as I didn't want to
  throw in anything that wasn't really necessary"

* tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drm:
  drm/bochs: Fix the ID mismatch error
  drm: Block fb changes for async plane updates
  drm/amd/display: Use vrr friendly pageflip throttling in DC.
2019-03-01 08:44:11 -08:00
Martin Schwidefsky 9fe567d09f s390/dasd: fix read device characteristic with CONFIG_VMAP_STACK=y
The dasd_eckd_restore_device() function calls dasd_generic_read_dev_chars
with a temporary buffer on the stack. With CONFIG_VMAP_STACK=y this is
a vmalloc address but dasd_generic_restore_device() uses the address of
the buffer as I/O address. Circumvent this by using the already allocated
cqr->data buffer for the RDC data.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-03-01 16:23:56 +01:00
Martin Schwidefsky c8e8ed386a s390/suspend: fix prefix register reset in swsusp_arch_resume
The reset of the prefix to zero in swsusp_arch_resume uses a 4 byte stack
slot. With CONFIG_VMAP_STACK=y this is now in the vmalloc area, this works
only with DAT enabled. Move the DAT disable in swsusp_arch_resume after
the prefix reset.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-03-01 16:23:27 +01:00
Peng Sun 352d20d611 bpf: drop refcount if bpf_map_new_fd() fails in map_create()
In bpf/syscall.c, map_create() first set map->usercnt to 1, a file
descriptor is supposed to return to userspace. When bpf_map_new_fd()
fails, drop the refcount.

Fixes: bd5f5f4ecb ("bpf: Add BPF_MAP_GET_FD_BY_ID")
Signed-off-by: Peng Sun <sironhide0null@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 16:04:29 +01:00
Arnd Bergmann 6089e65618 Qualcomm ARM64 Fixes for 5.0-rc8
* Fix TZ memory area size to avoid crashes during boot
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcdiEpAAoJEFKiBbHx2RXVQLAP/i9ZU6p6TD8KVYVnxbec5pIz
 bB6/gNo0VfM4/4CZmznJ0FQcGbxGekaMSCtNJDlgfzIM/VC2nXdlPzZi/Lo8tubX
 NgP4KYVccqnyA7Ppb/7lMaITsclljofdSaGVX4Vj4o60DzrEkA8reauJ4a+X8GDF
 d0kh006/Q0aRD9rrIV9ZaRWE8mzThbuCwMDZt6iV5kUDGKK1qleJV7wkBlSYKMTg
 rB+9ObD02kZxw7FWNcf0LH/4Ans9coPkM6I0CE4MCH5ozEDg+AN7RRXTkeniXYdi
 kljx2BilgN0SyV9zuGLwYW/FRXcMOV1Rmmdw2CUU84wNnnTkcLG0O43fqR8S8nBb
 cF0Wd9zPHlXWhIKa5twgFubvluZjsaFOciThixtAELwU4wrxq9h5NUxW3XnYQVUO
 riEIPUIBl8e3wY2SHk0HUBvHUcXr8m033V0oKN5tIO69OuWRROMBCNp0Su6JraSs
 7hC2wWvo+wtqQIC3JHszSRyfA1ceF7ldYit5dRLJPycMk49fgzXF+Coel7W0ifV5
 PV62/1SQXkajeRywdamFCT6ccdRQcqVzKSUrQdD0U0+Z5hC4BjgK9TOMHSak6YEY
 iKt4PwpdE6zdenybmflUw7wcmdt2plrYHGJYLelFE3TKZMhV1x33LtNeb2FPXSZo
 u/RxwNuDNdyw9kGUk1+u
 =iA1y
 -----END PGP SIGNATURE-----

Merge tag 'qcom-fixes-for-5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into arm/fixes

Qualcomm ARM64 Fixes for 5.0-rc8

* Fix TZ memory area size to avoid crashes during boot

* tag 'qcom-fixes-for-5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux:
  arm64: dts: qcom: msm8998: Extend TZ reserved memory area
2019-03-01 15:08:16 +01:00
Jin Yao 284c4e18f5 perf time-utils: Refactor time range parsing code
Jiri points out that we don't need any time checking and time string
parsing if the --time option is not set. That makes sense.

This patch refactors the time range parsing code, move the duplicated
code from perf report and perf script to time_utils and check if --time
option is set before parsing the time string. This patch is no logic
change expected. So the usage of --time is same as before.

For example:

Select the first and second 10% time slices:
  perf report --time 10%/1,10%/2
  perf script --time 10%/1,10%/2

Select the slices from 0% to 10% and from 30% to 40%:
  perf report --time 0%-10%,30%-40%
  perf script --time 0%-10%,30%-40%

Select the time slices from timestamp 3971 to 3973
  perf report --time 3971,3973
  perf script --time 3971,3973

Committer testing:

Using the above examples, check before and after to see if it remains
the same:

  $ perf record -F 10000 -- find . -name "*.[ch]" -exec cat {} + > /dev/null
  [ perf record: Woken up 3 times to write data ]
  [ perf record: Captured and wrote 1.626 MB perf.data (42392 samples) ]
  $
  $ perf report --time 10%/1,10%/2 > /tmp/report.before.1
  $ perf script --time 10%/1,10%/2 > /tmp/script.before.1
  $ perf report --time 0%-10%,30%-40% > /tmp/report.before.2
  $ perf script --time 0%-10%,30%-40% > /tmp/script.before.2
  $ perf report --time 180457.375844,180457.377717 > /tmp/report.before.3
  $ perf script --time 180457.375844,180457.377717 > /tmp/script.before.3

For example, the 3rd test produces this slice:

  $ cat /tmp/script.before.3
        cat  3147 180457.375844:   2143 cycles:uppp:      7f79362590d9 cfree@GLIBC_2.2.5+0x9 (/usr/lib64/libc-2.28.so)
        cat  3147 180457.375986:   2245 cycles:uppp:      558b70f3d86e [unknown] (/usr/bin/cat)
        cat  3147 180457.376012:   2164 cycles:uppp:      7f7936257430 _int_malloc+0x8c0 (/usr/lib64/libc-2.28.so)
        cat  3147 180457.376140:   2921 cycles:uppp:      558b70f3a554 [unknown] (/usr/bin/cat)
        cat  3147 180457.376296:   2844 cycles:uppp:      7f7936258abe malloc+0x4e (/usr/lib64/libc-2.28.so)
        cat  3147 180457.376431:   2717 cycles:uppp:      558b70f3b0ca [unknown] (/usr/bin/cat)
        cat  3147 180457.376667:   2630 cycles:uppp:      558b70f3d86e [unknown] (/usr/bin/cat)
        cat  3147 180457.376795:   2442 cycles:uppp:      7f79362bff55 read+0x15 (/usr/lib64/libc-2.28.so)
        cat  3147 180457.376927:   2376 cycles:uppp:  ffffffff9aa00163 [unknown] ([unknown])
        cat  3147 180457.376954:   2307 cycles:uppp:      7f7936257438 _int_malloc+0x8c8 (/usr/lib64/libc-2.28.so)
        cat  3147 180457.377116:   3091 cycles:uppp:      7f7936258a70 malloc+0x0 (/usr/lib64/libc-2.28.so)
        cat  3147 180457.377362:   2945 cycles:uppp:      558b70f3a3b0 [unknown] (/usr/bin/cat)
        cat  3147 180457.377517:   2727 cycles:uppp:      558b70f3a9aa [unknown] (/usr/bin/cat)
  $

Install 'coreutils-debuginfo' to see cat's guts (symbols), but then, the
above chunk translates into this 'perf report' output:

  $ cat /tmp/report.before.3
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 13  of event 'cycles:uppp' (time slices: 180457.375844,180457.377717)
  # Event count (approx.): 33552
  #
  # Overhead  Command  Shared Object     Symbol
  # ........  .......  ................  ......................
  #
      17.69%  cat      libc-2.28.so      [.] malloc
      14.53%  cat      cat               [.] 0x000000000000586e
      13.33%  cat      libc-2.28.so      [.] _int_malloc
       8.78%  cat      cat               [.] 0x00000000000023b0
       8.71%  cat      cat               [.] 0x0000000000002554
       8.13%  cat      cat               [.] 0x00000000000029aa
       8.10%  cat      cat               [.] 0x00000000000030ca
       7.28%  cat      libc-2.28.so      [.] read
       7.08%  cat      [unknown]         [k] 0xffffffff9aa00163
       6.39%  cat      libc-2.28.so      [.] cfree@GLIBC_2.2.5

  #
  # (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
  #
  $

Now lets see after applying this patch, nothing should change:

  $ perf report --time 10%/1,10%/2 > /tmp/report.after.1
  $ perf script --time 10%/1,10%/2 > /tmp/script.after.1
  $ perf report --time 0%-10%,30%-40% > /tmp/report.after.2
  $ perf script --time 0%-10%,30%-40% > /tmp/script.after.2
  $ perf report --time 180457.375844,180457.377717 > /tmp/report.after.3
  $ perf script --time 180457.375844,180457.377717 > /tmp/script.after.3
  $ diff -u /tmp/report.before.1 /tmp/report.after.1
  $ diff -u /tmp/script.before.1 /tmp/script.after.1
  $ diff -u /tmp/report.before.2 /tmp/report.after.2
  --- /tmp/report.before.2	2019-03-01 11:01:53.526094883 -0300
  +++ /tmp/report.after.2	2019-03-01 11:09:18.231770467 -0300
  @@ -352,5 +352,5 @@

   #
  -# (Tip: Generate a script for your data: perf script -g <lang>)
  +# (Tip: Treat branches as callchains: perf report --branch-history)
   #
  $ diff -u /tmp/script.before.2 /tmp/script.after.2
  $ diff -u /tmp/report.before.3 /tmp/report.after.3
  --- /tmp/report.before.3	2019-03-01 11:03:08.890045588 -0300
  +++ /tmp/report.after.3	2019-03-01 11:09:40.660224002 -0300
  @@ -22,5 +22,5 @@

   #
  -# (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
  +# (Tip: List events using substring match: perf list <keyword>)
   #
  $ diff -u /tmp/script.before.3 /tmp/script.after.3
  $

Cool, just the 'perf report' tips changed, QED.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1551435186-6008-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 11:03:53 -03:00
Arnd Bergmann 36baa6ed1c OP-TEE driver
- add missing of_node_put after of_device_is_available
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCgA4FiEEcK3MsDvGvFp6zV9ztbC4QZeP7NMFAlx30DMaHGplbnMud2lr
 bGFuZGVyQGxpbmFyby5vcmcACgkQtbC4QZeP7NNDtQ//WLr0R5ltH7YdR11b8f83
 6rLlYwRXULRdNUEHwm5/tIWsq2poUmNXUiYkrJ2j3PcBq1C9bLQbBubx1ZM9ukfc
 rS/vRXOvzNFUhnQUasaGy8mXJyMB0orFcOUlhCEMEnhoLuOAwqIVuoCbu+V9dim0
 AhL79ytaYS6dyubuRS5GuSRSkOM+0hbkm0l4s5N4Q0eT1zlGtFToBWV7od0+F64Q
 v4vsYIl04TgLtbPJpZ6ZOJ+LfuRE3OLkqAunEQsxiS9IwMZ0Ypx6ubZ4DlbCK/zk
 /lFOBHo5oI0yw8VQ20Q+fELiBNSBIU+SV0wTI6qJyY1cpfCAnddUrk2rXdEiBCzF
 1/ibjSRmakFCxmLDuSm1FSaZEIGfV7ZylUyyuPU7NMKEzRCFRnJ9re24yuhIHdVZ
 VaYFYcezn+lREbi/h5MOUY5BW7zG2QlmOkTnxqyWNiNhcrB/A+DJ5DowEPYWjlEP
 Ii+d8NS5aqhQtyvDZdB/M+WP1jYlqQncfIemXW8HhPHDIi6WHarBAojitoHmu224
 grcDYITAPYugRmZ3pzoX5LFjF3pr5nnbjmGERgHe5p935naw/x97PA/ixLePkpyc
 URh3KttRjCYP71FVhaYScLgHTT25dexvGQaJLBBR0zPQiLxLrj8ma46tpxPcLQjp
 6Pd0kWWTlnQw2ziz2jgn5sI=
 =AZfB
 -----END PGP SIGNATURE-----

Merge tag 'tee-fix-for-v5.0' of https://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes

OP-TEE driver
- add missing of_node_put after of_device_is_available

* tag 'tee-fix-for-v5.0' of https://git.linaro.org/people/jens.wiklander/linux-tee:
  tee: optee: add missing of_node_put after of_device_is_available
2019-03-01 14:59:40 +01:00
Gustavo A. R. Silva 10c3405f06 perf: Mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warning:

  kernel/events/core.c: In function ‘perf_event_parse_addr_filter’:
  kernel/events/core.c:9154:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
      kernel = 1;
      ~~~~~~~^~~
  kernel/events/core.c:9156:3: note: here
     case IF_SRC_FILEADDR:
     ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gustavo A. R. Silva <gustavo@embeddedor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190212205430.GA8446@embeddedor
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01 10:54:00 -03:00
Florian Westphal db8ab38880 netfilter: nf_tables: merge ipv4 and ipv6 nat chain types
Merge the ipv4 and ipv6 nat chain type. This is the last
missing piece which allows to provide inet family support
for nat in a follow patch.

The kconfig knobs for ipv4/ipv6 nat chain are removed, the
nat chain type will be built unconditionally if NFT_NAT
expression is enabled.

Before:
   text	   data	    bss	    dec	    hex	filename
   1576     896       0    2472     9a8 nft_chain_nat_ipv4.ko
   1697     896       0    2593     a21 nft_chain_nat_ipv6.ko

After:
   text	   data	    bss	    dec	    hex	filename
   1832     896       0    2728     aa8 nft_chain_nat.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:36:59 +01:00
Florian Westphal a9ce849e78 netfilter: nf_tables: nat: merge nft_masq protocol specific modules
The family specific masq modules are way too small to warrant
an extra module, just place all of them in nft_masq.

before:
  text	   data	    bss	    dec	    hex	filename
   1001	    832	      0	   1833	    729	nft_masq.ko
    766	    896	      0	   1662	    67e	nft_masq_ipv4.ko
    764	    896	      0	   1660	    67c	nft_masq_ipv6.ko

after:
   2010	    960	      0	   2970	    b9a	nft_masq.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:36:59 +01:00
Florian Westphal c78efc99c7 netfilter: nf_tables: nat: merge nft_redir protocol specific modules
before:
 text	   data	    bss	    dec	    hex	filename
 990	    832	      0	   1822	    71e nft_redir.ko
 697	    896	      0	   1593	    639 nft_redir_ipv4.ko
 713	    896	      0	   1609	    649	nft_redir_ipv6.ko

after:
 text	   data	    bss	    dec	    hex	filename
 1910	    960	      0	   2870	    b36	nft_redir.ko

size is reduced, all helpers from nft_redir.ko can be made static.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:36:58 +01:00
Sami Tolvanen 20fdaf6e1e netfilter: xt_IDLETIMER: fix sysfs callback function type
Use struct device_attribute instead of struct idletimer_tg_attr, and
the correct callback function type to avoid indirect call mismatches
with Control Flow Integrity checking.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:36:57 +01:00
Li RongQing 2e7b162c5e netfilter: nf_conntrack: ensure that CONNTRACK_LOCKS is power of 2
CONNTRACK_LOCKS is divisor when computer array index, if it is power of
2, compiler will optimize modulo operation as bitwise AND, or else
modulo will lower performance.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:36:46 +01:00
Li RongQing a9f5e78c40 netfilter: nf_tables: check the result of dereferencing base_chain->stats
Check the result of dereferencing base_chain->stats, instead of result
of this_cpu_ptr with NULL.

base_chain->stats maybe be changed to NULL when a chain is updated and a
new NULL counter can be attached.

And we do not need to check returning of this_cpu_ptr since
base_chain->stats is from percpu allocator if it is non-NULL,
this_cpu_ptr returns a valid value.

And fix two sparse error by replacing rcu_access_pointer and
rcu_dereference with READ_ONCE under rcu_read_lock.

Thanks for Eric's help to finish this patch.

Fixes: 009240940e ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:34:24 +01:00
David Ahern cd6428988b netfilter: bridge: Don't sabotage nf_hook calls for an l3mdev slave
Followup to a173f066c7 ("netfilter: bridge: Don't sabotage nf_hook
calls from an l3mdev"). Some packets (e.g., ndisc) do not have the skb
device flipped to the l3mdev (e.g., VRF) device. Update ip_sabotage_in
to not drop packets for slave devices too. Currently, neighbor
solicitation packets for 'dev -> bridge (addr) -> vrf' setups are getting
dropped. This patch enables IPv6 communications for bridges with an
address that are enslaved to a VRF.

Fixes: 73e20b761a ("net: vrf: Add support for PREROUTING rules on vrf device")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:28:45 +01:00
Xin Long f52a40fb41 ipvs: get sctphdr by sctphoff in sctp_csum_check
sctp_csum_check() is called by sctp_s/dnat_handler() where it calls
skb_make_writable() to ensure sctphdr to be linearized.

So there's no need to get sctphdr by calling skb_header_pointer()
in sctp_csum_check().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:28:44 +01:00
Li RongQing 11d4dd0b20 netfilter: convert the proto argument from u8 to u16
The proto in struct xt_match and struct xt_target is u16, when
calling xt_check_target/match, their proto argument is u8,
and will cause truncation, it is harmless to ip packet, since
ip proto is u8

if a etable's match/target has proto that is u16, will cause
the check failure.

and convert be16 to short in bridge/netfilter/ebtables.c

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:28:43 +01:00
wenxu 3e511d5652 netfilter: nft_tunnel: Add dst_cache support
The metadata_dst does not initialize the dst_cache field, this causes
problems to ip_md_tunnel_xmit() since it cannot use this cache, hence,
Triggering a route lookup for every packet.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:25:06 +01:00
Florian Westphal be0502a3f2 netfilter: conntrack: tcp: only close if RST matches exact sequence
TCP resets cause instant transition from established to closed state
provided the reset is in-window.  Endpoints that implement RFC 5961
require resets to match the next expected sequence number.
RST segments that are in-window (but that do not match RCV.NXT) are
ignored, and a "challenge ACK" is sent back.

Main problem for conntrack is that its a middlebox, i.e.  whereas an end
host might have ACK'd SEQ (and would thus accept an RST with this
sequence number), conntrack might not have seen this ACK (yet).

Therefore we can't simply flag RSTs with non-exact match as invalid.

This updates RST processing as follows:

1. If the connection is in a state other than ESTABLISHED, nothing is
   changed, RST is subject to normal in-window check.

2. If the RSTs sequence number either matches exactly RCV.NXT,
   connection state moves to CLOSE.

3. The same applies if the RST sequence number aligns with a previous
   packet in the same direction.

In all other cases, the connection remains in ESTABLISHED state.
If the normal-in-window check passes, the timeout will be lowered
to that of CLOSE.

If the peer sends a challenge ack, connection timeout will be reset.

If the challenge ACK triggers another RST (RST was valid after all),
this 2nd RST will match expected sequence and conntrack state changes to
CLOSE.

If no challenge ACK is received, the connection will time out after
CLOSE seconds (10 seconds by default), just like without this patch.

Packetdrill test case:

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Receive a segment.
0.210 < P. 1:1001(1000) ack 1 win 46
0.210 > . 1:1(0) ack 1001

// Application writes 1000 bytes.
0.250 write(4, ..., 1000) = 1000
0.250 > P. 1:1001(1000) ack 1001

// First reset, old sequence. Conntrack (correctly) considers this
// invalid due to failed window validation (regardless of this patch).
0.260 < R  2:2(0) ack 1001 win 260

// 2nd reset, but too far ahead sequence.  Same: correctly handled
// as invalid.
0.270 < R 99990001:99990001(0) ack 1001 win 260

// in-window, but not exact sequence.
// Current Linux kernels might reply with a challenge ack, and do not
// remove connection.
// Without this patch, conntrack state moves to CLOSE.
// With patch, timeout is lowered like CLOSE, but connection stays
// in ESTABLISHED state.
0.280 < R 1010:1010(0) ack 1001 win 260

// Expect challenge ACK
0.281 > . 1001:1001(0) ack 1001 win 501

// With or without this patch, RST will cause connection
// to move to CLOSE (sequence number matches)
// 0.282 < R 1001:1001(0) ack 1001 win 260

// ACK
0.300 < . 1001:1001(0) ack 1001 win 257

// more data could be exchanged here, connection
// is still established

// Client closes the connection.
0.610 < F. 1001:1001(0) ack 1001 win 260
0.650 > . 1001:1001(0) ack 1002

// Close the connection without reading outstanding data
0.700 close(4) = 0

// so one more reset.  Will be deemed acceptable with patch as well:
// connection is already closing.
0.701 > R. 1001:1001(0) ack 1002 win 501
// End packetdrill test case.

With patch, this generates following conntrack events:
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]

Without patch, first RST moves connection to close, whereas socket state
does not change until FIN is received.
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]

Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:19:31 +01:00
Andrea Claudi f25a9b8515 ipvs: change some data types from int to bool
Change the data type of the following variables from int to bool
across ipvs code:

  - found
  - loop
  - need_full_dest
  - need_full_svc
  - payload_csum

Also change the following functions to use bool full_entry param
instead of int:

  - ip_vs_genl_parse_dest()
  - ip_vs_genl_parse_service()

This patch does not change any functionality but makes the source
code slightly easier to read.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01 14:19:04 +01:00
Jiong Wu d4721339dc mmc:fix a bug when max_discard is 0
The original purpose of the code I fix is to replace max_discard with
max_trim if max_trim is less than max_discard. When max_discard is 0
we should replace max_discard with max_trim as well, because
max_discard equals 0 happens only when the max_do_calc_max_discard
process is overflowed, so if mmc_can_trim(card) is true, max_discard
should be replaced by an available max_trim.
However, in the original code, there are two lines of code interfere
the right process.
1) if (max_discard && mmc_can_trim(card))
when max_discard is 0, it skips the process checking if max_discard
needs to be replaced with max_trim.
2) if (max_trim < max_discard)
the condition is false when max_discard is 0. it also skips the process
that replaces max_discard with max_trim, in fact, we should replace the
0-valued max_discard with max_trim.

Signed-off-by: Jiong Wu <Lohengrin1024@gmail.com>
Fixes: b305882fbc (mmc: core: optimize mmc_calc_max_discard)
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-03-01 09:50:10 +01:00
Vasily Gorbik 6d85dac2ab s390: warn about clearing als implied facilities
Add a warning about removing required architecture level set facilities
via "facilities=" command line option.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-03-01 08:00:42 +01:00
Vasily Gorbik b5e804598d s390: allow overriding facilities via command line
Add "facilities=" command line option which allows to override
facility bits returned by stfle. The main purpose of that is debugging
aids which allows to test specific kernel behaviour depending on
specific facilities presence. It also affects CPU alternatives.

"facilities=" command line option format is comma separated list of
integer values to be additionally set or cleared (if value is starting
with "!"). Values ranges are also supported. e.g.:

facilities=!130-160,159,167-169

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-03-01 08:00:41 +01:00
Vasily Gorbik d8901f2b2d s390: clean up redundant facilities list setup
Facilities list in the lowcore is initially set up by verify_facilities
from als.c and later initializations are redundant, so cleaning them up.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-03-01 08:00:39 +01:00
Vasily Gorbik 96d3b64b52 s390/als: remove duplicated in-place implementation of stfle
Reuse __stfle call instead of in-place implementation. __stfle is using
memcpy and memset functions but they are safe to use, since mem.S is
built with -march=z900.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-03-01 08:00:37 +01:00
Daniel Borkmann 3860d38f28 Merge branch 'bpf-dedup-fixes'
Andrii Nakryiko says:

====================
This patchset fixes a bug in btf_dedup() algorithm, which under specific
hash collision causes infinite loop. It also exposes ability to tune BTF
deduplication table size, with double purpose of allowing applications to
adjust size according to the size of BTF data, as well as allowing a simple
way to force hash collisions by setting table size to 1.

- Patch #1 fixes bug in btf_dedup testing code that's checking strings
- Patch #2 fixes pointer arg formatting in btf.h
- Patch #3 adds option to specify custom dedup table size
- Patch #4 fixes aforementioned bug in btf_dedup
- Patch #5 adds test that validates the fix

v1->v2:
- remove "Fixes" from formatting change patch
- extract roundup_pow2_max func for dedup table size
- btf_equal_struct -> btf_shallow_equal_struct
- explain in comment why we can't rely on just btf_dedup_is_equiv
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 01:31:49 +01:00
Andrii Nakryiko 7c7a4890c8 selftests/bpf: add btf_dedup test of FWD/STRUCT resolution
This patch adds a btf_dedup test exercising logic of STRUCT<->FWD
resolution and validating that STRUCT is not resolved to a FWD. It also
forces hash collisions, forcing both FWD and STRUCT to be candidates for
each other. Previously this condition caused infinite loop due to FWD
pointing to STRUCT and STRUCT pointing to its FWD.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 01:31:48 +01:00
Andrii Nakryiko 91097fbee4 btf: fix bug with resolving STRUCT/UNION into corresponding FWD
When checking available canonical candidates for struct/union algorithm
utilizes btf_dedup_is_equiv to determine if candidate is suitable. This
check is not enough when candidate is corresponding FWD for that
struct/union, because according to equivalence logic they are
equivalent. When it so happens that FWD and STRUCT/UNION end in hashing
to the same bucket, it's possible to create remapping loop from FWD to
STRUCT and STRUCT to same FWD, which will cause btf_dedup() to loop
forever.

This patch fixes the issue by additionally checking that type and
canonical candidate are strictly equal (utilizing btf_equal_struct).

Fixes: d5caef5b56 ("btf: add BTF types deduplication algorithm")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 01:31:48 +01:00
Andrii Nakryiko 51edf5f6e0 btf: allow to customize dedup hash table size
Default size of dedup table (16k) is good enough for most binaries, even
typical vmlinux images. But there are cases of binaries with huge amount
of BTF types (e.g., allyesconfig variants of kernel), which benefit from
having bigger dedup table size to lower amount of unnecessary hash
collisions. Tools like pahole, thus, can tune this parameter to reach
optimal performance.

This change also serves double purpose of allowing tests to force hash
collisions to test some corner cases, used in follow up patch.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 01:31:47 +01:00
Andrii Nakryiko 1baabdc108 libbpf: fix formatting for btf_ext__get_raw_data
Fix invalid formatting of pointer arg.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 01:31:47 +01:00
Andrii Nakryiko 8054d51f76 selftests/bpf: fix btf_dedup testing code
btf_dedup testing code doesn't account for length of struct btf_header
when calculating the start of a string section. This patch fixes this
problem.

Fixes: 49b57e0d01 ("tools/bpf: remove btf__get_strings() superseded by raw data API")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 01:31:47 +01:00
Dan Carpenter 3d8669e637 tools/libbpf: signedness bug in btf_dedup_ref_type()
The "ref_type_id" variable needs to be signed for the error handling
to work.

Fixes: d5caef5b56 ("btf: add BTF types deduplication algorithm")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 00:56:06 +01:00
Daniel Borkmann 74b3881908 Merge branch 'bpf-samples-improvements'
Jakub Kicinski says:

====================
This set is next part of a quest to get rid of the bpf_load
ELF loader.  It fixes some minor issues with the samples and
starts the conversion.

First patch fixes ping invocations, ping localhost defaults
to IPv6 on modern setups. Next load_sock_ops sample is removed
and users are directed towards using bpftool directly.

Patch 4 removes the use of bpf_load from samples which don't
need the auto-attachment functionality at all.

Patch 5 improves symbol counting in libbpf, it's not currently
an issue but it will be when anyone adds a symbol with a long
name. Let's make sure that person doesn't have to spend time
scratching their head and wondering why .a and .so symbol
counts don't match.

v2: - specify prog_type where possible (Andrii).
====================

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 00:53:47 +01:00
Jakub Kicinski 771744f9dc tools: libbpf: make sure readelf shows full names in build checks
readelf truncates its output by default to attempt to make it more
readable.  This can lead to function names getting aliased if they
differ late in the string.  Use --wide parameter to avoid
truncation.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 00:53:46 +01:00
Jakub Kicinski 1a9b268c90 samples: bpf: use libbpf where easy
Some samples don't really need the magic of bpf_load,
switch them to libbpf.

v2: - specify program types.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 00:53:45 +01:00