Commit Graph

724216 Commits

Author SHA1 Message Date
Wei Wang 4512c43eac ipv6: remove null_entry before adding default route
In the current code, when creating a new fib6 table, tb6_root.leaf gets
initialized to net->ipv6.ip6_null_entry.
If a default route is being added with rt->rt6i_metric = 0xffffffff,
fib6_add() will add this route after net->ipv6.ip6_null_entry. As
null_entry is shared, it could cause problem.

In order to fix it, set fn->leaf to NULL before calling
fib6_add_rt2node() when trying to add the first default route.
And reset fn->leaf to null_entry when adding fails or when deleting the
last default route.

syzkaller reported the following issue which is fixed by this commit:

WARNING: suspicious RCU usage
4.15.0-rc5+ #171 Not tainted
-----------------------------
net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
4 locks held by swapper/0/0:
 #0:  ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] lockdep_copy_map include/linux/lockdep.h:178 [inline]
 #0:  ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310
 #1:  (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] spin_lock_bh include/linux/spinlock.h:315 [inline]
 #1:  (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007
 #2:  (rcu_read_lock){....}, at: [<0000000091db762d>] __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560
 #3:  (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] spin_lock_bh include/linux/spinlock.h:315 [inline]
 #3:  (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:53
 lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
 fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701
 fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892
 fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815
 fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863
 fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933
 __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949
 fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline]
 fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016
 fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033
 call_timer_fn+0x228/0x820 kernel/time/timer.c:1320
 expire_timers kernel/time/timer.c:1357 [inline]
 __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660
 run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
 invoke_softirq kernel/softirq.c:365 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:540 [inline]
 smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904
 </IRQ>

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 66f5d6ce53 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 12:33:55 -05:00
David S. Miller 22dd8e6bd8 Merge branch 'Ether-fixes-for-the-SolutionEngine771x-boards'
Sergei Shtylyov says:

====================
Ether fixes for the SolutionEngine771x boards

Here's the series of 2 patches against Linus' repo. This series should
(hoplefully) fix the Ether support on the SolutionEngine771x boards...

[1/2] SolutionEngine771x: fix Ether platform data
[2/2] SolutionEngine771x: add Ether TSU resource
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 12:21:14 -05:00
Sergei Shtylyov f9a531d673 SolutionEngine771x: add Ether TSU resource
After the  Ether platform data is fixed, the driver probe() method would
still fail since the 'struct sh_eth_cpu_data' corresponding  to SH771x
indicates the presence of TSU but the memory resource for it is absent.
Add the missing TSU resource  to both Ether devices and fix the harmless
off-by-one error in the main memory resources, while at it...

Fixes: 4986b99688 ("net: sh_eth: remove the SH_TSU_ADDR")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 12:21:14 -05:00
Sergei Shtylyov 195e2addbc SolutionEngine771x: fix Ether platform data
The 'sh_eth' driver's probe() method would fail  on the SolutionEngine7710
board and crash on SolutionEngine7712 board  as the platform code is
hopelessly behind the driver's platform data --  it passes the PHY address
instead of 'struct sh_eth_plat_data *'; pass the latter to the driver in
order to fix the bug...

Fixes: 71557a37ad ("[netdrvr] sh_eth: Add SH7619 support")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 12:21:14 -05:00
Mike Rapoport 2fdd18118d docs-rst: networking: wire up msg_zerocopy
Fix the following 'make htmldocs' complaint:

Documentation/networking/msg_zerocopy.rst:: WARNING: document isn't included in any toctree.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 12:18:51 -05:00
Nicolai Stange 20b50d7997 net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()
Commit 8f659a03a0 ("net: ipv4: fix for a race condition in
raw_sendmsg") fixed the issue of possibly inconsistent ->hdrincl handling
due to concurrent updates by reading this bit-field member into a local
variable and using the thus stabilized value in subsequent tests.

However, aforementioned commit also adds the (correct) comment that

  /* hdrincl should be READ_ONCE(inet->hdrincl)
   * but READ_ONCE() doesn't work with bit fields
   */

because as it stands, the compiler is free to shortcut or even eliminate
the local variable at its will.

Note that I have not seen anything like this happening in reality and thus,
the concern is a theoretical one.

However, in order to be on the safe side, emulate a READ_ONCE() on the
bit-field by doing it on the local 'hdrincl' variable itself:

	int hdrincl = inet->hdrincl;
	hdrincl = READ_ONCE(hdrincl);

This breaks the chain in the sense that the compiler is not allowed
to replace subsequent reads from hdrincl with reloads from inet->hdrincl.

Fixes: 8f659a03a0 ("net: ipv4: fix for a race condition in raw_sendmsg")
Signed-off-by: Nicolai Stange <nstange@suse.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 11:59:16 -05:00
Xiongfeng Wang 3dc2fa4754 net: caif: use strlcpy() instead of strncpy()
gcc-8 reports

net/caif/caif_dev.c: In function 'caif_enroll_dev':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]

net/caif/cfctrl.c: In function 'cfctrl_linkup_request':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]

net/caif/cfcnfg.c: In function 'caif_connect_client':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]

The compiler require that the input param 'len' of strncpy() should be
greater than the length of the src string, so that '\0' is copied as
well. We can just use strlcpy() to avoid this warning.

Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 11:52:18 -05:00
Ilya Dryomov 21acdf45f4 rbd: set max_segments to USHRT_MAX
Commit d3834fefcf ("rbd: bump queue_max_segments") bumped
max_segments (unsigned short) to max_hw_sectors (unsigned int).
max_hw_sectors is set to the number of 512-byte sectors in an object
and overflows unsigned short for 32M (largest possible) objects, making
the block layer resort to handing us single segment (i.e. single page
or even smaller) bios in that case.

Cc: stable@vger.kernel.org
Fixes: d3834fefcf ("rbd: bump queue_max_segments")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2018-01-09 17:40:48 +01:00
Florian Margaine edd8ca8015 rbd: reacquire lock should update lock owner client id
Otherwise, future operations on this RBD using exclusive-lock are
going to require the lock from a non-existent client id.

Cc: stable@vger.kernel.org
Fixes: 14bb211d32 ("rbd: support updating the lock cookie without releasing the lock")
Link: http://tracker.ceph.com/issues/19929
Signed-off-by: Florian Margaine <florian@platform.sh>
[idryomov@gmail.com: rbd_set_owner_cid() call, __rbd_lock() helper]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-09 17:40:21 +01:00
Andrii Vladyka b8fd0823e0 net: core: fix module type in sock_diag_bind
Use AF_INET6 instead of AF_INET in IPv6-related code path

Signed-off-by: Andrii Vladyka <tulup@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 11:28:58 -05:00
Linus Torvalds ef7f8cec80 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Frag and UDP handling fixes in i40e driver, from Amritha Nambiar and
    Alexander Duyck.

 2) Undo unintentional UAPI change in netfilter conntrack, from Florian
    Westphal.

 3) Revert a change to how error codes are returned from
    dev_get_valid_name(), it broke some apps.

 4) Cannot cache routes for ipv6 tunnels in the tunnel is ipv4/ipv6
    dual-stack. From Eli Cooper.

 5) Fix missed PMTU updates in geneve, from Xin Long.

 6) Cure double free in macvlan, from Gao Feng.

 7) Fix heap out-of-bounds write in rds_message_alloc_sgs(), from
    Mohamed Ghannam.

 8) FEC bug fixes from FUgang Duan (mis-accounting of dev_id, missed
    deferral of probe when the regulator is not ready yet).

 9) Missing DMA mapping error checks in 3c59x, from Neil Horman.

10) Turn off Broadcom tags for some b53 switches, from Florian Fainelli.

11) Fix OOPS when get_target_net() is passed an SKB whose NETLINK_CB()
    isn't initialized. From Andrei Vagin.

12) Fix crashes in fib6_add(), from Wei Wang.

13) PMTU bug fixes in SCTP from Marcelo Ricardo Leitner.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
  sh_eth: fix TXALCR1 offsets
  mdio-sun4i: Fix a memory leak
  phylink: mark expected switch fall-throughs in phylink_mii_ioctl
  sctp: fix the handling of ICMP Frag Needed for too small MTUs
  sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
  xen-netfront: enable device after manual module load
  bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
  bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
  sh_eth: fix SH7757 GEther initialization
  net: fec: free/restore resource in related probe error pathes
  uapi/if_ether.h: prevent redefinition of struct ethhdr
  ipv6: fix general protection fault in fib6_add()
  RDS: null pointer dereference in rds_atomic_free_op
  sh_eth: fix TSU resource handling
  net: stmmac: enable EEE in MII, GMII or RGMII only
  rtnetlink: give a user socket to get_target_net()
  MAINTAINERS: Update my email address.
  can: ems_usb: improve error reporting for error warning and error passive
  can: flex_can: Correct the checking for frame length in flexcan_start_xmit()
  can: gs_usb: fix return value of the "set_bittiming" callback
  ...
2018-01-08 20:21:39 -08:00
Linus Torvalds 44596f8682 Fourth pull request for 4.15-rc
- One line fix to mlx4 error flow (same as mlx5 fix in last pull request,
   just in the mlx4 driver)
 - Fix a race condition in the IPoIB driver.  This patch is larger than
   just a one line fix, but resolves a race condition in a fairly
   straight forward manner
 - Fix a locking issue in the RDMA netlink code.  This patch is also
   larger than I would like for a late -rc.  It has, however, had a week
   to bake in the rdma tree prior to this pull request
 - One line fix to fix granting remote machine access to memory that they
   don't need and shouldn't have
 - One line fix to correct the fact that our sgid/dgid pair is swapped
   from what you would expect when receiving an incoming connection
   request
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaU+ZkAAoJELgmozMOVy/dLw8P/1f27k9c7Bg91VfuyQeIcSxA
 kyRDdzlkRzuI/6QJ4ErK+IkOH8ADG6UGmQa+fOv1dxG8do+YwVflcY7gEgjJA7fP
 k0oPuGjiq8wrEWZrFGinln38ou0KALYd4F2C32unVYrsIohQLHSr1D6Ttw0W5FA6
 NQG4nVn9FzmilgjqtkW2zOGKw4jdAn57J47tUp49KufuPBTUcxjmZCdaV5AmiuzN
 5JpZUieL49Zoc18pcm1OreqDPZcj5LV1XquDNV+AZgU9+uGKoIb932k6hQjBRuml
 FSePxpPjdN8zX/KVaa4HQHX4U4uMBp0HcRHYME1bDsKwTh/d9xKM/yTPzzCtJz+r
 wmGJ9TPr2nq8blJJq17nSXbaJ4LmzlScCwork3LomdZJi880JwWJlvjFG3M/Yir9
 HvS2zIOUJm+xZBNCDVEayYcBMkXew5XjxETtDwOvfYX8FM419LLk1WOp2y/4LKDD
 hIR8QYkZMl37lMYqWZUghNjR7Rov6jdd30KDiCGdOAO/qszlNyTSL+icWyzc1t/X
 VT4ai7vc0RTicPWwb8H8o8/dQNj8Ed8w5NnMq3hjen+KrTKShkZTMuW+or/E9jZN
 ha9jIzSPLRfOvX6mZRrQVe6hiY3fOWMZXdw7gtehUy2hX7LCSwwbn2v6FcsDxyMQ
 UW6ZVG3ccP9YSY+tBWKg
 =kUnv
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:

 - One line fix to mlx4 error flow (same as mlx5 fix in last pull
   request, just in the mlx4 driver)

 - Fix a race condition in the IPoIB driver. This patch is larger than
   just a one line fix, but resolves a race condition in a fairly
   straight forward manner

 - Fix a locking issue in the RDMA netlink code. This patch is also
   larger than I would like for a late -rc. It has, however, had a week
   to bake in the rdma tree prior to this pull request

 - One line fix to fix granting remote machine access to memory that
   they don't need and shouldn't have

 - One line fix to correct the fact that our sgid/dgid pair is swapped
   from what you would expect when receiving an incoming connection
   request

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/srpt: Fix ACL lookup during login
  IB/srpt: Disable RDMA access by the initiator
  RDMA/netlink: Fix locking around __ib_get_device_by_index
  IB/ipoib: Fix race condition in neigh creation
  IB/mlx4: Fix mlx4_ib_alloc_mr error flow
2018-01-08 16:17:31 -08:00
Alexei Starovoitov b2157399cc bpf: prevent out-of-bounds speculation
Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index >= max_entries) {
    index &= map->index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-09 00:53:49 +01:00
Linus Torvalds d32da5841b platform-drivers-x86 for v4.15-4
Address a wmi initcall ordering race resulting in a difficult to
 reproduce boot failure.
 
 The following is an automated git shortlog grouped by driver:
 
 wmi:
  -  Call acpi_wmi_init() later
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaU72UAAoJEKbMaAwKp364/xYH/28Hv4EBwwMMWDMCdqhRUqaZ
 W0w5OiOZYXeFp48IE336udfONQs2IKOSnkq4A5C+vkcKSHAfM1aNY32Muk0E0Dpt
 fXuI2SnvhDTUc9jvN7n3RiRw9KpQ2rReEPRcUWhAZ89HkuYBSJ6/uG+aKtbouOWg
 Mxy1sVsKLxmWm4D+i2PNBo6b7zqAs7TL7FKeeaNx7SPe3sQb/xFNf+wFiD5VPW5S
 ecEc2djOOBkVQ83xmrNSi5c9RpwlviKRwFDD+3LFX5RuXqdK+unbOq4iPUiCNSso
 eAybSqohPPCcIgT1EpTsnXjtUGw0wfbkOz5aj7MEDodmTWGVVTvCEyq5fVHKjFY=
 =SKD2
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.15-4' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver fix from Darren Hart:
 "Address a wmi initcall ordering race resulting in a difficult to
  reproduce boot failure"

* tag 'platform-drivers-x86-v4.15-4' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: wmi: Call acpi_wmi_init() later
2018-01-08 11:52:24 -08:00
Sergei Shtylyov 50f3d740d3 sh_eth: fix TXALCR1 offsets
The  TXALCR1 offsets are incorrect in the register offset tables, most
probably due to copy&paste error.  Luckily, the driver never uses this
register. :-)

Fixes: 4a55530f38 ("net: sh_eth: modify the definitions of register")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 14:31:38 -05:00
Christophe JAILLET 56c0290202 mdio-sun4i: Fix a memory leak
If the probing of the regulator is deferred, the memory allocated by
'mdiobus_alloc_size()' will be leaking.
It should be freed before the next call to 'sun4i_mdio_probe()' which will
reallocate it.

Fixes: 4bdcb1dd9f ("net: Add MDIO bus driver for the Allwinner EMAC")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 14:30:28 -05:00
Gustavo A. R. Silva 46cd750364 phylink: mark expected switch fall-throughs in phylink_mii_ioctl
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1463447 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 14:21:58 -05:00
David S. Miller 313c86da2d Merge branch 'SCTP-PMTU-discovery-fixes'
Marcelo Ricardo Leitner says:

====================
SCTP PMTU discovery fixes

This patchset fixes 2 issues with PMTU discovery that can lead to flood
of retransmissions.
The first patch fixes the issue for when PMTUD is disabled by the
application, while the second fixes it for when its enabled.

Please consider these to stable.
====================

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 14:20:41 -05:00
Marcelo Ricardo Leitner b6c5734db0 sctp: fix the handling of ICMP Frag Needed for too small MTUs
syzbot reported a hang involving SCTP, on which it kept flooding dmesg
with the message:
[  246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
low, using default minimum of 512

That happened because whenever SCTP hits an ICMP Frag Needed, it tries
to adjust to the new MTU and triggers an immediate retransmission. But
it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
allowed (512) would not cause the PMTU to change, and issued the
retransmission anyway (thus leading to another ICMP Frag Needed, and so
on).

As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
are higher than that, sctp_transport_update_pmtu() is changed to
re-fetch the PMTU that got set after our request, and with that, detect
if there was an actual change or not.

The fix, thus, skips the immediate retransmission if the received ICMP
resulted in no change, in the hope that SCTP will select another path.

Note: The value being used for the minimum MTU (512,
SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
SCTP_MIN_PMTU), but such change belongs to another patch.

Changes from v1:
- do not disable PMTU discovery, in the light of commit
06ad391919 ("[SCTP] Don't disable PMTU discovery when mtu is small")
and as suggested by Xin Long.
- changed the way to break the rtx loop by detecting if the icmp
  resulted in a change or not
Changes from v2:
none

See-also: https://lkml.org/lkml/2017/12/22/811
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 14:19:13 -05:00
Marcelo Ricardo Leitner cc35c3d1ed sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
Currently, if PMTU discovery is disabled on a given transport, but the
configured value is higher than the actual PMTU, it is likely that we
will get some icmp Frag Needed. The issue is, if PMTU discovery is
disabled, we won't update the information and will issue a
retransmission immediately, which may very well trigger another ICMP,
and another retransmission, leading to a loop.

The fix is to simply not trigger immediate retransmissions if PMTU
discovery is disabled on the given transport.

Changes from v2:
- updated stale comment, noticed by Xin Long

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 14:19:13 -05:00
Eduardo Otubo b707fda2df xen-netfront: enable device after manual module load
When loading the module after unloading it, the network interface would
not be enabled and thus wouldn't have a backend counterpart and unable
to be used by the guest.

The guest would face errors like:

  [root@guest ~]# ethtool -i eth0
  Cannot get driver information: No such device

  [root@guest ~]# ifconfig eth0
  eth0: error fetching interface information: Device not found

This patch initializes the state of the netfront device whenever it is
loaded manually, this state would communicate the netback to create its
device and establish the connection between them.

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 14:17:03 -05:00
David S. Miller bde2191589 Merge branch 'bnxt_en_fixes'
Michael Chan says:

====================
bnxt_en: 2 small bug fixes.

The first one fixes the TC Flower flow parameter passed to firmware.  The
2nd one fixes the VF index range checking for iproute2 SRIOV related commands.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 14:13:45 -05:00
Venkat Duvvuru 78f3000493 bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
In bnxt_vf_ndo_prep (which is called by bnxt_get_vf_config ndo), there is a
check for "Invalid VF id". Currently, the check is done against max_vfs.
However, the user doesn't always create max_vfs. So, the check should be
against the created number of VFs. The number of bnxt_vf_info structures
that are allocated in bnxt_alloc_vf_resources routine is the "number of
requested VFs". So, if an "invalid VF id" falls between the requested
number of VFs and the max_vfs, the driver will be dereferencing an invalid
pointer.

Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Venkat Devvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 14:13:45 -05:00
Sunil Challa 7deea450eb bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
flow_type in HWRM_FLOW_ALLOC is not being populated correctly due to
incorrect passing of pointer and size of l3_mask argument of is_wildcard().
Fixed this.

Fixes: db1d36a273 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds")
Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 14:13:44 -05:00
Linus Torvalds 29f7e49941 Merge branch 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "This contains fixes for the following two non-trivial issues:

   - The task iterator got broken while adding thread mode support for
     v4.14. It was less visible because it only triggers when both
     cgroup1 and cgroup2 hierarchies are in use. The recent versions of
     systemd uses cgroup2 for process management even when cgroup1 is
     used for resource control exposing this issue.

   - cpuset CPU hotplug path could deadlock when racing against exits.

  There also are two patches to replace unlimited strcpy() usages with
  strlcpy()"

* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC
  cgroup: Fix deadlock in cpu hotplug path
  cgroup: use strlcpy() instead of strscpy() to avoid spurious warning
  cgroup: avoid copying strings longer than the buffers
2018-01-08 11:13:08 -08:00
Rafael J. Wysocki 98b8e4e5c1 platform/x86: wmi: Call acpi_wmi_init() later
Calling acpi_wmi_init() at the subsys_initcall() level causes ordering
issues to appear on some systems and they are difficult to reproduce,
because there is no guaranteed ordering between subsys_initcall()
calls, so they may occur in different orders on different systems.

In particular, commit 86d9f48534 (mm/slab: fix kmemcg cache
creation delayed issue) exposed one of these issues where genl_init()
and acpi_wmi_init() are both called at the same initcall level, but
the former must run before the latter so as to avoid a NULL pointer
dereference.

For this reason, move the acpi_wmi_init() invocation to the
initcall_sync level which should still be early enough for things
to work correctly in the WMI land.

Link: https://marc.info/?t=151274596700002&r=1&w=2
Reported-by: Jonathan McDowell <noodles@earth.li>
Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-01-08 10:47:48 -08:00
Takashi Iwai 900498a34a ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
PCM OSS read/write loops keep taking the mutex lock for the whole
read/write, and this might take very long when the exceptionally high
amount of data is given.  Also, since it invokes with mutex_lock(),
the concurrent read/write becomes unbreakable.

This patch tries to address these issues by replacing mutex_lock()
with mutex_lock_interruptible(), and also splits / re-takes the lock
at each read/write period chunk, so that it can switch the context
more finely if requested.

Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-01-08 16:40:26 +01:00
Jiri Olsa 28a0b39877 perf script: Add support to display sample misc field
Adding support to display sample misc field in form
of letter for each bit:

  # perf script -F +misc ...
   sched-messaging  1414 K     28690.636582:       4590 cycles ...
   sched-messaging  1407 U     28690.636600:     325620 cycles ...
   sched-messaging  1414 K     28690.636608:      19473 cycles ...
  misc field  __________/

The misc bits are assigned to following letters:

  PERF_RECORD_MISC_KERNEL        K
  PERF_RECORD_MISC_USER          U
  PERF_RECORD_MISC_HYPERVISOR    H
  PERF_RECORD_MISC_GUEST_KERNEL  G
  PERF_RECORD_MISC_GUEST_USER    g
  PERF_RECORD_MISC_MMAP_DATA*    M
  PERF_RECORD_MISC_COMM_EXEC     E
  PERF_RECORD_MISC_SWITCH_OUT    S

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180107160356.28203-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 12:39:50 -03:00
Jiri Olsa 972c148847 perf: Update PERF_RECORD_MISC_* comment for perf_event_header::misc bit 13
The perf_event_header::misc bit 13 is shared on different events and
next patch is adding yet another bit 13 user.  Updating the comment to
make it more structured and clear which events use bit 13.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20180107160356.28203-8-jolsa@kernel.org
[ Update the tools/include/uapi/linux copy ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 12:37:54 -03:00
Jiri Olsa 99e818cc88 perf: Return empty callchain instead of NULL
It simplifies the code a bit, because we dump the callchain
Link: http://lkml.kernel.org/n/tip-uqp7qd6aif47g39glnbu95yl@git.kernel.org
even if it's empty. With 'empty' callchain we can remove
all the NULL-checking code paths.

Original-patch-from: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20180107160356.28203-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 12:35:01 -03:00
Jiri Olsa 8cf7e0e224 perf: Make perf_callchain function static
And move it to core.c, because there's no caller of this function other
than the one in core.c

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180107160356.28203-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 12:33:01 -03:00
Jiri Olsa 81df978c49 perf: Add sample_id to PERF_RECORD_ITRACE_START event comment
Adding missing sample_id line into PERF_RECORD_ITRACE_START
event comment.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180107160356.28203-5-jolsa@kernel.org
[ Update the tools/include/uapi/linux copy ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 12:32:25 -03:00
Jiri Olsa 313ccb9615 perf: Allocate context task_ctx_data for child event
Currently we use perf_event_context::task_ctx_data to save and restore
the LBR status when the task is scheduled out and in.

We don't allocate it for child contexts, which results in shorter task's
LBR stack, because we don't save the history from previous run and start
over every time we schedule the task in.

I made a test to generate samples with LBR call stack and got higher
numbers on bigger chain depths:

                            before:     after:
  LBR call chain: nr: 1       60561     498127
  LBR call chain: nr: 2           0          0
  LBR call chain: nr: 3      107030       2172
  LBR call chain: nr: 4      466685      62758
  LBR call chain: nr: 5     2307319     878046
  LBR call chain: nr: 6       48713     495218
  LBR call chain: nr: 7        1040       4551
  LBR call chain: nr: 8         481        172
  LBR call chain: nr: 9         878        120
  LBR call chain: nr: 10       2377       6698
  LBR call chain: nr: 11      28830     151487
  LBR call chain: nr: 12      29347     339867
  LBR call chain: nr: 13          4         22
  LBR call chain: nr: 14          3         53

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 4af57ef28c ("perf: Add pmu specific data for perf task context")
Link: http://lkml.kernel.org/r/20180107160356.28203-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 12:26:46 -03:00
Jiri Olsa db9fc765e8 perf tools: Display perf_event_attr::namespaces debug info
Display namespaces bit in -vv debug display:

  $ perf record -vv --namespaces ...
  ...
  perf_event_attr:
    size                             112
    ...
    namespaces                       1

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180107160356.28203-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 12:15:19 -03:00
Jiri Olsa 24787afbcd perf tools: Enable LIBBABELTRACE by default
There's no reason anymore to treat babel trace in a special way, because
a) we no longer display its state b) the needed babeltrace library is
now out and well adopted among distros.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180107160356.28203-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 12:10:21 -03:00
Jin Yao 2ab046cd01 perf script: Support time percent and multiple time ranges
perf script has a --time option to limit the time range of output.  It
only supports absolute time.

Now this option is extended to support multiple time ranges and support
the percent of time.

For example:

1. Select the first and second 10% time slices:

   perf script --time 10%/1,10%/2

2. Select from 0% to 10% and 30% to 40% slices:

   perf script --time 0%-10%,30%-40%

Changelog:

v6: Fix the merge issue with latest perf/core branch.
    No functional changes.

v5: Add checking of first/last sample time to detect if it's recorded
    in perf.data. If it's not recorded, returns error message to user.

v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample

v3: Since the definitions of first_sample_time/last_sample_time
    are moved from perf_session to perf_evlist so change the
    related code.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-7-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 12:07:06 -03:00
Jin Yao 5b969bc766 perf report: Support time percent and multiple time ranges
perf report has a --time option to limit the time range of output.  It
only supports absolute time.

Now this option is extended to support multiple time ranges and support
the percent of time.

For example:

1. Select the first and second 10% time slices:

perf report --time 10%/1,10%/2

2. Select from 0% to 10% and 30% to 40% slices:

perf report --time 0%-10%,30%-40%

Changelog:

v6: Fix the merge issue with latest perf/core branch.
    No functional changes.

v5: Add checking of first/last sample time to detect if it's recorded
    in perf.data. If it's not recorded, returns error message to user.

v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample

v3: Since the definitions of first_sample_time/last_sample_time
    are moved from perf_session to perf_evlist so change the
    related code.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-6-git-send-email-yao.jin@linux.intel.com
[ Add missing colons at end of examples in the man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 12:06:20 -03:00
Jin Yao 9a9b8b4b22 perf tools: Create function to perform multiple time range checking
Previous patch supports the multiple time range.

For example, select the first and second 10% time slices.
perf report --time 10%/1,10%/2

We need a function to check if a timestamp is in the ranges of
[0, 10%) and [10%, 20%].

Note that it includes the last element in [10%, 20%] but it doesn't
include the last element in [0, 10%). It's to avoid the overlap.

This patch implments a new function perf_time__ranges_skip_sample
for this checking.

Change log:

v4: Let perf_time__ranges_skip_sample be compatible with
    perf_time__skip_sample when only one time range.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-5-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 11:41:06 -03:00
Jin Yao 13a70f3506 perf tools: Create function to parse time percent
Current perf report/script/... have a --time option to limit the time
range of output. But right now it only supports absolute time, add
support for time percentage.

For example:

1. Select the second 10% time slice
   perf report --time 10%/2

2. Select from 0% to 10% time slice
   perf report --time 0%-10%

It also support the multiple time ranges.

3. Select the first and second 10% time slices
   perf report --time 10%/1,10%/2

4. Select from 0% to 10% and 30% to 40% slices
   perf report --time 0%-10%,30%-40%

Changelog:

v4: An issue is found. Following passes.
    perf script --time 10%/10x12321xsdfdasfdsafdsafdsa

    Now it uses strtol to replace atoi.

Committer notes:

This just puts in place the infrastructure, so the examples in this cset
comment will only work later, after more patches in this series are
applied.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 11:39:09 -03:00
Jin Yao 68588baf8d perf record: Record the first and last sample time in the header
In the default 'perf record' configuration, all samples are processed,
to create the HEADER_BUILD_ID table. So it's very easy to get the
first/last samples and save the time to perf file header via the
function write_sample_time().

Later, at post processing time, perf report/script will fetch the time
from perf file header.

Committer testing:

  # perf record -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ]
  [root@jouet home]# perf report --header | grep "time of "
  # time of first sample : 22947.909226
  # time of last sample : 22948.910704
  #
  # perf report -D | grep PERF_RECORD_SAMPLE\(
  0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa21b1af3 period: 1 addr: 0
  0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa200d204 period: 1 addr: 0
  <SNIP>
  3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xffffffffa22071d8 period: 169518 addr: 0
  0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 198807 addr: 0
  2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 88111 addr: 0
  #

Changelog:

v7: Just update the patch description according to Arnaldo's suggestion.

v6: Currently '--buildid-all' is not enabled at default. So the walking
    on all samples is the default operation. There is no big overhead
    to calculate the timestamp boundary in process_sample_event handler
    once we already go through all samples. So the timestamp boundary
    calculation is enabled by default when '--buildid-all' is not enabled.

    While if '--buildid-all' is enabled, we creates a new option
    "--timestamp-boundary" for user to decide if it enables the
    timestamp boundary calculation.

v5: There is an issue that the sample walking can only work when
    '--buildid-all' is not enabled. So we need to let the walking
    be able to work even if '--buildid-all' is enabled and let the
    processing skips the dso hit marking for this case.

    At first, I want to provide a new option "--record-time-boundaries".
    While after consideration, I think a new option is not very
    necessary.

v3: Remove the definitions of first_sample_time and last_sample_time
    from struct record and directly save them in perf_evlist.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 11:20:56 -03:00
Jin Yao 6011518db3 perf header: Add infrastructure to record first and last sample time
perf report/script/... have a --time option to limit the time range of
output. That's very useful to slice large traces, e.g. when processing
the output of perf script for some analysis.

But right now --time only supports absolute time. Also there is no fast
way to get the start/end times of a given trace except for looking at
it.  This makes it hard to e.g. only decode the first half of the trace,
which is useful for parallelization of scripts

Another problem is that perf records are variable size and there is no
synchronization mechanism. So the only way to find the last sample
reliably would be to walk all samples. But we want to avoid that in perf
report/...  because it is already quite expensive. That is why storing
the first sample time and last sample time in perf record is better.

This patch creates a new header feature type HEADER_SAMPLE_TIME and
related ops. Save the first sample time and the last sample time to the
feature section in perf file header. That will be done when, for
instance, processing build-ids, where we already have to process all
samples to create the build-id table, take advantage of that to further
amortize that processing by storing HEADER_SAMPLE_TIME to make 'perf
report/script' faster when using --time.

Committer testing:

After this patch is applied the header is written with zeroes, we need
the next patch, for "perf record" to actually write the timestamps:

  # perf report -D | grep PERF_RECORD_SAMPLE\(
  22501155244406 0x44f0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xffffffffa21be8c5 period: 1 addr: 0
  <SNIP>
  22501155793625 0x4a30 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xffffffffa21ffd50 period: 2828043 addr: 0
  # perf report --header | grep "time of "
  # time of first sample : 0.000000
  # time of last sample : 0.000000
  #

Changelog:

v7: 1. Rebase to latest perf/core branch.

    2. Add following clarification in patch description according to
       Arnaldo's suggestion.

       "That will be done when, for instance, processing build-ids,
	where we already have to process all samples to create the
	build-id table, take advantage of that to further amortize
	that processing by storing HEADER_SAMPLE_TIME to make
	'perf report/script' faster when using --time."

v4: Use perf script time style for timestamp printing. Also add with
    the printing of sample duration.

v3: Remove the definitions of first_sample_time/last_sample_time from
    perf_session. Just define them in perf_evlist

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 11:20:51 -03:00
Takashi Iwai 29159a4ed7 ALSA: pcm: Abort properly at pending signal in OSS read/write loops
The loops for read and write in PCM OSS emulation have no proper check
of pending signals, and they keep processing even after user tries to
break.  This results in a very long delay, often seen as RCU stall
when a huge unprocessed bytes remain queued.  The bug could be easily
triggered by syzkaller.

As a simple workaround, this patch adds the proper check of pending
signals and aborts the loop appropriately.

Reported-by: syzbot+993cb4cfcbbff3947c21@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-01-08 15:16:52 +01:00
Jin Yao 40c39e3046 perf report: Fix a no annotate browser displayed issue
When enabling '-b' option in perf record, for example,

  perf record -b ...
  perf report

and then browsing the annotate browser from perf report (press 'A'), it
would fail (annotate browser can't be displayed).

It's because the '.add_entry_cb' op of struct report is overwritten by
hist_iter__branch_callback() in builtin-report.c. But this function doesn't do
something like mapping symbols and sources. So next, do_annotate() will return
directly.

        notes = symbol__annotation(act->ms.sym);
        if (!notes->src)
                return 0;

This patch adds the lost code to hist_iter__branch_callback (refer to
hist_iter__report_callback).

v2:

Fix a crash bug when perform 'perf report --stdio'.

The reason is that we init the symbol annotation only in browser mode, it
doesn't allocate/init resources for stdio mode.

So now in hist_iter__branch_callback(), it will return directly if it's not in
browser mode.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1514284963-18587-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 11:11:57 -03:00
Jin Yao 935f5a9d45 perf report: Fix a wrong offset issue when using /proc/kcore
When a valid vmlinux is not found, 'perf report' falls back to look at
/proc/kcore. In this case, it will report the impossible large offset.

For example:

  # perf record -b -e cycles:k find /etc/ > /dev/null
  # perf report --stdio --branch-history

    22.77%  _vm_normal_page+18446603336221188162
            |
            ---page_remove_rmap +18446603336221188324
               page_remove_rmap +18446603336221188487 (cycles:5)
               unlock_page_memcg +18446603336221188096
               page_remove_rmap +18446603336221188327 (cycles:1)

The issue is the value which is passed to parameter 'addr' in
__get_srcline() is the objdump address. It's not correct if we calculate
the offset by using 'addr - sym->start'.

This patch creates a new parameter 'ip' in __get_srcline(). It is not
converted to objdump address.

With this patch, the perf report output is:

    22.77%  _vm_normal_page+66
            |
            ---page_remove_rmap +228
               page_remove_rmap +391 (cycles:5)
               unlock_page_memcg +0
               page_remove_rmap +231 (cycles:1)
               page_remove_rmap +236

Committer testing:

Make sure you get any valid vmlinux out of the way, using '-v' on the
'perf report' case and deleting it from places where perf searches them,
like your kernel build dir and the build-id cache, in ~/.debug/.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1514564812-17344-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 11:11:57 -03:00
Wang Nan 44df1afdb1 perf tools: Fix compile error with libunwind x86
Fix a compile error:

 ...
   CC       util/libunwind/x86_32.o
 In file included from util/libunwind/x86_32.c:33:0:
 util/libunwind/../../arch/x86/util/unwind-libunwind.c: In function 'libunwind__x86_reg_id':
 util/libunwind/../../arch/x86/util/unwind-libunwind.c:110:11: error: 'EINVAL' undeclared (first use in this function)
    return -EINVAL;
            ^
 util/libunwind/../../arch/x86/util/unwind-libunwind.c:110:11: note: each undeclared identifier is reported only once for each function it appears in
 mv: cannot stat 'util/libunwind/.x86_32.o.tmp': No such file or directory
 make[4]: *** [util/libunwind/x86_32.o] Error 1
 make[3]: *** [util] Error 2
 make[2]: *** [libperf-in.o] Error 2
 make[1]: *** [sub-make] Error 2
 make: *** [all] Error 2

It happens when libunwind-x86 feature is detected.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20171206015040.114574-1-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 11:11:57 -03:00
Arnaldo Carvalho de Melo e0337f4f9a perf test bpf: Hook on epoll_pwait()
The 'perf test bpf' was hooking a eBPF program on the SyS_epoll_wait()
kernel function, that was what the epoll_wait() glibc function ended up
calling, but since at least glibc 2.26, the one that comes with, for
instance, Fedora 27, glibc ends up calling SyS_epoll_pwait() when
epoll_wait() is used, causing this 'perf test' entry to fail.

So switch to using epoll_pwait() and hook the eBPF program to the
SyS_epoll_pwait() kernel function to make it work on a wider range of
glibc and kernel versions.

Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-zynvquy63er8s5mrgsz65pto@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 11:11:57 -03:00
Arnaldo Carvalho de Melo 13cb2d0f51 perf test bpf: Use designated struct field initializers
To follow standard practice in the kernel sources, documenting the
initialization better and helping quickly finding the value for some
field in a struct with many entries.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-syn3hz9hz7ukxlxbx5x6hv20@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 11:11:56 -03:00
Arnaldo Carvalho de Melo 6703c9771d perf test bpf: Improve message about expected samples
When failing on one of the BPF tests we were just stating:

  BPF filter result incorrect

Add some more info to help figuring out the problem:

 BPF filter result incorrect, expected 56, got 0 samples

This came out while investigating this failure, first seen after
updating the kernel to the 4.15.0-rc6 tag:

  [root@jouet ~]# perf test bpf
  39: BPF filter               :
  39.1: Basic BPF filtering    : FAILED!
  39.2: BPF pinning            : Skip
  39.3: BPF prologue generation: Skip
  39.4: BPF relocation checker : Skip
  [root@jouet ~]#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-403npu7daupv6b2bmxliv5pk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08 11:11:56 -03:00
Christoph Hellwig 1125203c13
riscv: rename SR_* constants to match the spec
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-01-07 15:14:39 -08:00
Christoph Hellwig c163fb38ca
riscv: remove CONFIG_MMU ifdefs
The RISC-V port doesn't suport a nommu mode, so there is no reason
to provide some code only under a CONFIG_MMU ifdef.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-01-07 15:14:39 -08:00