Commit Graph

811236 Commits

Author SHA1 Message Date
Josef Bacik 3ec9a4c81c btrfs: run delayed iputs before committing
Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd.  So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is and we freed enough we can then commit the transaction and
potentially be able to make our reservation.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18 17:27:21 +01:00
Josef Bacik 74d5d229b1 btrfs: wait on ordered extents on abort cleanup
If we flip read-only before we initiate writeback on all dirty pages for
ordered extents we've created then we'll have ordered extents left over
on umount, which results in all sorts of bad things happening.  Fix this
by making sure we wait on ordered extents if we have to do the aborted
transaction cleanup stuff.

generic/475 can produce this warning:

 [ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs]
 [ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G        W 5.0.0-rc1-default+ #394
 [ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
 [ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs]
 [ 8531.193082] RSP: 0018:ffffb1ab86163d98 EFLAGS: 00010286
 [ 8531.194198] RAX: ffff9f3449494d18 RBX: ffff9f34a2695000 RCX:0000000000000000
 [ 8531.195629] RDX: 0000000000000002 RSI: 0000000000000001 RDI:0000000000000000
 [ 8531.197315] RBP: ffff9f344e930000 R08: 0000000000000001 R09:0000000000000000
 [ 8531.199095] R10: 0000000000000000 R11: ffff9f34494d4ff8 R12:ffffb1ab86163dc0
 [ 8531.200870] R13: ffff9f344e9300b0 R14: ffffb1ab86163db8 R15:0000000000000000
 [ 8531.202707] FS:  00007fc68e949fc0(0000) GS:ffff9f34bd800000(0000)knlGS:0000000000000000
 [ 8531.204851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [ 8531.205942] CR2: 00007ffde8114dd8 CR3: 000000002dfbd000 CR4:00000000000006e0
 [ 8531.207516] Call Trace:
 [ 8531.208175]  btrfs_free_fs_roots+0xdb/0x170 [btrfs]
 [ 8531.210209]  ? wait_for_completion+0x5b/0x190
 [ 8531.211303]  close_ctree+0x157/0x350 [btrfs]
 [ 8531.212412]  generic_shutdown_super+0x64/0x100
 [ 8531.213485]  kill_anon_super+0x14/0x30
 [ 8531.214430]  btrfs_kill_super+0x12/0xa0 [btrfs]
 [ 8531.215539]  deactivate_locked_super+0x29/0x60
 [ 8531.216633]  cleanup_mnt+0x3b/0x70
 [ 8531.217497]  task_work_run+0x98/0xc0
 [ 8531.218397]  exit_to_usermode_loop+0x83/0x90
 [ 8531.219324]  do_syscall_64+0x15b/0x180
 [ 8531.220192]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 [ 8531.221286] RIP: 0033:0x7fc68e5e4d07
 [ 8531.225621] RSP: 002b:00007ffde8116608 EFLAGS: 00000246 ORIG_RAX:00000000000000a6
 [ 8531.227512] RAX: 0000000000000000 RBX: 00005580c2175970 RCX:00007fc68e5e4d07
 [ 8531.229098] RDX: 0000000000000001 RSI: 0000000000000000 RDI:00005580c2175b80
 [ 8531.230730] RBP: 0000000000000000 R08: 00005580c2175ba0 R09:00007ffde8114e80
 [ 8531.232269] R10: 0000000000000000 R11: 0000000000000246 R12:00005580c2175b80
 [ 8531.233839] R13: 00007fc68eac61c4 R14: 00005580c2175a68 R15:0000000000000000

Leaving a tree in the rb-tree:

3853 void btrfs_free_fs_root(struct btrfs_root *root)
3854 {
3855         iput(root->ino_cache_inode);
3856         WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));

CC: stable@vger.kernel.org
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add stacktrace ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18 17:24:19 +01:00
Josef Bacik 31890da0bf btrfs: handle delayed ref head accounting cleanup in abort
We weren't doing any of the accounting cleanup when we aborted
transactions.  Fix this by making cleanup_ref_head_accounting global and
calling it from the abort code, this fixes the issue where our
accounting was all wrong after the fs aborts.

The test generic/475 on a 2G VM can trigger the problems eg.:

  [ 8502.136957] WARNING: CPU: 0 PID: 11064 at fs/btrfs/extent-tree.c:5986 btrfs_free_block_grou +ps+0x3dc/0x410 [btrfs]
  [ 8502.148372] CPU: 0 PID: 11064 Comm: umount Not tainted 5.0.0-rc1-default+ #394
  [ 8502.150807] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626 +cc-prebuilt.qemu-project.org 04/01/2014
  [ 8502.154317] RIP: 0010:btrfs_free_block_groups+0x3dc/0x410 [btrfs]
  [ 8502.160623] RSP: 0018:ffffb1ab84b93de8 EFLAGS: 00010206
  [ 8502.161906] RAX: 0000000001000000 RBX: ffff9f34b1756400 RCX: 0000000000000000
  [ 8502.163448] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff9f34b1755400
  [ 8502.164906] RBP: ffff9f34b7e8c000 R08: 0000000000000001 R09: 0000000000000000
  [ 8502.166716] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f34b7e8c108
  [ 8502.168498] R13: ffff9f34b7e8c158 R14: 0000000000000000 R15: dead000000000100
  [ 8502.170296] FS:  00007fb1cf15ffc0(0000) GS:ffff9f34bd400000(0000) knlGS:0000000000000000
  [ 8502.172439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8502.173669] CR2: 00007fb1ced507b0 CR3: 000000002f7a6000 CR4: 00000000000006f0
  [ 8502.175094] Call Trace:
  [ 8502.175759]  close_ctree+0x17f/0x350 [btrfs]
  [ 8502.176721]  generic_shutdown_super+0x64/0x100
  [ 8502.177702]  kill_anon_super+0x14/0x30
  [ 8502.178607]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [ 8502.179602]  deactivate_locked_super+0x29/0x60
  [ 8502.180595]  cleanup_mnt+0x3b/0x70
  [ 8502.181406]  task_work_run+0x98/0xc0
  [ 8502.182255]  exit_to_usermode_loop+0x83/0x90
  [ 8502.183113]  do_syscall_64+0x15b/0x180
  [ 8502.183919]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Corresponding to

  release_global_block_rsv() {
  ...
  WARN_ON(fs_info->delayed_refs_rsv.reserved > 0);

CC: stable@vger.kernel.org
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add log dump ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18 17:10:04 +01:00
David Sterba 77b7aad195 Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
This reverts commit e73e81b6d0.

This patch causes a few problems:

- adds latency to btrfs_finish_ordered_io
- as btrfs_finish_ordered_io is used for free space cache, generating
  more work from btrfs_btree_balance_dirty_nodelay could end up in the
  same workque, effectively deadlocking

12260 kworker/u96:16+btrfs-freespace-write D
[<0>] balance_dirty_pages+0x6e6/0x7ad
[<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90
[<0>] btrfs_finish_ordered_io+0x3da/0x770
[<0>] normal_work_helper+0x1c5/0x5a0
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x46/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff

Transaction commit will wait on the freespace cache:

838 btrfs-transacti D
[<0>] btrfs_start_ordered_extent+0x154/0x1e0
[<0>] btrfs_wait_ordered_range+0xbd/0x110
[<0>] __btrfs_wait_cache_io+0x49/0x1a0
[<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0
[<0>] commit_cowonly_roots+0x215/0x2b0
[<0>] btrfs_commit_transaction+0x37e/0x910
[<0>] transaction_kthread+0x14d/0x180
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff

And then writepages ends up waiting on transaction commit:

9520 kworker/u96:13+flush-btrfs-1 D
[<0>] wait_current_trans+0xac/0xe0
[<0>] start_transaction+0x21b/0x4b0
[<0>] cow_file_range_inline+0x10b/0x6b0
[<0>] cow_file_range.isra.69+0x329/0x4a0
[<0>] run_delalloc_range+0x105/0x3c0
[<0>] writepage_delalloc+0x119/0x180
[<0>] __extent_writepage+0x10c/0x390
[<0>] extent_write_cache_pages+0x26f/0x3d0
[<0>] extent_writepages+0x4f/0x80
[<0>] do_writepages+0x17/0x60
[<0>] __writeback_single_inode+0x59/0x690
[<0>] writeback_sb_inodes+0x291/0x4e0
[<0>] __writeback_inodes_wb+0x87/0xb0
[<0>] wb_writeback+0x3bb/0x500
[<0>] wb_workfn+0x40d/0x610
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x1e0/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff

Eventually, we have every process in the system waiting on
balance_dirty_pages(), and nobody is able to make progress on page
writeback.

The original patch tried to fix an OOM condition, that happened on 4.4 but no
success reproducing that on later kernels (4.19 and 4.20). This is more likely
a problem in OOM itself.

Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/
Reported-by: Chris Mason <clm@fb.com>
CC: stable@vger.kernel.org # 4.18+
CC: ethanlien <ethanlien@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18 17:09:55 +01:00
Rafael J. Wysocki 11ee2a3808 Merge branch 'acpi-pci'
* acpi-pci:
  drivers: thermal: int340x_thermal: Make PCI dependency explicit
  x86/intel/lpss: Make PCI dependency explicit
  platform/x86: apple-gmux: Make PCI dependency explicit
  platform/x86: intel_pmc: Make PCI dependency explicit
  platform/x86: intel_ips: make PCI dependency explicit
  vga-switcheroo: make PCI dependency explicit
  ata: pata_acpi: Make PCI dependency explicit
  ACPI / LPSS: Make PCI dependency explicit
2019-01-18 11:17:16 +01:00
Masahiro Yamada d311e0c27b mtd: rawnand: denali: get ->setup_data_interface() working again
Commit 7a08dbaedd ("mtd: rawnand: Move ->setup_data_interface() to
nand_controller_ops") missed to invert the if-conditonal for denali.
Since then, the Denali NAND driver cannnot invoke setup_data_interface.

Fixes: 7a08dbaedd ("mtd: rawnand: Move ->setup_data_interface() to nand_controller_ops")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
2019-01-18 10:27:01 +01:00
Luc Van Oostenryck 01eeb927bb mtd: nand: jz4740: fix '__iomem *' vs. '* __iomem'
The function jz_nand_ioremap_resource() needs a pointer to an __iomem
pointer as its last argument but this argument is declared as:
	void * __iomem *base

Fix this by using the correct declaration:
	void __iomem **base
which then also removes the following Sparse's warnings:
  282:15: warning: incorrect type in assignment (different address spaces)
  282:15:    expected void *[noderef] <asn:2>
  282:15:    got void [noderef] <asn:2> *
  322:57: warning: incorrect type in argument 4 (different address spaces)
  322:57:    expected void *[noderef] <asn:2> *base
  322:57:    got void [noderef] <asn:2> **
  402:67: warning: incorrect type in argument 4 (different address spaces)
  402:67:    expected void *[noderef] <asn:2> *base
  402:67:    got void [noderef] <asn:2> **

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Boris Brezillon <bbrezillon@kernel.org>
2019-01-18 10:26:46 +01:00
David S. Miller 435f3f2677 Merge branch 'tcp_openreq_child'
Eric Dumazet says:

====================
tcp: remove code from tcp_create_openreq_child()

tcp_create_openreq_child() is essentially cloning a listener, then
must initialize some fields that can not be inherited.

Listeners are either fresh sockets, or sockets that came through
tcp_disconnect() after a session that dirtied many fields.

By moving code to tcp_disconnect(), we can shorten time taken
to create a clone, since tcp_disconnect() operation is very
unlikely.
====================

Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:05 -08:00
Eric Dumazet 6bcdc40ddd tcp: move rx_opt & syn_data_acked init to tcp_disconnect()
If we make sure all listeners have these fields cleared, then a clone
will also inherit zero values.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:05 -08:00
Eric Dumazet 792c4354a5 tcp: move tp->rack init to tcp_disconnect()
If we make sure all listeners have proper tp->rack value,
then a clone will also inherit proper initial value.

Note that fresh sockets init tp->rack from tcp_init_sock()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:05 -08:00
Eric Dumazet 6cda8b7493 tcp: move app_limited init to tcp_disconnect()
If we make sure all listeners have app_limited set to ~0U,
then a clone will also inherit proper initial value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:05 -08:00
Eric Dumazet 5c701549c9 tcp: move retrans_out, sacked_out, tlp_high_seq, last_oow_ack_time init to tcp_disconnect()
If we make sure all listeners have these fields cleared, then a clone
will also inherit zero values.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:05 -08:00
Eric Dumazet 5d83676462 tcp: do not clear urg_data in tcp_create_openreq_child
All listeners have this field cleared already, since tcp_disconnect()
clears it and newly created sockets have also a zero value here.

So a clone will inherit a zero value here.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:05 -08:00
Eric Dumazet 3a9a57f637 tcp: move snd_cwnd & snd_cwnd_cnt init to tcp_disconnect()
Passive connections can inherit proper value by cloning,
if we make sure all listeners have the proper values there.

tcp_disconnect() was setting snd_cwnd to 2, which seems
quite obsolete since IW10 adoption.

Also remove an obsolete comment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:05 -08:00
Eric Dumazet b9e2e689aa tcp: move mdev_us init to tcp_disconnect()
If we make sure a listener always has its mdev_us
field set to TCP_TIMEOUT_INIT, we do not need to rewrite
this field after a new clone is created.

tcp_disconnect() is very seldom used in real applications.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:05 -08:00
Eric Dumazet a0070e463f tcp: do not clear srtt_us in tcp_create_openreq_child
All listeners have this field cleared already, since tcp_disconnect()
clears it and newly created sockets have also a zero value here.

So a clone will inherit a zero value here.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:05 -08:00
Eric Dumazet eb2c80ca87 tcp: do not clear packets_out in tcp_create_openreq_child()
New sockets have this field cleared, and tcp_disconnect()
calls tcp_write_queue_purge() which among other things
also clear tp->packets_out

So a listener is guaranteed to have this field cleared.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:04 -08:00
Eric Dumazet 6a408147ea tcp: move icsk_rto init to tcp_disconnect()
If we make sure a listener always has its icsk_rto
field set to TCP_TIMEOUT_INIT, we do not need to rewrite
this field after a new clone is created.

tcp_disconnect() is very seldom used in real applications.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:04 -08:00
Eric Dumazet b84235e291 tcp: do not set snd_ssthresh in tcp_create_openreq_child()
New sockets get the field set to TCP_INFINITE_SSTHRESH in tcp_init_sock()
In case a socket had this field changed and transitions to TCP_LISTEN
state, tcp_disconnect() also makes sure snd_ssthresh is set to
TCP_INFINITE_SSTHRESH.

So a listener has this field set to TCP_INFINITE_SSTHRESH already.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:19:04 -08:00
Yang Wei bf97403ac4 macvlan: replace kfree_skb by consume_skb for drop profiles
Replace the kfree_skb() by consume_skb() to be drop monitor(dropwatch,
perf) friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:09:09 -08:00
Yang Wei 87fff3cacd neighbour: Do not perturb drop profiles when neigh_probe
Replace the kfree_skb() by consume_skb() to be drop monitor(dropwatch,
perf) friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:08:14 -08:00
Lendacky, Thomas 5ab3121bee amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs
The XGBE hardware has support for performing MDIO operations using an
MDIO command request. The driver mistakenly uses the mdio port address
as the MDIO command request device address instead of the MDIO command
request port address. Additionally, the driver does not properly check
for and create a clause 45 MDIO command.

Check the supplied MDIO register to determine if the request is a clause
45 operation (MII_ADDR_C45). For a clause 45 operation, extract the device
address and register number from the supplied MDIO register and use them
to set the MDIO command request device address and register number fields.
For a clause 22 operation, the MDIO request device address is set to zero
and the MDIO command request register number is set to the supplied MDIO
register. In either case, the supplied MDIO port address is used as the
MDIO command request port address.

Fixes: 732f2ab7af ("amd-xgbe: Add support for MDIO attached PHYs")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:06:54 -08:00
YueHaibing bec03debe2 net/mlx4: remove unneeded semicolon
Remove unneeded semicolon.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:05:42 -08:00
YueHaibing 5c423d7114 net: ethernet: ti: cpsw-phy-sel: remove unneeded semicolon
Remove unneeded semicolon.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:05:16 -08:00
YueHaibing d4fb30f6f1 tipc: remove unneeded semicolon in trace.c
Remove unneeded semicolon

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:04:43 -08:00
Camelia Groza 40f89ebfbd net: phy: add missing phy driver features
The phy drivers for CS4340 and TN2020 are missing their
features attributes. Add them.

Fixes: 719655a149 ("net: phy: Replace phy driver features u32 with link_mode bitmap")
Reported-by: Scott Wood <oss@buserror.net>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:03:25 -08:00
Madalin Bucur c6ddfb9a96 dpaa_eth: NETIF_F_LLTX requires to do our own update of trans_start
As txq_trans_update() only updates trans_start when the lock is held,
trans_start does not get updated if NETIF_F_LLTX is declared.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 22:00:00 -08:00
YueHaibing 8b59bfe83c qed: remove duplicated include from qed_if.h
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 21:57:45 -08:00
Yunjian Wang 28c1382fa2 net: bridge: Fix ethernet header pointer before check skb forwardable
The skb header should be set to ethernet header before using
is_skb_forwardable. Because the ethernet header length has been
considered in is_skb_forwardable(including dev->hard_header_len
length).

To reproduce the issue:
1, add 2 ports on linux bridge br using following commands:
$ brctl addbr br
$ brctl addif br eth0
$ brctl addif br eth1
2, the MTU of eth0 and eth1 is 1500
3, send a packet(Data 1480, UDP 8, IP 20, Ethernet 14, VLAN 4)
from eth0 to eth1

So the expect result is packet larger than 1500 cannot pass through
eth0 and eth1. But currently, the packet passes through success, it
means eth1's MTU limit doesn't take effect.

Fixes: f6367b4660 ("bridge: use is_skb_forwardable in forward path")
Cc: bridge@lists.linux-foundation.org
Cc: Nkolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 21:55:15 -08:00
Colin Ian King 6394d98df6 sb1000: fix a couple of indentation issues and remove assignment in if statements
There is an if statement and a return statement that are incorrectly
indented. Fix these.  Also replace the assignment-in-if statements
to assignment followed by an if to keep to the coding style.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 21:51:36 -08:00
Jason Wang cc5e710759 vhost: log dirty page correctly
Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.

To solve this issue, when logging with device IOTLB enabled, we will:

1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
   get HVA, for writable descriptor, get HVA through iovec. For used
   ring update, translate its GIOVA to HVA
2) traverse the GPA->HVA mapping to get the possible GPA and log
   through GPA. Pay attention this reverse mapping is not guaranteed
   to be unique, so we should log each possible GPA in this case.

This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA->GPA instead of GIOVA->HVA.

Fixes: 6b1e6cc785 ("vhost: new device IOTLB API")
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Cc: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 21:43:24 -08:00
Jakub Kicinski f655f8b818 Documentation: timestamping: correct path to net_tstamp.h
net_tstamp.h is an UAPI header, so it was moved under include/uapi.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 21:39:59 -08:00
Dave Airlie 9420151d88 Merge branch 'linux-4.21' of git://github.com/skeggsb/linux into drm-fixes
nouveau support for TU102 (RTX 2080 Ti)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <bskeggs@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CABDvA=mQsRr0BpRpv3n6UjthHush4u_kQR3oUGHkBtAHTmyCYw@mail.gmail.com
2019-01-18 15:38:18 +10:00
Linus Torvalds d7393226d1 First 5.0 rc pull request
Not much so far, but I'm feeling like the 2nd PR -rc will be larger than
 this. We have the usual batch of bugs and two fixes to code merged this cycle.
 
 - Restore valgrind support for the ioctl verbs interface merged this window,
   and fix a missed error code on an error path from that conversion
 
 - A user reported crash on obsolete mthca hardware
 
 - pvrdma was using the wrong command opcode toward the hypervisor
 
 - NULL pointer crash regression when dumping rdma-cm over netlink
 
 - Be conservative about exposing the global rkey
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlxBTeMACgkQOG33FX4g
 mxrOIQ//YdZdU9J825DM4ppH/MWRoPgayI+cca5sW2EG/nkgsvFJoiVDDK5/ka1g
 ge5Q21ZLMSPCBR0Iu/e/JOq6fJI4fsbcJGZURbyKgRZqyCBCf6qJbhiZKifpQMVb
 w7RP8kRFRdaiQzkAYfZSv9TP93JLvTDLg6zZ74r4vc8YphIzkI410v568hs6FiVu
 MIcb53pBWUswpCAnBVB+54sw+phJyjd02kmY4xTlWmiEzwHBb0JQ+Kps72/G0IWy
 0vOlDI1UjwqoDfThzyT7mcXqnSbXxg/e8EecMpyFzlorQyxgZ5TsJgQ8ubSYxuiQ
 7+dZ4rsdoZD++3MGtpmqDMQzKSPb989WzJT8WLp5oSw4ryAXeJJ+tys/APLtvPkf
 EgKgVyEqfxMDXn02/ENwDPpZyKLZkhcHFLgvfYmxtlDvtai/rvTLmzV1mptEaxlF
 +2pwSQM4/E/8qrLglN9kdFSfjBMb7Bvd2NYQqZ9vah2omb7gPsaTEEpVw6l/E0NX
 oOxFKPEzb0nP9KmJmwO8KLCvcrruuRL8kpmhc6sQMQJ6z0h4hmZrHF5EZZH92g0p
 maHyrx66vqw/Yl+TLvAb/T6FV1ax5c1TauiNErAjnag2wgVWW42Q7lQzSFLFI8su
 GU8oRlbIclDQ/1bszsf0IShq0r9G17+2n6yyTX39rj62YioiDlI=
 =ymZq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes frfom Jason Gunthorpe:
 "Not much so far. We have the usual batch of bugs and two fixes to code
  merged this cycle:

   - Restore valgrind support for the ioctl verbs interface merged this
     window, and fix a missed error code on an error path from that
     conversion

   - A user reported crash on obsolete mthca hardware

   - pvrdma was using the wrong command opcode toward the hypervisor

   - NULL pointer crash regression when dumping rdma-cm over netlink

   - Be conservative about exposing the global rkey"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT
  RDMA/mthca: Clear QP objects during their allocation
  RDMA/vmw_pvrdma: Return the correct opcode when creating WR
  RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id type
  RDMA/nldev: Don't expose unsafe global rkey to regular user
  RDMA/uverbs: Fix post send success return value in case of error
2019-01-18 17:17:20 +12:00
Linus Torvalds 1092a94fcb drm amdgpu, i915 gvt, sun4i, meson, rockchip, qxl, virtio fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcQRsZAAoJEAx081l5xIa+oKMP/A6jc+SmQgss3TD3WsrIMNUJ
 QoeKh5KQZqPDsz4sSQ27J6UIDKWuX9ZkzhOOgpKLECKK8CK5xpf14cKKwacyfW05
 8E9gB1kq4SRRNQxUvgrh6g3o3DHSanGtZUywMFN56MGfSxPMWytfzmQNDZ5XvAE5
 OrpRW+Nizs8uSrgvfoZoKOVuCaNVkZFxOTYXWwIPSJxmSuxGoX0nTnTdl0lNCSdE
 gSK82TIAxibfkeJ0K1MXCLbYTWXIvuoZY/JCWJ6wDAd21eK8IszsmVmn7Ou6q/sY
 aFNbKlaPuzdVe9MMRScAQLOBaoZSbiaIVA9UXXK/XR12K3Sqb6XU+NiRtrMWE5XF
 7Z8fkPCrfDG8oelcZW1iRRuZyL82I91xh7j+B20X+GMdHs+A+fT2YKLxbJBk1BMT
 3S/FdGfAnMqezgXDpqeoeYXoEsCaYtIls442FVcXSvQdOt7BjGlzXr+FBLFmvums
 4JL0yvcSTgo85N/hcM3FWLlBYVD6D65+fM42wPqyl05FgUTHF+Ev9503EjxsxQF6
 yCU4bsixnhxE21/v/Tw5Vhe3DU+zZqBWNpSocaWCfj7cyl1rDirZPbr/Tr6yuWgx
 mTR/B4tNGFiXsiPdQwmIGRSKWOWz+wDT7B9apKdIyn++hyFz6Rh/IScZFzqso5w8
 dq522vKrWbP5Jl/sA5GX
 =hgg/
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2019-01-18' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "The rc3 fixes are a bit scattered:

   - meson, sun4i and rockchip all had missing of_node_put.

   - qxl and virtio both were advertising dma-buf to userspace when they
     really shouldn't have.

  Otherwise:

  meson:
   - modesetting regression fix

  i915 GVT:
   - one cmd parser failure fix
   - region cleanup fix in vGPU destroy

  amdgpu:
   - KFD fixes for arm64 mixed APU/DGPU
   - vega12 powerplay fix
   - raven DC fixes
   - freesync fix"

* tag 'drm-fixes-2019-01-18' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Detach backlight from stream
  drm/sun4i: backend: add missing of_node_puts
  Revert "drm/amdgpu: validate user pitch alignment"
  Revert "drm/amdgpu: validate user GEM object size"
  drm/meson: Fix atomic mode switching regression
  drm/i915/gvt: Fix mmap range check
  drm/i915/gvt: free VFIO region space in vgpu detach
  drm/amd/display: Fix disabled cursor on top screen edge
  drm/amd/display: fix warning on raven hotplug
  drm/amd/display: fix PME notification not working in RV desktop
  drm/amd/display: Only get the connector state for VRR when toggled
  drm/amd/display: Pack DMCU iRAM alignment
  drm/amd/powerplay: run acg btc for Vega12
  drm/amdkfd: Don't assign dGPUs to APU topology devices
  drm/amdkfd: Allow building KFD on ARM64 (v2)
  drm/meson: add missing of_node_put
  drm/virtio: drop prime import/export callbacks
  drm/qxl: drop prime import/export callbacks
  drm/i915/gvt: Allow F_CMD_ACCESS on mmio 0x21f0
  drm/rockchip: add missing of_node_put
2019-01-18 17:14:02 +12:00
Linus Torvalds 2451f3717c LED fix for 5.0-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQUwxxKyE5l/npt8ARiEGxRG/Sl2wUCXED1agAKCRBiEGxRG/Sl
 2125AP4+hSXiVYvxgQg4zeAHzd00GKAIcYTudrzZ/iX5E19UlQD7B3h7HiTgvNIo
 QOvU+0PChsk/qwg1+Ztw8Gw3WZxl/wM=
 =I9ym
 -----END PGP SIGNATURE-----

Merge tag 'led-fix-for-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds

Pull LED fix from Jacek Anaszewski.

* tag 'led-fix-for-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  leds: lp5523: fix a missing check of return value of lp55xx_read
2019-01-18 16:58:07 +12:00
Linus Torvalds 0a2fbed84a hwmon fixes for v5.0-rc3
Minor fixes/regressions.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcQOvDAAoJEMsfJm/On5mBKZgP/22R1XvbJtr7UYCukAe4auoq
 qnwFClb4CXn2sw0z7nLid4lOHEaw9QRu6QycDH8oUCb4osO/TzHAuRhROfiie7cV
 0l54IdTsQ206qw+zKuYExg0l5rRrdqlGgyRvW22R/hhseMFG7BDMIfL+osT3RY+r
 q/rJyRl2S+XojEvi58zPgEr0bVpxudkvmppXNJl5fdPVeo1lschcNgMPrCrAB+4D
 WhQserDi4biW50SWLMWF/cvmM50IfdYmwMM1fHJi6Win29SkM1cDKAOb4HR4z5K1
 xvRDLnwHTz60nOWSEcvjVH68uJB/bRuQUjolw8hjRUmvU1Iv/PyMvQy8n0XYLsMn
 5AtMwUKFc16kdclvRWljRpBx2TUtthgHEDllpiK0Z/u0INMEYyxJ6sZAEr6VQWTR
 yIQyZHAx1vm3+b51MrLlJDETDaAVKO/bbw/jEDyR9wOGePW79al2HTusBiCjaMbo
 5XdnpvSDv60c52nc72UlI7XqkttAYO9EpJwlwtv9WbZNdKIzlfCgJZumTG81GNo4
 tLsmcUnCPL4W1zUV//g0CEoaUtU3s95mW01sjf9GBCxbAhFtOTg0yNzXfRF0/sYz
 ekMPePOQw5MinCUaN4n8ACPDRkVI5HAvrGT2JfQ85Hgyz+yIwSCjLrsTeOb/LjEs
 qHgyRy6drkUkgzlfdTqy
 =kzWH
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "Minor fixes/regressions"

* tag 'hwmon-for-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (tmp421) Correct the misspelling of the tmp442 compatible attribute in OF device ID table
  hwmon: (occ) Fix potential integer overflow
  hwmon: (lm80) Fix missing unlock on error in set_fan_div()
  hwmon: (nct6775) Enable IO mapping for NCT6797D and NCT6798D
  hwmon: (nct6775) Fix chip ID for NCT6798D
2019-01-18 16:55:49 +12:00
Thomas Gleixner 38197ca176 block: Cleanup license notice
Remove the imprecise and sloppy:

  "This files is licensed under the GPL."

license notice in the top level comment.

1) The file already contains a SPDX license identifier which clearly
   states that the license of the file is GPL V2 only

2) The notice resolves to GPL v1 or later for scanners which is just
   contrary to the intent of SPDX identifiers to provide clear and non
   ambiguous license information. Aside of that the value add of this
   notice is below zero,

Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Matias Bjorling <mb@lightnvm.io>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Fixes: 6a5ac98465 ("block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-17 21:21:40 -07:00
Ben Skeggs 7ebec5f431 drm/nouveau/core: recognise TU102
Would usually do this split-out, verifying each component indivitually, but
this has been squashed together to be more palatable for merging in 5.0-rc.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-01-18 14:18:08 +10:00
Nicolas Dichtel 88a8121dc1 af_packet: fix raw sockets over 6in4 tunnel
Since commit cb9f1b7838, scapy (which uses an AF_PACKET socket in
SOCK_RAW mode) is unable to send a basic icmp packet over a sit tunnel:

Here is a example of the setup:
$ ip link set ntfp2 up
$ ip addr add 10.125.0.1/24 dev ntfp2
$ ip tunnel add tun1 mode sit ttl 64 local 10.125.0.1 remote 10.125.0.2 dev ntfp2
$ ip addr add fd00:cafe:cafe::1/128 dev tun1
$ ip link set dev tun1 up
$ ip route add fd00:200::/64 dev tun1
$ scapy
>>> p = []
>>> p += IPv6(src='fd00💯:1', dst='fd00:200::1')/ICMPv6EchoRequest()
>>> send(p, count=1, inter=0.1)
>>> quit()
$ ip -s link ls dev tun1 | grep -A1 "TX.*errors"
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        1       0       0       0

The problem is that the network offset is set to the hard_header_len of the
output device (tun1, ie 14 + 20) and in our case, because the packet is
small (48 bytes) the pskb_inet_may_pull() fails (it tries to pull 40 bytes
(ipv6 header) starting from the network offset).

This problem is more generally related to device with variable hard header
length. To avoid a too intrusive patch in the current release, a (ugly)
workaround is proposed in this patch. It has to be cleaned up in net-next.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=993675a3100b1
Link: http://patchwork.ozlabs.org/patch/1024489/
Fixes: cb9f1b7838 ("ip: validate header length on virtual device xmit")
CC: Willem de Bruijn <willemb@google.com>
CC: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:54:45 -08:00
Peter Oskolkov 22c2ad616b net: add a route cache full diagnostic message
In some testing scenarios, dst/route cache can fill up so quickly
that even an explicit GC call occasionally fails to clean it up. This leads
to sporadically failing calls to dst_alloc and "network unreachable" errors
to the user, which is confusing.

This patch adds a diagnostic message to make the cause of the failure
easier to determine.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:37:25 -08:00
Ioana Ciocoi Radulescu 68d7431553 dpaa2-eth: Fix ndo_stop routine
In the current implementation, on interface down we disabled NAPI and
then manually drained any remaining ingress frames. This could lead
to a situation when, under heavy traffic, the data availability
notification for some of the channels would not get rearmed correctly.

Change the implementation such that we let all remaining ingress frames
be processed as usual and only disable NAPI once the hardware queues
are empty.

We also add a wait on the Tx side, to allow hardware time to process
all in-flight Tx frames before issueing the disable command.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:37:02 -08:00
Colin Ian King 5191673b69 wan: dscc4: fix various indentation issues
There are some lines that have indentation issues, fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:36:31 -08:00
Yuchung Cheng e224c390a6 bpf: fix SO_MAX_PACING_RATE to support TCP internal pacing
If sch_fq packet scheduler is not used, TCP can fallback to
internal pacing, but this requires sk_pacing_status to
be properly set.

Fixes: 8c4b4c7e9f ("bpf: Add setsockopt helper function to bpf")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-18 00:30:34 +01:00
Peter Oskolkov f4924f24da bpf: bpf_setsockopt: reset sock dst on SO_MARK changes
In sock_setsockopt() (net/core/sock.h), when SO_MARK option is used
to change sk_mark, sk_dst_reset(sk) is called. The same should be
done in bpf_setsockopt().

Fixes: 8c4b4c7e9f ("bpf: Add setsockopt helper function to bpf")
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-18 00:27:47 +01:00
David S. Miller 039d52e15e Merge branch 'vxlan-FDB-veto'
Petr Machata says:

====================
vxlan: Allow vetoing FDB operations

mlxsw does not implement handling of the more advanced types of VXLAN
FDB entries. In order to provide visibility to users, it is important to
be able to reject such FDB entries, ideally with an explanation passed
in extended ack. This patch set implements this.

In patches #1-#4, vxlan is gradually transformed to support vetoing of
FDB entries added (or modified) through vxlan_fdb_update(), and the
default FDB entry added in __vxlan_dev_create().

Patches #5-#7 deal with vxlan_changelink(). The existing code recognizes
that vxlan_fdb_update() may fail, but doesn't attempt to keep things
intact if it does. These patches change the function in several steps to
gracefully handle vetoes (or other failures).

Then in patches #8-#11, extack arguments are added, respectively, to
ndo_fdb_add(), mlxsw's mlxsw_sp_nve_ops.fdb_replay, the functions that
connect to the VXLAN vetoing code, and call_switchdev_notifiers(). Note
that call_switchdev_blocking_notifiers() already does support extack.

Finally in patch #12, mlxsw is extended to add extack messages to
rejected FDB entries. In patch #13, the functionality is tested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:18:47 -08:00
Petr Machata 7e1046fd1f selftests: mlxsw: Test veto of unsupported VXLAN FDBs
mlxsw doesn't implement offloading of all types of FDB entries that the
VXLAN driver supports. Test that such FDB entries are rejected. That
makes sure that the decision made by the existing validation code in
mlxsw propagates up the stack. It also exercises rollback functionality
in VXLAN, and tests that extack is returned.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:18:47 -08:00
Petr Machata a40313d956 mlxsw: spectrum: Add extack messages to VXLAN FDB rejection
Annotate the rejections in mlxsw_sp_switchdev_vxlan_work_prepare() with
textual reasons.

Because this code ends up being invoked for FDB replay as well, drop the
default message from there, so that the more accurate error message is
not overwritten.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:18:47 -08:00
Petr Machata 6685987c29 switchdev: Add extack argument to call_switchdev_notifiers()
A follow-up patch will enable vetoing of FDB entries. Make it possible
to communicate details of why an FDB entry is not acceptable back to the
user.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:18:47 -08:00
Petr Machata 4c59b7d160 vxlan: Add extack to switchdev operations
There are four sources of VXLAN switchdev notifier calls:

- the changelink() link operation, which already supports extack,
- ndo_fdb_add() which got extack support in a previous patch,
- FDB updates due to packet forwarding,
- and vxlan_fdb_replay().

Extend vxlan_fdb_switchdev_call_notifiers() to include extack in the
switchdev message that it sends, and propagate the argument upwards to
the callers. For the first two cases, pass in the extack gotten through
the operation. For case #3, pass in NULL.

To cover the last case, extend vxlan_fdb_replay() to take extack
argument, which might come from whatever operation necessitated the FDB
replay.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17 15:18:47 -08:00