Commit Graph

700045 Commits

Author SHA1 Message Date
Eric Dumazet c1d1b43781 net: convert (struct ubuf_info)->refcnt to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

v2: added the change in drivers/vhost/net.c as spotted
by Willem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 20:22:03 -07:00
Eric Dumazet db5bce32fb net: prepare (struct ubuf_info)->refcnt conversion
In order to convert this atomic_t refcnt to refcount_t,
we need to init the refcount to one to not trigger
a 0 -> 1 transition.

This also removes one atomic operation in fast path.

v2: removed dead code in sock_zerocopy_put_abort()
as suggested by Willem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 20:22:03 -07:00
Florian Fainelli 487234cc19 net: systemport: Correctly set TSB endian for host
Similarly to how we configure the RSB (Receive Status Block) we also
need to set the TSB (Transmit Status Block) based on the host endian.
This was missing from the commit indicated below.

Fixes: 389a06bc53 ("net: systemport: Set correct RSB endian bits based on host")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 20:19:32 -07:00
David S. Miller d49e3a9f5e Merge branch 'inet_diag-TCP-MD5'
Ivan Delalande says:

====================
inet_diag: report TCP MD5 signing keys and addresses

Allow userspace to retrieve MD5 signature keys and addresses configured
on TCP sockets through inet_diag.

Thanks to Eric Dumazet and Stephen Hemminger for their useful
explanations and feedback.

v5: - memset the whole netlink payload after it has been nla_reserve-d
      in tcp_diag_put_md5sig (a third memset had to be added for
      tcpm_key so we might as well have just one for entire region).
    - move the nla_total_size call from inet_sk_attr_size to the
      idiag_get_aux_size defined by protocols as they could add multiple
      netlink attributes,
    - add check for net_admin in tcp_diag_get_aux_size.

v4: - add new struct tcp_diag_md5sig to report the data instead of
      tcp_md5sig to avoid wasting 112 bytes on every tcpm_addr,
    - memset tcpm_addr on IPv4 addresses to avoid leaks,
    - style fix in inet_diag_dump_one_icsk.

v3: - rename inet_diag_*md5sig in tcp_diag.c to tcp_diag_* for
      consistency,
    - don't lock the socket in tcp_diag_put_md5sig,
    - add checks on md5sig_count in tcp_diag_put_md5sig to not create
      the netlink attribute if the list is empty, and to avoid overflows
      or memory leaks if the list has changed in the meantime.

v2: - move changes to tcp_diag.c and extend inet_diag_handler to allow
      protocols to provide additional data on INET_DIAG_INFO,
    - lock socket before calling tcp_diag_put_md5sig.

I also have a patch for iproute2/ss to test this change, making it print
this new attribute. I'm planning to polish and send it if this series
gets applied.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 18:38:09 -07:00
Ivan Delalande c03fa9bcac tcp_diag: report TCP MD5 signing keys and addresses
Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to
processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is
not possible to retrieve these from the kernel once they have been
configured on sockets.

Signed-off-by: Ivan Delalande <colona@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 18:38:09 -07:00
Ivan Delalande b37e88407c inet_diag: allow protocols to provide additional data
Extend inet_diag_handler to allow individual protocols to report
additional data on INET_DIAG_INFO through idiag_get_aux. The size
can be dynamic and is computed by idiag_get_aux_size.

Signed-off-by: Ivan Delalande <colona@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 18:38:09 -07:00
Thomas Meyer 6391c4f67a ipv6: sr: Use ARRAY_SIZE macro
Grepping for "sizeof\(.+\) / sizeof\(" found this as one of the first
candidates.
Maybe a coccinelle can catch all of those.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 18:35:23 -07:00
Colin Ian King 5debc53ffe net: qualcomm: rmnet: remove unused variable priv
priv is being assigned but is never used, so remove it.

Cleans up clang build warning:
"warning: Value stored to 'priv' is never read"

Fixes: ceed73a2cf ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 18:34:36 -07:00
Colin Ian King 33c8182166 net: phy: bcm7xxx: make array bcm7xxx_suspend_cfg static, reduces object code size
Don't populate the array bcm7xxx_suspend_cfg A on the stack, instead
make it static.  Makes the object code smaller by over 300 bytes:

Before:
   text	   data	    bss	    dec	    hex	filename
   6351	   8146	      0	  14497	   38a1	drivers/net/phy/bcm7xxx.o

After:
   text	   data	    bss	    dec	    hex	filename
   5986	   8210	      0	  14196	   3774	drivers/net/phy/bcm7xxx.o

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 18:30:50 -07:00
Colin Ian King d05071ed4a fsl/fman: make arrays port_ids static, reduces object code size
Don't populate the arrays port_ids on the stack, instead make them static.
Makes the object code smaller by over 700 bytes:

Before:
   text	   data	    bss	    dec	    hex	filename
  28785	   5832	    192	  34809	   87f9	fman.o

After:
   text	   data	    bss	    dec	    hex	filename
  27921	   5992	    192	  34105	   8539	fman.o

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 18:21:09 -07:00
David S. Miller 6026e043d0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Three cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 17:42:05 -07:00
Eric Dumazet 4cc5b44b29 inetpeer: fix RCU lookup()
Excess of seafood or something happened while I cooked the commit
adding RB tree to inetpeer.

Of course, RCU rules need to be respected or bad things can happen.

In this particular loop, we need to read *pp once per iteration, not
twice.

Fixes: b145425f26 ("inetpeer: remove AVL implementation in favor of RB tree")
Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 17:33:17 -07:00
Linus Torvalds 54f70f52e3 ARM: SoC fixes for 4.13
A couple of late-arriving fixes before final 4.13:
 
  - A few reverts of DT bindings on Allwinner for their ethernet
    driver. Discussion didn't converge, and since bindings are considered
    ABI it makes sense to revert instead of having to support two bindings
    long-term.
  - A fix to enumerate GPIOs properly on Marvell Armada AP806
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZqfLaAAoJEIwa5zzehBx3magP/Aukku5Lw8s0jyFryZIszZP9
 6fxHz4B6pyCbfSGFTBRvltxxMwthXxfmnI2hkw3ISC40wJKKMmA6Bc3RpFNzZEwx
 Vmm5LAvbizV5KIgZ8O/DnZNfkq2Fx4rOZSSIOyjfW1N4Jzg4V7JFr+vz+vtiHovr
 o9VIeheXOVMLUKtYNP1lCKnoAjMo028qtPprUJ24urjwyd4W7RqGvAUSXZYIHGOi
 wCE8fGK18CEkNadWY+EeYnmEGb2nNtlUsqgYwdJzP6D6I0We+KM3Uu9GNjC+hApj
 BsrYkxq+BZesSrZz5vz3tvCMbwvZe7sOUNkr7MEEvVxRT7PLlN/4WwVo42nxbWQr
 tlN2FTU/fOkfZAoaO9Incn68daDhrcxzms61kVBc7yAt3cBcbzB8rColFF2gQWLo
 vgOP60T+MlSq3HAKcmx979bH0AHxvFEUOer83mDwJpdalltC1ZIrCOwnNk3buXU+
 V23JMDhWSsaWPRLyOFx0Rdp8yk9K/doiVfhXffYIboDG9ip8Yn39R4bzT9Ep6HkA
 NepcuUwbL8SFfTksiOxwxPRq82CN/aPBYh4OmXSlLHLblqYrzFFNcexTp7b3ymsK
 1AIhAuzwsj34sstT7CL+7GIuyvN7mASPo2sw3g4LuA1ICh67vsRJ71P2D5qmcP6u
 C9wDsDytdDiq2Nu9iGv8
 =3QBk
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "A couple of late-arriving fixes before final 4.13:

   - A few reverts of DT bindings on Allwinner for their ethernet
     driver. Discussion didn't converge, and since bindings are
     considered ABI it makes sense to revert instead of having to
     support two bindings long-term.

   - A fix to enumerate GPIOs properly on Marvell Armada AP806"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm64: dts: marvell: fix number of GPIOs in Armada AP806 description
  arm: dts: sunxi: Revert EMAC changes
  arm64: dts: allwinner: Revert EMAC changes
  dt-bindings: net: Revert sun8i dwmac binding
2017-09-01 17:16:40 -07:00
Olof Johansson 6f71a92576 mvebu fixes for 4.13 (part 3)
Fix number of GPIOs in AP806 description for Armada 7K/8K
 -----BEGIN PGP SIGNATURE-----
 
 iIEEABECAEEWIQQYqXDMF3cvSLY+g9cLBhiOFHI71QUCWal5XSMcZ3JlZ29yeS5j
 bGVtZW50QGZyZWUtZWxlY3Ryb25zLmNvbQAKCRALBhiOFHI71W2pAJ90lo/XfZ3O
 y7YizLM864H+4XGl9gCeOfmhngOW/lj1gPAVny35hw/ua00=
 =2qOU
 -----END PGP SIGNATURE-----

Merge tag 'mvebu-fixes-4.13-3' of git://git.infradead.org/linux-mvebu into fixes

mvebu fixes for 4.13 (part 3)

Fix number of GPIOs in AP806 description for Armada 7K/8K

* tag 'mvebu-fixes-4.13-3' of git://git.infradead.org/linux-mvebu:
  arm64: dts: marvell: fix number of GPIOs in Armada AP806 description

Signed-off-by: Olof Johansson <olof@lixom.net>
2017-09-01 16:37:02 -07:00
Linus Torvalds f8c6d7246a Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "The ismt driver had a problem with a rarely used transaction type and
  the designware driver was made even more robust against non standard
  ACPI tables"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: designware: Round down ACPI provided clk to nearest supported clk
  i2c: ismt: Return EMSGSIZE for block reads with bogus length
  i2c: ismt: Don't duplicate the receive length for block reads
2017-09-01 15:03:13 -07:00
Loic Poulain 65bce46298 Bluetooth: make baswap src const
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-09-01 22:49:47 +02:00
Darrick J. Wong d897246df9 fsmap: fix documentation of FMR_OF_LAST
The FMR_OF_LAST flag is set on the last fsmap record being returned for
the dataset requested, contrary to what the header file says.  Fix the
docs to reflect the behavior of all fsmap implementations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2017-09-01 13:08:26 -07:00
Darrick J. Wong 4cc1ee5e65 xfs: simplify the rmap code in xfs_bmse_merge
In Christoph's patch to refactor xfs_bmse_merge, the updated rmap code
does more work than it needs to (because map-extent auto-merges
records).  Remove the unnecessary unmap and save ourselves a deferred
op.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-09-01 13:08:26 -07:00
Eric Sandeen f91fb956f2 xfs: remove unused flags arg from xfs_file_iomap_begin_delay
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:26 -07:00
Amir Goldstein 47c7d0b195 xfs: fix incorrect log_flushed on fsync
When calling into _xfs_log_force{,_lsn}() with a pointer
to log_flushed variable, log_flushed will be set to 1 if:
1. xlog_sync() is called to flush the active log buffer
AND/OR
2. xlog_wait() is called to wait on a syncing log buffers

xfs_file_fsync() checks the value of log_flushed after
_xfs_log_force_lsn() call to optimize away an explicit
PREFLUSH request to the data block device after writing
out all the file's pages to disk.

This optimization is incorrect in the following sequence of events:

 Task A                    Task B
 -------------------------------------------------------
 xfs_file_fsync()
   _xfs_log_force_lsn()
     xlog_sync()
        [submit PREFLUSH]
                           xfs_file_fsync()
                             file_write_and_wait_range()
                               [submit WRITE X]
                               [endio  WRITE X]
                             _xfs_log_force_lsn()
                               xlog_wait()
        [endio  PREFLUSH]

The write X is not guarantied to be on persistent storage
when PREFLUSH request in completed, because write A was submitted
after the PREFLUSH request, but xfs_file_fsync() of task A will
be notified of log_flushed=1 and will skip explicit flush.

If the system crashes after fsync of task A, write X may not be
present on disk after reboot.

This bug was discovered and demonstrated using Josef Bacik's
dm-log-writes target, which can be used to record block io operations
and then replay a subset of these operations onto the target device.
The test goes something like this:
- Use fsx to execute ops of a file and record ops on log device
- Every now and then fsync the file, store md5 of file and mark
  the location in the log
- Then replay log onto device for each mark, mount fs and compare
  md5 of file to stored value

Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <jbacik@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:26 -07:00
Christoph Hellwig 742d842907 xfs: disable per-inode DAX flag
Currently flag switching can be used to easily crash the kernel.  Disable
the per-inode DAX flag until that is sorted out.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:26 -07:00
Christoph Hellwig 8bfadd8d03 xfs: replace xfs_qm_get_rtblks with a direct call to xfs_bmap_count_leaves
Use the existing functionality instead of directly poking into the extent
list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:26 -07:00
Christoph Hellwig e17a5c6f0e xfs: rewrite xfs_bmap_count_leaves using xfs_iext_get_extent
This avoids poking into the internals of the extent list.  Also return
the number of extents as the return value instead of an additional
by reference argument, and make it available to callers outside of
xfs_bmap_util.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:26 -07:00
Christoph Hellwig 4c35445b59 xfs: use xfs_iext_*_extent helpers in xfs_bmap_split_extent_at
This abstracts the function away from details of the low-level extent
list implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:25 -07:00
Christoph Hellwig 4da6b514ea xfs: use xfs_iext_*_extent helpers in xfs_bmap_shift_extents
This abstracts the function away from details of the low-level extent
list implementation.

Note that it seems like the previous implementation of rmap for
the merge case was completely broken, but it no seems appear to
trigger that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:25 -07:00
Christoph Hellwig 05b7c8ab2b xfs: move some code around inside xfs_bmap_shift_extents
For the first right move we need to look up next_fsb.  That means
our last fsb that contains next_fsb must also be the current extent,
so take advantage of that by moving the code around a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 13:08:25 -07:00
Oleg Nesterov 138e4ad67a epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()
The race was introduced by me in commit 971316f050 ("epoll:
ep_unregister_pollwait() can use the freed pwq->whead").  I did not
realize that nothing can protect eventpoll after ep_poll_callback() sets
->whead = NULL, only whead->lock can save us from the race with
ep_free() or ep_remove().

Move ->whead = NULL to the end of ep_poll_callback() and add the
necessary barriers.

TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even
before this patch.

Hopefully this explains use-after-free reported by syzcaller:

	BUG: KASAN: use-after-free in debug_spin_lock_before
	...
	 _raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159
	 ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148

this is spin_lock(eventpoll->lock),

	...
	Freed by task 17774:
	...
	 kfree+0xe8/0x2c0 mm/slub.c:3883
	 ep_free+0x22c/0x2a0 fs/eventpoll.c:865

Fixes: 971316f050 ("epoll: ep_unregister_pollwait() can use the freed pwq->whead")
Reported-by: 范龙飞 <long7573@126.com>
Cc: stable@vger.kernel.org
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-01 13:07:35 -07:00
Linus Torvalds 8cf9f2a29f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix handling of pinned BPF map nodes in hash of maps, from Daniel
    Borkmann.

 2) IPSEC ESP error paths leak memory, from Steffen Klassert.

 3) We need an RCU grace period before freeing fib6_node objects, from
    Wei Wang.

 4) Must check skb_put_padto() return value in HSR driver, from FLorian
    Fainelli.

 5) Fix oops on PHY probe failure in ftgmac100 driver, from Andrew
    Jeffery.

 6) Fix infinite loop in UDP queue when using SO_PEEK_OFF, from Eric
    Dumazet.

 7) Use after free when tcf_chain_destroy() called multiple times, from
    Jiri Pirko.

 8) Fix KSZ DSA tag layer multiple free of SKBS, from Florian Fainelli.

 9) Fix leak of uninitialized memory in sctp_get_sctp_info(),
    inet_diag_msg_sctpladdrs_fill() and inet_diag_msg_sctpaddrs_fill().
    From Stefano Brivio.

10) L2TP tunnel refcount fixes from Guillaume Nault.

11) Don't leak UDP secpath in udp_set_dev_scratch(), from Yossi
    Kauperman.

12) Revert a PHY layer change wrt. handling of PHY_HALTED state in
    phy_stop_machine(), it causes regressions for multiple people. From
    Florian Fainelli.

13) When packets are sent out of br0 we have to clear the
    offload_fwdq_mark value.

14) Several NULL pointer deref fixes in packet schedulers when their
    ->init() routine fails. From Nikolay Aleksandrov.

15) Aquantium devices cannot checksum offload correctly when the packet
    is <= 60 bytes. From Pavel Belous.

16) Fix vnet header access past end of buffer in AF_PACKET, from
    Benjamin Poirier.

17) Double free in probe error paths of nfp driver, from Dan Carpenter.

18) QOS capability not checked properly in DCB init paths of mlx5
    driver, from Huy Nguyen.

19) Fix conflicts between firmware load failure and health_care timer in
    mlx5, also from Huy Nguyen.

20) Fix dangling page pointer when DMA mapping errors occur in mlx5,
    from Eran Ben ELisha.

21) ->ndo_setup_tc() in bnxt_en driver doesn't count rings properly,
    from Michael Chan.

22) Missing MSIX vector free in bnxt_en, also from Michael Chan.

23) Refcount leak in xfrm layer when using sk_policy, from Lorenzo
    Colitti.

24) Fix copy of uninitialized data in qlge driver, from Arnd Bergmann.

25) bpf_setsockopts() erroneously always returns -EINVAL even on
    success. Fix from Yuchung Cheng.

26) tipc_rcv() needs to linearize the SKB before parsing the inner
    headers, from Parthasarathy Bhuvaragan.

27) Fix deadlock between link status updates and link removal in netvsc
    driver, from Stephen Hemminger.

28) Missed locking of page fragment handling in ESP output, from Steffen
    Klassert.

29) Fix refcnt leak in ebpf congestion control code, from Sabrina
    Dubroca.

30) sxgbe_probe_config_dt() doesn't check devm_kzalloc()'s return value,
    from Christophe Jaillet.

31) Fix missing ipv6 rx_dst_cookie update when rx_dst is updated during
    early demux, from Paolo Abeni.

32) Several info leaks in xfrm_user layer, from Mathias Krause.

33) Fix out of bounds read in cxgb4 driver, from Stefano Brivio.

34) Properly propagate obsolete state of route upwards in ipv6 so that
    upper holders like xfrm can see it. From Xin Long.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (118 commits)
  udp: fix secpath leak
  bridge: switchdev: Clear forward mark when transmitting packet
  mlxsw: spectrum: Forbid linking to devices that have uppers
  wl1251: add a missing spin_lock_init()
  Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
  net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278
  kcm: do not attach PF_KCM sockets to avoid deadlock
  sch_tbf: fix two null pointer dereferences on init failure
  sch_sfq: fix null pointer dereference on init failure
  sch_netem: avoid null pointer deref on init failure
  sch_fq_codel: avoid double free on init failure
  sch_cbq: fix null pointer dereferences on init failure
  sch_hfsc: fix null pointer deref and double free on init failure
  sch_hhf: fix null pointer dereference on init failure
  sch_multiq: fix double free on init failure
  sch_htb: fix crash on init failure
  net/mlx5e: Fix CQ moderation mode not set properly
  net/mlx5e: Fix inline header size for small packets
  net/mlx5: E-Switch, Unload the representors in the correct order
  net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address
  ...
2017-09-01 12:49:03 -07:00
Linus Torvalds b8a78bb4d1 ceph fscache page locking fix from Zheng, marked for stable.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJZqYGlAAoJEEp/3jgCEfOLx58H/1jnP79H03/kchVBCGPLCKjs
 E+pgHpb2922EGeYmEUoxfq627SiCODap/jo6JFVpsd+JnmHLZiMzmEzGpDce6fn9
 /YY5u3WNtmnKtyPvl0kzspK0ujaeCuiRyarULXBiHveL2ZQINKus4F9MiZphNnt4
 X4hgo866+esEf6LocuEkMoEGvgN7vk/Q9nDPgD/YoFrhCuwdvLJpBnE65CGbQyk5
 n3g0qlBR+yorDr1stdlSyVUDPkF5FQjhQTqkpi1oPAhsNPKgVPyZzRIEQEA+nI+N
 wTsQ0SMKfST4PNaRNdUuO1xwszYziqqlLZ2KwaaLIDHlElcbQR1S3GUKz6hddJc=
 =oesm
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.13-rc8' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "ceph fscache page locking fix from Zheng, marked for stable"

* tag 'ceph-for-4.13-rc8' of git://github.com/ceph/ceph-client:
  ceph: fix readpage from fscache
2017-09-01 12:46:30 -07:00
Christoph Hellwig f2285c148c xfs: use xfs_iext_get_extent in xfs_bmap_first_unused
Use the bmap abstraction instead of open-coding bmbt details here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig 50bb44c286 xfs: switch xfs_bmap_local_to_extents to use xfs_iext_insert
Use the helper instead of open coding it, to provide a better abstraction
for the scalable extent list work.  This also gets an additional assert
and trace point for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig 67e4e69cb2 xfs: add a xfs_iext_update_extent helper
This helper is used to update an extent record based on the extent index,
and can be used to provide a level of abstractions between callers that
want to modify in-core extent records and the details of the extent list
implementation.

Also switch all users of the xfs_bmbt_set_all(xfs_iext_get_ext(...))
pattern to this new helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig d522d569d6 xfs: consolidate the various page fault handlers
Add a new __xfs_filemap_fault helper that implements all four page fault
callouts, and make these methods themselves small stubs that set the
correct write_fault flag, and exit early for the non-DAX case for the
hugepage related ones.

Also remove the extra size checking in the pfn_fault path, which is now
handled in the core DAX code.

Life would be so much simpler if we only had one method for all this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig e7647fb491 iomap: return VM_FAULT_* codes from iomap_page_mkwrite
All callers will need the VM_FAULT_* flags, so convert in the helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster 2dd3d709fc xfs: relog dirty buffers during swapext bmbt owner change
The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.

Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[darrick: fix the xfs_trans_roll call]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster a5814bceea xfs: disallow marking previously dirty buffers as ordered
Ordered buffers are used in situations where the buffer is not
physically logged but must pass through the transaction/logging
pipeline for a particular transaction. As a result, ordered buffers
are not unpinned and written back until the transaction commits to
the log. Ordered buffers have a strict requirement that the target
buffer must not be currently dirty and resident in the log pipeline
at the time it is marked ordered. If a dirty+ordered buffer is
committed, the buffer is reinserted to the AIL but not physically
relogged at the LSN of the associated checkpoint. The buffer log
item is assigned the LSN of the latest checkpoint and the AIL
effectively releases the previously logged buffer content from the
active log before the buffer has been written back. If the tail
pushes forward and a filesystem crash occurs while in this state, an
inconsistent filesystem could result.

It is currently the caller responsibility to ensure an ordered
buffer is not already dirty from a previous modification. This is
unclear and error prone when not used in situations where it is
guaranteed a buffer has not been previously modified (such as new
metadata allocations).

To facilitate general purpose use of ordered buffers, update
xfs_trans_ordered_buf() to conditionally order the buffer based on
state of the log item and return the status of the result. If the
bli is dirty, do not order the buffer and return false. The caller
must either physically log the buffer (having acquired the
appropriate log reservation) or push it from the AIL to clean it
before it can be marked ordered in the current transaction.

Note that ordered buffers are currently only used in two situations:
1.) inode chunk allocation where previously logged buffers are not
possible and 2.) extent swap which will be updated to handle ordered
buffer failures in a separate patch.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster 6fb10d6d22 xfs: move bmbt owner change to last step of extent swap
The extent swap operation currently resets bmbt block owners before
the inode forks are swapped. The bmbt buffers are marked as ordered
so they do not have to be physically logged in the transaction.

This use of ordered buffers is not safe as bmbt buffers may have
been previously physically logged. The bmbt owner change algorithm
needs to be updated to physically log buffers that are already dirty
when/if they are encountered. This means that an extent swap will
eventually require multiple rolling transactions to handle large
btrees. In addition, all inode related changes must be logged before
the bmbt owner change scan begins and can roll the transaction for
the first time to preserve fs consistency via log recovery.

In preparation for such fixes to the bmbt owner change algorithm,
refactor the bmbt scan out of the extent fork swap code to the last
operation before the transaction is committed. Update
xfs_swap_extent_forks() to only set the inode log flags when an
owner change scan is necessary. Update xfs_swap_extents() to trigger
the owner change based on the inode log flags. Note that since the
owner change now occurs after the extent fork swap, the inode btrees
must be fixed up with the inode number of the current inode (similar
to log recovery).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster 99c794c639 xfs: skip bmbt block ino validation during owner change
Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
owners on v5 (!rmapbt) filesystems. The bmbt scan uses
xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
current owner of the block against the parent inode of the bmbt.
This works during extent swap because the bmbt owners are updated to
the opposite inode number before the inode extent forks are swapped.

The modified bmbt blocks are marked as ordered buffers which allows
everything to commit in a single transaction. If the transaction
commits to the log and the system crashes such that recovery of the
extent swap is required, log recovery restarts the bmbt scan to fix
up any bmbt blocks that may have not been written back before the
crash. The log recovery bmbt scan occurs after the inode forks have
been swapped, however. This causes the bmbt block owner verification
to fail, leads to log recovery failure and requires xfs_repair to
zap the log to recover.

Define a new invalid inode owner flag to inform the btree block
lookup mechanism that the current inode may be invalid with respect
to the current owner of the bmbt block. Set this flag on the cursor
used for change owner scans to allow this operation to work at
runtime and during log recovery.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Fixes: bb3be7e7c ("xfs: check for bogus values in btree block headers")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster 8dc518dfa7 xfs: don't log dirty ranges for ordered buffers
Ordered buffers are attached to transactions and pushed through the
logging infrastructure just like normal buffers with the exception
that they are not actually written to the log. Therefore, we don't
need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
called on ordered buffers to set up all of the dirty state on the
transaction, buffer and log item and prepare the buffer for I/O.

Now that xfs_trans_dirty_buf() is available, call it from
xfs_trans_ordered_buf() so the latter is now mutually exclusive with
xfs_trans_log_buf(). This reflects the implementation of ordered
buffers and helps eliminate confusion over the need to log ranges of
ordered buffers just to set up internal log state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster 9684010d38 xfs: refactor buffer logging into buffer dirtying helper
xfs_trans_log_buf() is responsible for logging the dirty segments of
a buffer along with setting all of the necessary state on the
transaction, buffer, bli, etc., to ensure that the associated items
are marked as dirty and prepared for I/O. We have a couple use cases
that need to to dirty a buffer in a transaction without actually
logging dirty ranges of the buffer.  One existing use case is
ordered buffers, which are currently logged with arbitrary ranges to
accomplish this even though the content of ordered buffers is never
written to the log. Another pending use case is to relog an already
dirty buffer across rolled transactions within the deferred
operations infrastructure. This is required to prevent a held
(XFS_BLI_HOLD) buffer from pinning the tail of the log.

Refactor xfs_trans_log_buf() into a new function that contains all
of the logic responsible to dirty the transaction, lidp, buffer and
bli. This new function can be used in the future for the use cases
outlined above. This patch does not introduce functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster e9385cc6fb xfs: ordered buffer log items are never formatted
Ordered buffers pass through the logging infrastructure without ever
being written to the log. The way this works is that the ordered
buffer status is transferred to the log vector at commit time via
the ->iop_size() callback. In xlog_cil_insert_format_items(),
ordered log vectors bypass ->iop_format() processing altogether.

Therefore it is unnecessary for xfs_buf_item_format() to handle
ordered buffers. Remove the unnecessary logic and assert that an
ordered buffer never reaches this point.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster 6453c65d35 xfs: remove unnecessary dirty bli format check for ordered bufs
xfs_buf_item_unlock() historically checked the dirty state of the
buffer by manually checking the buffer log formats for dirty
segments. The introduction of ordered buffers invalidated this check
because ordered buffers have dirty bli's but no dirty (logged)
segments. The check was updated to accommodate ordered buffers by
looking at the bli state first and considering the blf only if the
bli is clean.

This logic is safe but unnecessary. There is no valid case where the
bli is clean yet the blf has dirty segments. The bli is set dirty
whenever the blf is logged (via xfs_trans_log_buf()) and the blf is
cleared in the only place BLI_DIRTY is cleared (xfs_trans_binval()).

Remove the conditional blf dirty checks and replace with an assert
that should catch any discrepencies between bli and blf dirty
states. Refactor the old blf dirty check into a helper function to
be used by the assert.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Brian Foster a4f6cf6b2b xfs: open-code xfs_buf_item_dirty()
It checks a single flag and has one caller. It probably isn't worth
its own function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig 8ad7c629b1 xfs: remove the ip argument to xfs_defer_finish
And instead require callers to explicitly join the inode using
xfs_defer_ijoin.  Also consolidate the defer error handling in
a few places using a goto label.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig 882d8785fb xfs: rename xfs_defer_join to xfs_defer_ijoin
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Christoph Hellwig 411350df14 xfs: refactor xfs_trans_roll
Split xfs_trans_roll into a low-level helper that just rolls the
actual transaction and a new higher level xfs_trans_roll_inode
that takes care of logging and rejoining the inode.  This gets
rid of the NULL inode case, and allows to simplify the special
cases in the deferred operation code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Omar Sandoval f2e9ad212d xfs: check for race with xfs_reclaim_inode() in xfs_ifree_cluster()
After xfs_ifree_cluster() finds an inode in the radix tree and verifies
that the inode number is what it expected, xfs_reclaim_inode() can swoop
in and free it. xfs_ifree_cluster() will then happily continue working
on the freed inode. Most importantly, it will mark the inode stale,
which will probably be overwritten when the inode slab object is
reallocated, but if it has already been reallocated then we can end up
with an inode spuriously marked stale.

In 8a17d7dded ("xfs: mark reclaimed inodes invalid earlier") we added
a second check to xfs_iflush_cluster() to detect this race, but the
similar RCU lookup in xfs_ifree_cluster() needs the same treatment.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01 10:55:30 -07:00
Darrick J. Wong 799ea9e9c5 xfs: evict all inodes involved with log redo item
When we introduced the bmap redo log items, we set MS_ACTIVE on the
mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
from being truncated prematurely during log recovery.  This also had the
effect of putting linked inodes on the lru instead of evicting them.

Unfortunately, we neglected to find all those unreferenced lru inodes
and evict them after finishing log recovery, which means that we leak
them if anything goes wrong in the rest of xfs_mountfs, because the lru
is only cleaned out on unmount.

Therefore, evict unreferenced inodes in the lru list immediately
after clearing MS_ACTIVE.

Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: viro@ZenIV.linux.org.uk
Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-09-01 10:55:30 -07:00
Linus Torvalds 3e1d79c811 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "Just a couple drivers fixes (Synaptics PS/2, Xpad)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: xpad - fix PowerA init quirk for some gamepad models
  Input: synaptics - fix device info appearing different on reconnect
2017-09-01 10:43:37 -07:00
Willem de Bruijn bbd9644e84 selftests: correct define in msg_zerocopy.c
The msg_zerocopy test defines SO_ZEROCOPY if necessary, but its value
is inconsistent with the one in asm-generic.h. Correct that.

Also convert one error to a warning. When the test is complete, report
throughput and close cleanly even if the process did not wait for all
completions.

Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 10:41:21 -07:00