Commit Graph

847547 Commits

Author SHA1 Message Date
Linus Torvalds c83b5d321b Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
 "Two kbuild enhancements by Masahiro Yamada"

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Remove redundant 'clean-files += capflags.c'
  x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c
2019-07-08 17:24:44 -07:00
Linus Torvalds a1aab6f3d2 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "Most of the changes relate to Peter Zijlstra's cleanup of ptregs
  handling, in particular the i386 part is now much simplified and
  standardized - no more partial ptregs stack frames via the esp/ss
  oddity. This simplifies ftrace, kprobes, the unwinder, ptrace, kdump
  and kgdb.

  There's also a CR4 hardening enhancements by Kees Cook, to make the
  generic platform functions such as native_write_cr4() less useful as
  ROP gadgets that disable SMEP/SMAP. Also protect the WP bit of CR0
  against similar attacks.

  The rest is smaller cleanups/fixes"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternatives: Add int3_emulate_call() selftest
  x86/stackframe/32: Allow int3_emulate_push()
  x86/stackframe/32: Provide consistent pt_regs
  x86/stackframe, x86/ftrace: Add pt_regs frame annotations
  x86/stackframe, x86/kprobes: Fix frame pointer annotations
  x86/stackframe: Move ENCODE_FRAME_POINTER to asm/frame.h
  x86/entry/32: Clean up return from interrupt preemption path
  x86/asm: Pin sensitive CR0 bits
  x86/asm: Pin sensitive CR4 bits
  Documentation/x86: Fix path to entry_32.S
  x86/asm: Remove unused TASK_TI_flags from asm-offsets.c
2019-07-08 16:59:34 -07:00
Ilya Maximets bf0bdd1343 xdp: fix race on generic receive path
Unlike driver mode, generic xdp receive could be triggered
by different threads on different CPU cores at the same time
leading to the fill and rx queue breakage. For example, this
could happen while sending packets from two processes to the
first interface of veth pair while the second part of it is
open with AF_XDP socket.

Need to take a lock for each generic receive to avoid race.

Fixes: c497176cb2 ("xsk: add Rx receive functions and poll support")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-09 01:43:26 +02:00
Linus Torvalds dad1c12ed8 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - Remove the unused per rq load array and all its infrastructure, by
   Dietmar Eggemann.

 - Add utilization clamping support by Patrick Bellasi. This is a
   refinement of the energy aware scheduling framework with support for
   boosting of interactive and capping of background workloads: to make
   sure critical GUI threads get maximum frequency ASAP, and to make
   sure background processing doesn't unnecessarily move to cpufreq
   governor to higher frequencies and less energy efficient CPU modes.

 - Add the bare minimum of tracepoints required for LISA EAS regression
   testing, by Qais Yousef - which allows automated testing of various
   power management features, including energy aware scheduling.

 - Restructure the former tsk_nr_cpus_allowed() facility that the -rt
   kernel used to modify the scheduler's CPU affinity logic such as
   migrate_disable() - introduce the task->cpus_ptr value instead of
   taking the address of &task->cpus_allowed directly - by Sebastian
   Andrzej Siewior.

 - Misc optimizations, fixes, cleanups and small enhancements - see the
   Git log for details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  sched/uclamp: Add uclamp support to energy_compute()
  sched/uclamp: Add uclamp_util_with()
  sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
  sched/uclamp: Set default clamps for RT tasks
  sched/uclamp: Reset uclamp values on RESET_ON_FORK
  sched/uclamp: Extend sched_setattr() to support utilization clamping
  sched/core: Allow sched_setattr() to use the current policy
  sched/uclamp: Add system default clamps
  sched/uclamp: Enforce last task's UCLAMP_MAX
  sched/uclamp: Add bucket local max tracking
  sched/uclamp: Add CPU's clamp buckets refcounting
  sched/fair: Rename weighted_cpuload() to cpu_runnable_load()
  sched/debug: Export the newly added tracepoints
  sched/debug: Add sched_overutilized tracepoint
  sched/debug: Add new tracepoint to track PELT at se level
  sched/debug: Add new tracepoints to track PELT at rq level
  sched/debug: Add a new sched_trace_*() helper functions
  sched/autogroup: Make autogroup_path() always available
  sched/wait: Deduplicate code with do-while
  sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
  ...
2019-07-08 16:39:53 -07:00
David S. Miller 7650b1a9bd Merge branch 'mp-inner-L3'
Stephen Suryaputra says:

====================
net: Multipath hashing on inner L3

This series extends commit 363887a2cd ("ipv4: Support multipath
hashing on inner IP pkts for GRE tunnel") to include support when the
outer L3 is IPv6 and to consider the case where the inner L3 is
different version from the outer L3, such as IPv6 tunneled by IPv4 GRE
or vice versa. It also includes kselftest scripts to test the use cases.

v2: Clarify the commit messages in the commits in this series to use the
    term tunneled by IPv4 GRE or by IPv6 GRE so that it's clear which
    one is the inner and which one is the outer (per David Miller).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:37:30 -07:00
Stephen Suryaputra 2800f24854 selftests: forwarding: Test multipath hashing on inner IP pkts for GRE tunnel
Add selftest scripts for multipath hashing on inner IP pkts when there
is a single GRE tunnel but there are multiple underlay routes to reach
the other end of the tunnel.

Four cases are covered in these scripts:
    - IPv4 inner, IPv4 outer
    - IPv6 inner, IPv4 outer
    - IPv4 inner, IPv6 outer
    - IPv6 inner, IPv6 outer

Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:37:29 -07:00
Stephen Suryaputra d8f74f0975 ipv6: Support multipath hashing on inner IP pkts
Make the same support as commit 363887a2cd ("ipv4: Support multipath
hashing on inner IP pkts for GRE tunnel") for outer IPv6. The hashing
considers both IPv4 and IPv6 pkts when they are tunneled by IPv6 GRE.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:37:29 -07:00
Stephen Suryaputra 828b2b4421 ipv4: Multipath hashing on inner L3 needs to consider inner IPv6 pkts
Commit 363887a2cd ("ipv4: Support multipath hashing on inner IP pkts
for GRE tunnel") supports multipath policy value of 2, Layer 3 or inner
Layer 3 if present, but it only considers inner IPv4. There is a use
case of IPv6 is tunneled by IPv4 GRE, thus add the ability to hash on
inner IPv6 addresses.

Fixes: 363887a2cd ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel")
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:37:29 -07:00
Wen Yang faf5577f24 net: pasemi: fix an use-after-free in pasemi_mac_phy_init()
The phy_dn variable is still being used in of_phy_connect() after the
of_node_put() call, which may result in use-after-free.

Fixes: 1dd2d06c04 ("net: Rework pasemi_mac driver to use of_mdio infrastructure")
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:33:02 -07:00
Linus Torvalds 090bc5a2a9 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "Boris is on vacation so I'm sending the RAS bits this time. The main
  changes were:

   - Various RAS/CEC improvements and fixes by Borislav Petkov:
       - error insertion fixes
       - offlining latency fix
       - memory leak fix
       - additional sanity checks
       - cleanups
       - debug output improvements

   - More SMCA enhancements by Yazen Ghannam:
       - make banks truly per-CPU which they are in the hardware
       - don't over-cache certain registers
       - make the number of MCA banks per-CPU variable

     The long term goal with these changes is to support future
     heterogenous SMCA extensions.

   - Misc fixes and improvements"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Do not check return value of debugfs_create functions
  x86/MCE: Determine MCA banks' init state properly
  x86/MCE: Make the number of MCA banks a per-CPU variable
  x86/MCE/AMD: Don't cache block addresses on SMCA systems
  x86/MCE: Make mce_banks a per-CPU array
  x86/MCE: Make struct mce_banks[] static
  RAS/CEC: Add copyright
  RAS/CEC: Add CONFIG_RAS_CEC_DEBUG and move CEC debug features there
  RAS/CEC: Dump the different array element sections
  RAS/CEC: Rename count_threshold to action_threshold
  RAS/CEC: Sanity-check array on every insertion
  RAS/CEC: Fix potential memory leak
  RAS/CEC: Do not set decay value on error
  RAS/CEC: Check count_threshold unconditionally
  RAS/CEC: Fix pfn insertion
2019-07-08 16:31:06 -07:00
Wen Yang ef86ea982b net: axienet: fix a potential double free in axienet_probe()
There is a possible use-after-free issue in the axienet_probe():

1701:	np = of_parse_phandle(pdev->dev.of_node, "axistream-connected", 0);
1702:   if (np) {
...
1787:		of_node_put(np); ---> released here
1788:		lp->eth_irq = platform_get_irq(pdev, 0);
1789:	} else {
...
1801:	}
1802:	if (IS_ERR(lp->dma_regs)) {
...
1805:		of_node_put(np); ---> double released here
1806:		goto free_netdev;
1807:	}

We solve this problem by removing the unnecessary of_node_put().

Fixes: 28ef9ebdb6 ("net: axienet: make use of axistream-connected attribute optional")
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Anirudha Sarangi <anirudh@xilinx.com>
Cc: John Linn <John.Linn@xilinx.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Robert Hancock <hancock@sedsystems.ca>
Cc: netdev@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:28:32 -07:00
Denis Efremov a57caf8c52 sunrpc/cache: remove the exporting of cache_seq_next
The function cache_seq_next is declared static and marked
EXPORT_SYMBOL_GPL, which is at best an odd combination. Because the
function is not used outside of the net/sunrpc/cache.c file it is
defined in, this commit removes the EXPORT_SYMBOL_GPL() marking.

Fixes: d48cf356a1 ("SUNRPC: Remove non-RCU protected lookup")
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-08 19:23:17 -04:00
Linus Torvalds e192832869 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - rwsem scalability improvements, phase #2, by Waiman Long, which are
     rather impressive:

       "On a 2-socket 40-core 80-thread Skylake system with 40 reader
        and writer locking threads, the min/mean/max locking operations
        done in a 5-second testing window before the patchset were:

         40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
         40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255

        After the patchset, they became:

         40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
         40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"

     There's a lot of changes to the locking implementation that makes
     it similar to qrwlock, including owner handoff for more fair
     locking.

     Another microbenchmark shows how across the spectrum the
     improvements are:

       "With a locking microbenchmark running on 5.1 based kernel, the
        total locking rates (in kops/s) on a 2-socket Skylake system
        with equal numbers of readers and writers (mixed) before and
        after this patchset were:

        # of Threads   Before Patch      After Patch
        ------------   ------------      -----------
             2            2,618             4,193
             4            1,202             3,726
             8              802             3,622
            16              729             3,359
            32              319             2,826
            64              102             2,744"

     The changes are extensive and the patch-set has been through
     several iterations addressing various locking workloads. There
     might be more regressions, but unless they are pathological I
     believe we want to use this new implementation as the baseline
     going forward.

   - jump-label optimizations by Daniel Bristot de Oliveira: the primary
     motivation was to remove IPI disturbance of isolated RT-workload
     CPUs, which resulted in the implementation of batched jump-label
     updates. Beyond the improvement of the real-time characteristics
     kernel, in one test this patchset improved static key update
     overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
     as well.

   - atomic64_t cross-arch type cleanups by Mark Rutland: over the last
     ~10 years of atomic64_t existence the various types used by the
     APIs only had to be self-consistent within each architecture -
     which means they became wildly inconsistent across architectures.
     Mark puts and end to this by reworking all the atomic64
     implementations to use 's64' as the base type for atomic64_t, and
     to ensure that this type is consistently used for parameters and
     return values in the API, avoiding further problems in this area.

   - A large set of small improvements to lockdep by Yuyang Du: type
     cleanups, output cleanups, function return type and othr cleanups
     all around the place.

   - A set of percpu ops cleanups and fixes by Peter Zijlstra.

   - Misc other changes - please see the Git log for more details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
  locking/lockdep: increase size of counters for lockdep statistics
  locking/atomics: Use sed(1) instead of non-standard head(1) option
  locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
  x86/jump_label: Make tp_vec_nr static
  x86/percpu: Optimize raw_cpu_xchg()
  x86/percpu, sched/fair: Avoid local_clock()
  x86/percpu, x86/irq: Relax {set,get}_irq_regs()
  x86/percpu: Relax smp_processor_id()
  x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
  locking/rwsem: Guard against making count negative
  locking/rwsem: Adaptive disabling of reader optimistic spinning
  locking/rwsem: Enable time-based spinning on reader-owned rwsem
  locking/rwsem: Make rwsem->owner an atomic_long_t
  locking/rwsem: Enable readers spinning on writer
  locking/rwsem: Clarify usage of owner's nonspinaable bit
  locking/rwsem: Wake up almost all readers in wait queue
  locking/rwsem: More optimal RT task handling of null owner
  locking/rwsem: Always release wait_lock before waking up tasks
  locking/rwsem: Implement lock handoff to prevent lock starvation
  locking/rwsem: Make rwsem_spin_on_owner() return owner state
  ...
2019-07-08 16:12:03 -07:00
Ilya Leoshkevich bc2d8afecb selftests/bpf: fix test_reuseport_array on s390
Fix endianness issue: passing a pointer to 64-bit fd as a 32-bit key
does not work on big-endian architectures. So cast fd to 32-bits when
necessary.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-09 01:10:23 +02:00
Kweh Hock Leong d4117d63a3 net: stmmac: enable clause 45 mdio support
DWMAC4 is capable to support clause 45 mdio communication.
This patch enable the feature on stmmac_mdio_write() and
stmmac_mdio_read() by following phy_write_mmd() and
phy_read_mmd() mdiobus read write implementation format.

Reviewed-by: Li, Yifan <yifan2.li@intel.com>
Signed-off-by: Kweh Hock Leong <hock.leong.kweh@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 16:08:55 -07:00
Taehee Yoo 44e3725943 net: openvswitch: use netif_ovs_is_port() instead of opencode
Use netif_ovs_is_port() function instead of open code.
This patch doesn't change logic.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:53:25 -07:00
Jesper Dangaard Brouer f714ecc9cf MAINTAINERS: Add page_pool maintainer entry
In this release cycle the number of NIC drivers using page_pool
will likely reach 4 drivers.  It is about time to add a maintainer
entry.  Add myself and Ilias.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:51:00 -07:00
David S. Miller 11aef3c6da Merge branch 'mvpp2-cls-ether'
Maxime Chevallier says:

====================
net: mvpp2: Add classification based on the ETHER flow

This series adds support for classification of the ETHER flow in the
mvpp2 driver.

The first patch allows detecting when a user specifies a flow_type that
isn't supported by the driver, while the second adds support for this
flow_type by adding the mapping between the ETHER_FLOW enum value and
the relevant classifier flow entries.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:50:06 -07:00
Maxime Chevallier f406324e50 net: mvpp2: cls: Add support for ETHER_FLOW
Users can specify classification actions based on the 'ether' flow type.
In that case, this will apply to all ethernet traffic, superseeding
flows such as 'udp4' or 'tcp6'.

Add support for this flow type in the PPv2 classifier, by mapping the
ETHER_FLOW value to the corresponding entries in the classifier.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:50:06 -07:00
Maxime Chevallier f4f1ba1819 net: mvpp2: cls: Report an error for unsupported flow types
Add a missing check to detect flow types that we don't support, so that
user can be informed of this.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:50:06 -07:00
Linus Torvalds 46f1ec23a4 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The changes in this cycle are:

   - RCU flavor consolidation cleanups and optmizations

   - Documentation updates

   - Miscellaneous fixes

   - SRCU updates

   - RCU-sync flavor consolidation

   - Torture-test updates

   - Linux-kernel memory-consistency-model updates, most notably the
     addition of plain C-language accesses"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
  tools/memory-model: Improve data-race detection
  tools/memory-model: Change definition of rcu-fence
  tools/memory-model: Expand definition of barrier
  tools/memory-model: Do not use "herd" to refer to "herd7"
  tools/memory-model: Fix comment in MP+poonceonces.litmus
  Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic()
  rcu: Don't return a value from rcu_assign_pointer()
  rcu: Force inlining of rcu_read_lock()
  rcu: Fix irritating whitespace error in rcu_assign_pointer()
  rcu: Upgrade sync_exp_work_done() to smp_mb()
  rcutorture: Upper case solves the case of the vanishing NULL pointer
  torture: Suppress propagating trace_printk() warning
  rcutorture: Dump trace buffer for callback pipe drain failures
  torture: Add --trust-make to suppress "make clean"
  torture: Make --cpus override idleness calculations
  torture: Run kernel build in source directory
  torture: Add function graph-tracing cheat sheet
  torture: Capture qemu output
  rcutorture: Tweak kvm options
  rcutorture: Add trivial RCU implementation
  ...
2019-07-08 15:45:14 -07:00
Frank de Brabander cecaa76b29 selftests: txring_overwrite: fix incorrect test of mmap() return value
If mmap() fails it returns MAP_FAILED, which is defined as ((void *) -1).
The current if-statement incorrectly tests if *ring is NULL.

Fixes: 358be65640 ("selftests/net: add txring_overwrite")
Signed-off-by: Frank de Brabander <debrabander@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:39:38 -07:00
David S. Miller 3f4957eb6c Merge branch 'vsock-virtio-fixes'
Stefano Garzarella says:

====================
vsock/virtio: several fixes in the .probe() and .remove()

During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
before registering the driver", Stefan pointed out some possible issues
in the .probe() and .remove() callbacks of the virtio-vsock driver.

This series tries to solve these issues:
- Patch 1 adds RCU critical sections to avoid use-after-free of
  'the_virtio_vsock' pointer.
- Patch 2 stops workers before to call vdev->config->reset(vdev) to
  be sure that no one is accessing the device.
- Patch 3 moves the works flush at the end of the .remove() to avoid
  use-after-free of 'vsock' object.

v3:
- Patch 1: use rcu_dereference_protected() to get the_virtio_vosck value in
           the virtio_vsock_probe() [Jason]

v2: https://patchwork.kernel.org/cover/11022343/

v1: https://patchwork.kernel.org/cover/10964733/

Before this series the guest crashes in a few second. After this series the
test runs (~12h) without issues.
Tested on an SMP guest (-smp 4 -monitor tcp:127.0.0.1:1234,server,nowait)
with these scripts to stress the .probe()/.remove() path:

- guest
  while true; do
      cat /dev/urandom | nc-vsock -l 4321 > /dev/null &
      cat /dev/urandom | nc-vsock -l 5321 > /dev/null &
      cat /dev/urandom | nc-vsock -l 6321 > /dev/null &
      cat /dev/urandom | nc-vsock -l 7321 > /dev/null &
      wait
  done

- host
  while true; do
      cat /dev/urandom | nc-vsock 3 4321 > /dev/null &
      cat /dev/urandom | nc-vsock 3 5321 > /dev/null &
      cat /dev/urandom | nc-vsock 3 6321 > /dev/null &
      cat /dev/urandom | nc-vsock 3 7321 > /dev/null &
      sleep 2
      echo "device_del v1" | nc 127.0.0.1 1234
      sleep 1
      echo "device_add vhost-vsock-pci,id=v1,guest-cid=3" | nc 127.0.0.1 1234
      sleep 1
  done
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:35:17 -07:00
Stefano Garzarella e226121fcc vsock/virtio: fix flush of works during the .remove()
This patch moves the flush of works after vdev->config->del_vqs(vdev),
because we need to be sure that no workers run before to free the
'vsock' object.

Since we stopped the workers using the [tx|rx|event]_run flags,
we are sure no one is accessing the device while we are calling
vdev->config->reset(vdev), so we can safely move the workers' flush.

Before the vdev->config->del_vqs(vdev), workers can be scheduled
by VQ callbacks, so we must flush them after del_vqs(), to avoid
use-after-free of 'vsock' object.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:35:17 -07:00
Stefano Garzarella b917507e5a vsock/virtio: stop workers during the .remove()
Before to call vdev->config->reset(vdev) we need to be sure that
no one is accessing the device, for this reason, we add new variables
in the struct virtio_vsock to stop the workers during the .remove().

This patch also add few comments before vdev->config->reset(vdev)
and vdev->config->del_vqs(vdev).

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:35:17 -07:00
Stefano Garzarella 0deab087b1 vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock
Some callbacks used by the upper layers can run while we are in the
.remove(). A potential use-after-free can happen, because we free
the_virtio_vsock without knowing if the callbacks are over or not.

To solve this issue we move the assignment of the_virtio_vsock at the
end of .probe(), when we finished all the initialization, and at the
beginning of .remove(), before to release resources.
For the same reason, we do the same also for the vdev->priv.

We use RCU to be sure that all callbacks that use the_virtio_vsock
ended before freeing it. This is not required for callbacks that
use vdev->priv, because after the vdev->config->del_vqs() we are sure
that they are ended and will no longer be invoked.

We also take the mutex during the .remove() to avoid that .probe() can
run while we are resetting the device.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:35:17 -07:00
David S. Miller 1a2d405c00 Merge branch 'b53-docs'
Benedikt Spranger says:

====================
Document the configuration of b53

this is the third round to document the configuration of a b53 supported
switch.

v3..v2:
- fix a typo
- improve b53 configuration in DSA_TAG_PROTO_NONE showcase.
- grade up from RFC to patch for mainline inclusion.

v1..v2:
- split out generic parts of the configuration.
- target comments by Andrew Lunn and Florian Fainelli.
- make changes visible to build system
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:30:13 -07:00
Benedikt Spranger ff2d339375 Documentation: net: dsa: b53: Describe b53 configuration
Document the different needs of documentation for the b53 driver.

Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:30:13 -07:00
Benedikt Spranger 58dd7a8d9d Documentation: net: dsa: Describe DSA switch configuration
Document DSA tagged and VLAN based switch configuration by showcases.

Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:30:13 -07:00
Wei Yongjun 31d166642c nfp: tls: fix error return code in nfp_net_tls_add()
Fix to return negative error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 1f35a56cf5 ("nfp: tls: add/delete TLS TX connections")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:27:33 -07:00
David S. Miller 107d3ce601 Merge branch 'bnxt_en-XDP_REDIRECT'
Michael Chan says:

====================
bnxt_en: Add XDP_REDIRECT support.

This patch series adds XDP_REDIRECT support by Andy Gospodarek.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:25 -07:00
Andy Gospodarek 322b87ca55 bnxt_en: add page_pool support
This removes contention over page allocation for XDP_REDIRECT actions by
adding page_pool support per queue for the driver.  The performance for
XDP_REDIRECT actions scales linearly with the number of cores performing
redirect actions when using the page pools instead of the standard page
allocator.

v2: Fix up the error path from XDP registration, noted by Ilias Apalodimas.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:25 -07:00
Andy Gospodarek f18c2b77b2 bnxt_en: optimized XDP_REDIRECT support
This adds basic support for XDP_REDIRECT in the bnxt_en driver.  Next
patch adds the more optimized page pool support.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:24 -07:00
Michael Chan c1ba92a86d bnxt_en: Refactor __bnxt_xmit_xdp().
__bnxt_xmit_xdp() is used by XDP_TX and ethtool loopback packet transmit.
Refactor it so that it can be re-used by the XDP_REDIRECT logic.
Restructure the TX interrupt handler logic to cleanly separate XDP_TX
logic in preparation for XDP_REDIRECT.

Acked-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:24 -07:00
Andy Gospodarek 52c0609258 bnxt_en: rename some xdp functions
Renaming bnxt_xmit_xdp to __bnxt_xmit_xdp to get ready for XDP_REDIRECT
support and reduce confusion/namespace collision.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:24 -07:00
David S. Miller aa6be2b95d Merge branch 'cpsw-Add-XDP-support'
Ivan Khoronzhuk says:

====================
net: ethernet: ti: cpsw: Add XDP support

This patchset adds XDP support for TI cpsw driver and base it on
page_pool allocator. It was verified on af_xdp socket drop,
af_xdp l2f, ebpf XDP_DROP, XDP_REDIRECT, XDP_PASS, XDP_TX.

It was verified with following configs enabled:
CONFIG_JIT=y
CONFIG_BPFILTER=y
CONFIG_BPF_SYSCALL=y
CONFIG_XDP_SOCKETS=y
CONFIG_BPF_EVENTS=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_JIT=y
CONFIG_CGROUP_BPF=y

Link on previous v7:
https://lkml.org/lkml/2019/7/4/715

Also regular tests with iperf2 were done in order to verify impact on
regular netstack performance, compared with base commit:
https://pastebin.com/JSMT0iZ4

v8..v9:
- fix warnings on arm64 caused by typos in type casting

v7..v8:
- corrected dma calculation based on headroom instead of hard start
- minor comment changes

v6..v7:
- rolled back to v4 solution but with small modification
- picked up patch:
  https://www.spinics.net/lists/netdev/msg583145.html
- added changes related to netsec fix and cpsw

v5..v6:
- do changes that is rx_dev while redirect/flush cycle is kept the same
- dropped net: ethernet: ti: davinci_cpdma: return handler status
- other changes desc in patches

v4..v5:
- added two plreliminary patches:
  net: ethernet: ti: davinci_cpdma: allow desc split while down
  net: ethernet: ti: cpsw_ethtool: allow res split while down
- added xdp alocator refcnt on xdp level, avoiding page pool refcnt
- moved flush status as separate argument for cpdma_chan_process
- reworked cpsw code according to last changes to allocator
- added missed statistic counter

v3..v4:
- added page pool user counter
- use same pool for ndevs in dual mac
- restructured page pool create/destroy according to the last changes in API

v2..v3:
- each rxq and ndev has its own page pool

v1..v2:
- combined xdp_xmit functions
- used page allocation w/o refcnt juggle
- unmapped page for skb netstack
- moved rxq/page pool allocation to open/close pair
- added several preliminary patches:
  net: page_pool: add helper function to retrieve dma addresses
  net: page_pool: add helper function to unmap dma addresses
  net: ethernet: ti: cpsw: use cpsw as drv data
  net: ethernet: ti: cpsw_ethtool: simplify slave loops
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk 9ed4050c0d net: ethernet: ti: cpsw: add XDP support
Add XDP support based on rx page_pool allocator, one frame per page.
Page pool allocator is used with assumption that only one rx_handler
is running simultaneously. DMA map/unmap is reused from page pool
despite there is no need to map whole page.

Due to specific of cpsw, the same TX/RX handler can be used by 2
network devices, so special fields in buffer are added to identify
an interface the frame is destined to. Thus XDP works for both
interfaces, that allows to test xdp redirect between two interfaces
easily. Also, each rx queue have own page pools, but common for both
netdevs.

XDP prog is common for all channels till appropriate changes are added
in XDP infrastructure. Also, once page_pool recycling becomes part of
skb netstack some simplifications can be added, like removing
page_pool_release_page() before skb receive.

In order to keep rx_dev while redirect, that can be somehow used in
future, do flush in rx_handler, that allows to keep rx dev the same
while redirect. It allows to conform with tracing rx_dev pointed
by Jesper.

Also, there is probability, that XDP generic code can be extended to
support multi ndev drivers like this one, using same rx queue for
several ndevs, based on switchdev for instance or else. In this case,
driver can be modified like exposed here:
https://lkml.org/lkml/2019/7/3/243

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk 608ef6202f net: ethernet: ti: cpsw_ethtool: allow res split while down
That's possible to set channel num while interfaces are down. When
interface gets up it should resplit budget. This resplit can happen
after phy is up but only if speed is changed, so should be set before
this, for this allow it to happen while changing number of channels,
when interfaces are down.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk 962fb61890 net: ethernet: ti: davinci_cpdma: allow desc split while down
That's possible to set ring params while interfaces are down. When
interface gets up it uses number of descs to fill rx queue and on
later on changes to create rx pools. Usually, this resplit can happen
after phy is up, but it can be needed before this, so allow it to
happen while setting number of rx descs, when interfaces are down.
Also, if no dependency on intf state, move it to cpdma layer, where
it should be.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk 6670acacd5 net: ethernet: ti: davinci_cpdma: add dma mapped submit
In case if dma mapped packet needs to be sent, like with XDP
page pool, the "mapped" submit can be used. This patch adds dma
mapped submit based on regular one.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk 1da4bbeffe net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
Jesper recently removed page_pool_destroy() (from driver invocation)
and moved shutdown and free of page_pool into xdp_rxq_info_unreg(),
in-order to handle in-flight packets/pages. This created an asymmetry
in drivers create/destroy pairs.

This patch reintroduce page_pool_destroy and add page_pool user
refcnt. This serves the purpose to simplify drivers error handling as
driver now drivers always calls page_pool_destroy() and don't need to
track if xdp_rxq_info_reg_mem_model() was unsuccessful.

This could be used for a special cases where a single RX-queue (with a
single page_pool) provides packets for two net_device'es, and thus
needs to register the same page_pool twice with two xdp_rxq_info
structures.

This patch is primarily to ease API usage for drivers. The recently
merged netsec driver, actually have a bug in this area, which is
solved by this API change.

This patch is a modified version of Ivan Khoronzhuk's original patch.

Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/
Fixes: 5c67bf0ec4 ("net: netsec: Use page_pool API")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Mauro Carvalho Chehab 454f96f2b7 docs: automarkup.py: ignore exceptions when seeking for xrefs
When using the automarkup extension with:
	make pdfdocs

without passing an specific book, the code will raise an exception:

	  File "/devel/v4l/docs/Documentation/sphinx/automarkup.py", line 86, in auto_markup
	    node.parent.replace(node, markup_funcs(name, app, node))
	  File "/devel/v4l/docs/Documentation/sphinx/automarkup.py", line 59, in markup_funcs
	    'function', target, pxref, lit_text)
	  File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/domains/c.py", line 308, in resolve_xref
	    contnode, target)
	  File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/util/nodes.py", line 450, in make_refnode
	    '#' + targetid)
	  File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/builders/latex/__init__.py", line 159, in get_relative_uri
	    return self.get_target_uri(to, typ)
	  File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/builders/latex/__init__.py", line 152, in get_target_uri
	    raise NoUri
	sphinx.environment.NoUri

This happens because not all references will belong to a single
PDF/LaTeX document.

Better to just ignore those than breaking Sphinx build.

Fixes: d74b0d31dd ("Docs: An initial automarkup extension for sphinx")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
[jc: Narrowed the "except" and tweaked the comment]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-08 14:35:47 -06:00
Matthew Wilcox (Oracle) 66f2a122c6 docs: Move binderfs to admin-guide
The documentation is more appropriate for the administrator than for
the internal kernel API section it is currently in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-08 14:15:36 -06:00
Yang Wei dd006fc434 nfc: fix potential illegal memory access
The frags_q is not properly initialized, it may result in illegal memory
access when conn_info is NULL.
The "goto free_exit" should be replaced by "goto exit".

Signed-off-by: Yang Wei <albin_yang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:46:24 -07:00
Arnd Bergmann 49db9228b8 macb: fix build warning for !CONFIG_OF
When CONFIG_OF is disabled, we get a harmless warning about the
newly added variable:

drivers/net/ethernet/cadence/macb_main.c:48:39: error: 'mgmt' defined but not used [-Werror=unused-variable]
 static struct sifive_fu540_macb_mgmt *mgmt;

Move the variable closer to its use inside of the #ifdef.

Fixes: c218ad5590 ("macb: Add support for SiFive FU540-C000")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:43:07 -07:00
Arnd Bergmann 0287f9ed16 gve: fix unused variable/label warnings
On unusual page sizes, we get harmless warnings:

drivers/net/ethernet/google/gve/gve_rx.c:283:6: error: unused variable 'pagecount' [-Werror,-Wunused-variable]
drivers/net/ethernet/google/gve/gve_rx.c:336:1: error: unused label 'have_skb' [-Werror,-Wunused-label]

Change the preprocessor #if to regular if() to avoid this.

Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:40:58 -07:00
Jose Abreu 4993e5b37e net: stmmac: Re-work the queue selection for TSO packets
Ben Hutchings says:
	"This is the wrong place to change the queue mapping.
	stmmac_xmit() is called with a specific TX queue locked,
	and accessing a different TX queue results in a data race
	for all of that queue's state.

	I think this commit should be reverted upstream and in all
	stable branches.  Instead, the driver should implement the
	ndo_select_queue operation and override the queue mapping there."

Fixes: c5acdbee22 ("net: stmmac: Send TSO packets always from Queue 0")
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:36:06 -07:00
Linus Torvalds 223cea6a4f Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
 "The speculative paranoia departement delivers a few more plugs for
  possible (probably theoretical) spectre/mds leaks"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tls: Fix possible spectre-v1 in do_get_thread_area()
  x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
  x86/speculation/mds: Eliminate leaks by trace_hardirqs_on()
2019-07-08 12:23:00 -07:00
Martin Habets 05cfee98c8 sfc: Remove 'PCIE error reporting unavailable'
This is only at notice level but it was pointed out that no other driver
does this.
Also there is no action the user can take as it is really a property of
the server.

Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:16:48 -07:00
Linus Torvalds 2f0f6503e3 Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Thomas Gleixner:
 "A rather large series consolidating the HPET code, which was triggered
  by the attempt to bolt HPET NMI watchdog support on to the existing
  maze with the usual duct tape and super glue approach.

  This mainly removes two separate partially redundant storage layers
  and consolidates them into a single one which provides a consistent
  view of the different HPET channels and their usage and allows to
  integrate HPET NMI watchdog support (if it turns out to be feasible)
  in a non intrusive way"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
  x86/hpet: Use channel for legacy clockevent storage
  x86/hpet: Use common init for legacy clockevent
  x86/hpet: Carve out shareable parts of init_one_hpet_msi_clockevent()
  x86/hpet: Consolidate clockevent functions
  x86/hpet: Wrap legacy clockevent in hpet_channel
  x86/hpet: Use cached info instead of extra flags
  x86/hpet: Move clockevents into channels
  x86/hpet: Rename variables to prepare for switching to channels
  x86/hpet: Add function to select a /dev/hpet channel
  x86/hpet: Add mode information to struct hpet_channel
  x86/hpet: Use cached channel data
  x86/hpet: Introduce struct hpet_base and struct hpet_channel
  x86/hpet: Coding style cleanup
  x86/hpet: Clean up comments
  x86/hpet: Make naming consistent
  x86/hpet: Remove not required includes
  x86/hpet: Decapitalize and rename EVT_TO_HPET_DEV
  x86/hpet: Simplify counter validation
  x86/hpet: Separate counter check out of clocksource register code
  x86/hpet: Shuffle code around for readability sake
  ...
2019-07-08 12:16:40 -07:00