Commit Graph

873460 Commits

Author SHA1 Message Date
Francesco Ruggeri 2adf81c0f7 net: icmp: use input address in traceroute
Even with icmp_errors_use_inbound_ifaddr set, traceroute returns the
primary address of the interface the packet was received on, even if
the path goes through a secondary address. In the example:

                    1.0.3.1/24
 ---- 1.0.1.3/24    1.0.1.1/24 ---- 1.0.2.1/24    1.0.2.4/24 ----
 |H1|--------------------------|R1|--------------------------|H2|
 ----            N1            ----            N2            ----

where 1.0.3.1/24 is R1's primary address on N1, traceroute from
H1 to H2 returns:

traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
 1  1.0.3.1 (1.0.3.1)  0.018 ms  0.006 ms  0.006 ms
 2  1.0.2.4 (1.0.2.4)  0.021 ms  0.007 ms  0.007 ms

After applying this patch, it returns:

traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
 1  1.0.1.1 (1.0.1.1)  0.033 ms  0.007 ms  0.006 ms
 2  1.0.2.4 (1.0.2.4)  0.011 ms  0.007 ms  0.007 ms

Original-patch-by: Bill Fenner <fenner@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:25:18 -08:00
David S. Miller c219a16622 Merge branch 'optimize-openvswitch-flow-looking-up'
Tonghao Zhang says:

====================
optimize openvswitch flow looking up

This series patch optimize openvswitch for performance or simplify
codes.

Patch 1, 2, 4: Port Pravin B Shelar patches to
linux upstream with little changes.

Patch 5, 6, 7: Optimize the flow looking up and
simplify the flow hash.

Patch 8, 9: are bugfix.

The performance test is on Intel Xeon E5-2630 v4.
The test topology is show as below:

+-----------------------------------+
|   +---------------------------+   |
|   | eth0   ovs-switch    eth1 |   | Host0
|   +---------------------------+   |
+-----------------------------------+
      ^                       |
      |                       |
      |                       |
      |                       |
      |                       v
+-----+----+             +----+-----+
| netperf  | Host1       | netserver| Host2
+----------+             +----------+

We use netperf send the 64B packets, and insert 255+ flow-mask:
$ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2
...
$ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2
$
$ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18

* Without series patch, throughput 8.28Mbps
* With series patch, throughput 46.05Mbps

v6:
some coding style fixes

v5:
rewrite patch 8, release flow-mask when freeing flow

v4:
access ma->count with READ_ONCE/WRITE_ONCE API. More information,
see patch 5 comments.

v3:
update ma point when realloc mask_array in patch 5

v2:
simplify codes. e.g. use kfree_rcu instead of call_rcu
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:04 -08:00
Tonghao Zhang eec62eadd1 net: openvswitch: simplify the ovs_dp_cmd_new
use the specified functions to init resource.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:04 -08:00
Tonghao Zhang 4c76bf696a net: openvswitch: don't unlock mutex when changing the user_features fails
Unlocking of a not locked mutex is not allowed.
Other kernel thread may be in critical section while
we unlock it because of setting user_feature fail.

Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index")
Cc: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:04 -08:00
Tonghao Zhang 50b0e61b32 net: openvswitch: fix possible memleak on destroy flow-table
When we destroy the flow tables which may contain the flow_mask,
so release the flow mask struct.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:03 -08:00
Tonghao Zhang 0a3e01371d net: openvswitch: add likely in flow_lookup
The most case *index < ma->max, and flow-mask is not NULL.
We add un/likely for performance.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:03 -08:00
Tonghao Zhang 515b65a4b9 net: openvswitch: simplify the flow_hash
Simplify the code and remove the unnecessary BUILD_BUG_ON.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:03 -08:00
Tonghao Zhang 57f7d7b916 net: openvswitch: optimize flow-mask looking up
The full looking up on flow table traverses all mask array.
If mask-array is too large, the number of invalid flow-mask
increase, performance will be drop.

One bad case, for example: M means flow-mask is valid and NULL
of flow-mask means deleted.

+-------------------------------------------+
| M | NULL | ...                  | NULL | M|
+-------------------------------------------+

In that case, without this patch, openvswitch will traverses all
mask array, because there will be one flow-mask in the tail. This
patch changes the way of flow-mask inserting and deleting, and the
mask array will be keep as below: there is not a NULL hole. In the
fast path, we can "break" "for" (not "continue") in flow_lookup
when we get a NULL flow-mask.

         "break"
            v
+-------------------------------------------+
| M | M |  NULL |...           | NULL | NULL|
+-------------------------------------------+

This patch don't optimize slow or control path, still using ma->max
to traverse. Slow path:
* tbl_mask_array_realloc
* ovs_flow_tbl_lookup_exact
* flow_mask_find

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:03 -08:00
Tonghao Zhang a7f35e78e7 net: openvswitch: optimize flow mask cache hash collision
Port the codes to linux upstream and with little changes.

Pravin B Shelar, says:
| In case hash collision on mask cache, OVS does extra flow
| lookup. Following patch avoid it.

Link: 0e6efbe271
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:03 -08:00
Tonghao Zhang 1689754de6 net: openvswitch: shrink the mask array if necessary
When creating and inserting flow-mask, if there is no available
flow-mask, we realloc the mask array. When removing flow-mask,
if necessary, we shrink mask array.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:03 -08:00
Tonghao Zhang 4bc63b1b53 net: openvswitch: convert mask list in mask array
Port the codes to linux upstream and with little changes.

Pravin B Shelar, says:
| mask caches index of mask in mask_list. On packet recv OVS
| need to traverse mask-list to get cached mask. Therefore array
| is better for retrieving cached mask. This also allows better
| cache replacement algorithm by directly checking mask's existence.

Link: d49fc3ff53
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:03 -08:00
Tonghao Zhang 04b7d136d0 net: openvswitch: add flow-mask cache for performance
The idea of this optimization comes from a patch which
is committed in 2014, openvswitch community. The author
is Pravin B Shelar. In order to get high performance, I
implement it again. Later patches will use it.

Pravin B Shelar, says:
| On every packet OVS needs to lookup flow-table with every
| mask until it finds a match. The packet flow-key is first
| masked with mask in the list and then the masked key is
| looked up in flow-table. Therefore number of masks can
| affect packet processing performance.

Link: 5604935e4e
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:03 -08:00
David S. Miller ae8a76fb8b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-11-02

The following pull-request contains BPF updates for your *net-next* tree.

We've added 30 non-merge commits during the last 7 day(s) which contain
a total of 41 files changed, 1864 insertions(+), 474 deletions(-).

The main changes are:

1) Fix long standing user vs kernel access issue by introducing
   bpf_probe_read_user() and bpf_probe_read_kernel() helpers, from Daniel.

2) Accelerated xskmap lookup, from Björn and Maciej.

3) Support for automatic map pinning in libbpf, from Toke.

4) Cleanup of BTF-enabled raw tracepoints, from Alexei.

5) Various fixes to libbpf and selftests.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-02 15:29:58 -07:00
David S. Miller d31e95585c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The only slightly tricky merge conflict was the netdevsim because the
mutex locking fix overlapped a lot of driver reload reorganization.

The rest were (relatively) trivial in nature.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-02 13:54:56 -07:00
Alexei Starovoitov 358fdb4562 Merge branch 'bpf_probe_read_user'
Daniel Borkmann says:

====================
This set adds probe_read_{user,kernel}(), probe_read_str_{user,kernel}()
helpers, fixes probe_write_user() helper and selftests. For details please
see individual patches.

Thanks!

v2 -> v3:
  - noticed two more things that are fixed in here:
   - bpf uapi helper description used 'int size' for *_str helpers, now u32
   - we need TASK_SIZE_MAX + guard page on x86-64 in patch 2 otherwise
     we'll trigger the 00c42373d3 warn as well, so full range covered now
v1 -> v2:
  - standardize unsafe_ptr terminology in uapi header comment (Andrii)
  - probe_read_{user,kernel}[_str] naming scheme (Andrii)
  - use global data in last test case, remove relaxed_maps (Andrii)
  - add strict non-pagefault kernel read funcs to avoid warning in
    kernel probe read helpers (Alexei)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-11-02 12:45:15 -07:00
Daniel Borkmann fa553d9b57 bpf, testing: Add selftest to read/write sockaddr from user space
Tested on x86-64 and Ilya was also kind enough to give it a spin on
s390x, both passing with probe_user:OK there. The test is using the
newly added bpf_probe_read_user() to dump sockaddr from connect call
into .bss BPF map and overrides the user buffer via bpf_probe_write_user():

  # ./test_progs
  [...]
  #17 pkt_md_access:OK
  #18 probe_user:OK
  #19 prog_run_xattr:OK
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/90f449d8af25354e05080e82fc6e2d3179da30ea.1572649915.git.daniel@iogearbox.net
2019-11-02 12:45:08 -07:00
Daniel Borkmann 50f9aa44ca bpf, testing: Convert prog tests to probe_read_{user, kernel}{, _str} helper
Use probe read *_{kernel,user}{,_str}() helpers instead of bpf_probe_read()
or bpf_probe_read_user_str() for program tests where appropriate.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/4a61d4b71ce3765587d8ef5cb93afa18515e5b3e.1572649915.git.daniel@iogearbox.net
2019-11-02 12:39:13 -07:00
Daniel Borkmann 251e2d337a bpf, samples: Use bpf_probe_read_user where appropriate
Use bpf_probe_read_user() helper instead of bpf_probe_read() for samples that
attach to kprobes probing on user addresses.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/5b0144b3f8e031ec5e2438bd7de8d7877e63bf2f.1572649915.git.daniel@iogearbox.net
2019-11-02 12:39:13 -07:00
Daniel Borkmann 6e07a63412 bpf: Switch BPF probe insns to bpf_probe_read_kernel
Commit 2a02759ef5 ("bpf: Add support for BTF pointers to interpreter")
explicitly states that the pointer to BTF object is a pointer to a kernel
object or NULL. Therefore we should also switch to using the strict kernel
probe helper which is restricted to kernel addresses only when architectures
have non-overlapping address spaces.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/d2b90827837685424a4b8008dfe0460558abfada.1572649915.git.daniel@iogearbox.net
2019-11-02 12:39:12 -07:00
Daniel Borkmann 6ae08ae3de bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers
The current bpf_probe_read() and bpf_probe_read_str() helpers are broken
in that they assume they can be used for probing memory access for kernel
space addresses /as well as/ user space addresses.

However, plain use of probe_kernel_read() for both cases will attempt to
always access kernel space address space given access is performed under
KERNEL_DS and some archs in-fact have overlapping address spaces where a
kernel pointer and user pointer would have the /same/ address value and
therefore accessing application memory via bpf_probe_read{,_str}() would
read garbage values.

Lets fix BPF side by making use of recently added 3d7081822f ("uaccess:
Add non-pagefault user-space read functions"). Unfortunately, the only way
to fix this status quo is to add dedicated bpf_probe_read_{user,kernel}()
and bpf_probe_read_{user,kernel}_str() helpers. The bpf_probe_read{,_str}()
helpers are kept as-is to retain their current behavior.

The two *_user() variants attempt the access always under USER_DS set, the
two *_kernel() variants will -EFAULT when accessing user memory if the
underlying architecture has non-overlapping address ranges, also avoiding
throwing the kernel warning via 00c42373d3 ("x86-64: add warning for
non-canonical user access address dereferences").

Fixes: a5e8c07059 ("bpf: add bpf_probe_read_str helper")
Fixes: 2541517c32 ("tracing, perf: Implement BPF programs attached to kprobes")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/796ee46e948bc808d54891a1108435f8652c6ca4.1572649915.git.daniel@iogearbox.net
2019-11-02 12:39:12 -07:00
Daniel Borkmann eb1b668874 bpf: Make use of probe_user_write in probe write helper
Convert the bpf_probe_write_user() helper to probe_user_write() such that
writes are not attempted under KERNEL_DS anymore which is buggy as kernel
and user space pointers can have overlapping addresses. Also, given we have
the access_ok() check inside probe_user_write(), the helper doesn't need
to do it twice.

Fixes: 96ae522795 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/841c461781874c07a0ee404a454c3bc0459eed30.1572649915.git.daniel@iogearbox.net
2019-11-02 12:39:12 -07:00
Daniel Borkmann 75a1a607bb uaccess: Add strict non-pagefault kernel-space read function
Add two new probe_kernel_read_strict() and strncpy_from_unsafe_strict()
helpers which by default alias to the __probe_kernel_read() and the
__strncpy_from_unsafe(), respectively, but can be overridden by archs
which have non-overlapping address ranges for kernel space and user
space in order to bail out with -EFAULT when attempting to probe user
memory including non-canonical user access addresses [0]:

  4-level page tables:
    user-space mem: 0x0000000000000000 - 0x00007fffffffffff
    non-canonical:  0x0000800000000000 - 0xffff7fffffffffff

  5-level page tables:
    user-space mem: 0x0000000000000000 - 0x00ffffffffffffff
    non-canonical:  0x0100000000000000 - 0xfeffffffffffffff

The idea is that these helpers are complementary to the probe_user_read()
and strncpy_from_unsafe_user() which probe user-only memory. Both added
helpers here do the same, but for kernel-only addresses.

Both set of helpers are going to be used for BPF tracing. They also
explicitly avoid throwing the splat for non-canonical user addresses from
00c42373d3 ("x86-64: add warning for non-canonical user access address
dereferences").

For compat, the current probe_kernel_read() and strncpy_from_unsafe() are
left as-is.

  [0] Documentation/x86/x86_64/mm.txt

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: x86@kernel.org
Link: https://lore.kernel.org/bpf/eefeefd769aa5a013531f491a71f0936779e916b.1572649915.git.daniel@iogearbox.net
2019-11-02 12:39:12 -07:00
Daniel Borkmann 1d1585ca0f uaccess: Add non-pagefault user-space write function
Commit 3d7081822f ("uaccess: Add non-pagefault user-space read functions")
missed to add probe write function, therefore factor out a probe_write_common()
helper with most logic of probe_kernel_write() except setting KERNEL_DS, and
add a new probe_user_write() helper so it can be used from BPF side.

Again, on some archs, the user address space and kernel address space can
co-exist and be overlapping, so in such case, setting KERNEL_DS would mean
that the given address is treated as being in kernel address space.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/9df2542e68141bfa3addde631441ee45503856a8.1572649915.git.daniel@iogearbox.net
2019-11-02 12:39:12 -07:00
Alexei Starovoitov e1cb7d2d60 Merge branch 'map-pinning'
Toke Høiland-Jørgensen says:

====================
This series adds support to libbpf for reading 'pinning' settings from BTF-based
map definitions. It introduces a new open option which can set the pinning path;
if no path is set, /sys/fs/bpf is used as the default. Callers can customise the
pinning between open and load by setting the pin path per map, and still get the
automatic reuse feature.

The semantics of the pinning is similar to the iproute2 "PIN_GLOBAL" setting,
and the eventual goal is to move the iproute2 implementation to be based on
libbpf and the functions introduced in this series.

Changelog:

v6:
  - Fix leak of struct bpf_object in selftest
  - Make struct bpf_map arg const in bpf_map__is_pinned() and bpf_map__get_pin_path()

v5:
  - Don't pin maps with pinning set, but with a value of LIBBPF_PIN_NONE
  - Add a few more selftests:
    - Should not pin map with pinning set, but value LIBBPF_PIN_NONE
    - Should fail to load a map with an invalid pinning value
    - Should fail to re-use maps with parameter mismatch
  - Alphabetise libbpf.map
  - Whitespace and typo fixes

v4:
  - Don't check key_type_id and value_type_id when checking for map reuse
    compatibility.
  - Move building of map->pin_path into init_user_btf_map()
  - Get rid of 'pinning' attribute in struct bpf_map
  - Make sure we also create parent directory on auto-pin (new patch 3).
  - Abort the selftest on error instead of attempting to continue.
  - Support unpinning all pinned maps with bpf_object__unpin_maps(obj, NULL)
  - Support pinning at map->pin_path with bpf_object__pin_maps(obj, NULL)
  - Make re-pinning a map at the same path a noop
  - Rename the open option to pin_root_path
  - Add a bunch more self-tests for pin_maps(NULL) and unpin_maps(NULL)
  - Fix a couple of smaller nits

v3:
  - Drop bpf_object__pin_maps_opts() and just use an open option to customise
    the pin path; also don't touch bpf_object__{un,}pin_maps()
  - Integrate pinning and reuse into bpf_object__create_maps() instead of having
    multiple loops though the map structure
  - Make errors in map reuse and pinning fatal to the load procedure
  - Add selftest to exercise pinning feature
  - Rebase series to latest bpf-next

v2:
  - Drop patch that adds mounting of bpffs
  - Only support a single value of the pinning attribute
  - Add patch to fixup error handling in reuse_fd()
  - Implement the full automatic pinning and map reuse logic on load
====================

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-11-02 12:35:12 -07:00
Toke Høiland-Jørgensen 2f4a32cc83 selftests: Add tests for automatic map pinning
This adds a new BPF selftest to exercise the new automatic map pinning
code.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157269298209.394725.15420085139296213182.stgit@toke.dk
2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen 57a00f4164 libbpf: Add auto-pinning of maps when loading BPF objects
This adds support to libbpf for setting map pinning information as part of
the BTF map declaration, to get automatic map pinning (and reuse) on load.
The pinning type currently only supports a single PIN_BY_NAME mode, where
each map will be pinned by its name in a path that can be overridden, but
defaults to /sys/fs/bpf.

Since auto-pinning only does something if any maps actually have a
'pinning' BTF attribute set, we default the new option to enabled, on the
assumption that seamless pinning is what most callers want.

When a map has a pin_path set at load time, libbpf will compare the map
pinned at that location (if any), and if the attributes match, will re-use
that map instead of creating a new one. If no existing map is found, the
newly created map will instead be pinned at the location.

Programs wanting to customise the pinning can override the pinning paths
using bpf_map__set_pin_path() before calling bpf_object__load() (including
setting it to NULL to disable pinning of a particular map).

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157269298092.394725.3966306029218559681.stgit@toke.dk
2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen 196f8487f5 libbpf: Move directory creation into _pin() functions
The existing pin_*() functions all try to create the parent directory
before pinning. Move this check into the per-object _pin() functions
instead. This ensures consistent behaviour when auto-pinning is
added (which doesn't go through the top-level pin_maps() function), at the
cost of a few more calls to mkdir().

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157269297985.394725.5882630952992598610.stgit@toke.dk
2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen 4580b25fce libbpf: Store map pin path and status in struct bpf_map
Support storing and setting a pin path in struct bpf_map, which can be used
for automatic pinning. Also store the pin status so we can avoid attempts
to re-pin a map that has already been pinned (or reused from a previous
pinning).

The behaviour of bpf_object__{un,}pin_maps() is changed so that if it is
called with a NULL path argument (which was previously illegal), it will
(un)pin only those maps that have a pin_path set.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157269297876.394725.14782206533681896279.stgit@toke.dk
2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen d1b4574a4b libbpf: Fix error handling in bpf_map__reuse_fd()
bpf_map__reuse_fd() was calling close() in the error path before returning
an error value based on errno. However, close can change errno, so that can
lead to potentially misleading error messages. Instead, explicitly store
errno in the err variable before each goto.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157269297769.394725.12634985106772698611.stgit@toke.dk
2019-11-02 12:35:06 -07:00
Linus Torvalds 1204c70d9d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix free/alloc races in batmanadv, from Sven Eckelmann.

 2) Several leaks and other fixes in kTLS support of mlx5 driver, from
    Tariq Toukan.

 3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
    Høiland-Jørgensen.

 4) Add an r8152 device ID, from Kazutoshi Noguchi.

 5) Missing include in ipv6's addrconf.c, from Ben Dooks.

 6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
    easily infer the 32-bit secret otherwise etc.

 7) Several netdevice nesting depth fixes from Taehee Yoo.

 8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
    when doing lockless skb_queue_empty() checks, and accessing
    sk_napi_id/sk_incoming_cpu lockless as well.

 9) Fix jumbo packet handling in RXRPC, from David Howells.

10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.

11) Fix DMA synchronization in gve driver, from Yangchun Fu.

12) Several bpf offload fixes, from Jakub Kicinski.

13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.

14) Fix ping latency during high traffic rates in hisilicon driver, from
    Jiangfent Xiao.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
  net: fix installing orphaned programs
  net: cls_bpf: fix NULL deref on offload filter removal
  selftests: bpf: Skip write only files in debugfs
  selftests: net: reuseport_dualstack: fix uninitalized parameter
  r8169: fix wrong PHY ID issue with RTL8168dp
  net: dsa: bcm_sf2: Fix IMP setup for port different than 8
  net: phylink: Fix phylink_dbg() macro
  gve: Fixes DMA synchronization.
  inet: stop leaking jiffies on the wire
  ixgbe: Remove duplicate clear_bit() call
  Documentation: networking: device drivers: Remove stray asterisks
  e1000: fix memory leaks
  i40e: Fix receive buffer starvation for AF_XDP
  igb: Fix constant media auto sense switching when no cable is connected
  net: ethernet: arc: add the missed clk_disable_unprepare
  igb: Enable media autosense for the i350.
  igb/igc: Don't warn on fatal read failures when the device is removed
  tcp: increase tcp_max_syn_backlog max value
  net: increase SOMAXCONN to 4096
  netdevsim: Fix use-after-free during device dismantle
  ...
2019-11-01 17:48:11 -07:00
Linus Torvalds 372bf6c1c8 NFS Client Bugfixes for Linux 5.4-rc6
Stable bugfixes:
 - Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
 
 Other fixes:
 - The TCP back channel mustn't disappear while requests are outstanding
 - The RDMA back channel mustn't disappear while requests are outstanding
 - Destroy the back channel when we destroy the host transport
 - Don't allow a cached open with a revoked delegation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl28mZ4ACgkQ18tUv7Cl
 QOvAAxAA5h15NJ8OTOpr/kyWQAHEYWIEWQyTNYT8Gi/TI/YCfauET6kxAKg7ETk+
 Bx6+A7Q5OlKSqFZXf2UMMIHPnvH7Ar06uIof6DLjWxA9N/q57SNW0YszQCUhvWjp
 4Ry7d9A2yNrCizEk/vTmY5zfFIo2S88AvwkQ6gLCEyfIn5REWQyQnS4SS3vL7MuS
 E/iWBsXFV2C9/yON+zzZnC0ewMIPbtWA3/CsVI2zDKLsHKvBYOx6n7TbfTtMtc/p
 mCdgLzdEEZlN0KOCzzMp7ziME8y1qUGX6eo17Jrr0ZBZIM7o5d7sYMA88Apu6dMg
 MQM5iEEAf//kmHLWzQ6yg8XZEZrXFMOiGKYu0kWt4pVlqi78gemEfE6T8dClsB5m
 P7jqn/4ntDlJPWPciImsVEzDOHlzlZBIXwZs1JjufeO/hIB570cs80PjeDQP0id8
 Cx3yR7Hr1ldKzwSwpWrtIaqEPljJJw0jhxQ7YYlvfbA1Ogd4ek4QU4RuvyW11CRr
 d/5Oaa7xvjGs63klu8HpZG4NHFby3PPmZT6t4xUcH75MR14S15YCGtaUvPJ7ORck
 C6tZJAkVtJqSoAjsGuBtTc5qiOdo+hMRghutW4Pd6UpnPYJKwQUtTqp3MbfM4+bA
 S19tGG6qn+7cl7bW4NJpNy8HfbC7BdS+m7maHpQfGIdMsxKVFoI=
 =ttcU
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "This contains two delegation fixes (with the RCU lock leak fix marked
  for stable), and three patches to fix destroying the the sunrpc back
  channel.

  Stable bugfixes:

   - Fix an RCU lock leak in nfs4_refresh_delegation_stateid()

  Other fixes:

   - The TCP back channel mustn't disappear while requests are
     outstanding

   - The RDMA back channel mustn't disappear while requests are
     outstanding

   - Destroy the back channel when we destroy the host transport

   - Don't allow a cached open with a revoked delegation"

* tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
  NFSv4: Don't allow a cached open with a revoked delegation
  SUNRPC: Destroy the back channel when we destroy the host transport
  SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding
  SUNRPC: The TCP back channel mustn't disappear while requests are outstanding
2019-11-01 17:37:44 -07:00
Linus Torvalds 0821de2896 for-linus-20191101
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl28lRQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjtbEADbrMXdsdPV2CApxSiZIaWO1mR78yy/btxp
 cHsJ+avaPGxhNukSsose2KWm656SriH/OfQspqtvzDpslbu40V41+vSqSknqGRPr
 8jW5efZIAy6dq0FjbtBnmIV6PhC5d4F/nAEQbsnVRn8RSr3OwQcm8/smpSFA8urI
 oHVU8jiyLsQiSbDvjf2KPhPYhWBHO0W5SyGo29HY8pSzQpsMzGkQ6TcHL4EzwPZP
 WPtGglr14v8rMyhNMxUdHZ9eHCMq7uufFPuyJXzesE/qyM+H8p2pwwxyfflHGZil
 w2vxLJRu8d4UIkHEkNbC0bydXJ+eCtRMBZON1ZGdrZwQ58L9AbBPBZmxKb0LkmHb
 4tc/yQm/0kSUUXwFtDoUoIBFjjy36Pl5BsLt4n5fofsl04myhm5CLqZ8oWxyU0vO
 sCinJwk1+eQO/tbQVDfven+MroNlYVPCnXhIe/12/wEba3EJ7Ab4X5p0lJoJ1oY7
 9dQyY6+BaHd4wV9p0domOP5y7dJnXM9k46EF0/5YoNjoqaH5MWPMq355VH2xNjdw
 5HzRcZfvOAlXASrnXuQAAQAdR2b+s/iFZaNKA7bTZxjNPvYE0zySCMeQeNXmfVKe
 CrDuwViWukwIzETDZHYqMWJxOV4nyOeL3jTo7rQp5A5TEWwBiJKQ4aGBif2eqc+L
 Mk41ziQGuQ==
 =+rar
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20191101' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Two small nvme fixes, one is a fabrics connection fix, the other one
   a cleanup made possible by that fix (Anton, via Keith)

 - Fix requeue handling in umb ubd (Anton)

 - Fix spin_lock_irq() nesting in blk-iocost (Dan)

 - Three small io_uring fixes:
     - Install io_uring fd after done with ctx (me)
     - Clear ->result before every poll issue (me)
     - Fix leak of shadow request on error (Pavel)

* tag 'for-linus-20191101' of git://git.kernel.dk/linux-block:
  iocost: don't nest spin_lock_irq in ioc_weight_write()
  io_uring: ensure we clear io_kiocb->result before each issue
  um-ubd: Entrust re-queue to the upper layers
  nvme-multipath: remove unused groups_only mode in ana log
  nvme-multipath: fix possible io hang after ctrl reconnect
  io_uring: don't touch ctx in setup after ring fd install
  io_uring: Fix leaked shadow_req
2019-11-01 17:33:12 -07:00
Linus Torvalds e5897c7d2e RISC-V updates for v5.4-rc6
One fix for PCIe users:
 
 - Fix legacy PCI I/O port access emulation
 
 One set of cleanups:
 
 - Resolve most of the warnings generated by sparse across arch/riscv.
   No functional changes
 
 And one MAINTAINERS update:
 
 - Update Palmer's E-mail address
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAl28yFoACgkQx4+xDQu9
 KkvNSQ/8DpNzyFysNQZQEwbh3s9VOl2mWpie2Ym/ND2KJHL4zavFURwX5wO7QY7I
 a6H1fYZ3yjw9JOmUsCj/0ia2gzvAlrAPpy8Vd3hO/MPTXvxqPYJ0CC325Spn4aG+
 qYJHJWH/kBKhaaWDVa4wtOok35fx2F9UD352BZjyJfmvjqLKK0H+iflVId8ZrKPl
 hM8q4uO75b9NJUFakXMYAfYSTAzDe2qleNQzfJMlBRNjfxyx6E39O6U0VYNmIYXp
 c+U8OxfFKvJ0aU3JMLRxvt1gZRhsVk3lpScqaRvSRhtbUBcP6ya/hvySIDg3T+wK
 b7k3x0uINMJqxlAu/akw2X3QPFfqrQUxpXP80qZW+duGSEPsOeICGFvzi0gOn80F
 vq+SjVv3pXLxt1psd4Y6Os1kofFki+/VpUmtaQQwnBIHM3U7RomFnfSgCF2H/mPY
 SdYrH5F4RHVzptvG3MBSFqrs+QycPE2NJUdMKz794VQ4qiDqlpV1HKAO1Fbbw7KA
 b0eUd4MatHp2KSRG7YyT3RpCAKSyz7Ar9yu5rGr/I/J40PiPzHtR0d4FK0mQfiZw
 GBxvwUlErswgFrrmD3JMnWaFTEa+kzGKiYDgunjazGzJ8W06i3la0wWVcHXVdbj+
 K6SWla67dTw+N45U3Cs1WvtdgVK9QpQkl14X5HLynqUviw546ng=
 =hkRB
 -----END PGP SIGNATURE-----

Merge tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:
 "One fix for PCIe users:

   - Fix legacy PCI I/O port access emulation

  One set of cleanups:

   - Resolve most of the warnings generated by sparse across arch/riscv.
     No functional changes

  And one MAINTAINERS update:

   - Update Palmer's E-mail address"

* tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  MAINTAINERS: Change to my personal email address
  RISC-V: Add PCIe I/O BAR memory mapping
  riscv: for C functions called only from assembly, mark with __visible
  riscv: fp: add missing __user pointer annotations
  riscv: add missing header file includes
  riscv: mark some code and data as file-static
  riscv: init: merge split string literals in preprocessor directive
  riscv: add prototypes for assembly language functions from head.S
2019-11-01 17:20:53 -07:00
Daniel Borkmann 78db77fab1 Merge branch 'bpf-xskmap-perf-improvements'
Björn Töpel says:

====================
This set consists of three patches from Maciej and myself which are
optimizing the XSKMAP lookups.  In the first patch, the sockets are
moved to be stored at the tail of the struct xsk_map. The second
patch, Maciej implements map_gen_lookup() for XSKMAP. The third patch,
introduced in this revision, moves various XSKMAP functions, to permit
the compiler to do more aggressive inlining.

Based on the XDP program from tools/lib/bpf/xsk.c where
bpf_map_lookup_elem() is explicitly called, this work yields a 5%
improvement for xdpsock's rxdrop scenario. The last patch yields 2%
improvement.

Jonathan's Acked-by: for patch 1 and 2 was carried on. Note that the
overflow checks are done in the bpf_map_area_alloc() and
bpf_map_charge_init() functions, which was fixed in commit
ff1c08e1f7 ("bpf: Change size to u64 for bpf_map_{area_alloc,
charge_init}()").

  [1] https://patchwork.ozlabs.org/patch/1186170/

v1->v2: * Change size/cost to size_t and use {struct, array}_size
          where appropriate. (Jakub)
v2->v3: * Proper commit message for patch 2.
v3->v4: * Change size_t to u64 to handle 32-bit overflows. (Jakub)
        * Introduced patch 3.
v4->v5: * Use BPF_SIZEOF size, instead of BPF_DW, for correct
          pointer-sized loads. (Daniel)
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-11-02 00:38:59 +01:00
Björn Töpel d817991cc7 xsk: Restructure/inline XSKMAP lookup/redirect/flush
In this commit the XSKMAP entry lookup function used by the XDP
redirect code is moved from the xskmap.c file to the xdp_sock.h
header, so the lookup can be inlined from, e.g., the
bpf_xdp_redirect_map() function.

Further the __xsk_map_redirect() and __xsk_map_flush() is moved to the
xsk.c, which lets the compiler inline the xsk_rcv() and xsk_flush()
functions.

Finally, all the XDP socket functions were moved from linux/bpf.h to
net/xdp_sock.h, where most of the XDP sockets functions are anyway.

This yields a ~2% performance boost for the xdpsock "rx_drop"
scenario.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191101110346.15004-4-bjorn.topel@gmail.com
2019-11-02 00:38:49 +01:00
Maciej Fijalkowski e65650f291 bpf: Implement map_gen_lookup() callback for XSKMAP
Inline the xsk_map_lookup_elem() via implementing the map_gen_lookup()
callback. This results in emitting the bpf instructions in place of
bpf_map_lookup_elem() helper call and better performance of bpf
programs.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/20191101110346.15004-3-bjorn.topel@gmail.com
2019-11-02 00:38:49 +01:00
Björn Töpel 64fe8c061d xsk: Store struct xdp_sock as a flexible array member of the XSKMAP
Prior this commit, the array storing XDP socket instances were stored
in a separate allocated array of the XSKMAP. Now, we store the sockets
as a flexible array member in a similar fashion as the arraymap. Doing
so, we do less pointer chasing in the lookup.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/20191101110346.15004-2-bjorn.topel@gmail.com
2019-11-02 00:38:49 +01:00
Linus Torvalds 31408fbe33 Merge branch 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
 "Fix a parisc kernel crash with ftrace functions when compiled without
  frame pointers"

* 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: fix frame pointer in ftrace_regs_caller()
2019-11-01 15:16:25 -07:00
David S. Miller aeb1b85c34 Merge branch 'fix-BPF-offload-related-bugs'
Jakub Kicinski says:

====================
fix BPF offload related bugs

test_offload.py catches some recently added bugs.

First of a bug in test_offload.py itself after recent changes
to netdevsim is fixed.

Second patch fixes a bug in cls_bpf, and last one addresses
a problem with the recently added XDP installation optimization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:16:01 -07:00
Jakub Kicinski aefc3e723a net: fix installing orphaned programs
When netdevice with offloaded BPF programs is destroyed
the programs are orphaned and removed from the program
IDA - their IDs get released (the programs may remain
accessible via existing open file descriptors and pinned
files). After IDs are released they are set to 0.

This confuses dev_change_xdp_fd() because it compares
the __dev_xdp_query() result where 0 means no program
with prog->aux->id where 0 means orphaned.

dev_change_xdp_fd() would have incorrectly returned success
even though it had not installed the program.

Since drivers already catch this case via bpf_offload_dev_match()
let them handle this case. The error message drivers produce in
this case ("program loaded for a different device") is in fact
correct as the orphaned program must had to be loaded for a
different device.

Fixes: c14a9f633d ("net: Don't call XDP_SETUP_PROG when nothing is changed")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:16:01 -07:00
Jakub Kicinski 41aa29a58b net: cls_bpf: fix NULL deref on offload filter removal
Commit 4011921137 ("net: sched: refactor block offloads counter
usage") missed the fact that either new prog or old prog may be
NULL.

Fixes: 4011921137 ("net: sched: refactor block offloads counter usage")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:16:01 -07:00
Jakub Kicinski 8101e06941 selftests: bpf: Skip write only files in debugfs
DebugFS for netdevsim now contains some "action trigger" files
which are write only. Don't try to capture the contents of those.

Note that we can't use os.access() because the script requires
root.

Fixes: 4418f862d6 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:16:01 -07:00
Wei Wang d64479a3e3 selftests: net: reuseport_dualstack: fix uninitalized parameter
This test reports EINVAL for getsockopt(SOL_SOCKET, SO_DOMAIN)
occasionally due to the uninitialized length parameter.
Initialize it to fix this, and also use int for "test_family" to comply
with the API standard.

Fixes: d6a61f80b8 ("soreuseport: test mixed v4/v6 sockets")
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Craig Gallek <cgallek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:11:02 -07:00
Heiner Kallweit 62bdc8fd1c r8169: fix wrong PHY ID issue with RTL8168dp
As reported in [0] at least one RTL8168dp version has problems
establishing a link. This chip version has an integrated RTL8211b PHY,
however the chip seems to report a wrong PHY ID, resulting in a wrong
PHY driver (for Generic Realtek PHY) being loaded.
Work around this issue by adding a hook to r8168dp_2_mdio_read()
for returning the correct PHY ID.

[0] https://bbs.archlinux.org/viewtopic.php?id=246508

Fixes: 242cd9b586 ("r8169: use phy_resume/phy_suspend")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:09:40 -07:00
Florian Fainelli 5fc0f21246 net: dsa: bcm_sf2: Fix IMP setup for port different than 8
Since it became possible for the DSA core to use a CPU port different
than 8, our bcm_sf2_imp_setup() function was broken because it assumes
that registers are applicable to port 8. In particular, the port's MAC
is going to stay disabled, so make sure we clear the RX_DIS and TX_DIS
bits if we are not configured for port 8.

Fixes: 9f91484f6f ("net: dsa: make "label" property optional for dsa2")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:08:21 -07:00
Florian Fainelli 9d68db5092 net: phylink: Fix phylink_dbg() macro
The phylink_dbg() macro does not follow dynamic debug or defined(DEBUG)
and as a result, it spams the kernel log since a PR_DEBUG level is
currently used. Fix it to be defined appropriately whether
CONFIG_DYNAMIC_DEBUG or defined(DEBUG) are set.

Fixes: 17091180b1 ("net: phylink: Add phylink_{printk, err, warn, info, dbg} macros")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:06:46 -07:00
Yangchun Fu 9cfeeb576d gve: Fixes DMA synchronization.
Synces the DMA buffer properly in order for CPU and device to see
the most up-to-data data.

Signed-off-by: Yangchun Fu <yangchun@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:00:05 -07:00
Eric Dumazet a904a0693c inet: stop leaking jiffies on the wire
Historically linux tried to stick to RFC 791, 1122, 2003
for IPv4 ID field generation.

RFC 6864 made clear that no matter how hard we try,
we can not ensure unicity of IP ID within maximum
lifetime for all datagrams with a given source
address/destination address/protocol tuple.

Linux uses a per socket inet generator (inet_id), initialized
at connection startup with a XOR of 'jiffies' and other
fields that appear clear on the wire.

Thiemo Nagel pointed that this strategy is a privacy
concern as this provides 16 bits of entropy to fingerprint
devices.

Let's switch to a random starting point, this is just as
good as far as RFC 6864 is concerned and does not leak
anything critical.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Thiemo Nagel <tnagel@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 14:57:52 -07:00
Roman Mashak c23fcbbc6a tc-testing: added tests with cookie for conntrack TC action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 14:53:41 -07:00
David S. Miller c8c2cd8102 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2019-11-01

This series contains updates to e1000, igb, igc, ixgbe, i40e and driver
documentation.

Lyude Paul fixes an issue where a fatal read error occurs when the
device is unplugged from the machine.  So change the read error into a
warn while the device is still present.

Manfred Rudigier found that the i350 device was not apart of the "Media
Auto Sense" feature, yet the device supports it.  So add the missing
i350 device to the check and fix an issue where the media auto sense
would flip/flop when no cable was connected to the port causing spurious
kernel log messages.

I fixed an issue where the fix to resolve receive buffer starvation was
applied in more than one place in the driver, one being the incorrect
location in the i40e driver.

Wenwen Wang fixes a potential memory leak in e1000 where allocated
memory is not properly cleaned up in one of the error paths.

Jonathan Neuschäfer cleans up the driver documentation to be consistent
and remove the footnote reference, since the footnote no longer exists in
the documentation.

Igor Pylypiv cleans up a duplicate clearing of a bit, no need to clear
it twice.

v2: Fixed alignment issue in patch 3 of the series based on community
    feedback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 14:50:27 -07:00