Commit Graph

726439 Commits

Author SHA1 Message Date
David S. Miller 457740a903 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-01-26

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) A number of extensions to tcp-bpf, from Lawrence.
    - direct R or R/W access to many tcp_sock fields via bpf_sock_ops
    - passing up to 3 arguments to bpf_sock_ops functions
    - tcp_sock field bpf_sock_ops_cb_flags for controlling callbacks
    - optionally calling bpf_sock_ops program when RTO fires
    - optionally calling bpf_sock_ops program when packet is retransmitted
    - optionally calling bpf_sock_ops program when TCP state changes
    - access to tclass and sk_txhash
    - new selftest

2) div/mod exception handling, from Daniel.
    One of the ugly leftovers from the early eBPF days is that div/mod
    operations based on registers have a hard-coded src_reg == 0 test
    in the interpreter as well as in JIT code generators that would
    return from the BPF program with exit code 0. This was basically
    adopted from cBPF interpreter for historical reasons.
    There are multiple reasons why this is very suboptimal and prone
    to bugs. To name one: the return code mapping for such abnormal
    program exit of 0 does not always match with a suitable program
    type's exit code mapping. For example, '0' in tc means action 'ok'
    where the packet gets passed further up the stack, which is just
    undesirable for such cases (e.g. when implementing policy) and
    also does not match with other program types.
    After considering _four_ different ways to address the problem,
    we adapt the same behavior as on some major archs like ARMv8:
    X div 0 results in 0, and X mod 0 results in X. aarch64 and
    aarch32 ISA do not generate any traps or otherwise aborts
    of program execution for unsigned divides.
    Given the options, it seems the most suitable from
    all of them, also since major archs have similar schemes in
    place. Given this is all in the realm of undefined behavior,
    we still have the option to adapt if deemed necessary.

3) sockmap sample refactoring, from John.

4) lpm map get_next_key fixes, from Yonghong.

5) test cleanups, from Alexei and Prashant.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-28 21:22:46 -05:00
David S. Miller 6b2e2829c1 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2018-01-26

This series contains updates to ixgbe and ixgbevf.

Emil updates ixgbevf to match ixgbe functionality, starting with the
consolidating of functions that represent logical steps in the receive
process so we can later update them more easily.  Updated ixgbevf to
only synchronize the length of the frame, which will typically be the
MTU or smaller.  Updated the VF driver to use the length of the packet
instead of the DD status bit to determine if a new descriptor is ready
to be processed, which saves on reads and we can save time on
initialization.  Added support for DMA_ATTR_SKIP_CPU_SYNC/WEAK_ORDERING
to help improve performance on some platforms.  Updated the VF driver to
do bulk updates of the page reference count instead of just incrementing
it by one reference at a time.  Updated the VF driver to only go through
the region of the receive ring that was designated to be cleaned up,
rather than process the entire ring.

Colin Ian King adds the use of ARRAY_SIZE() on various arrays.

Miroslav Lichvar fixes an issue where ethtool was reporting timestamping
filters unsupported for X550, which is incorrect.

Paul adds support for reporting 5G link speed for some devices.

Dan Carpenter fixes a typo where && was used when it should have been
||.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-28 10:19:48 -05:00
Leon Romanovsky 751c45bd82 net/rocker: Remove unreachable return instruction
The "return 0" instruction follows other return instruction
and it makes it impossible to execute, hence remove it.

Fixes: 00fc0c51e3 ("rocker: Change world_ops API and implementation to be switchdev independant")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-28 10:13:40 -05:00
Alexei Starovoitov 8223967fe0 Merge branch 'fix-lpm-map'
Yonghong Song says:

====================
A kernel page fault which happens in lpm map trie_get_next_key is reported
by syzbot and Eric. The issue was introduced by commit b471f2f1de
("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map").
Patch #1 fixed the issue in the kernel and patch #2 adds a multithreaded
test case in tools/testing/selftests/bpf/test_lpm_map.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 17:06:24 -08:00
Yonghong Song af32efeede tools/bpf: add a multithreaded stress test in bpf selftests test_lpm_map
The new test will spawn four threads, doing map update, delete, lookup
and get_next_key in parallel. It is able to reproduce the issue in the
previous commit found by syzbot and Eric Dumazet.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 17:06:22 -08:00
Yonghong Song 6dd1ec6c7a bpf: fix kernel page fault in lpm map trie_get_next_key
Commit b471f2f1de ("bpf: implement MAP_GET_NEXT_KEY command
for LPM_TRIE map") introduces a bug likes below:

    if (!rcu_dereference(trie->root))
        return -ENOENT;
    if (!key || key->prefixlen > trie->max_prefixlen) {
        root = &trie->root;
        goto find_leftmost;
    }
    ......
  find_leftmost:
    for (node = rcu_dereference(*root); node;) {

In the code after label find_leftmost, it is assumed
that *root should not be NULL, but it is not true as
it is possbile trie->root is changed to NULL by an
asynchronous delete operation.

The issue is reported by syzbot and Eric Dumazet with the
below error log:
  ......
  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Modules linked in:
  CPU: 1 PID: 8033 Comm: syz-executor3 Not tainted 4.15.0-rc8+ #4
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:trie_get_next_key+0x3c2/0xf10 kernel/bpf/lpm_trie.c:682
  ......

This patch fixed the issue by use local rcu_dereferenced
pointer instead of *(&trie->root) later on.

Fixes: b471f2f1de ("bpf: implement MAP_GET_NEXT_KEY command or LPM_TRIE map")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 17:06:22 -08:00
Alexei Starovoitov 1651e39e4a Merge branch 'bpf-improvements-and-fixes'
Daniel Borkmann says:

====================
This set contains a small cleanup in cBPF prologue generation and
otherwise fixes an outstanding issue related to BPF to BPF calls
and exception handling. For details please see related patches.
Last but not least, BPF selftests is extended with several new
test cases.

Thanks!
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:08 -08:00
Daniel Borkmann 21ccaf2149 bpf: add further test cases around div/mod and others
Update selftests to relfect recent changes and add various new
test cases.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:07 -08:00
Daniel Borkmann 73ae3c0426 bpf, arm: remove obsolete exception handling from div/mod
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from arm32 JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Shubham Bansal <illusionist.neo@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:07 -08:00
Daniel Borkmann e472d5d8af bpf, mips64: remove unneeded zero check from div/mod with k
The verifier in both cBPF and eBPF reject div/mod by 0 imm,
so this can never load. Remove emitting such test and reject
it from being JITed instead (the latter is actually also not
needed, but given practice in sparc64, ppc64 today, so
doesn't hurt to add it here either).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Daney <david.daney@cavium.com>
Reviewed-by: David Daney <david.daney@cavium.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:06 -08:00
Daniel Borkmann 1fb5c9c622 bpf, mips64: remove obsolete exception handling from div/mod
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from mips64 JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Daney <david.daney@cavium.com>
Reviewed-by: David Daney <david.daney@cavium.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:06 -08:00
Daniel Borkmann 740d52c646 bpf, sparc64: remove obsolete exception handling from div/mod
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from sparc64 JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:06 -08:00
Daniel Borkmann 53fbf5719a bpf, ppc64: remove obsolete exception handling from div/mod
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from ppc64 JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:06 -08:00
Daniel Borkmann a3212b8f15 bpf, s390x: remove obsolete exception handling from div/mod
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from s390x JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:06 -08:00
Daniel Borkmann 96a71005bd bpf, arm64: remove obsolete exception handling from div/mod
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from arm64 JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:06 -08:00
Daniel Borkmann 3e5b1a39d7 bpf, x86_64: remove obsolete exception handling from div/mod
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from x86_64 JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:06 -08:00
Daniel Borkmann f6b1b3bf0d bpf: fix subprog verifier bypass by div/mod by 0 exception
One of the ugly leftovers from the early eBPF days is that div/mod
operations based on registers have a hard-coded src_reg == 0 test
in the interpreter as well as in JIT code generators that would
return from the BPF program with exit code 0. This was basically
adopted from cBPF interpreter for historical reasons.

There are multiple reasons why this is very suboptimal and prone
to bugs. To name one: the return code mapping for such abnormal
program exit of 0 does not always match with a suitable program
type's exit code mapping. For example, '0' in tc means action 'ok'
where the packet gets passed further up the stack, which is just
undesirable for such cases (e.g. when implementing policy) and
also does not match with other program types.

While trying to work out an exception handling scheme, I also
noticed that programs crafted like the following will currently
pass the verifier:

  0: (bf) r6 = r1
  1: (85) call pc+8
  caller:
   R6=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
  callee:
   frame1: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_1
  10: (b4) (u32) r2 = (u32) 0
  11: (b4) (u32) r3 = (u32) 1
  12: (3c) (u32) r3 /= (u32) r2
  13: (61) r0 = *(u32 *)(r1 +76)
  14: (95) exit
  returning from callee:
   frame1: R0_w=pkt(id=0,off=0,r=0,imm=0)
           R1=ctx(id=0,off=0,imm=0) R2_w=inv0
           R3_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
           R10=fp0,call_1
  to caller at 2:
   R0_w=pkt(id=0,off=0,r=0,imm=0) R6=ctx(id=0,off=0,imm=0)
   R10=fp0,call_-1

  from 14 to 2: R0=pkt(id=0,off=0,r=0,imm=0)
                R6=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
  2: (bf) r1 = r6
  3: (61) r1 = *(u32 *)(r1 +80)
  4: (bf) r2 = r0
  5: (07) r2 += 8
  6: (2d) if r2 > r1 goto pc+1
   R0=pkt(id=0,off=0,r=8,imm=0) R1=pkt_end(id=0,off=0,imm=0)
   R2=pkt(id=0,off=8,r=8,imm=0) R6=ctx(id=0,off=0,imm=0)
   R10=fp0,call_-1
  7: (71) r0 = *(u8 *)(r0 +0)
  8: (b7) r0 = 1
  9: (95) exit

  from 6 to 8: safe
  processed 16 insns (limit 131072), stack depth 0+0

Basically what happens is that in the subprog we make use of a
div/mod by 0 exception and in the 'normal' subprog's exit path
we just return skb->data back to the main prog. This has the
implication that the verifier thinks we always get a pkt pointer
in R0 while we still have the implicit 'return 0' from the div
as an alternative unconditional return path earlier. Thus, R0
then contains 0, meaning back in the parent prog we get the
address range of [0x0, skb->data_end] as read and writeable.
Similar can be crafted with other pointer register types.

Since i) BPF_ABS/IND is not allowed in programs that contain
BPF to BPF calls (and generally it's also disadvised to use in
native eBPF context), ii) unknown opcodes don't return zero
anymore, iii) we don't return an exception code in dead branches,
the only last missing case affected and to fix is the div/mod
handling.

What we would really need is some infrastructure to propagate
exceptions all the way to the original prog unwinding the
current stack and returning that code to the caller of the
BPF program. In user space such exception handling for similar
runtimes is typically implemented with setjmp(3) and longjmp(3)
as one possibility which is not available in the kernel,
though (kgdb used to implement it in kernel long time ago). I
implemented a PoC exception handling mechanism into the BPF
interpreter with porting setjmp()/longjmp() into x86_64 and
adding a new internal BPF_ABRT opcode that can use a program
specific exception code for all exception cases we have (e.g.
div/mod by 0, unknown opcodes, etc). While this seems to work
in the constrained BPF environment (meaning, here, we don't
need to deal with state e.g. from memory allocations that we
would need to undo before going into exception state), it still
has various drawbacks: i) we would need to implement the
setjmp()/longjmp() for every arch supported in the kernel and
for x86_64, arm64, sparc64 JITs currently supporting calls,
ii) it has unconditional additional cost on main program
entry to store CPU register state in initial setjmp() call,
and we would need some way to pass the jmp_buf down into
___bpf_prog_run() for main prog and all subprogs, but also
storing on stack is not really nice (other option would be
per-cpu storage for this, but it also has the drawback that
we need to disable preemption for every BPF program types).
All in all this approach would add a lot of complexity.

Another poor-man's solution would be to have some sort of
additional shared register or scratch buffer to hold state
for exceptions, and test that after every call return to
chain returns and pass R0 all the way down to BPF prog caller.
This is also problematic in various ways: i) an additional
register doesn't map well into JITs, and some other scratch
space could only be on per-cpu storage, which, again has the
side-effect that this only works when we disable preemption,
or somewhere in the input context which is not available
everywhere either, and ii) this adds significant runtime
overhead by putting conditionals after each and every call,
as well as implementation complexity.

Yet another option is to teach verifier that div/mod can
return an integer, which however is also complex to implement
as verifier would need to walk such fake 'mov r0,<code>; exit;'
sequeuence and there would still be no guarantee for having
propagation of this further down to the BPF caller as proper
exception code. For parent prog, it is also is not distinguishable
from a normal return of a constant scalar value.

The approach taken here is a completely different one with
little complexity and no additional overhead involved in
that we make use of the fact that a div/mod by 0 is undefined
behavior. Instead of bailing out, we adapt the same behavior
as on some major archs like ARMv8 [0] into eBPF as well:
X div 0 results in 0, and X mod 0 results in X. aarch64 and
aarch32 ISA do not generate any traps or otherwise aborts
of program execution for unsigned divides. I verified this
also with a test program compiled by gcc and clang, and the
behavior matches with the spec. Going forward we adapt the
eBPF verifier to emit such rewrites once div/mod by register
was seen. cBPF is not touched and will keep existing 'return 0'
semantics. Given the options, it seems the most suitable from
all of them, also since major archs have similar schemes in
place. Given this is all in the realm of undefined behavior,
we still have the option to adapt if deemed necessary and
this way we would also have the option of more flexibility
from LLVM code generation side (which is then fully visible
to verifier). Thus, this patch i) fixes the panic seen in
above program and ii) doesn't bypass the verifier observations.

  [0] ARM Architecture Reference Manual, ARMv8 [ARM DDI 0487B.b]
      http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487b.b/DDI0487B_b_armv8_arm.pdf
      1) aarch64 instruction set: section C3.4.7 and C6.2.279 (UDIV)
         "A division by zero results in a zero being written to
          the destination register, without any indication that
          the division by zero occurred."
      2) aarch32 instruction set: section F1.4.8 and F5.1.263 (UDIV)
         "For the SDIV and UDIV instructions, division by zero
          always returns a zero result."

Fixes: f4d7e40a5b ("bpf: introduce function calls (verification)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:05 -08:00
Daniel Borkmann 5e581dad4f bpf: make unknown opcode handling more robust
Recent findings by syzcaller fixed in 7891a87efc ("bpf: arsh is
not supported in 32 bit alu thus reject it") triggered a warning
in the interpreter due to unknown opcode not being rejected by
the verifier. The 'return 0' for an unknown opcode is really not
optimal, since with BPF to BPF calls, this would go untracked by
the verifier.

Do two things here to improve the situation: i) perform basic insn
sanity check early on in the verification phase and reject every
non-uapi insn right there. The bpf_opcode_in_insntable() table
reuses the same mapping as the jumptable in ___bpf_prog_run() sans
the non-public mappings. And ii) in ___bpf_prog_run() we do need
to BUG in the case where the verifier would ever create an unknown
opcode due to some rewrites.

Note that JITs do not have such issues since they would punt to
interpreter in these situations. Moreover, the BPF_JIT_ALWAYS_ON
would also help to avoid such unknown opcodes in the first place.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:05 -08:00
Daniel Borkmann 2a5418a13f bpf: improve dead code sanitizing
Given we recently had c131187db2 ("bpf: fix branch pruning
logic") and 95a762e2c8 ("bpf: fix incorrect sign extension in
check_alu_op()") in particular where before verifier skipped
verification of the wrongly assumed dead branch, we should not
just replace the dead code parts with nops (mov r0,r0). If there
is a bug such as fixed in 95a762e2c8 in future again, where
runtime could execute those insns, then one of the potential
issues with the current setting would be that given the nops
would be at the end of the program, we could execute out of
bounds at some point.

The best in such case would be to just exit the BPF program
altogether and return an exception code. However, given this
would require two instructions, and such a dead code gap could
just be a single insn long, we would need to place 'r0 = X; ret'
snippet at the very end after the user program or at the start
before the program (where we'd skip that region on prog entry),
and then place unconditional ja's into the dead code gap.

While more complex but possible, there's still another block
in the road that currently prevents from this, namely BPF to
BPF calls. The issue here is that such exception could be
returned from a callee, but the caller would not know that
it's an exception that needs to be propagated further down.
Alternative that has little complexity is to just use a ja-1
code for now which will trap the execution here instead of
silently doing bad things if we ever get there due to bugs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:05 -08:00
Daniel Borkmann 1d621674d9 bpf: xor of a/x in cbpf can be done in 32 bit alu
Very minor optimization; saves 1 byte per program in x86_64
JIT in cBPF prologue.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26 16:42:05 -08:00
Mickaël Salaün c25ef6a5e6 samples/bpf: Partially fixes the bpf.o build
Do not build lib/bpf/bpf.o with this Makefile but use the one from the
library directory.  This avoid making a buggy bpf.o file (e.g. missing
symbols).

This patch is useful if some code (e.g. Landlock tests) needs both the
bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-26 23:57:10 +01:00
Lawrence Brakmo 771fc607e6 bpf: clean up from test_tcpbpf_kern.c
Removed commented lines from test_tcpbpf_kern.c

Fixes: d6d4f60c3a bpf: add selftest for tcpbpf
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-26 23:39:05 +01:00
Emil Tantilov 2bafa8fac1 ixgbe: don't set RXDCTL.RLPML for 82599
commit 2de6aa3a66 ("ixgbe: Add support for padding packet")

Uses RXDCTL.RLPML to limit the maximum frame size on Rx when using
build_skb. Unfortunately that register does not work on 82599.

Added an explicit check to avoid setting this register on 82599 MAC.

Extended the comment related to the setting of RXDCTL.RLPML to better
explain its purpose.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 10:25:35 -08:00
Dan Carpenter fd492228d4 ixgbe: Fix && vs || typo
"offset" can't be both 0x0 and 0xFFFF so presumably || was intended
instead of &&.  That matches with how this check is done in other
functions.

Fixes: 73834aec71 ("ixgbe: extend firmware version support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 10:25:32 -08:00
Paul Greenwalt 06e3f94947 ixgbe: add support for reporting 5G link speed
Since 5G link speed is supported by some devices, add reporting of 5G link
speed.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 10:25:28 -08:00
Miroslav Lichvar 838200482d ixgbe: Don't report unsupported timestamping filters for X550
The current code enables on X550 timestamping of all packets for any
filter, which means ethtool should not report any PTP-specific filters
as unsupported.

Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 10:25:23 -08:00
Colin Ian King f0e49dc3f9 ixgbe: use ARRAY_SIZE for array sizing calculation on array buf
Use the ARRAY_SIZE macro on array buf to determine size of the array.
Improvement suggested by coccinelle.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 10:25:19 -08:00
Colin Ian King 4078ea3756 ixgbevf: use ARRAY_SIZE for various array sizing calculations
Use the ARRAY_SIZE macro on various arrays to determine
size of the arrays. Improvement suggested by coccinelle.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 10:25:15 -08:00
Emil Tantilov 865a4d987b ixgbevf: don't bother clearing tx_buffer_info in ixgbevf_clean_tx_ring()
In the case of the Tx rings we need to only clear the Tx buffer_info when
we are resetting the rings.  Ideally we do this when we configure the ring
to bring it back up instead of when we are taking it down in order to avoid
dirtying pages we don't need to.

In addition we don't need to clear the Tx descriptor ring since we will
fully repopulate it when we begin transmitting frames and next_to_watch can
be cleared to prevent the ring from being cleaned beyond that point instead
of needing to touch anything in the Tx descriptor ring.

Finally with these changes we can avoid having to reset the skb member of
the Tx buffer_info structure in the cleanup path since the skb will always
be associated with the first buffer which has next_to_watch set.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 10:25:02 -08:00
David S. Miller 6bb46bc57c Merge branch 'cxgb4-fix-dump-collection-when-firmware-crashed'
Rahul Lakkireddy says:

====================
cxgb4: fix dump collection when firmware crashed

Patch 1 resets FW_OK flag, if firmware reports error.

Patch 2 fixes incorrect condition for using firmware LDST commands.

Patch 3 fixes dump collection logic to use backdoor register
access to collect dumps when firmware is crashed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 11:00:23 -05:00
Rahul Lakkireddy 770ca3477a cxgb4: use backdoor access to collect dumps when firmware crashed
Fallback to backdoor register access to collect dumps if firmware
is crashed.  Fixes TID, SGE Queue Context, and MPS TCAM dump collection.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 11:00:22 -05:00
Rahul Lakkireddy ebb5568fe2 cxgb4: fix incorrect condition for using firmware LDST commands
Only contact firmware if it's alive _AND_ if use_bd (use backdoor
access) is not set when issuing FW_LDST_CMD.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 11:00:22 -05:00
Rahul Lakkireddy 825b2b6fd9 cxgb4: reset FW_OK flag on firmware crash
If firmware reports error, reset FW_OK flag.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 11:00:22 -05:00
David S. Miller 2d0a527ed3 Merge branch 'hns3-next'
Peng Li says:

====================
net: hns3: add support ethtool_ops.{set|get}_coalesce for VF

This patch-set adds ethtool_ops.{get|set}_coalesce to VF and
fix one related bug.

HNS3 PF and VF driver use the common enet layer, as the
ethtool_ops.{get|set}_coalesce to PF have upstreamed,  just
need add the ops to hns3vf_ethtool_ops.

[Patch 1/2] fix a related bug for the VF ethtool_ops.{set|
get}_coalesce.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:58:30 -05:00
Fuyun Liang 79eee41085 net: hns3: add int_gl_idx setup for VF
Just like PF, if the int_gl_idx of VF does not be set, the default
interrupt coalesce index of VF is 0. But it should be GL1 for TX
queues and GL0 for RX queues.

This patch adds the int_gl_idx setup for VF.

Fixes: 200ecda42598 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:58:30 -05:00
Fuyun Liang ad31c73201 net: hns3: add get/set_coalesce support to VF
This patch adds ethtool_ops.get/set_coalesce support to VF.

Since PF and VF share the same get/set_coalesce interface,
we only need to set hns3_get/set_coalesce to the ethtool_ops
when supporting get/set_coalesce for VF.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:58:29 -05:00
David S. Miller e2d6e64bc3 linux-can-next-for-4.16-20180126
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAlpq+ZkTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAe/sog7Dgc9mFcB/wPSu30a664/+wjUvXM7Zdw4ko/PRdS
 deSRnjGj3epkHRyGJkdGSuPx9iGg3pqR8poMCZZmFUG+kGBmEcGQX+eyaR41zIUz
 iyEgZSufYDjsW47eGBsNE01xQjoL1jcF9JM7NHmRrw4+2YF75cGE3BOGcmcV6Hjc
 O5HDIpLmbeMHI4NcujgD4UG/VPnZQw3+oN9eyYUEbY5Aa2XQyW76DIJ3SyKsHQz0
 K/s0uxAGo+Ap7xuoBUJpx6BBYoHYM171DTgXfH9pUB0MwqyDCq3hAyYGR+UEdIXb
 IDhIcN/l5wFU8VICjYmSKgKyjjHqlixgoki2snmJxVWu0KeVl5LJ1Edv
 =7jiC
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-4.16-20180126' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2018-01-26

this is a pull request for net-next/master consisting of 3 patches.

The first two patches target the CAN documentation. The first is by me
and fixes pointer to location of fsl,mpc5200-mscan node in the mpc5200
documentation. The second patch is by Robert Schwebel and it converts
the plain ASCII documentation to restructured text.

The third patch is by Fabrizio Castro add the r8a774[35] support to the
rcar_can dt-bindings documentation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:49:12 -05:00
Emil Tantilov 6f3554548e ixgbevf: improve performance and reduce size of ixgbevf_tx_map()
Based on commit ec718254cb
("ixgbe: Improve performance and reduce size of ixgbe_tx_map")

This change is meant to both improve the performance and reduce the size of
ixgbevf_tx_map().

Expand the work done in the main loop by pushing first into tx_buffer.
This allows us to pull in the dma_mapping_error check, the tx_buffer value
assignment, and the initial DMA value assignment to the Tx descriptor.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 07:46:51 -08:00
Emil Tantilov 40b8178bc9 ixgbevf: clear rx_buffer_info in configure instead of clean
Based on commit d2bead576e
("igb: Clear Rx buffer_info in configure instead of clean")

This change makes it so that instead of going through the entire ring on Rx
cleanup we only go through the region that was designated to be cleaned up
and stop when we reach the region where new allocations should start.

In addition we can avoid having to perform a memset on the Rx buffer_info
structures until we are about to start using the ring again.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 07:46:51 -08:00
Emil Tantilov 2a35efe582 ixgbevf: add counters for Rx page allocations
We already had placehloders for failed page and buffer allocations.
Added alloc_rx_page and made sure the stats are properly updated and
exposed in ethtool.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 07:46:51 -08:00
Emil Tantilov 35074d698d ixgbevf: update code to better handle incrementing page count
Based on commit bd4171a5d4
("igb: update code to better handle incrementing page count")

Update the driver code so that we do bulk updates of the page reference
count instead of just incrementing it by one reference at a time.  The
advantage to doing this is that we cut down on atomic operations and
this in turn should give us a slight improvement in cycles per packet.
In addition if we eventually move this over to using build_skb the gains
will be more noticeable.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 07:46:50 -08:00
Emil Tantilov 16b359498b ixgbevf: add support for DMA_ATTR_SKIP_CPU_SYNC/WEAK_ORDERING
Based on commit 5be5955425
("igb: update driver to make use of DMA_ATTR_SKIP_CPU_SYNC")
and
commit 7bd1759282 ("igb: Add support for DMA_ATTR_WEAK_ORDERING")

Convert the calls to dma_map/unmap_page() to the attributes version
and add DMA_ATTR_SKIP_CPU_SYNC/WEAK_ORDERING which should help
improve performance on some platforms.

Move sync_for_cpu call before we perform a prefetch to avoid
invalidating the first 128 bytes of the packet on architectures where
that call may invalidate the cache.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 07:46:50 -08:00
Emil Tantilov 24bff091d7 ixgbevf: use length to determine if descriptor is done
Based on:
commit 7ec0116c91 ("igb: Use length to determine if descriptor is done")

This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.

In addition we only reset the Rx descriptor length for descriptor zero when
resetting a ring instead of having to do a memset with 0 over the entire
ring. By doing this we can save some time on initialization.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 07:46:50 -08:00
Emil Tantilov 68b6ff5825 ixgbevf: only DMA sync frame length
Based on commit 64f2525ca4 ("igb: Only DMA sync frame length")

On some architectures synching a buffer for DMA may be expensive.
Instead of the entire 2K receive buffer only synchronize the length of
the frame, which will typically be the MTU or smaller.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 07:46:50 -08:00
Emil Tantilov a355fd9a1b ixgbevf: add function for checking if we can reuse page
Introduce ixgbevf_can_reuse_page() similar to the change in ixgbe from
commit af43da0dba
("ixgbe: Add function for checking to see if we can reuse page")

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 07:46:50 -08:00
David S. Miller 72a231b7a7 Merge branch 'net-smc-fixes-2018-01-26'
Ursula Braun says:

====================
net/smc: fixes 2018-01-26

here are some more smc patches. The first 4 patches take care about
different aspects of smc socket closing, the 5th patch improves
coding style.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:41:57 -05:00
Gustavo A. R. Silva a8fbf8e7ec net/smc: return booleans instead of integers
Return statements in functions returning bool should use
true/false instead of 1/0.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:41:56 -05:00
Ursula Braun 127f497058 net/smc: release clcsock from tcp_listen_worker
Closing a listen socket may hit the warning
WARN_ON(sock_owned_by_user(sk)) of tcp_close(), if the wake up of
the smc_tcp_listen_worker has not yet finished.
This patch introduces smc_close_wait_listen_clcsock() making sure
the listening internal clcsock has been closed in smc_tcp_listen_work(),
before the listening external SMC socket finishes closing.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:41:56 -05:00
Ursula Braun 51f1de79ad net/smc: replace sock_put worker by socket refcounting
Proper socket refcounting makes the sock_put worker obsolete.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:41:56 -05:00
Ursula Braun 8dce2786a2 net/smc: smc_poll improvements
Increase the socket refcount during poll wait.
Take the socket lock before checking socket state.
For a listening socket return a mask independent of state SMC_ACTIVE and
cover errors or closed state as well.
Get rid of the accept_q loop in smc_accept_poll().

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:41:56 -05:00