OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
John Hurley	f3b975778c	nfp: flower: tidy tunnel related private data Recent additions to the flower app private data have grouped the variables of a given feature into a struct and added that struct to the main private data struct. In keeping with this, move all tunnel related private data to their own struct. This has no affect on functionality but improves readability and maintenance of the code. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:15 -08:00
Pieter Jansen van Vuuren	467322e262	nfp: flower: support multiple memory units for filter offloads Adds support for multiple memory units which are used for filter offloads. Each filter is assigned a stats id, the MSBs of the id are used to determine which memory unit the filter should be offloaded to. The number of available memory units that could be used for filter offload is obtained from HW. A simple round robin technique is used to allocate and distribute the ids across memory units. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:14 -08:00
Fred Lotter	96439889b4	nfp: flower: increase cmesg reply timeout QA tests report occasional timeouts on REIFY message replies. Profiling of the two cmesg reply types under burst conditions, with a 12-core host under heavy cpu and io load (stress --cpu 12 --io 12), show both PHY MTU change and REIFY replies can exceed the 10ms timeout. The maximum MTU reply wait under burst is 16ms, while the maximum REIFY wait under 40 VF burst is 12ms. Using a 4 VF REIFY burst results in an 8ms maximum wait. A larger VF burst does increase the delay, but not in a linear enough way to justify a scaled REIFY delay. The worse case values between MTU and REIFY appears close enough to justify a common timeout. Pick a conservative 40ms to make a safer future proof common reply timeout. The delay only effects the failure case. Change the REIFY timeout mechanism to use wait_event_timeout() instead of wait_event_interruptible_timeout(), to match the MTU code. In the current implementation, theoretically, a signal could interrupt the REIFY waiting period, with a return code of ERESTARTSYS. However, this is caught under the general timeout error code EIO. I cannot see the benefit of exposing the REIFY waiting period to signals with such a short delay (40ms), while the MTU mechnism does not use the same logic. In the absence of any reply (wakeup() call), both reply types will wake up the task after the timeout period. The REIFY timeout applies to the entire representor group being instantiated (e.g. VFs), while the MTU timeout apples to a single PHY MTU change. Signed-off-by: Fred Lotter <frederik.lotter@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:14 -08:00
Luis Chamberlain	750afb08ca	cross-tree: phase out dma_zalloc_coherent() We already need to zero out memory for dma_alloc_coherent(), as such using dma_zalloc_coherent() is superflous. Phase it out. This change was generated with the following Coccinelle SmPL patch: @ replace_dma_zalloc_coherent @ expression dev, size, data, handle, flags; @@ -dma_zalloc_coherent(dev, size, handle, flags) +dma_alloc_coherent(dev, size, handle, flags) Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: re-ran the script on the latest tree] Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-01-08 07:58:37 -05:00
David S. Miller	339bbff2d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-12-21 The following pull-request contains BPF updates for your net-next tree. There is a merge conflict in test_verifier.c. Result looks as follows: [...] }, { "calls: cross frame pruning", .insns = { [...] .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, .errstr_unpriv = "function calls to other bpf functions are allowed for root only", .result_unpriv = REJECT, .errstr = "!read_ok", .result = REJECT, }, { "jset: functional", .insns = { [...] { "jset: unknown const compare not taken", .insns = { BPF_RAW_INSN(BPF_JMP \| BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1), BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, .errstr_unpriv = "!read_ok", .result_unpriv = REJECT, .errstr = "!read_ok", .result = REJECT, }, [...] { "jset: range", .insns = { [...] }, .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, .result_unpriv = ACCEPT, .result = ACCEPT, }, The main changes are: 1) Various BTF related improvements in order to get line info working. Meaning, verifier will now annotate the corresponding BPF C code to the error log, from Martin and Yonghong. 2) Implement support for raw BPF tracepoints in modules, from Matt. 3) Add several improvements to verifier state logic, namely speeding up stacksafe check, optimizations for stack state equivalence test and safety checks for liveness analysis, from Alexei. 4) Teach verifier to make use of BPF_JSET instruction, add several test cases to kselftests and remove nfp specific JSET optimization now that verifier has awareness, from Jakub. 5) Improve BPF verifier's slot_type marking logic in order to allow more stack slot sharing, from Jiong. 6) Add sk_msg->size member for context access and add set of fixes and improvements to make sock_map with kTLS usable with openssl based applications, from John. 7) Several cleanups and documentation updates in bpftool as well as auto-mount of tracefs for "bpftool prog tracelog" command, from Quentin. 8) Include sub-program tags from now on in bpf_prog_info in order to have a reliable way for user space to get all tags of the program e.g. needed for kallsyms correlation, from Song. 9) Add BTF annotations for cgroup_local_storage BPF maps and implement bpf fs pretty print support, from Roman. 10) Fix bpftool in order to allow for cross-compilation, from Ivan. 11) Update of bpftool license to GPLv2-only + BSD-2-Clause in order to be compatible with libbfd and allow for Debian packaging, from Jakub. 12) Remove an obsolete prog->aux sanitation in dump and get rid of version check for prog load, from Daniel. 13) Fix a memory leak in libbpf's line info handling, from Prashant. 14) Fix cpumap's frame alignment for build_skb() so that skb_shared_info does not get unaligned, from Jesper. 15) Fix test_progs kselftest to work with older compilers which are less smart in optimizing (and thus throwing build error), from Stanislav. 16) Cleanup and simplify AF_XDP socket teardown, from Björn. 17) Fix sk lookup in BPF kselftest's test_sock_addr with regards to netns_id argument, from Andrey. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 17:31:36 -08:00
David S. Miller	2be09de7d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 11:53:36 -08:00
Jakub Kicinski	4987eaccd2	nfp: bpf: optimize codegen for JSET with a constant The top word of the constant can only have bits set if sign extension set it to all-1, therefore we don't really have to mask the top half of the register. We can just OR it into the result as is. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-12-20 17:28:29 +01:00
Jakub Kicinski	6e774845b3	nfp: bpf: remove the trivial JSET optimization The verifier will now understand the JSET instruction, so don't mark the dead branch in the JIT as noop. We won't generate any code, anyway. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-12-20 17:28:28 +01:00
John Hurley	b12c97d45c	nfp: flower: fix cb_ident duplicate in indirect block register Previously the identifier used for indirect block callback registry and for block rule cb registry (when done via indirect blocks) was the pointer to the netdev we were interested in receiving updates on. This worked fine if a single app existed that registered one callback per netdev of interest. However, if multiple cards are in place and, in turn, multiple apps, then each app may register the same callback with the same identifier to both the netdev's indirect block cb list and to a block's cb list. This can lead to EEXIST errors and/or incorrect cb deletions. Prevent this conflict by using the app pointer as the identifier for netdev indirect block cb registry, allowing each app to register a unique callback per netdev. For block cb registry, the same app may register multiple cbs to the same block if using TC shared blocks. Instead of the app, use the pointer to the allocated cb_priv data as the identifier here. This means that there can be a unique block callback for each app/netdev combo. Fixes: `3166dd07a9` ("nfp: flower: offload tunnel decap rules via indirect TC blocks") Reported-by: Edward Cree <ecree@solarflare.com> Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-17 23:34:12 -08:00
Jakub Kicinski	036b9e7cae	nfp: abm: allow to opt-out of RED offload FW team asks to be able to not support RED even if NIC is capable of buffering for testing and experimentation. Add an opt-out flag. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-16 12:41:42 -08:00
David S. Miller	addb067983	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-12-11 The following pull-request contains BPF updates for your net-next tree. It has three minor merge conflicts, resolutions: 1) tools/testing/selftests/bpf/test_verifier.c Take first chunk with alignment_prevented_execution. 2) net/core/filter.c [...] case bpf_ctx_range_ptr(struct __sk_buff, flow_keys): case bpf_ctx_range(struct __sk_buff, wire_len): return false; [...] 3) include/uapi/linux/bpf.h Take the second chunk for the two cases each. The main changes are: 1) Add support for BPF line info via BTF and extend libbpf as well as bpftool's program dump to annotate output with BPF C code to facilitate debugging and introspection, from Martin. 2) Add support for BPF_ALU \| BPF_ARSH \| BPF_{K,X} in interpreter and all JIT backends, from Jiong. 3) Improve BPF test coverage on archs with no efficient unaligned access by adding an "any alignment" flag to the BPF program load to forcefully disable verifier alignment checks, from David. 4) Add a new bpf_prog_test_run_xattr() API to libbpf which allows for proper use of BPF_PROG_TEST_RUN with data_out, from Lorenz. 5) Extend tc BPF programs to use a new __sk_buff field called wire_len for more accurate accounting of packets going to wire, from Petar. 6) Improve bpftool to allow dumping the trace pipe from it and add several improvements in bash completion and map/prog dump, from Quentin. 7) Optimize arm64 BPF JIT to always emit movn/movk/movk sequence for kernel addresses and add a dedicated BPF JIT backend allocator, from Ard. 8) Add a BPF helper function for IR remotes to report mouse movements, from Sean. 9) Various cleanups in BPF prog dump e.g. to make UAPI bpf_prog_info member naming consistent with existing conventions, from Yonghong and Song. 10) Misc cleanups and improvements in allowing to pass interface name via cmdline for xdp1 BPF example, from Matteo. 11) Fix a potential segfault in BPF sample loader's kprobes handling, from Daniel T. 12) Fix SPDX license in libbpf's README.rst, from Andrey. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-10 18:00:43 -08:00
Pieter Jansen van Vuuren	290974d434	nfp: flower: ensure TCP flags can be placed in IPv6 frame Previously we did not ensure tcp flags have a place to be stored when using IPv6. We correct this by including IPv6 key layer when we match tcp flags and the IPv6 key layer has not been included already. Fixes: `07e1671cfc` ("nfp: flower: refactor shared ip header in match offload") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-10 17:45:41 -08:00
David S. Miller	4cc1feeb6f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Several conflicts, seemingly all over the place. I used Stephen Rothwell's sample resolutions for many of these, if not just to double check my own work, so definitely the credit largely goes to him. The NFP conflict consisted of a bug fix (moving operations past the rhashtable operation) while chaning the initial argument in the function call in the moved code. The net/dsa/master.c conflict had to do with a bug fix intermixing of making dsa_master_set_mtu() static with the fixing of the tagging attribute location. cls_flower had a conflict because the dup reject fix from Or overlapped with the addition of port range classifiction. __set_phy_supported()'s conflict was relatively easy to resolve because Andrew fixed it in both trees, so it was just a matter of taking the net-next copy. Or at least I think it was :-) Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup() intermixed with changes on how the sdif and caller_net are calculated in these code paths in net-next. The remaining BPF conflicts were largely about the addition of the __bpf_md_ptr stuff in 'net' overlapping with adjustments and additions to the relevant data structure where the MD pointer macros are used. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-09 21:43:31 -08:00
Jiong Wang	84708c1386	nfp: bpf: implement jitting of BPF_ALU \| BPF_ARSH \| BPF_* BPF_X support needs indirect shift mode, please see code comments for details. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-12-07 13:30:48 -08:00
Yangtao Li	6f6c74fad8	nfp: convert to DEFINE_SHOW_ATTRIBUTE Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-03 17:33:38 -08:00
Jakub Kicinski	6db3a9dcf0	nfp: report more info when reconfiguration fails FW reconfiguration timeouts are a common indicator of FW trouble. To make debugging easier print requested update and control word when reconfiguration fails. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:30:45 -08:00
Jakub Kicinski	9571d98775	nfp: add offset to all TLV parsing errors When troubleshooting incorrect FW capabilities it's useful to know where the faulty TLV is located. Add offset to all errors messages. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:30:44 -08:00
Jakub Kicinski	51a6588e8c	nfp: add offloads on representors FW/HW can generally support the standard networking offloads on representors without any trouble. Add the ability for FW to advertise which features should be available on representors. Because representors are muxed on top of the vNIC we need to listen on feature changes of their lower devices, and update their features appropriately. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:30:44 -08:00
Jakub Kicinski	71844fac1e	nfp: add locking around representor changes Up until now we never needed to keep a networking locks around representors accesses, we only accessed them when device was reconfigured (under nfp pf->lock) or on fast path (under RCU). Now we want to be able to iterate over all representors during notifications, so make sure representor assignment is done under RTNL lock. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:30:44 -08:00
Jakub Kicinski	fbf60e377d	nfp: run don't require Qdiscs on representor netdevs Our representors are software devices built on top of the PF vNIC, the queuing should only happen at the vNIC netdevice. Allow representors to run qdisc-less. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:30:44 -08:00
Jakub Kicinski	9db8bbcb9b	nfp: run representor TX locklessly Our representors are software devices built on top of the PF vNIC, the only state they have are per-cpu stats, so make the TX run locklessly. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:30:44 -08:00
Jakub Kicinski	d7cc825225	nfp: avoid oversized TSO headers with metadata prepend In preparation for TSO over representors make sure the port id prepend will always fit in the frame. The current max header length is 255, which is ample, so assume worst case scenario of 8 byte prepend and save ourselves the conditionals. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:30:44 -08:00
Jakub Kicinski	b54ad0eaad	nfp: correct descriptor offsets in presence of metadata The TSO-related offsets in the descriptor should not include the length of the prepended metadata. Adjust them. Note that this could not have caused issues in the past as we don't support TSO with metadata prepend as of this patch. Signed-off-by: Michael Rapson <michael.rapson@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:30:44 -08:00
Jakub Kicinski	8b5ddf1e51	nfp: move queue variable init nd_q is only used at the very end of nfp_net_tx(), there is no need to initialize it early. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:30:44 -08:00
Jakub Kicinski	de31049a48	nfp: move temporary variables in nfp_net_tx_complete() Move temporary variables in scope of the loop in nfp_net_tx_complete(), and add a temp for txbuf software structure. This saves us 0.2% of CPU. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:30:44 -08:00
Jakub Kicinski	9586274967	nfp: copy only the relevant part of the TX descriptor for frags Chained descriptors for fragments need to duplicate all the descriptor fields of the skb head, so we copy the descriptor and then modify the relevant fields. This is wasteful, because the top half of the descriptor will get overwritten entirely while the bottom half is not modified at all. Copy only the bottom half. This saves us 0.3% of CPU in a GSO test. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:30:44 -08:00
John Hurley	b5f0cf0834	nfp: flower: prevent offload if rhashtable insert fails For flow offload adds, if the rhash insert code fails, the flow will still have been offloaded but the reference to it in the driver freed. Re-order the offload setup calls to ensure that a flow will only be written to FW if a kernel reference is held and stored in the rhashtable. Remove this hashtable entry if the offload fails. Fixes: `c01d0efa51` ("nfp: flower: use rhashtable for flow caching") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:24:56 -08:00
John Hurley	1166494891	nfp: flower: release metadata on offload failure Calling nfp_compile_flow_metadata both assigns a stats context and increments a ref counter on (or allocates) a mask id table entry. These are released by the nfp_modify_flow_metadata call on flow deletion, however, if a flow add fails after metadata is set then the flow entry will be deleted but the metadata assignments leaked. Add an error path to the flow add offload function to ensure allocated metadata is released in the event of an offload fail. Fixes: `81f3ddf254` ("nfp: add control message passing capabilities to flower offloads") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-30 13:24:56 -08:00
David S. Miller	4afe60a97b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-11-26 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Extend BTF to support function call types and improve the BPF symbol handling with this info for kallsyms and bpftool program dump to make debugging easier, from Martin and Yonghong. 2) Optimize LPM lookups by making longest_prefix_match() handle multiple bytes at a time, from Eric. 3) Adds support for loading and attaching flow dissector BPF progs from bpftool, from Stanislav. 4) Extend the sk_lookup() helper to be supported from XDP, from Nitin. 5) Enable verifier to support narrow context loads with offset > 0 to adapt to LLVM code generation (currently only offset of 0 was supported). Add test cases as well, from Andrey. 6) Simplify passing device functions for offloaded BPF progs by adding callbacks to bpf_prog_offload_ops instead of ndo_bpf. Also convert nfp and netdevsim to make use of them, from Quentin. 7) Add support for sock_ops based BPF programs to send events to the perf ring-buffer through perf_event_output helper, from Sowmini and Daniel. 8) Add read / write support for skb->tstamp from tc BPF and cg BPF programs to allow for supporting rate-limiting in EDT qdiscs like fq from BPF side, from Vlad. 9) Extend libbpf API to support map in map types and add test cases for it as well to BPF kselftests, from Nikita. 10) Account the maximum packet offset accessed by a BPF program in the verifier and use it for optimizing nfp JIT, from Jiong. 11) Fix error handling regarding kprobe_events in BPF sample loader, from Daniel T. 12) Add support for queue and stack map type in bpftool, from David. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-26 13:08:17 -08:00
Jakub Kicinski	340a4864d5	nfp: abm: add support for more threshold actions Original FW only allowed us to perform ECN marking. Newer releases also support plain old drop. Add the ability to configure drop policy. This is particularly useful in combination with GRED, because different bands can have different ECN marking setting. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:53:46 -08:00
Jakub Kicinski	174ab544e3	nfp: abm: add cls_u32 offload for simple band classification Use offload of very simple u32 filters to direct packets to GRED bands based on the DSCP marking. No u32 hashing is supported, just plain simple filters matching on ToS or Priority with appropriate mask device can support. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:53:46 -08:00
Jakub Kicinski	6a80240571	nfp: abm: add functions to update DSCP -> virtual queue map Learn how to set the DSCP map. FW uses a packed array which geometry depends on the number of supported priorities and virtual queues. Write code to assemble this map and to communicate the setting to the FW via mailbox. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:53:46 -08:00
Jakub Kicinski	14780c3429	nfp: abm: calculate PRIO map len and check mailbox size In preparation for PRIO offload calculate how long the prio map for FW will be and make sure the configuration can be performed via the vNIC mailbox. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:53:46 -08:00
Jakub Kicinski	f3d6372064	nfp: abm: add GRED offload Add support for GRED offload. It behaves much like RED, but can apply different parameters to different bands. GRED operates pretty much exactly like our HW/FW with a single FIFO and different RED state instances. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:53:46 -08:00
Jakub Kicinski	990b50a53a	nfp: abm: wrap RED parameters in bands Wrap RED parameters and stats into a structure, and a 1-element array. Upcoming GRED offload will add the support for more bands. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:53:46 -08:00
Jakub Kicinski	184ec856ca	nfp: abm: add up bands for sto/non-sto stats Add up stats for all bands for the extra ethtool statistics. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:53:45 -08:00
Jakub Kicinski	57f31bbaa9	nfp: abm: switch to extended stats for reading packet/byte counts In PRIO-enabled FW read the statistics from per-band symbol, rather than from the standard per-PCIe-queue counters. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:53:45 -08:00
Jakub Kicinski	68e9864221	nfp: abm: size threshold table to account for bands Make sure the threshold table is large enough to hold information for all bands. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:53:45 -08:00
Jakub Kicinski	5720769609	nfp: abm: pass band parameter to functions In preparation for per-band RED offload pass band parameter to functions. For now it will always be 0. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:53:45 -08:00
Jakub Kicinski	3a44820591	nfp: abm: map per-band symbols In preparation for multi-band RED offload if FW is capable map the extended symbols which will allow us to set per-band parameters and read stats. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-19 18:53:45 -08:00
Jakub Kicinski	bd3b5d462a	nfp: abm: restructure Qdisc handling In preparation of handling more Qdisc types switch to a different offload strategy. We have now recreated the Qdisc hierarchy in the driver. Every time the hierarchy changes parse it, and update the configuration of the HW accordingly. While at it drop the support of pretending that we can instantiate a single queue on a multi-queue device in HW/FW. MQ is now required, and each queue will have its own instance of RED. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-14 08:51:28 -08:00
Jakub Kicinski	52db4eaca5	nfp: abm: save RED's parameters Use the new driver Qdisc structure to keep track of parameters of RED Qdiscs. This way as the Qdisc moves around in the hierarchy we will be able to configure the HW appropriately. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-14 08:51:28 -08:00
Jakub Kicinski	6c5dbda0d4	nfp: abm: reset RED's child based on limit RED qdisc will replace its child Qdisc with a new FIFO queue if it is reconfigured and the limit parameter is not 0. This means that when it's created with limit of 0 it will have no FIFO, and all packets will be dropped. If it's changed and limit is specified it will loose its existing child (implicit graft). Make sure we mark RED Qdisc child as NFP_QDISC_UNTRACKED if its not the expected FIFO. nfp_abm_qdisc_replace() will return 1 if Qdisc already existed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-14 08:51:28 -08:00
Jakub Kicinski	6b8417b7e6	nfp: abm: build full Qdisc hierarchy based on graft notifications Using graft notifications recreate in the driver the full Qdisc hierarchy. Keep track of how many times each Qdisc is attached to the hierarchy to make sure we don't offload Qdiscs which are attached multiple times (device queues can't be shared). For graft events of Qdiscs we don't know exist make the child as invalid/untracked. Note that MQ Qdisc doesn't send destruction events reliably when device is dismantled, so we need to manually clean out the children otherwise we'd think Qdiscs which are still in use are getting freed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-14 08:51:28 -08:00
Jakub Kicinski	aee7539c58	nfp: abm: allocate Qdisc child table To keep track of Qdisc hierarchy allocate a table for children for each Qdisc. RED Qdisc can only have one child. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-14 08:51:27 -08:00
Jakub Kicinski	1853125889	nfp: abm: remember which Qdisc is root Keep track of which Qdisc is currently root. We need to implement TC_SETUP_ROOT_QDISC handling, and for completeness also clear the root Qdisc pointer when it's freed. TC_SETUP_ROOT_QDISC isn't always sent when device is dismantled. Remembering the root Qdisc will allow us to build the entire hierarchy in following patches. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-14 08:51:27 -08:00
Jakub Kicinski	4f5681d088	nfp: abm: track all offload-enabled qdiscs Allocate an object corresponding to any offloaded qdisc we are informed about by the kernel. Not only the qdiscs we have a chance of offloading. The count of created objects will be used to decide whether the ethtool TC offload can be disabled, since otherwise we may miss destroy commands. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-14 08:51:27 -08:00
Jakub Kicinski	6666f545e9	nfp: abm: keep track of all RED thresholds Instead of writing the threshold out when Qdisc is configured and not remembering it move to a scheme where we remember all thresholds. When configuration changes parse the offloaded Qdiscs and set thresholds appropriately. This will help future extensions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-14 08:51:27 -08:00
Jakub Kicinski	08990494e5	nfp: abm: rename qdiscs -> red_qdiscs Rename qdiscs member to red_qdiscs. One of following patches will use the name qdiscs for tracking all qdisc types. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-14 08:51:27 -08:00
John Hurley	d4b69bad61	nfp: flower: remove unnecessary code in flow lookup Recent changes to NFP mean that stats updates from fw to driver no longer require a flow lookup and (because egdev offload has been removed) the ingress netdev for a lookup is now always known. Remove obsolete code in a flow lookup that matches on host context and that allows for a netdev to be NULL. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-11 09:54:53 -08:00
John Hurley	4f63fde3fc	nfp: flower: remove TC egdev offloads Previously, only tunnel decap rules required egdev registration for offload in NFP. These are now supported via indirect TC block callbacks. Remove the egdev code from NFP. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-11 09:54:53 -08:00
John Hurley	3166dd07a9	nfp: flower: offload tunnel decap rules via indirect TC blocks Previously, TC block tunnel decap rules were only offloaded when a callback was triggered through registration of the rules egress device. This meant that the driver had no access to the ingress netdev and so could not verify it was the same tunnel type that the rule implied. Register tunnel devices for indirect TC block offloads in NFP, giving access to new rules based on the ingress device rather than egress. Use this to verify the netdev type of VXLAN and Geneve based rules and offload the rules to HW if applicable. Tunnel registration is done via a netdev notifier. On notifier registration, this is triggered for already existing netdevs. This means that NFP can register for offloads from devices that exist before it is loaded (filter rules will be replayed from the TC core). Similarly, on notifier unregister, a call is triggered for each currently active netdev. This allows the driver to unregister any indirect block callbacks that may still be active. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-11 09:54:53 -08:00
John Hurley	65b7970edf	nfp: flower: increase scope of netdev checking functions Both the actions and tunnel_conf files contain local functions that check the type of an input netdev. In preparation for re-use with tunnel offload via indirect blocks, move these to static inline functions in a header file. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-11 09:54:53 -08:00
John Hurley	7885b4fc8d	nfp: flower: allow non repr netdev offload Previously the offload functions in NFP assumed that the ingress (or egress) netdev passed to them was an nfp repr. Modify the driver to permit the passing of non repr netdevs as the ingress device for an offload rule candidate. This may include devices such as tunnels. The driver should then base its offload decision on a combination of ingress device and egress port for a rule. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-11 09:54:53 -08:00
Quentin Monnet	16a8cb5cff	bpf: do not pass netdev to translate() and prepare() offload callbacks The kernel functions to prepare verifier and translate for offloaded program retrieve "offload" from "prog", and "netdev" from "offload". Then both "prog" and "netdev" are passed to the callbacks. Simplify this by letting the drivers retrieve the net device themselves from the offload object attached to prog - if they need it at all. There is currently no need to pass the netdev as an argument to those functions. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	a40a26322a	bpf: pass prog instead of env to bpf_prog_offload_verifier_prep() Function bpf_prog_offload_verifier_prep(), called from the kernel BPF verifier to run a driver-specific callback for preparing for the verification step for offloaded programs, takes a pointer to a struct bpf_verifier_env object. However, no driver callback needs the whole structure at this time: the two drivers supporting this, nfp and netdevsim, only need a pointer to the struct bpf_prog instance held by env. Update the callback accordingly, on kernel side and in these two drivers. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	eb9119471e	bpf: pass destroy() as a callback and remove its ndo_bpf subcommand As part of the transition from ndo_bpf() to callbacks attached to struct bpf_offload_dev for some of the eBPF offload operations, move the functions related to program destruction to the struct and remove the subcommand that was used to call them through the NDO. Remove function __bpf_offload_ndo(), which is no longer used. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	b07ade27e9	bpf: pass translate() as a callback and remove its ndo_bpf subcommand As part of the transition from ndo_bpf() to callbacks attached to struct bpf_offload_dev for some of the eBPF offload operations, move the functions related to code translation to the struct and remove the subcommand that was used to call them through the NDO. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	00db12c3d1	bpf: call verifier_prep from its callback in struct bpf_offload_dev In a way similar to the change previously brought to the verify_insn hook and to the finalize callback, switch to the newly added ops in struct bpf_prog_offload for calling the functions used to prepare driver verifiers. Since the dev_ops pointer in struct bpf_prog_offload is no longer used by any callback, we can now remove it from struct bpf_prog_offload. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	1385d755cf	bpf: pass a struct with offload callbacks to bpf_offload_dev_create() For passing device functions for offloaded eBPF programs, there used to be no place where to store the pointer without making the non-offloaded programs pay a memory price. As a consequence, three functions were called with ndo_bpf() through specific commands. Now that we have struct bpf_offload_dev, and since none of those operations rely on RTNL, we can turn these three commands into hooks inside the struct bpf_prog_offload_ops, and pass them as part of bpf_offload_dev_create(). This commit effectively passes a pointer to the struct to bpf_offload_dev_create(). We temporarily have two struct bpf_prog_offload_ops instances, one under offdev->ops and one under offload->dev_ops. The next patches will make the transition towards the former, so that offload->dev_ops can be removed, and callbacks relying on ndo_bpf() added to offdev->ops as well. While at it, rename "nfp_bpf_analyzer_ops" as "nfp_bpf_dev_ops" (and similarly for netdevsim). Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:53 -08:00
Quentin Monnet	1da6f57338	nfp: bpf: move nfp_bpf_analyzer_ops from verifier.c to offload.c We are about to add several new callbacks to the struct, all of them defined in offload.c. Move the struct bpf_prog_offload_ops object in that file. As a consequence, nfp_verify_insn() and nfp_finalize() can no longer be static. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:53 -08:00
Jakub Kicinski	560f1ba4d8	nfp: use the new __netdev_tx_sent_queue() BQL optimisation __netdev_tx_sent_queue() was added in commit e59020abf0f ("net: bql: add __netdev_tx_sent_queue()") and allows for better GSO performance. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-09 19:49:00 -08:00
Jiong Wang	cf599f5031	nfp: bpf: relax prog rejection through max_pkt_offset NFP is refusing to offload programs whenever the MTU is set to a value larger than the max packet bytes that fits in NFP Cluster Target Memory (CTM). However, a eBPF program doesn't always need to access the whole packet data. Verifier has always calculated maximum direct packet access (DPA) offset, and kept it in max_pkt_offset inside prog auxiliar information. This patch relax prog rejection based on max_pkt_offset. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-09 09:16:32 +01:00
Jakub Kicinski	6e5a716f42	nfp: abm: refuse RED offload with harddrop set RED Qdisc will now inform the drivers about the state of the harddrop flag. Refuse to offload in case harddrop is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-08 20:48:01 -08:00
Jakub Kicinski	cae5f48e32	nfp: abm: don't set negative threshold Turns out the threshold value is used in signed compares in the FW, so we should avoid setting the top bit. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-08 20:48:01 -08:00
Jakub Kicinski	032748acf6	nfp: abm: provide more precise info about offload parameter validation Improve log messages printed when RED can't be offloaded because of Qdisc parameters. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-08 20:48:01 -08:00
Jakub Kicinski	83ec8857a0	nfp: parse vNIC TLV capabilities at alloc time In certain cases initialization logic which follows allocation of the vNIC structure may want to validate the capabilities of that vNIC. This is easy before vNIC is initialized for normal capabilities which are at fixed offsets in control memory, easy to locate and read, but poses a challenge if the capabilities are in form of TLVs. Parse the TLVs early on so other code can just access parsed info, instead of having to do the parsing by itself. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-08 20:48:00 -08:00
Jakub Kicinski	e38f5d11b9	nfp: pass ctrl_bar pointer to nfp_net_alloc Move setting ctrl_bar pointer to the nfp_net_alloc function, to make sure we can parse capabilities early in the following patch. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-08 20:48:00 -08:00
Jakub Kicinski	47330f9bdf	nfp: abm: split qdisc offload code into a separate file The Qdisc offload code is logically separate, and we will soon do significant surgery on it to support more Qdiscs, so move it to a separate file. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-08 20:48:00 -08:00
John Hurley	e963e1097a	nfp: flower: include geneve as supported offload tunnel type Offload of geneve decap rules is supported in NFP. Include geneve in the check for supported types. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-07 23:00:23 -08:00
John Hurley	83f27d027d	nfp: flower: use geneve and vxlan helpers Make use of the recently added VXLAN and geneve helper functions to determine the type of the netdev from its rtnl_link_ops. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-07 23:00:23 -08:00
Jakub Kicinski	0c665e2bf4	nfp: flower: use the common netdev notifier Use driver's common notifier for LAG and tunnel configuration. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-07 11:45:22 -08:00
Jakub Kicinski	3e33359040	nfp: register a notifier handler in a central location for the device Code interested in networking events registers its own notifier handlers. Create one device-wide notifier instance. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-07 11:45:22 -08:00
Jakub Kicinski	659bb404eb	nfp: flower: make nfp_fl_lag_changels_event() void nfp_fl_lag_changels_event() never fails, and therefore we would never return NOTIFY_BAD for NETDEV_CHANGELOWERSTATE. Make this clearer by changing nfp_fl_lag_changels_event()'s return type to void. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-07 11:45:22 -08:00
Jakub Kicinski	a558c982a8	nfp: flower: don't try to nack device unregister events Returning an error from a notifier means we want to veto the change. We shouldn't veto NETDEV_UNREGISTER just because we couldn't find the tracking info for given master. I can't seem to find a way to trigger this unless we have some other bug, so it's probably not fix-worthy. While at it move the checking if the netdev really is of interest into the handling functions, like we do for other events. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-07 11:45:22 -08:00
Jakub Kicinski	e50bfdf74d	nfp: flower: remove unnecessary iteration over devices For flower tunnel offloads FW has to be informed about MAC addresses of tunnel devices. We use a netdev notifier to keep track of these addresses. Remove unnecessary loop over netdevices after notifier is registered. The intention of the loop was to catch devices which already existed on the system before nfp driver got loaded, but netdev notifier will replay NETDEV_REGISTER events. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-07 11:45:22 -08:00
Pieter Jansen van Vuuren	4234d62c27	nfp: flower: add ipv6 set flow label and hop limit offload Add ipv6 set flow label and hop limit action offload. Since pedit sets headers per 4 byte word, we need to ensure that setting either version, priority, payload_len or nexthdr does not get offloaded. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-07 11:45:21 -08:00
Pieter Jansen van Vuuren	a3c6b063fe	nfp: flower: add ipv4 set ttl and tos offload Add ipv4 set ttl and tos action offload. Since pedit sets headers per 4 byte word, we need to ensure that setting either version, ihl, protocol, total length or checksum does not get offloaded. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-07 11:45:21 -08:00
David S. Miller	a19c59cc10	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-10-21 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Implement two new kind of BPF maps, that is, queue and stack map along with new peek, push and pop operations, from Mauricio. 2) Add support for MSG_PEEK flag when redirecting into an ingress psock sk_msg queue, and add a new helper bpf_msg_push_data() for insert data into the message, from John. 3) Allow for BPF programs of type BPF_PROG_TYPE_CGROUP_SKB to use direct packet access for __skb_buff, from Song. 4) Use more lightweight barriers for walking perf ring buffer for libbpf and perf tool as well. Also, various fixes and improvements from verifier side, from Daniel. 5) Add per-symbol visibility for DSO in libbpf and hide by default global symbols such as netlink related functions, from Andrey. 6) Two improvements to nfp's BPF offload to check vNIC capabilities in case prog is shared with multiple vNICs and to protect against mis-initializing atomic counters, from Jakub. 7) Fix for bpftool to use 4 context mode for the nfp disassembler, also from Jakub. 8) Fix a return value comparison in test_libbpf.sh and add several bpftool improvements in bash completion, documentation of bpf fs restrictions and batch mode summary print, from Quentin. 9) Fix a file resource leak in BPF selftest's load_kallsyms() helper, from Peng. 10) Fix an unused variable warning in map_lookup_and_delete_elem(), from Alexei. 11) Fix bpf_skb_adjust_room() signature in BPF UAPI helper doc, from Nicolas. 12) Add missing executables to .gitignore in BPF selftests, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-21 21:11:46 -07:00
David S. Miller	2e2d6f0342	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-19 11:03:06 -07:00
Ido Schimmel	5ff4ff4fe8	net: Add netif_is_vxlan() Add the ability to determine whether a netdev is a VxLAN netdev by calling the above mentioned function that checks the netdev's rtnl_link_ops. This will allow modules to identify netdev events involving a VxLAN netdev and act accordingly. For example, drivers capable of VxLAN offload will need to configure the underlying device when a VxLAN netdev is being enslaved to an offloaded bridge. Convert nfp to use the newly introduced helper. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-17 17:45:07 -07:00
Jakub Kicinski	44b6fed0c1	nfp: bpf: double check vNIC capabilities after object sharing Program translation stage checks that program can be offloaded to the netdev which was passed during the load (bpf_attr->prog_ifindex). After program sharing was introduced, however, the netdev on which program is loaded can theoretically be different, and therefore we should recheck the program size and max stack size at load time. This was found by code inspection, AFAIK today all vNICs have identical caps. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-10-16 15:23:58 -07:00
Jakub Kicinski	527db74b71	nfp: bpf: protect against mis-initializing atomic counters Atomic operations on the NFP are currently always in big endian. The driver keeps track of regions of memory storing atomic values and byte swaps them accordingly. There are corner cases where the map values may be initialized before the driver knows they are used as atomic counters. This can happen either when the datapath is performing the update and the stack contents are unknown or when map is updated before the program which will use it for atomic values is loaded. To avoid situation where user initializes the value to 0 1 2 3 and then after loading a program which uses the word as an atomic counter starts reading 3 2 1 0 - only allow atomic counters to be initialized to endian-neutral values. For updates from the datapath the stack information may not be as precise, so just allow initializing such values to 0. Example code which would break: struct bpf_map_def SEC("maps") rxcnt = { .type = BPF_MAP_TYPE_HASH, .key_size = sizeof(__u32), .value_size = sizeof(__u64), .max_entries = 1, }; int xdp_prog1() { __u64 nonzeroval = 3; __u32 key = 0; __u64 *value; value = bpf_map_lookup_elem(&rxcnt, &key); if (!value) bpf_map_update_elem(&rxcnt, &key, &nonzeroval, BPF_ANY); else __sync_fetch_and_add(value, 1); return XDP_PASS; } $ offload bpftool map dump key: 00 00 00 00 value: 00 00 00 03 00 00 00 00 should be: $ offload bpftool map dump key: 00 00 00 00 value: 03 00 00 00 00 00 00 00 Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-10-16 15:23:58 -07:00
Pieter Jansen van Vuuren	140b6abac2	nfp: flower: use offsets provided by pedit instead of index for ipv6 Previously when populating the set ipv6 address action, we incorrectly made use of pedit's key index to determine which 32bit word should be set. We now calculate which word has been selected based on the offset provided by the pedit action. Fixes: `354b82bb32` ("nfp: add set ipv6 source and destination address") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-15 23:17:25 -07:00
Pieter Jansen van Vuuren	d08c9e5893	nfp: flower: fix multiple keys per pedit action Previously we only allowed a single header key per pedit action to change the header. This used to result in the last header key in the pedit action to overwrite previous headers. We now keep track of them and allow multiple header keys per pedit action. Fixes: `c0b1bd9a8b` ("nfp: add set ipv4 header action flower offload") Fixes: `354b82bb32` ("nfp: add set ipv6 source and destination address") Fixes: `f8b7b0a6b1` ("nfp: add set tcp and udp header action flower offload") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-15 23:17:24 -07:00
Pieter Jansen van Vuuren	8913806f16	nfp: flower: fix pedit set actions for multiple partial masks Previously we did not correctly change headers when using multiple pedit actions with partial masks. We now take this into account and no longer just commit the last pedit action. Fixes: `c0b1bd9a8b` ("nfp: add set ipv4 header action flower offload") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-15 23:17:24 -07:00
Ryan C Goodfellow	5948185b97	nfp: devlink port split support for 1x100G CXP NIC This commit makes it possible to use devlink to split the 100G CXP Netronome into two 40G interfaces. Currently when you ask for 2 interfaces, the math in src/nfp_devlink.c:nfp_devlink_port_split calculates that you want 5 lanes per port because for some reason eth_port.port_lanes=10 (shouldn't this be 12 for CXP?). What we really want when asking for 2 breakout interfaces is 4 lanes per port. This commit makes that happen by calculating based on 8 lanes if 10 are present. Signed-off-by: Ryan C Goodfellow <rgoodfel@isi.edu> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Greg Weeks <greg.weeks@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-15 22:29:55 -07:00
Jakub Kicinski	96de25060d	nfp: replace long license headers with SPDX Replace the repeated license text with SDPX identifiers. While at it bump the Copyright dates for files we touched this year. Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Nic Viljoen <nick.viljoen@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-11 12:16:21 -07:00
Pieter Jansen van Vuuren	12ecf61529	nfp: flower: use host context count provided by firmware Read the host context count symbols provided by firmware and use it to determine the number of allocated stats ids. Previously it won't be possible to offload more than 2^17 filter even if FW was able to do so. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-10 22:32:44 -07:00
Pieter Jansen van Vuuren	7fade1077c	nfp: flower: use stats array instead of storing stats per flow Make use of an array stats instead of storing stats per flow which would require a hash lookup at critical times. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-10 22:32:44 -07:00
Pieter Jansen van Vuuren	c01d0efa51	nfp: flower: use rhashtable for flow caching Make use of relativistic hash tables for tracking flows instead of fixed sized hash tables. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-10 22:32:44 -07:00
David S. Miller	071a234ad7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2018-10-08 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) sk_lookup_[tcp\|udp] and sk_release helpers from Joe Stringer which allow BPF programs to perform lookups for sockets in a network namespace. This would allow programs to determine early on in processing whether the stack is expecting to receive the packet, and perform some action (eg drop, forward somewhere) based on this information. 2) per-cpu cgroup local storage from Roman Gushchin. Per-cpu cgroup local storage is very similar to simple cgroup storage except all the data is per-cpu. The main goal of per-cpu variant is to implement super fast counters (e.g. packet counters), which don't require neither lookups, neither atomic operations in a fast path. The example of these hybrid counters is in selftests/bpf/netcnt_prog.c 3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet 4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski 5) rename of libbpf interfaces from Andrey Ignatov. libbpf is maturing as a library and should follow good practices in library design and implementation to play well with other libraries. This patch set brings consistent naming convention to global symbols. 6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov to let Apache2 projects use libbpf 7) various AF_XDP fixes from Björn and Magnus ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-08 23:42:44 -07:00
Quentin Monnet	7ff0ccde43	nfp: bpf: support pointers to other stack frames for BPF-to-BPF calls Mark instructions that use pointers to areas in the stack outside of the current stack frame, and process them accordingly in mem_op_stack(). This way, we also support BPF-to-BPF calls where the caller passes a pointer to data in its own stack frame to the callee (typically, when the caller passes an address to one of its local variables located in the stack, as an argument). Thanks to Jakub and Jiong for figuring out how to deal with this case, I just had to turn their email discussion into this patch. Suggested-by: Jiong Wang <jiong.wang@netronome.com> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-08 10:24:13 +02:00
Quentin Monnet	4454962314	nfp: bpf: optimise save/restore for R6~R9 based on register usage When pre-processing the instructions, it is trivial to detect what subprograms are using R6, R7, R8 or R9 as destination registers. If a subprogram uses none of those, then we do not need to jump to the subroutines dedicated to saving and restoring callee-saved registers in its prologue and epilogue. This patch introduces detection of callee-saved registers in subprograms and prevents the JIT from adding calls to those subroutines whenever we can: we save some instructions in the translated program, and some time at runtime on BPF-to-BPF calls and returns. If no subprogram needs to save those registers, we can avoid appending the subroutines at the end of the program. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-08 10:24:13 +02:00
Quentin Monnet	2178f3f0dc	nfp: bpf: fix return address from register-saving subroutine to callee On performing a BPF-to-BPF call, we first jump to a subroutine that pushes callee-saved registers (R6~R9) to the stack, and from there we goes to the start of the callee next. In order to do so, the caller must pass to the subroutine the address of the NFP instruction to jump to at the end of that subroutine. This cannot be reliably implemented when translated the caller, as we do not always know the start offset of the callee yet. This patch implement the required fixup step for passing the start offset in the callee via the register used by the subroutine to hold its return address. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-08 10:24:13 +02:00
Quentin Monnet	bdf4c66faf	nfp: bpf: update fixup function for BPF-to-BPF calls support Relocation for targets of BPF-to-BPF calls are required at the end of translation. Update the nfp_fixup_branches() function in that regard. When checking that the last instruction of each bloc is a branch, we must account for the length of the instructions required to pop the return address from the stack. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-08 10:24:13 +02:00
Quentin Monnet	fb19816541	nfp: bpf: account for additional stack usage when checking stack limit Offloaded programs using BPF-to-BPF calls use the stack to store the return address when calling into a subprogram. Callees also need some space to save eBPF registers R6 to R9. And contrarily to kernel verifier, we align stack frames on 64 bytes (and not 32). Account for all this when checking the stack size limit before JIT-ing the program. This means we have to recompute maximum stack usage for the program, we cannot get the value from the kernel. In addition to adapting the checks on stack usage, move them to the finalize() callback, now that we have it and because such checks are part of the verification step rather than translation. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-08 10:24:13 +02:00
Quentin Monnet	389f263b60	nfp: bpf: add main logics for BPF-to-BPF calls support in nfp driver This is the main patch for the logics of BPF-to-BPF calls in the nfp driver. The functions called on BPF_JUMP \| BPF_CALL and BPF_JUMP \| BPF_EXIT were used to call helpers and exit from the program, respectively; make them usable for calling into, or returning from, a BPF subprogram as well. For all calls, push the return address as well as the callee-saved registers (R6 to R9) to the stack, and pop them upon returning from the calls. In order to limit the overhead in terms of instruction number, this is done through dedicated subroutines. Jumping to the callee actually consists in jumping to the subroutine, that "returns" to the callee: this will require some fixup for passing the address in a later patch. Similarly, returning consists in jumping to the subroutine, which pops registers and then return directly to the caller (but no fixup is needed here). Return to the caller is performed with the RTN instruction newly added to the JIT. For the few steps where we need to know what subprogram an instruction belongs to, the struct nfp_insn_meta is extended with a new subprog_idx field. Note that checks on the available stack size, to take into account the additional requirements associated to BPF-to-BPF calls (storing R6-R9 and return addresses), are added in a later patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-08 10:24:13 +02:00
Quentin Monnet	e3b49dc69b	nfp: bpf: account for BPF-to-BPF calls when preparing nfp JIT Similarly to "exit" or "helper call" instructions, BPF-to-BPF calls will require additional processing before translation starts, in order to record and mark jump destinations. We also mark the instructions where each subprogram begins. This will be used in a following commit to determine where to add prologues for subprograms. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-08 10:24:13 +02:00
Quentin Monnet	bcfdfb7c96	nfp: bpf: ignore helper-related checks for BPF calls in nfp verifier The checks related to eBPF helper calls are performed each time the nfp driver meets a BPF_JUMP \| BPF_CALL instruction. However, these checks are not relevant for BPF-to-BPF call (same instruction code, different value in source register), so just skip the checks for such calls. While at it, rename the function that runs those checks to make it clear they apply to _helper_ calls only. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-08 10:24:12 +02:00

1 2 3 4 5 ...

917 Commits