linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Tobin C. Harding	d95768d3cc	docs: networking: Add failover docs to index Currently we have rst format docs for the failover and net_failover modules however these docs are not linked to within the index. Add `failover` and `net_failover` to the networking documentation index. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 11:23:53 -07:00
David S. Miller	35edb56e94	Merge branch 'hns3-next' Salil Mehta says: ==================== Bug fixes and some minor changes to HNS3 driver This patch-set presents some fixes and minor changes to the HNS3 Ethernet Driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 11:16:45 -07:00
Fuyun Liang	5550aa4d47	net: hns3: Fix comments for hclge_get_ring_chain_from_mbx Actually, hclge_get_ring_chain_from_mbx is used to get ring type, tqp id, and int_gl index from mailbox message. So the comments is incorrect. This patch fixes it. Fixes: `dde1a86e93` ("net: hns3: Add mailbox support to PF driver") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 11:16:45 -07:00
Fuyun Liang	cf4103c699	net: hns3: Fix for using wrong mask and shift in hclge_get_ring_chain_from_mbx HCLGE_INT_GL_IDX_M and HCLGE_INT_GL_IDX_S are used to set fireware cmd. When getting int_gl value from mailbox message, we should use HNAE3_RING_GL_IDX_M and HNAE3_RING_GL_IDX_S. Fixes: `79eee41085` ("net: hns3: add int_gl_idx setup for VF") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 11:16:45 -07:00
Yunsheng Lin	82b5321460	net: hns3: Fix for reset_level default assignment probelm handle->reset_level is assigned to HNAE3_NONE_RESET when client is initialized, if a tx timeout happens right after initialization, then handle->reset_level is not resetted to HNAE3_FUNC_RESET in hclge_reset_event, which will cause reset event not properly handled problem. This patch fixes it by setting handle->reset_level properly when client is initialized. Fixes: `6d4c3981a8` ("net: hns3: Changes to make enet watchdog timeout func common for PF/VF") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 11:16:44 -07:00
Huazhong Tan	d62eccaed4	net: hns3: remove unnecessary ring configuration operation while resetting The configuration of the ring will be used to reinitialize the ring after the hardware reset is completed. So we should not release and reacquire this configuration during reset. Fixes: `bb6b94a896` ("net: hns3: Add reset interface implementation in client") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 11:16:44 -07:00
Huazhong Tan	6b1385cc25	net: hns3: Fix return value error in hns3_reset_notify_down_enet When doing reset, netdev has not been brought up is not an error, it means that we do not need do the stop operation, so just return zero. Fixes: `76ad4f0ee7` ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 11:16:44 -07:00
Huazhong Tan	9ca8d1a73c	net: hns3: Correct reset event status register According to hardware's description, driver should get reset event from VECTOR0_PF_OTHER_INT_ST(0x20800) instead of VECTOR0_PF_OTHER_INT_SRC(0x20700). Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 11:16:44 -07:00
Huazhong Tan	9de0b86f64	net: hns3: Prevent to request reset frequently Netdevice reset should not be requested frequently, a new one must wait a moment since there may be some work not completed. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 11:16:44 -07:00
Huazhong Tan	6d4fab3953	net: hns3: Reset net device with rtnl_lock Since current locking was not covering certain code where netdev was being accessed or manipulated, this patch fixes it. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 11:16:44 -07:00
Huazhong Tan	1b3725781a	net: hns3: Modify the order of initializing command queue register According to hardware's description, the head pointer register should be written before the tail pointer register while doing command queue initialization. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 11:16:44 -07:00
David S. Miller	aea06eb276	Merge branch 'TLS-offload-rx-netdev-and-mlx5' Boris Pismenny says: ==================== TLS offload rx, netdev & mlx5 The following series provides TLS RX inline crypto offload. v5->v4: - Remove the Kconfig to mutually exclude both IPsec and TLS v4->v3: - Remove the iov revert for zero copy send flow v2->v3: - Fix typo - Adjust cover letter - Fix bug in zero copy flows - Use network byte order for the record number in resync - Adjust the sequence provided in resync v1->v2: - Fix bisectability problems due to variable name changes - Fix potential uninitialized return value This series completes the generic infrastructure to offload TLS crypto to a network devices. It enables the kernel TLS socket to skip decryption and authentication operations for SKBs marked as decrypted on the receive side of the data path. Leaving those computationally expensive operations to the NIC. This infrastructure doesn't require a TCP offload engine. Instead, the NIC decrypts a packet's payload if the packet contains the expected TCP sequence number. The TLS record authentication tag remains unmodified regardless of decryption. If the packet is decrypted successfully and it contains an authentication tag, then the authentication check has passed. Otherwise, if the authentication fails, then the packet is provided unmodified and the KTLS layer is responsible for handling it. Out-Of-Order TCP packets are provided unmodified. As a result, in the slow path some of the SKBs are decrypted while others remain as ciphertext. The GRO and TCP layers must not coalesce decrypted and non-decrypted SKBs. At the worst case a received TLS record consists of both plaintext and ciphertext packets. These partially decrypted records must be reencrypted, only to be decrypted. The notable differences between SW KTLS and NIC offloaded TLS implementations are as follows: 1. Partial decryption - Software must handle the case of a TLS record that was only partially decrypted by HW. This can happen due to packet reordering. 2. Resynchronization - tls_read_size calls the device driver to resynchronize HW whenever it lost track of the TLS record framing in the TCP stream. The infrastructure should be extendable to support various NIC offload implementations. However it is currently written with the implementation below in mind: The NIC identifies packets that should be offloaded according to the 5-tuple and the TCP sequence number. If these match and the packet is decrypted and authenticated successfully, then a syndrome is provided to software. Otherwise, the packet is unmodified. Decrypted and non-decrypted packets aren't coalesced by the network stack, and the KTLS layer decrypts and authenticates partially decrypted records. The NIC provides an indication whenever a resync is required. The resync operation is triggered by the KTLS layer while parsing TLS record headers. Finally, we measure the performance obtained by running single stream iperf with two Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz machines connected back-to-back with Innova TLS (40Gbps) NICs. We compare TCP (upper bound) and KTLS-Offload running both in Tx and Rx. The results show that the performance of offload is comparable to TCP. \| Bandwidth (Gbps) \| CPU Tx (%) \| CPU rx (%) TCP \| 28.8 \| 5 \| 12 KTLS-Offload-Tx-Rx \| 28.6 \| 7 \| 14 Paper: https://netdevconf.org/2.2/papers/pismenny-tlscrypto-talk.pdf ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:40 -07:00
Boris Pismenny	b3ccf97813	net/mlx5e: IPsec, fix byte count in CQE This patch fixes the byte count indication in CQE for processed IPsec packets that contain a metadata header. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:28 -07:00
Boris Pismenny	10e71acca2	net/mlx5: Accel, add common metadata functions This patch adds common functions to handle mellanox metadata headers. These functions are used by IPsec and TLS to process FPGA metadata. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	790af90c00	net/mlx5e: TLS, build TLS netdev from capabilities This patch enables TLS Rx based on available HW capabilities. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	afd3baaa93	net/mlx5e: TLS, add software statistics This patch adds software statistics for TLS to count important events. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	00aebab27c	net/mlx5e: TLS, add Innova TLS rx data path Implement the TLS rx offload data path according to the requirements of the TLS generic NIC offload infrastructure. Special metadata ethertype is used to pass information to the hardware. When hardware loses synchronization a special resync request metadata message is used to request resync. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	ca942c78f3	net/mlx5e: TLS, add innova rx support Add the mlx5 implementation of the TLS Rx routines to add/del TLS contexts, also add the tls_dev_resync_rx routine to work with the TLS inline Rx crypto offload infrastructure. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	ab412e1dd7	net/mlx5: Accel, add TLS rx offload routines In Innova TLS, TLS contexts are added or deleted via a command message over the SBU connection. The HW then sends a response message over the same connection. Complete the implementation for Innova TLS (FPGA-based) hardware by adding support for rx inline crypto offload. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	0aadb2fc09	net/mlx5e: TLS, refactor variable names For symmetry, we rename mlx5e_tls_offload_context to mlx5e_tls_offload_context_tx before we add mlx5e_tls_offload_context_rx. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	4718799817	tls: Fix zerocopy_from_iter iov handling zerocopy_from_iter iterates over the message, but it doesn't revert the updates made by the iov iteration. This patch fixes it. Now, the iov can be used after calling zerocopy_from_iter. Fixes: `3c4d75591` ("tls: kernel TLS support") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	4799ac81e5	tls: Add rx inline crypto offload This patch completes the generic infrastructure to offload TLS crypto to a network device. It enables the kernel to skip decryption and authentication of some skbs marked as decrypted by the NIC. In the fast path, all packets received are decrypted by the NIC and the performance is comparable to plain TCP. This infrastructure doesn't require a TCP offload engine. Instead, the NIC only decrypts packets that contain the expected TCP sequence number. Out-Of-Order TCP packets are provided unmodified. As a result, at the worst case a received TLS record consists of both plaintext and ciphertext packets. These partially decrypted records must be reencrypted, only to be decrypted. The notable differences between SW KTLS Rx and this offload are as follows: 1. Partial decryption - Software must handle the case of a TLS record that was only partially decrypted by HW. This can happen due to packet reordering. 2. Resynchronization - tls_read_size calls the device driver to resynchronize HW after HW lost track of TLS record framing in the TCP stream. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	b190a587c6	tls: Fill software context without allocation This patch allows tls_set_sw_offload to fill the context in case it was already allocated previously. We will use it in TLS_DEVICE to fill the RX software context. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	39f56e1a78	tls: Split tls_sw_release_resources_rx This patch splits tls_sw_release_resources_rx into two functions one which releases all inner software tls structures and another that also frees the containing structure. In TLS_DEVICE we will need to release the software structures without freeeing the containing structure, which contains other information. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	dafb67f3bb	tls: Split decrypt_skb to two functions Previously, decrypt_skb also updated the TLS context. Now, decrypt_skb only decrypts the payload using the current context, while decrypt_skb_update also updates the state. Later, in the tls_device Rx flow, we will use decrypt_skb directly. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:10 -07:00
Boris Pismenny	d80a1b9d18	tls: Refactor tls_offload variable names For symmetry, we rename tls_offload_context to tls_offload_context_tx before we add tls_offload_context_rx. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:12:09 -07:00
Boris Pismenny	41ed9c04aa	tcp: Don't coalesce decrypted and encrypted SKBs Prevent coalescing of decrypted and encrypted SKBs in GRO and TCP layer. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:12:09 -07:00
Boris Pismenny	16e4edc297	net: Add TLS rx resync NDO Add new netdev tls op for resynchronizing HW tls context Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:12:09 -07:00
Ilya Lesokhin	14136564c8	net: Add TLS RX offload feature This patch adds a netdev feature to configure TLS RX inline crypto offload. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:12:09 -07:00
Boris Pismenny	784abe24c9	net: Add decrypted field to skb The decrypted bit is propogated to cloned/copied skbs. This will be used later by the inline crypto receive side offload of tls. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:12:09 -07:00
David S. Miller	cc98419a57	Merge branch 'mvpp2-add-debugfs-interface' Maxime Chevallier says: ==================== net: mvpp2: add debugfs interface The PPv2 Header Parser and Classifier are not straightforward to debug, having easy access to some of the many lookup tables configuration is helpful during development and debug. This series adds a basic debugfs interface, allowing to read data from the Header Parser and some of the Classifier tables. For now, the interface is read-only, and contains only some basic info. This was actually used during RSS development, and might be useful to troubleshoot some issues we might find. The first patch of the series converts the mvpp2 files to SPDX, which eases adding the new debugfs dedicated file. The second patch adds the interface, and exposes basic Header Parser data. The 3rd patch adds a hit counter for the Header Parser TCAM. The 4th patch exposes classifier info. The 5th patch adds some hit counters for some of the classifier engines. Changes since V1: - Rebased on the lastest net-next - Made cls_flow_get non static so that it can be used in mvpp2_debugfs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:10:01 -07:00
Maxime Chevallier	f9d30d5bd5	net: mvpp2: debugfs: add classifier hit counters The classification operations that are used for RSS make use of several lookup tables. Having hit counters for these tables is really helpful to determine what flows were matched by ingress traffic, and see the path of packets among all the classifier tables. This commit adds hit counters for the 3 tables used at the moment : - The decoding table (also called lookup_id table), that links flows identified by the Header Parser to the flow table. There's one entry per flow, located at : .../mvpp2/<controller>/flows/XX/dec_hits Note that there are 21 flows in the decoding table, whereas there are 52 flows in the Header Parser. That's because there are several kind of traffic that will match a given flow. Reading the hit counter from one sub-flow will clear all hit counter that have the same flow_id. This also applies to the flow_hits. - The flow table, that contains all the different lookups to be performed by the classifier for each packet of a given flow. The match is done on the first entry of the flow sequence. - The C2 engine entries, that are used to assign the default rx queue, and enable or disable RSS for a given port. There's one entry per flow, located at: .../mvpp2/<controller>/flows/XX/flow_hits There is one C2 entry per port, so the c2 hit counter is located at : .../mvpp2/<controller>/ethX/c2_hits All hit counter values are 16-bits clear-on-read values. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:10:01 -07:00
Maxime Chevallier	dba1d918da	net: mvpp2: debugfs: add entries for classifier flows The classifier configuration for RSS is quite complex, with several lookup tables being used. This commit adds useful info in debugfs to see how the different tables are configured : Added 2 new entries in the per-port directory : - .../eth0/default_rxq : The default rx queue on that port - .../eth0/rss_enable : Indicates if RSS is enabled in the C2 entry Added the 'flows' directory : It contains one entry per sub-flow. a 'sub-flow' is a unique path from Header Parser to the flow table. Multiple sub-flows can point to the same 'flow' (each flow has an id from 8 to 29, which is its index in the Lookup Id table) : - .../flows/00/... /01/... ... /51/id : The flow id. There are 21 unique flows. There's one flow per combination of the following parameters : - L4 protocol (TCP, UDP, none) - L3 protocol (IPv4, IPv6) - L3 parameters (Fragmented or not) - L2 parameters (Vlan tag presence or not) .../type : The flow type. This is an even higher level flow, that we manipulate with ethtool. It can be : "udp4" "tcp4" "udp6" "tcp6" "ipv4" "ipv6" "other". .../eth0/... .../eth1/engine : The hash generation engine used for this flow on the given port .../hash_opts : The hash generation options indicating on what data we base the hash (vlan tag, src IP, src port, etc.) Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:10:01 -07:00
Maxime Chevallier	1203341cc9	net: mvpp2: debugfs: add hit counter stats for Header Parser entries One helpful feature to help debug the Header Parser TCAM filter in PPv2 is to be able to see if the entries did match something when a packet comes in. This can be done by using the built-in hit counter for TCAM entries. This commit implements reading the counter, and exposing its value on debugfs for each filter entry. The counter is a 16-bits clear-on-read value, located at: .../mvpp2/<controller>/parser/XXX/hits Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:10:01 -07:00
Maxime Chevallier	21da57a231	net: mvpp2: add a debugfs interface for the Header Parser Marvell PPv2 Packer Header Parser has a TCAM based filter, that is not trivial to configure and debug. Being able to dump TCAM entries from userspace can be really helpful to help development of new features and debug existing ones. This commit adds a basic debugfs interface for the PPv2 driver, focusing on TCAM related features. <mnt>/mvpp2/ --- f2000000.ethernet \- f4000000.ethernet --- parser --- 000 ... \| \- 001 \| \- ... \| \- 255 --- ai \| \- header_data \| \- lookup_id \| \- sram \| \- valid \- eth1 ... \- eth2 --- mac_filter \- parser_entries \- vid_filter There's one directory per PPv2 instance, named after pdev->name to make sure names are uniques. In each of these directories, there's : - one directory per interface on the controller, each containing : - "mac_filter", which lists all filtered addresses for this port (based on TCAM, not on the kernel's uc / mc lists) - "parser_entries", which lists the indices of all valid TCAM entries that have this port in their port map - "vid_filter", which lists the vids allowed on this port, based on TCAM - one "parser" directory (the parser is common to all ports), containing : - one directory per TCAM entry (256 of them, from 0 to 255), each containing : - "ai" : Contains the 1 byte Additional Info field from TCAM, and - "header_data" : Contains the 8 bytes Header Data extracted from the packet - "lookup_id" : Contains the 4 bits LU_ID - "sram" : contains the raw SRAM data, which is the result of the TCAM lookup. This readonly at the moment. - "valid" : Indicates if the entry is valid of not. All entries are read-only, and everything is output in hex form. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:10:00 -07:00
Antoine Tenart	f1e37e3101	net: mvpp2: switch to SPDX identifiers Use the appropriate SPDX license identifiers and drop the license text. This patch is only cosmetic. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:10:00 -07:00
David S. Miller	2aa4a3378a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-07-15 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Various different arm32 JIT improvements in order to optimize code emission and make the JIT code itself more robust, from Russell. 2) Support simultaneous driver and offloaded XDP in order to allow for advanced use-cases where some work is offloaded to the NIC and some to the host. Also add ability for bpftool to load programs and maps beyond just the cgroup case, from Jakub. 3) Add BPF JIT support in nfp for multiplication as well as division. For the latter in particular, it uses the reciprocal algorithm to emulate it, from Jiong. 4) Add BTF pretty print functionality to bpftool in plain and JSON output format, from Okash. 5) Add build and installation to the BPF helper man page into bpftool, from Quentin. 6) Add a TCP BPF callback for listening sockets which is triggered right after the socket transitions to TCP_LISTEN state, from Andrey. 7) Add a new cgroup tree command to bpftool which iterates over the whole cgroup tree and prints all attached programs, from Roman. 8) Improve xdp_redirect_cpu sample to support parsing of double VLAN tagged packets, from Jesper. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-14 18:47:44 -07:00
Daniel Borkmann	13f7432bdd	Merge branch 'bpf-tcp-listen-cb' Andrey Ignatov says: ==================== This patchset adds TCP-BPF callback for listening sockets. Patch 0001 provides more details and is the main patch in the set. Patch 0006 adds selftest for the new callback. Other patches are bug fixes and improvements in TCP-BPF selftest to make it easier to extend in 0006. ==================== Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-07-15 00:09:22 +02:00
Andrey Ignatov	78d8e26d46	selftests/bpf: Test case for BPF_SOCK_OPS_TCP_LISTEN_CB Cover new TCP-BPF callback in test_tcpbpf: when listen() is called on socket, set BPF_SOCK_OPS_STATE_CB_FLAG so that BPF_SOCK_OPS_STATE_CB callback can be called on future state transition, and when such a transition happens (TCP_LISTEN -> TCP_CLOSE), track it in the map and verify it in user space later. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-07-15 00:08:41 +02:00
Andrey Ignatov	2044e4ef0b	selftests/bpf: Better verification in test_tcpbpf Reduce amount of copy/paste for debug info when result is verified in the test and keep that info together with values being checked so that they won't get out of sync. It also improves debug experience: instead of checking manually what doesn't match in debug output for all fields, only unexpected field is printed. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-07-15 00:08:41 +02:00
Andrey Ignatov	c65267e5ff	selftests/bpf: Switch test_tcpbpf_user to cgroup_helpers Switch to cgroup_helpers to simplify the code and fix cgroup cleanup: before cgroup was not cleaned up after the test. It also removes SYSTEM macro, that only printed error, but didn't terminate the test. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-07-15 00:08:41 +02:00
Andrey Ignatov	04c1341115	selftests/bpf: Fix const'ness in cgroup_helpers Lack of const in cgroup helpers signatures forces to write ugly client code. Fix it. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-07-15 00:08:41 +02:00
Andrey Ignatov	060a7fccd3	bpf: Sync bpf.h to tools/ Sync BPF_SOCK_OPS_TCP_LISTEN_CB related UAPI changes to tools/. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-07-15 00:08:41 +02:00
Andrey Ignatov	f333ee0cdb	bpf: Add BPF_SOCK_OPS_TCP_LISTEN_CB Add new TCP-BPF callback that is called on listen(2) right after socket transition to TCP_LISTEN state. It fills the gap for listening sockets in TCP-BPF. For example BPF program can set BPF_SOCK_OPS_STATE_CB_FLAG when socket becomes listening and track later transition from TCP_LISTEN to TCP_CLOSE with BPF_SOCK_OPS_STATE_CB callback. Before there was no way to do it with TCP-BPF and other options were much harder to work with. E.g. socket state tracking can be done with tracepoints (either raw or regular) but they can't be attached to cgroup and their lifetime has to be managed separately. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-07-15 00:08:41 +02:00
David S. Miller	f5c64e566c	Merge branch 'mlxsw-VRRP' Ido Schimmel says: ==================== mlxsw: Add VRRP support When a router that is acting as the default gateway of a host stops functioning, the host will encounter packet loss until the router starts functioning again. To increase the reliability of the default gateway without performing reconfiguration on the host, a host can use a Virtual Router Redundancy Protocol (VRRP) Router. This virtual router is composed from several routers where only one is actually forwarding packets from the host (the master router) while the other routers act as backup routers. The election of the master router is determined by the VRRP protocol [1]. Packets addressed to the virtual router are always sent to the virtual router MAC address (IPv4: 00-00-5E-00-01-XX, IPv6: 00-00-5E-00-02-XX). Such packets can only be accepted by the master router and must be discarded by the backup routers. In Linux, VRRP is usually implemented by configuring a macvlan with the virtual router MAC on top of the router interface that is connected to the host / LAN. The macvlan on the master router is assigned the virtual IP (VIP) that the host uses as its gateway. In order to support VRRP in mlxsw, we first need to enable macvlan upper devices on top of mlxsw netdevs and their uppers. This is done by the first patch, which also takes care of sanitizing macvlan configurations that are not currently supported by the driver. The second patch directs packets with destination MAC addresses as the macvlans to the router so that they will undergo an L3 lookup. This is consistent with the kernel's behavior where the macvlan's Rx handler will re-inject such packets to the Rx path so that they will be picked up by the IPvX protocol handlers and undergo an L3 lookup. Note that the driver prevents the macvlans from being enslaved to other devices, to ensure the packets will be picked up by the protocol handler and not by another Rx handler. The third patch adds packet traps for VRRP control packets for both IPv4 and IPv6. Finally, the last patch optimizes the reception of VRRP MACs by potentially skipping one L2 lookup for them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-14 11:23:26 -07:00
Ido Schimmel	c3a495409a	mlxsw: spectrum_router: Optimize processing of VRRP MACs Hosts using a VRRP router send their packets with a destination MAC of the VRRP router which is of the following form [1]: IPv4 - 00-00-5E-00-01-{VRID} IPv6 - 00-00-5E-00-02-{VRID} Where VRID is the ID of the virtual router. Such packets are directed to the router block in the ASIC by an FDB entry that was added in the previous patch. However, in certain cases it is possible to skip this FDB lookup and send such packets directly to the router. This is accomplished by adding these special MAC addresses to the RIF cache. If the cache is hit, the packet will skip the L2 lookup and ingress the router with the RIF specified in the cache entry. 1. https://tools.ietf.org/html/rfc5798#section-7.3 Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-14 11:23:26 -07:00
Ido Schimmel	11566d34f8	mlxsw: spectrum: Add VRRP traps Virtual Router Redundancy Protocol packets are used to communicate the state of the Master router associated with the virtual router ID (VRID). These are link-local multicast packets sent with IP protocol 112 that are trapped in the router block in the ASIC. Add a trap for these packets and mark the trapped packets to prevent them from potentially being re-flooded by the bridge driver. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-14 11:23:26 -07:00
Ido Schimmel	2db9937804	mlxsw: spectrum_router: Direct macvlans' MACs to router An IP packet received on a netdev with a macvlan upper whose MAC matches the packet's destination MAC will be re-injected to the Rx path as if it was received by the macvlan, and perform an L3 lookup. Reflect this functionality to the ASIC by programming FDB entries that will direct MACs of macvlan uppers to the router. In a similar fashion to router interfaces (RIFs) that are programmed upon the addition of the first IP address on an interface and destroyed upon the removal of the last IP address, the FDB entries for the macvlan are added and destroyed based on the addition of the first and removal of the last IP address on the macvlan. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-14 11:23:26 -07:00
Ido Schimmel	c55161852f	mlxsw: spectrum: Enable macvlan upper devices In order to allow more unicast MAC addresses (e.g., VRRP virtual MAC) to be directed to the router we need to enable macvlan uppers on top of mlxsw netdevs. Allow macvlan upper devices on top of mlxsw netdevs and sanitize configurations that can't work. For example, a macvlan can't be enslaved to a bridge as without ACLs the device doesn't take the destination MAC into account when classifying a packet to a bridge instance (i.e., a FID). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-14 11:23:25 -07:00
Yafang Shao	ff0432e5a8	tcp: remove redundant rcv_nxt update tcp_rcv_nxt_update() is already executed in tcp_data_queue(). This line is redundant. See bellow, tcp_queue_rcv tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq); tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); <<<< redundant Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-14 11:21:40 -07:00

1 2 3 4 5 ...

768008 Commits All Branches Search

768008 Commits

All Branches