OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Julian Wiedmann	072f794000	s390/qeth: serialize cmd reply with concurrent timeout Callbacks for a cmd reply run outside the protection of card->lock, to allow for additional cmds to be issued & enqueued in parallel. When qeth_send_control_data() bails out for a cmd without having received a reply (eg. due to timeout), its callback may concurrently be processing a reply that just arrived. In this case, the callback potentially accesses a stale reply->reply_param area that eg. was on-stack and has already been released. To avoid this race, add some locking so that qeth_send_control_data() can (1) wait for a concurrently running callback, and (2) zap any pending callback that still wants to run. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 19:26:47 -07:00
Xin Long	a1794de8b9	sctp: fix the transport error_count check As the annotation says in sctp_do_8_2_transport_strike(): "If the transport error count is greater than the pf_retrans threshold, and less than pathmaxrtx ..." It should be transport->error_count checked with pathmaxrxt, instead of asoc->pf_retrans. Fixes: `5aa93bcf66` ("sctp: Implement quick failover draft from tsvwg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 19:08:20 -07:00
Dirk Morris	656c8e9cc1	netfilter: conntrack: Use consistent ct id hash calculation Change ct id hash calculation to only use invariants. Currently the ct id hash calculation is based on some fields that can change in the lifetime on a conntrack entry in some corner cases. The current hash uses the whole tuple which contains an hlist pointer which will change when the conntrack is placed on the dying list resulting in a ct id change. This patch also removes the reply-side tuple and extension pointer from the hash calculation so that the ct id will will not change from initialization until confirmation. Fixes: `3c79107631` ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id") Signed-off-by: Dirk Morris <dmorris@metaloft.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-13 18:03:11 +02:00
André Draszik	bb0ce4c151	net: phy: at803x: stop switching phy delay config needlessly This driver does a funny dance disabling and re-enabling RX and/or TX delays. In any of the RGMII-ID modes, it first disables the delays, just to re-enable them again right away. This looks like a needless exercise. Just enable the respective delays when in any of the relevant 'id' modes, and disable them otherwise. Also, remove comments which don't add anything that can't be seen by looking at the code. Signed-off-by: André Draszik <git@andred.net> CC: Andrew Lunn <andrew@lunn.ch> CC: Florian Fainelli <f.fainelli@gmail.com> CC: Heiner Kallweit <hkallweit1@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-12 14:02:29 -07:00
Balakrishna Godavarthi	12072a6896	Bluetooth: btqca: Reset download type to default This patch will reset the download flag to default value before retrieving the download mode type. Fixes: `32646db8cc` ("Bluetooth: btqca: inject command complete event during fw download") Signed-off-by: Balakrishna Godavarthi <bgodavar@codeaurora.org> Tested-by: Claire Chang <tientzu@chromium.org> Reviewed-by: Claire Chang <tientzu@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-08-12 19:07:15 +02:00
Claire Chang	c7c5ae2902	Bluetooth: btqca: release_firmware after qca_inject_cmd_complete_event commit `32646db8cc` ("Bluetooth: btqca: inject command complete event during fw download") added qca_inject_cmd_complete_event() for certain qualcomm chips. However, qca_download_firmware() will return without calling release_firmware() in this case. This leads to a memory leak like the following found by kmemleak: unreferenced object 0xfffffff3868a5880 (size 128): comm "kworker/u17:5", pid 347, jiffies 4294676481 (age 312.157s) hex dump (first 32 bytes): ac fd 00 00 00 00 00 00 00 d0 7e 17 80 ff ff ff ..........~..... 00 00 00 00 00 00 00 00 00 59 8a 86 f3 ff ff ff .........Y...... backtrace: [<00000000978ce31d>] kmem_cache_alloc_trace+0x194/0x298 [<000000006ea0398c>] _request_firmware+0x74/0x4e4 [<000000004da31ca0>] request_firmware+0x44/0x64 [<0000000094572996>] qca_download_firmware+0x74/0x6e4 [btqca] [<00000000b24d615a>] qca_uart_setup+0xc0/0x2b0 [btqca] [<00000000364a6d5a>] qca_setup+0x204/0x570 [hci_uart] [<000000006be1a544>] hci_uart_setup+0xa8/0x148 [hci_uart] [<00000000d64c0f4f>] hci_dev_do_open+0x144/0x530 [bluetooth] [<00000000f69f5110>] hci_power_on+0x84/0x288 [bluetooth] [<00000000d4151583>] process_one_work+0x210/0x420 [<000000003cf3dcfb>] worker_thread+0x2c4/0x3e4 [<000000007ccaf055>] kthread+0x124/0x134 [<00000000bef1f723>] ret_from_fork+0x10/0x18 [<00000000c36ee3dd>] 0xffffffffffffffff unreferenced object 0xfffffff37b16de00 (size 128): comm "kworker/u17:5", pid 347, jiffies 4294676873 (age 311.766s) hex dump (first 32 bytes): da 07 00 00 00 00 00 00 00 50 ff 0b 80 ff ff ff .........P...... 00 00 00 00 00 00 00 00 00 dd 16 7b f3 ff ff ff ...........{.... backtrace: [<00000000978ce31d>] kmem_cache_alloc_trace+0x194/0x298 [<000000006ea0398c>] _request_firmware+0x74/0x4e4 [<000000004da31ca0>] request_firmware+0x44/0x64 [<0000000094572996>] qca_download_firmware+0x74/0x6e4 [btqca] [<000000000cde20a9>] qca_uart_setup+0x144/0x2b0 [btqca] [<00000000364a6d5a>] qca_setup+0x204/0x570 [hci_uart] [<000000006be1a544>] hci_uart_setup+0xa8/0x148 [hci_uart] [<00000000d64c0f4f>] hci_dev_do_open+0x144/0x530 [bluetooth] [<00000000f69f5110>] hci_power_on+0x84/0x288 [bluetooth] [<00000000d4151583>] process_one_work+0x210/0x420 [<000000003cf3dcfb>] worker_thread+0x2c4/0x3e4 [<000000007ccaf055>] kthread+0x124/0x134 [<00000000bef1f723>] ret_from_fork+0x10/0x18 [<00000000c36ee3dd>] 0xffffffffffffffff Make sure release_firmware() is called aftre qca_inject_cmd_complete_event() to avoid the memory leak. Fixes: `32646db8cc` ("Bluetooth: btqca: inject command complete event during fw download") Signed-off-by: Claire Chang <tientzu@chromium.org> Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-08-12 18:36:09 +02:00
Fabian Henneke	48d9cc9d85	Bluetooth: hidp: Let hidp_send_message return number of queued bytes Let hidp_send_message return the number of successfully queued bytes instead of an unconditional 0. With the return value fixed to 0, other drivers relying on hidp, such as hidraw, can not return meaningful values from their respective implementations of write(). In particular, with the current behavior, a hidraw device's write() will have different return values depending on whether the device is connected via USB or Bluetooth, which makes it harder to abstract away the transport layer. Signed-off-by: Fabian Henneke <fabian.henneke@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-08-12 18:23:50 +02:00
Harish Bandi	a2780889e2	Bluetooth: hci_qca: Send VS pre shutdown command. WCN399x chips are coex chips, it needs a VS pre shutdown command while turning off the BT. So that chip can inform BT is OFF to other active clients. Signed-off-by: Harish Bandi <c-hbandi@codeaurora.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-08-12 18:23:50 +02:00
Matthias Kaehlcke	2fde6afb8c	Bluetooth: btqca: Use correct byte format for opcode of injected command The opcode of the command injected by commit `32646db8cc` ("Bluetooth: btqca: inject command complete event during fw download") uses the CPU byte format, however it should always be little endian. In practice it shouldn't really matter, since all we need is an opcode != 0, but still let's do things correctly and keep sparse happy. Fixes: `32646db8cc` ("Bluetooth: btqca: inject command complete event during fw download") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-08-12 18:23:50 +02:00
Wei Yongjun	4974c839d4	Bluetooth: hci_qca: Use kfree_skb() instead of kfree() Use kfree_skb() instead of kfree() to free sk_buff. Fixes: `2faa3f15fa` ("Bluetooth: hci_qca: wcn3990: Drop baudrate change vendor event") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-08-12 18:23:49 +02:00
Matthias Kaehlcke	8059ba0bd0	Bluetooth: btqca: Add a short delay before downloading the NVM On WCN3990 downloading the NVM sometimes fails with a "TLV response size mismatch" error: [ 174.949955] Bluetooth: btqca.c:qca_download_firmware() hci0: QCA Downloading qca/crnv21.bin [ 174.958718] Bluetooth: btqca.c:qca_tlv_send_segment() hci0: QCA TLV response size mismatch It seems the controller needs a short time after downloading the firmware before it is ready for the NVM. A delay as short as 1 ms seems sufficient, make it 10 ms just in case. No event is received during the delay, hence we don't just silently drop an extra event. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-08-12 18:23:49 +02:00
Wei Yongjun	5ee6310fb1	Bluetooth: btusb: Fix error return code in btusb_mtk_setup_firmware() Fix to return error code -EINVAL from the error handling case instead of 0, as done elsewhere in this function. Fixes: `a1c49c434e` ("Bluetooth: btusb: Add protocol support for MediaTek MT7668U USB devices") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2019-08-12 18:23:49 +02:00
Nathan Chancellor	125b7e0949	net: tc35815: Explicitly check NET_IP_ALIGN is not zero in tc35815_rx clang warns: drivers/net/ethernet/toshiba/tc35815.c:1507:30: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand] if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ^ ~~~~~~~~~~~~ drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: use '&' for a bitwise operation if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ^~ & drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: remove constant to silence this warning if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ~^~~~~~~~~~~~~~~ 1 warning generated. Explicitly check that NET_IP_ALIGN is not zero, which matches how this is checked in other parts of the tree. Because NET_IP_ALIGN is a build time constant, this check will be constant folded away during optimization. Fixes: `82a9928db5` ("tc35815: Enable StripCRC feature") Link: https://github.com/ClangBuiltLinux/linux/issues/608 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:41:48 -07:00
Chris Packham	8874ecae29	tipc: initialise addr_trail_end when setting node addresses We set the field 'addr_trial_end' to 'jiffies', instead of the current value 0, at the moment the node address is initialized. This guarantees we don't inadvertently enter an address trial period when the node address is explicitly set by the user. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:40:04 -07:00
Chen-Yu Tsai	58799865be	net: dsa: Check existence of .port_mdb_add callback before calling it The dsa framework has optional .port_mdb_{prepare,add,del} callback fields for drivers to handle multicast database entries. When adding an entry, the framework goes through a prepare phase, then a commit phase. Drivers not providing these callbacks should be detected in the prepare phase. DSA core may still bypass the bridge layer and call the dsa_port_mdb_add function directly with no prepare phase or no switchdev trans object, and the framework ends up calling an undefined .port_mdb_add callback. This results in a NULL pointer dereference, as shown in the log below. The other functions seem to be properly guarded. Do the same for .port_mdb_add in dsa_switch_mdb_add_bitmap() as well. 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = (ptrval) [00000000] *pgd=00000000 Internal error: Oops: 80000005 [#1] SMP ARM Modules linked in: rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi mac80211 cfg80211 CPU: 1 PID: 134 Comm: kworker/1:2 Not tainted 5.3.0-rc1-00247-gd3519030752a #1 Hardware name: Allwinner sun7i (A20) Family Workqueue: events switchdev_deferred_process_work PC is at 0x0 LR is at dsa_switch_event+0x570/0x620 pc : [<00000000>] lr : [<c08533ec>] psr: 80070013 sp : ee871db8 ip : 00000000 fp : ee98d0a4 r10: 0000000c r9 : 00000008 r8 : ee89f710 r7 : ee98d040 r6 : ee98d088 r5 : c0f04c48 r4 : ee98d04c r3 : 00000000 r2 : ee89f710 r1 : 00000008 r0 : ee98d040 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 6deb406a DAC: 00000051 Process kworker/1:2 (pid: 134, stack limit = 0x(ptrval)) Stack: (0xee871db8 to 0xee872000) 1da0: ee871e14 103ace2d 1dc0: 00000000 ffffffff 00000000 ee871e14 00000005 00000000 c08524a0 00000000 1de0: ffffe000 c014bdfc c0f04c48 ee871e98 c0f04c48 ee9e5000 c0851120 c014bef0 1e00: 00000000 b643aea2 ee9b4068 c08509a8 ee2bf940 ee89f710 ee871ecb 00000000 1e20: 00000008 103ace2d 00000000 c087e248 ee29c868 103ace2d 00000001 ffffffff 1e40: 00000000 ee871e98 00000006 00000000 c0fb2a50 c087e2d0 ffffffff c08523c4 1e60: ffffffff c014bdfc 00000006 c0fad2d0 ee871e98 ee89f710 00000000 c014c500 1e80: 00000000 ee89f3c0 c0f04c48 00000000 ee9e5000 c087dfb4 ee9e5000 00000000 1ea0: ee89f710 ee871ecb 00000001 103ace2d 00000000 c0f04c48 00000000 c087e0a8 1ec0: 00000000 efd9a3e0 0089f3c0 103ace2d ee89f700 ee89f710 ee9e5000 00000122 1ee0: 00000100 c087e130 ee89f700 c0fad2c8 c1003ef0 c087de4c 2e928000 c0fad2ec 1f00: c0fad2ec ee839580 ef7a62c0 ef7a9400 00000000 c087def8 c0fad2ec c01447dc 1f20: ef315640 ef7a62c0 00000008 ee839580 ee839594 ef7a62c0 00000008 c0f03d00 1f40: ef7a62d8 ef7a62c0 ffffe000 c0145b84 ffffe000 c0fb2420 c0bfaa8c 00000000 1f60: ffffe000 ee84b600 ee84b5c0 00000000 ee870000 ee839580 c0145b40 ef0e5ea4 1f80: ee84b61c c014a6f8 00000001 ee84b5c0 c014a5b0 00000000 00000000 00000000 1fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) [<c014bdfc>] (notifier_call_chain) from [<c014bef0>] (raw_notifier_call_chain+0x18/0x20) [<c014bef0>] (raw_notifier_call_chain) from [<c08509a8>] (dsa_port_mdb_add+0x48/0x74) [<c08509a8>] (dsa_port_mdb_add) from [<c087e248>] (__switchdev_handle_port_obj_add+0x54/0xd4) [<c087e248>] (__switchdev_handle_port_obj_add) from [<c087e2d0>] (switchdev_handle_port_obj_add+0x8/0x14) [<c087e2d0>] (switchdev_handle_port_obj_add) from [<c08523c4>] (dsa_slave_switchdev_blocking_event+0x94/0xa4) [<c08523c4>] (dsa_slave_switchdev_blocking_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) [<c014bdfc>] (notifier_call_chain) from [<c014c500>] (blocking_notifier_call_chain+0x50/0x68) [<c014c500>] (blocking_notifier_call_chain) from [<c087dfb4>] (switchdev_port_obj_notify+0x44/0xa8) [<c087dfb4>] (switchdev_port_obj_notify) from [<c087e0a8>] (switchdev_port_obj_add_now+0x90/0x104) [<c087e0a8>] (switchdev_port_obj_add_now) from [<c087e130>] (switchdev_port_obj_add_deferred+0x14/0x5c) [<c087e130>] (switchdev_port_obj_add_deferred) from [<c087de4c>] (switchdev_deferred_process+0x64/0x104) [<c087de4c>] (switchdev_deferred_process) from [<c087def8>] (switchdev_deferred_process_work+0xc/0x14) [<c087def8>] (switchdev_deferred_process_work) from [<c01447dc>] (process_one_work+0x218/0x50c) [<c01447dc>] (process_one_work) from [<c0145b84>] (worker_thread+0x44/0x5bc) [<c0145b84>] (worker_thread) from [<c014a6f8>] (kthread+0x148/0x150) [<c014a6f8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c) Exception stack(0xee871fb0 to 0xee871ff8) 1fa0: 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 Code: bad PC value ---[ end trace 1292c61abd17b130 ]--- [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) corresponds to $ arm-linux-gnueabihf-addr2line -C -i -e vmlinux c08533ec linux/net/dsa/switch.c:156 linux/net/dsa/switch.c:178 linux/net/dsa/switch.c:328 Fixes: `e6db98db8a` ("net: dsa: add switch mdb bitmap functions") Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:36:51 -07:00
Petr Machata	8028ccda39	mlxsw: spectrum_ptp: Keep unmatched entries in a linked list To identify timestamps for matching with their packets, Spectrum-1 uses a five-tuple of (port, direction, domain number, message type, sequence ID). If there are several clients from the same domain behind a single port sending Delay_Req's, the only thing differentiating these packets, as far as Spectrum-1 is concerned, is the sequence ID. Should sequence IDs between individual clients be similar, conflicts may arise. That is not a problem to hardware, which will simply deliver timestamps on a first comes, first served basis. However the driver uses a simple hash table to store the unmatched pieces. When a new conflicting piece arrives, it pushes out the previously stored one, which if it is a packet, is delivered without timestamp. Later on as the corresponding timestamps arrive, the first one is mismatched to the second packet, and the second one is never matched and eventually is GCd. To correct this issue, instead of using a simple rhashtable, use rhltable to keep the unmatched entries. Previously, a found unmatched entry would always be removed from the hash table. That is not the case anymore--an incompatible entry is left in the hash table. Therefore removal from the hash table cannot be used to confirm the validity of the looked-up pointer, instead the lookup would simply need to be redone. Therefore move it inside the critical section. This simplifies a lot of the code. Fixes: `8748642751` ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Reported-by: Alex Veber <alexve@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:35:39 -07:00
Jonathan Neuschäfer	d81f41411c	net: nps_enet: Fix function names in doc comments Adjust the function names in two doc comments to match the corresponding functions. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:32:33 -07:00
David Howells	68553f1a6f	rxrpc: Fix local refcounting Fix rxrpc_unuse_local() to handle a NULL local pointer as it can be called on an unbound socket on which rx->local is not yet set. The following reproduced (includes omitted): int main(void) { socket(AF_RXRPC, SOCK_DGRAM, AF_INET); return 0; } causes the following oops to occur: BUG: kernel NULL pointer dereference, address: 0000000000000010 ... RIP: 0010:rxrpc_unuse_local+0x8/0x1b ... Call Trace: rxrpc_release+0x2b5/0x338 __sock_release+0x37/0xa1 sock_close+0x14/0x17 __fput+0x115/0x1e9 task_work_run+0x72/0x98 do_exit+0x51b/0xa7a ? __context_tracking_exit+0x4e/0x10e do_group_exit+0xab/0xab __x64_sys_exit_group+0x14/0x17 do_syscall_64+0x89/0x1d4 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: syzbot+20dee719a2e090427b5f@syzkaller.appspotmail.com Fixes: `730c5fd42c` ("rxrpc: Fix local endpoint refcounting") Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:28:29 -07:00
David Ahern	59c84b9fcf	netdevsim: Restore per-network namespace accounting for fib entries Prior to the commit in the fixes tag, the resource controller in netdevsim tracked fib entries and rules per network namespace. Restore that behavior. Fixes: `5fc494225c` ("netdevsim: create devlink instance per netdevsim instance") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 20:59:19 -07:00
David S. Miller	9481382b36	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2019-08-11 The following pull-request contains BPF updates for your net tree. The main changes are: 1) x64 JIT code generation fix for backward-jumps to 1st insn, from Alexei. 2) Fix buggy multi-closing of BTF file descriptor in libbpf, from Andrii. 3) Fix libbpf_num_possible_cpus() to make it thread safe, from Takshak. 4) Fix bpftool to dump an error if pinning fails, from Jakub. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 14:49:34 -07:00
Jakub Kicinski	57c722e932	net/tls: swap sk_write_space on close Now that we swap the original proto and clear the ULP pointer on close we have to make sure no callback will try to access the freed state. sk_write_space is not part of sk_prot, remember to swap it. Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com Fixes: `95fa145479` ("bpf: sockmap/tls, close can race with map free") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 19:55:22 -07:00
Dexuan Cui	6d0d779dca	hv_netvsc: Fix a warning of suspicious RCU usage This fixes a warning of "suspicious rcu_dereference_check() usage" when nload runs. Fixes: `776e726bfb` ("netvsc: fix RCU warning in get_stats") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:42:19 -07:00
David S. Miller	9566e650bf	mlx5-fixes-2019-08-08 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1Mf5AACgkQSD+KveBX +j7ztQgAiaCmQxJ6wV82KFVbmGA04oC3S0vz2lafTB1L3i2TqX6ZiRW07qGIPfml HtylFu1Qta6sXDlJC5iQK48TnQaOr4HJqTxigofrWA3Pt7sYI1FNVG73BR1p8M5D 11ZZM6zCGkCd8cD5hTMzhvEr2d5NSMIy5wzC8DiiEph9JwAEh3ojeteP9KuwYdzX /8W6swmwgMg/VKGlOwPhQcxSot0PcBm/a2sNjfGJ/NqyAb1yMQ5Bttt30A8346YN kIhUBWIvDkbWDxxokRFGbpx5VswqJpMDKUEACWoJT44JyfFBMpMXPNe6wLvd1IIf noxmAqe+CSTbr5UqJZdjqSrId5GhBA== =X/dE -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2019-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-08-08 This series introduces some fixes to mlx5 driver. Highlights: 1) From Tariq, Critical mlx5 kTLS fixes to better align with hw specs. 2) From Aya, Fixes to mlx5 tx devlink health reporter. 3) From Maxim, aRFs parsing to use flow dissector to avoid relying on invalid skb fields. Please pull and let me know if there is any problem. For -stable v4.3 ('net/mlx5e: Only support tx/rx pause setting for port owner') For -stable v4.9 ('net/mlx5e: Use flow keys dissector to parse packets for ARFS') For -stable v5.1 ('net/mlx5e: Fix false negative indication on tx reporter CQE recovery') ('net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter') ('net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off') Note: when merged with net-next this minor conflict will pop up: ++<<<<<<< (net-next) + if (is_eswitch_flow) { + flow->esw_attr->match_level = match_level; + flow->esw_attr->tunnel_match_level = tunnel_match_level; ++======= + if (flow->flags & MLX5E_TC_FLOW_ESWITCH) { + flow->esw_attr->inner_match_level = inner_match_level; + flow->esw_attr->outer_match_level = outer_match_level; ++>>>>>>> (net) To resolve, use hunks from net (2nd) and replace: if (flow->flags & MLX5E_TC_FLOW_ESWITCH) with if (is_eswitch_flow) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:23:33 -07:00
Taehee Yoo	8b6381600d	ixgbe: fix possible deadlock in ixgbe_service_task() ixgbe_service_task() calls unregister_netdev() under rtnl_lock(). But unregister_netdev() internally calls rtnl_lock(). So deadlock would occur. Fixes: `59dd45d550` ("ixgbe: firmware recovery mode") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:17:00 -07:00
David S. Miller	703acf6259	Merge branch 'Fix-collisions-in-socket-cookie-generation' Daniel Borkmann says: ==================== Fix collisions in socket cookie generation This change makes the socket cookie generator as a global counter instead of per netns in order to fix cookie collisions for BPF use cases we ran into. See main patch #1 for more details. Given the change is small/trivial and fixes an issue we're seeing my preference would be net tree (though it cleanly applies to net-next as well). Went for net tree instead of bpf tree here given the main change is in net/core/sock_diag.c, but either way would be fine with me. v1 -> v2: - Fix up commit description in patch #1, thanks Eric! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:14:46 -07:00
Daniel Borkmann	609a2ca57a	bpf: sync bpf.h to tools infrastructure Pull in updates in BPF helper function description. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:14:46 -07:00
Daniel Borkmann	cd48bdda4f	sock: make cookie generation global instead of per netns Generating and retrieving socket cookies are a useful feature that is exposed to BPF for various program types through bpf_get_socket_cookie() helper. The fact that the cookie counter is per netns is quite a limitation for BPF in practice in particular for programs in host namespace that use socket cookies as part of a map lookup key since they will be causing socket cookie collisions e.g. when attached to BPF cgroup hooks or cls_bpf on tc egress in host namespace handling container traffic from veth or ipvlan devices with peer in different netns. Change the counter to be global instead. Socket cookie consumers must assume the value as opqaue in any case. Not every socket must have a cookie generated and knowledge of the counter value itself does not provide much value either way hence conversion to global is fine. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Martynas Pumputis <m@lambda.lt> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:14:46 -07:00
David S. Miller	7bac762d8d	RxRPC fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl1NmIAACgkQ+7dXa6fL C2vroRAAioYEcgShBOH6uqjuGiWZjRsGX6Ppw8QKg7LTvvAes5oT74t3jL0a8as9 Zn3Y5bj7jueUH1vakFGFdycWNO8wgxD6+63Ki0M070Yy15lV9uH1VyAI8BJhMxHe M3v6Yjy5OzILNqqs5+9i3om8ocBLkwfiGlocHXxQv15MA6lMgDFl23d1J3L4KK0o sItorzvg90wIWNWR2jctbjqyAdS17LeXdUL41Oc3SXJi9wLHony3LAH3al4Rb9k7 FgWHyD9colDEezl4cWAehjZvm060EfWmuQsDT7+iDhNftokZTeDlRdnizDzAISYQ i3wleKXegxFpXJ6FfaJeL8BDMUjTVG/MWUWSCb4gKYOr1XC6ioVhvo2DH7mgcb+2 E4dP76u0S7hoYI0Vn3X8091Vc4rXv6tCgw71HYA5wWKnZIfxFo+conOFoNujRh3V 3DZXj/hr/3UzIA8tsqWadfGfE9O08bObnhqfqFCzSTy3UhlWGkKCKEIyJljhh2wb ztvDpASl2L62NPtd0irHG2yHxbVFH0lWcF1buL/wbndLo3HaQ7bTxQ/s8RDOpKTH 40Rv9Leih4QuT8WLrEMhRkgxE1heP9dGC34ZD2r1eZX6Lb9A5w21GjYd3/hl0dyG llb75sqyBfAbfYveMij58hLuk4GW7lcaoQ0rnDwvS6yJ1LTHxLI= =u6EZ -----END PGP SIGNATURE----- Merge tag 'rxrpc-fixes-20190809' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== Here's a couple of fixes for rxrpc: (1) Fix refcounting of the local endpoint. (2) Don't calculate or report packet skew information. This has been obsolete since AFS 3.1 and so is a waste of resources. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 11:27:17 -07:00
Daniel Borkmann	4f7aafd78a	Merge branch 'bpf-bpftool-pinning-error-msg' Jakub Kicinski says: ==================== First make sure we don't use "prog" in error messages because the pinning operation could be performed on a map. Second add back missing error message if pin syscall failed. ==================== Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-09 17:38:53 +02:00
Jakub Kicinski	3c7be384fe	tools: bpftool: add error message on pin failure No error message is currently printed if the pin syscall itself fails. It got lost in the loadall refactoring. Fixes: `77380998d9` ("bpftool: add loadall command") Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-09 17:38:53 +02:00
Jakub Kicinski	b3e78adcbf	tools: bpftool: fix error message (prog -> object) Change an error message to work for any object being pinned not just programs. Fixes: `71bb428fe2` ("tools: bpf: add bpftool") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-09 17:38:53 +02:00
David Howells	e8c3af6bb3	rxrpc: Don't bother generating maxSkew in the ACK packet Don't bother generating maxSkew in the ACK packet as it has been obsolete since AFS 3.1. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>	2019-08-09 15:24:00 +01:00
David Howells	730c5fd42c	rxrpc: Fix local endpoint refcounting The object lifetime management on the rxrpc_local struct is broken in that the rxrpc_local_processor() function is expected to clean up and remove an object - but it may get requeued by packets coming in on the backing UDP socket once it starts running. This may result in the assertion in rxrpc_local_rcu() firing because the memory has been scheduled for RCU destruction whilst still queued: rxrpc: Assertion failed ------------[ cut here ]------------ kernel BUG at net/rxrpc/local_object.c:468! Note that if the processor comes around before the RCU free function, it will just do nothing because ->dead is true. Fix this by adding a separate refcount to count active users of the endpoint that causes the endpoint to be destroyed when it reaches 0. The original refcount can then be used to refcount objects through the work processor and cause the memory to be rcu freed when that reaches 0. Fixes: `4f95dd78a7` ("rxrpc: Rework local endpoint management") Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>	2019-08-09 15:21:19 +01:00
Pablo Neira Ayuso	1e5b2471bc	netfilter: nf_flow_table: teardown flow timeout race Flows that are in teardown state (due to RST / FIN TCP packet) still have their offload flag set on. Hence, the conntrack garbage collector may race to undo the timeout adjustment that the fixup routine performs, leaving the conntrack entry in place with the internal offload timeout (one day). Update teardown flow state to ESTABLISHED and set tracking to liberal, then once the offload bit is cleared, adjust timeout if it is more than the default fixup timeout (conntrack might already have set a lower timeout from the packet path). Fixes: `da5984e510` ("netfilter: nf_flow_table: add support for sending flows back to the slow path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-09 14:41:21 +02:00
Pablo Neira Ayuso	3e68db2f64	netfilter: nf_flow_table: conntrack picks up expired flows Update conntrack entry to pick up expired flows, otherwise the conntrack entry gets stuck with the internal offload timeout (one day). The TCP state also needs to be adjusted to ESTABLISHED state and tracking is set to liberal mode in order to give conntrack a chance to pick up the expired flow. Fixes: `ac2a66665e` ("netfilter: add generic flow table infrastructure") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-09 14:41:20 +02:00
Pablo Neira Ayuso	6a0a8d10a3	netfilter: nf_tables: use-after-free in failing rule with bound set If a rule that has already a bound anonymous set fails to be added, the preparation phase releases the rule and the bound set. However, the transaction object from the abort path still has a reference to the set object that is stale, leading to a use-after-free when checking for the set->bound field. Add a new field to the transaction that specifies if the set is bound, so the abort path can skip releasing it since the rule command owns it and it takes care of releasing it. After this update, the set->bound field is removed. [ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434 [ 24.657858] Mem abort info: [ 24.660686] ESR = 0x96000004 [ 24.663769] Exception class = DABT (current EL), IL = 32 bits [ 24.669725] SET = 0, FnV = 0 [ 24.672804] EA = 0, S1PTW = 0 [ 24.675975] Data abort info: [ 24.678880] ISV = 0, ISS = 0x00000004 [ 24.682743] CM = 0, WnR = 0 [ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000 [ 24.692207] [0000000000040434] pgd=0000000000000000 [ 24.697119] Internal error: Oops: 96000004 [#1] SMP [...] [ 24.889414] Call trace: [ 24.891870] __nf_tables_abort+0x3f0/0x7a0 [ 24.895984] nf_tables_abort+0x20/0x40 [ 24.899750] nfnetlink_rcv_batch+0x17c/0x588 [ 24.904037] nfnetlink_rcv+0x13c/0x190 [ 24.907803] netlink_unicast+0x18c/0x208 [ 24.911742] netlink_sendmsg+0x1b0/0x350 [ 24.915682] sock_sendmsg+0x4c/0x68 [ 24.919185] ___sys_sendmsg+0x288/0x2c8 [ 24.923037] __sys_sendmsg+0x7c/0xd0 [ 24.926628] __arm64_sys_sendmsg+0x2c/0x38 [ 24.930744] el0_svc_common.constprop.0+0x94/0x158 [ 24.935556] el0_svc_handler+0x34/0x90 [ 24.939322] el0_svc+0x8/0xc [ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863) [ 24.948336] ---[ end trace cebbb9dcbed3b56f ]--- Fixes: `f6ac858589` ("netfilter: nf_tables: unbind set in rule from commit path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-09 14:41:13 +02:00
Fuqian Huang	8c25d0887a	net: tundra: tsi108: use spin_lock_irqsave instead of spin_lock_irq in IRQ context As spin_unlock_irq will enable interrupts. Function tsi108_stat_carry is called from interrupt handler tsi108_irq. Interrupts are enabled in interrupt handler. Use spin_lock_irqsave/spin_unlock_irqrestore instead of spin_(un)lock_irq in IRQ context to avoid this. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:43:34 -07:00
YueHaibing	227f2f030e	team: Add vlan tx offload to hw_enc_features We should also enable team's vlan tx offload in hw_enc_features, pass the vlan packets to the slave devices with vlan tci, let the slave handle vlan tunneling offload implementation. Fixes: `3268e5cb49` ("team: Advertise tunneling offload features") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:41:41 -07:00
Jakub Kicinski	414776621d	net/tls: prevent skb_orphan() from leaking TLS plain text with offload sk_validate_xmit_skb() and drivers depend on the sk member of struct sk_buff to identify segments requiring encryption. Any operation which removes or does not preserve the original TLS socket such as skb_orphan() or skb_clone() will cause clear text leaks. Make the TCP socket underlying an offloaded TLS connection mark all skbs as decrypted, if TLS TX is in offload mode. Then in sk_validate_xmit_skb() catch skbs which have no socket (or a socket with no validation) and decrypted flag set. Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and sk->sk_validate_xmit_skb are slightly interchangeable right now, they all imply TLS offload. The new checks are guarded by CONFIG_TLS_DEVICE because that's the option guarding the sk_buff->decrypted member. Second, smaller issue with orphaning is that it breaks the guarantee that packets will be delivered to device queues in-order. All TLS offload drivers depend on that scheduling property. This means skb_orphan_partial()'s trick of preserving partial socket references will cause issues in the drivers. We need a full orphan, and as a result netem delay/throttling will cause all TLS offload skbs to be dropped. Reusing the sk_buff->decrypted flag also protects from leaking clear text when incoming, decrypted skb is redirected (e.g. by TC). See commit `0608c69c9a` ("bpf: sk_msg, sock{map\|hash} redirect through ULP") for justification why the internal flag is safe. The only location which could leak the flag in is tcp_bpf_sendmsg(), which is taken care of by clearing the previously unused bit. v2: - remove superfluous decrypted mark copy (Willem); - remove the stale doc entry (Boris); - rely entirely on EOR marking to prevent coalescing (Boris); - use an internal sendpages flag instead of marking the socket (Boris). v3 (Willem): - reorganize the can_skb_orphan_partial() condition; - fix the flag leak-in through tcp_bpf_sendmsg. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:39:35 -07:00
David S. Miller	0de94de180	Merge branch 'skbedit-batch-fixes' Roman Mashak says: ==================== Fix batched event generation for skbedit action When adding or deleting a batch of entries, the kernel sends up to TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user space. However it does not consider that the action sizes may vary and require different skb sizes. For example, consider the following script adding 32 entries with all supported skbedit parameters and cookie (in order to maximize netlink messages size): % cat tc-batch.sh TC="sudo /mnt/iproute2.git/tc/tc" $TC actions flush action skbedit for i in `seq 1 $1`; do cmd="action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd \ ptype host inheritdsfield \ index $i cookie aabbccddeeff112233445566778800a1 " args=$args$cmd done $TC actions add $args % % ./tc-batch.sh 32 Error: Failed to fill netlink attributes while adding TC action. We have an error talking to the kernel % patch 1 adds callback in tc_action_ops of skbedit action, which calculates the action size, and passes size to tcf_add_notify()/tcf_del_notify(). patch 2 updates the TDC test suite with relevant skbedit test cases. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:37:06 -07:00
Roman Mashak	7bc161846d	tc-testing: updated skbedit action tests with batch create/delete Update TDC tests with cases varifying ability of TC to install or delete batches of skbedit actions. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:37:06 -07:00
Roman Mashak	e1fea322fc	net sched: update skbedit action for batched events operations Add get_fill_size() routine used to calculate the action size when building a batch of events. Fixes: `ca9b0e27e` ("pkt_action: add new action skbedit") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:37:06 -07:00
YueHaibing	e3e3af9aa2	net: dsa: sja1105: remove set but not used variables 'tx_vid' and 'rx_vid' Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/dsa/sja1105/sja1105_main.c: In function sja1105_fdb_dump: drivers/net/dsa/sja1105/sja1105_main.c:1226:14: warning: variable tx_vid set but not used [-Wunused-but-set-variable] drivers/net/dsa/sja1105/sja1105_main.c:1226:6: warning: variable rx_vid set but not used [-Wunused-but-set-variable] They are not used since commit `6d7c7d948a` ("net: dsa: sja1105: Fix broken learning with vlan_filtering disabled") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:29:02 -07:00
YueHaibing	d595b03de2	bonding: Add vlan tx offload to hw_enc_features As commit `30d8177e8a` ("bonding: Always enable vlan tx offload") said, we should always enable bonding's vlan tx offload, pass the vlan packets to the slave devices with vlan tci, let them to handle vlan implementation. Now if encapsulation protocols like VXLAN is used, skb->encapsulation may be set, then the packet is passed to vlan device which based on bonding device. However in netif_skb_features(), the check of hw_enc_features: if (skb->encapsulation) features &= dev->hw_enc_features; clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results in same issue in commit `30d8177e8a` like this: vlan_dev_hard_start_xmit -->dev_queue_xmit -->validate_xmit_skb -->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared -->validate_xmit_vlan -->__vlan_hwaccel_push_inside //skb->tci is cleared ... --> bond_start_xmit --> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34 --> __skb_flow_dissect // nhoff point to IP header --> case htons(ETH_P_8021Q) // skb_vlan_tag_present is false, so vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), //vlan point to ip header wrongly Fixes: `b2a103e6d0` ("bonding: convert to ndo_fix_features") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:01:44 -07:00
Ivan Khoronzhuk	51650d33b2	net: sched: sch_taprio: fix memleak in error path for sched list parse In error case, all entries should be freed from the sched list before deleting it. For simplicity use rcu way. Fixes: `5a781ccbd1` ("tc: Add support for configuring the taprio scheduler") Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:00:24 -07:00
Stephen Hemminger	fe90689fed	net: docs: replace IPX in tuntap documentation IPX is no longer supported, but the example in the documentation might useful. Replace it with IPv6. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:06:53 -07:00
Stephen Hemminger	7e7c076e12	docs: admin-guide: remove references to IPX and token-ring Both IPX and TR have not been supported for a while now. Remove them from the /proc/sys/net documentation. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:06:53 -07:00
Ross Lagerwall	3a0233ddec	xen/netback: Reset nr_frags before freeing skb At this point nr_frags has been incremented but the frag does not yet have a page assigned so freeing the skb results in a crash. Reset nr_frags before freeing the skb to prevent this. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:01:24 -07:00
Guillaume Nault	891584f48a	inet: frags: re-introduce skb coalescing for local delivery Before commit `d4289fcc9b` ("net: IP6 defrag: use rbtrees for IPv6 defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus generating many fragments) and running over an IPsec tunnel, reported more than 6Gbps throughput. After that patch, the same test gets only 9Mbps when receiving on a be2net nic (driver can make a big difference here, for example, ixgbe doesn't seem to be affected). By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing (IPv4 fragment coalescing was dropped by commit `14fe22e334` ("Revert "ipv4: use skb coalescing in defragmentation"")). Without fragment coalescing, be2net runs out of Rx ring entries and starts to drop frames (ethtool reports rx_drops_no_frags errors). Since the netperf traffic is only composed of UDP fragments, any lost packet prevents reassembly of the full datagram. Therefore, fragments which have no possibility to ever get reassembled pile up in the reassembly queue, until the memory accounting exeeds the threshold. At that point no fragment is accepted anymore, which effectively discards all netperf traffic. When reassembly timeout expires, some stale fragments are removed from the reassembly queue, so a few packets can be received, reassembled and delivered to the netperf receiver. But the nic still drops frames and soon the reassembly queue gets filled again with stale fragments. These long time frames where no datagram can be received explain why the performance drop is so significant. Re-introducing fragment coalescing is enough to get the initial performances again (6.6Gbps with be2net): driver doesn't drop frames anymore (no more rx_drops_no_frags errors) and the reassembly engine works at full speed. This patch is quite conservative and only coalesces skbs for local IPv4 and IPv6 delivery (in order to avoid changing skb geometry when forwarding). Coalescing could be extended in the future if need be, as more scenarios would probably benefit from it. [0]: Test configuration Sender: ip xfrm policy flush ip xfrm state flush ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1 ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1 ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow netserver -D -L fc00:2::1 Receiver: ip xfrm policy flush ip xfrm state flush ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1 ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1 ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6 Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 15:55:10 -07:00
Aya Levin	a4e508cab6	net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter Remove check of recovery bit, in the beginning of the CQE recovery function. This test is already performed right before the reporter is invoked, when CQE error is detected. Fixes: `de8650a820` ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:20 -07:00

1 2 3 4 5 ...

856418 Commits All Branches Search

856418 Commits

All Branches