OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Lijun Pan	a0faaa27c7	ibmvnic: fix NULL pointer dereference in reset_sub_crq_queues adapter->tx_scrq and adapter->rx_scrq could be NULL if the previous reset did not complete after freeing sub crqs. Check for NULL before dereferencing them. Snippet of call trace: ibmvnic 30000006 env6: Releasing sub-CRQ ibmvnic 30000006 env6: Releasing CRQ ... ibmvnic 30000006 env6: Got Control IP offload Response ibmvnic 30000006 env6: Re-setting tx_scrq[0] BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc008000003dea7cc Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag tun bridge stp llc rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmvnic ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod CPU: 80 PID: 1856 Comm: kworker/80:2 Tainted: G W 5.8.0+ #4 Workqueue: events __ibmvnic_reset [ibmvnic] NIP: c008000003dea7cc LR: c008000003dea7bc CTR: 0000000000000000 REGS: c0000007ef7db860 TRAP: 0380 Tainted: G W (5.8.0+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28002422 XER: 0000000d CFAR: c000000000bd9520 IRQMASK: 0 GPR00: c008000003dea7bc c0000007ef7dbaf0 c008000003df7400 c0000007fa26ec00 GPR04: c0000007fcd0d008 c0000007fcd96350 0000000000000027 c0000007fcd0d010 GPR08: 0000000000000023 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000002000 c00000001ec18e00 c0000000001982f8 c0000007bad6e840 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 fffffffffffffef7 GPR24: 0000000000000402 c0000007fa26f3a8 0000000000000003 c00000016f8ec048 GPR28: 0000000000000000 0000000000000000 0000000000000000 c0000007fa26ec00 NIP [c008000003dea7cc] ibmvnic_reset_init+0x15c/0x258 [ibmvnic] LR [c008000003dea7bc] ibmvnic_reset_init+0x14c/0x258 [ibmvnic] Call Trace: [c0000007ef7dbaf0] [c008000003dea7bc] ibmvnic_reset_init+0x14c/0x258 [ibmvnic] (unreliable) [c0000007ef7dbb80] [c008000003de8860] __ibmvnic_reset+0x408/0x970 [ibmvnic] [c0000007ef7dbc50] [c00000000018b7cc] process_one_work+0x2cc/0x800 [c0000007ef7dbd20] [c00000000018bd78] worker_thread+0x78/0x520 [c0000007ef7dbdb0] [c0000000001984c4] kthread+0x1d4/0x1e0 [c0000007ef7dbe20] [c00000000000cea8] ret_from_kernel_thread+0x5c/0x74 Fixes: `57a49436f4` ("ibmvnic: Reset sub-crqs during driver reset") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 16:28:34 -08:00
Jakub Kicinski	5fc145f155	Merge branch 'fixes-for-ena-driver' Shay Agroskin says: ==================== Fixes for ENA driver - fix wrong data offset on machines that support rx offset - work-around Intel iommu issue - fix out of bound access when request id is wrong ==================== Link: https://lore.kernel.org/r/20201123190859.21298-1-shayagr@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 16:07:17 -08:00
Shay Agroskin	1396d3148b	net: ena: fix packet's addresses for rx_offset feature This patch fixes two lines in which the rx_offset received by the device wasn't taken into account: - prefetch function: In our driver the copied data would reside in rx_info->page + rx_headroom + rx_offset so the prefetch function is changed accordingly. - setting page_offset to zero for descriptors > 1: for every descriptor but the first, the rx_offset is zero. Hence the page_offset value should be set to rx_headroom. The previous implementation changed the value of rx_info after the descriptor was added to the SKB (essentially providing wrong page offset). Fixes: `68f236df93` ("net: ena: add support for the rx offset feature") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 16:07:13 -08:00
Shay Agroskin	09323b3bca	net: ena: set initial DMA width to avoid intel iommu issue The ENA driver uses the readless mechanism, which uses DMA, to find out what the DMA mask is supposed to be. If DMA is used without setting the dma_mask first, it causes the Intel IOMMU driver to think that ENA is a 32-bit device and therefore disables IOMMU passthrough permanently. This patch sets the dma_mask to be ENA_MAX_PHYS_ADDR_SIZE_BITS=48 before readless initialization in ena_device_init()->ena_com_mmio_reg_read_request_init(), which is large enough to workaround the intel_iommu issue. DMA mask is set again to the correct value after it's received from the device after readless is initialized. The patch also changes the driver to use dma_set_mask_and_coherent() function instead of the two pci_set_dma_mask() and pci_set_consistent_dma_mask() ones. Both methods achieve the same effect. Fixes: `1738cd3ed3` ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Mike Cui <mikecui@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 16:07:13 -08:00
Shay Agroskin	5b7022cf1d	net: ena: handle bad request id in ena_netdev After request id is checked in validate_rx_req_id() its value is still used in the line rx_ring->free_ids[next_to_clean] = rx_ring->ena_bufs[i].req_id; even if it was found to be out-of-bound for the array free_ids. The patch moves the request id to an earlier stage in the napi routine and makes sure its value isn't used if it's found out-of-bounds. Fixes: `30623e1ed1` ("net: ena: avoid memory access violation by validating req_id properly") Signed-off-by: Ido Segev <idose@amazon.com> Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 16:07:13 -08:00
Krzysztof Kozlowski	d8f0a86795	nfc: s3fwrn5: use signed integer for parsing GPIO numbers GPIOs - as returned by of_get_named_gpio() and used by the gpiolib - are signed integers, where negative number indicates error. The return value of of_get_named_gpio() should not be assigned to an unsigned int because in case of !CONFIG_GPIOLIB such number would be a valid GPIO. Fixes: `c04c674fad` ("nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20201123162351.209100-1-krzk@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 15:00:53 -08:00
Ezequiel Garcia	078eb55cdf	dpaa2-eth: Fix compile error due to missing devlink support The dpaa2 driver depends on devlink, so it should select NET_DEVLINK in order to fix compile errors, such as: drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.o: in function `dpaa2_eth_rx_err': dpaa2-eth.c:(.text+0x3cec): undefined reference to `devlink_trap_report' drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.o: in function `dpaa2_eth_dl_info_get': dpaa2-eth-devlink.c:(.text+0x160): undefined reference to `devlink_info_driver_name_put' Fixes: `ceeb03ad8e` ("dpaa2-eth: add basic devlink support") Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20201123163553.1666476-1-ciorneiioana@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 14:57:55 -08:00
Jesper Dangaard Brouer	bc40a3691f	MAINTAINERS: Update page pool entry Add some file F: matches that is related to page_pool. Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/160613894639.2826716.14635284017814375894.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 14:45:21 -08:00
Alexander Duyck	407c85c7dd	tcp: Set ECT0 bit in tos/tclass for synack when BPF needs ECN When a BPF program is used to select between a type of TCP congestion control algorithm that uses either ECN or not there is a case where the synack for the frame was coming up without the ECT0 bit set. A bit of research found that this was due to the final socket being configured to dctcp while the listener socket was staying in cubic. To reproduce it all that is needed is to monitor TCP traffic while running the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed, assuming tcp_dctcp module is loaded or compiled in and the traffic matches the rules in the sample file, is that for all frames with the exception of the synack the ECT0 bit is set. To address that it is necessary to make one additional call to tcp_bpf_ca_needs_ecn using the request socket and then use the output of that to set the ECT0 bit for the tos/tclass of the packet. Fixes: `91b5b21c7c` ("bpf: Add support for changing congestion control") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/160593039663.2604.1374502006916871573.stgit@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 14:12:55 -08:00
Moshe Shemesh	5204bb683c	devlink: Fix reload stats structure Fix reload stats structure exposed to the user. Change stats structure hierarchy to have the reload action as a parent of the stat entry and then stat entry includes value per limit. This will also help to avoid string concatenation on iproute2 output. Reload stats structure before this fix: "stats": { "reload": { "driver_reinit": 2, "fw_activate": 1, "fw_activate_no_reset": 0 } } After this fix: "stats": { "reload": { "driver_reinit": { "unspecified": 2 }, "fw_activate": { "unspecified": 1, "no_reset": 0 } } Fixes: `a254c26426` ("devlink: Add reload stats") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1606109785-25197-1-git-send-email-moshe@mellanox.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 13:04:04 -08:00
Lincoln Ramsay	9bd2702d29	aquantia: Remove the build_skb path When performing IPv6 forwarding, there is an expectation that SKBs will have some headroom. When forwarding a packet from the aquantia driver, this does not always happen, triggering a kernel warning. aq_ring.c has this code (edited slightly for brevity): if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) { skb = build_skb(aq_buf_vaddr(&buff->rxdata), AQ_CFG_RX_FRAME_MAX); } else { skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE); There is a significant difference between the SKB produced by these 2 code paths. When napi_alloc_skb creates an SKB, there is a certain amount of headroom reserved. However, this is not done in the build_skb codepath. As the hardware buffer that build_skb is built around does not handle the presence of the SKB header, this code path is being removed and the napi_alloc_skb path will always be used. This code path does have to copy the packet header into the SKB, but it adds the packet data as a frag. Fixes: `018423e90b` ("net: ethernet: aquantia: Add ring support code") Signed-off-by: Lincoln Ramsay <lincoln.ramsay@opengear.com> Link: https://lore.kernel.org/r/MWHPR1001MB23184F3EAFA413E0D1910EC9E8FC0@MWHPR1001MB2318.namprd10.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-24 10:59:17 -08:00
Eyal Birger	d549699048	net/packet: fix packet receive on L3 devices without visible hard header In the patchset merged by commit `b9fcf0a0d8` ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which did not have header_ops were given one for the purpose of protocol parsing on af_packet transmit path. That change made af_packet receive path regard these devices as having a visible L3 header and therefore aligned incoming skb->data to point to the skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not reset their mac_header prior to ingress and therefore their incoming packets became malformed. Ideally these devices would reset their mac headers, or af_packet would be able to rely on dev->hard_header_len being 0 for such cases, but it seems this is not the case. Fix by changing af_packet RX ll visibility criteria to include the existence of a '.create()' header operation, which is used when creating a device hard header - via dev_hard_header() - by upper layers, and does not exist in these L3 devices. As this predicate may be useful in other situations, add it as a common dev_has_header() helper in netdevice.h. Fixes: `b9fcf0a0d8` ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20201121062817.3178900-1-eyal.birger@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 17:29:36 -08:00
Sylwester Dziedziuch	2980cbd4dc	i40e: Fix removing driver while bare-metal VFs pass traffic Prevent VFs from resetting when PF driver is being unloaded: - introduce new pf state: __I40E_VF_RESETS_DISABLED; - check if pf state has __I40E_VF_RESETS_DISABLED state set, if so, disable any further VFLR event notifications; - when i40e_remove (rmmod i40e) is called, disable any resets on the VFs; Previously if there were bare-metal VFs passing traffic and PF driver was removed, there was a possibility of VFs triggering a Tx timeout right before iavf_remove. This was causing iavf_close to not be called because there is a check in the beginning of iavf_remove that bails out early if adapter->state < IAVF_DOWN_PENDING. This makes it so some resources do not get cleaned up. Fixes: `6a9ddb36ee` ("i40e: disable IOV before freeing resources") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20201120180640.3654474-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 16:52:59 -08:00
Stefano Garzarella	3fe356d58e	vsock/virtio: discard packets only when socket is really closed Starting from commit `8692cefc43` ("virtio_vsock: Fix race condition in virtio_transport_recv_pkt"), we discard packets in virtio_transport_recv_pkt() if the socket has been released. When the socket is connected, we schedule a delayed work to wait the RST packet from the other peer, also if SHUTDOWN_MASK is set in sk->sk_shutdown. This is done to complete the virtio-vsock shutdown algorithm, releasing the port assigned to the socket definitively only when the other peer has consumed all the packets. If we discard the RST packet received, the socket will be closed only when the VSOCK_CLOSE_TIMEOUT is reached. Sergio discovered the issue while running ab(1) HTTP benchmark using libkrun [1] and observing a latency increase with that commit. To avoid this issue, we discard packet only if the socket is really closed (SOCK_DONE flag is set). We also set SOCK_DONE in virtio_transport_release() when we don't need to wait any packets from the other peer (we didn't schedule the delayed work). In this case we remove the socket from the vsock lists, releasing the port assigned. [1] https://github.com/containers/libkrun Fixes: `8692cefc43` ("virtio_vsock: Fix race condition in virtio_transport_recv_pkt") Cc: justin.he@arm.com Reported-by: Sergio Lopez <slp@redhat.com> Tested-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jia He <justin.he@arm.com> Link: https://lore.kernel.org/r/20201120104736.73749-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 16:36:29 -08:00
Ricardo Dias	01770a1661	tcp: fix race condition when creating child sockets from syncookies When the TCP stack is in SYN flood mode, the server child socket is created from the SYN cookie received in a TCP packet with the ACK flag set. The child socket is created when the server receives the first TCP packet with a valid SYN cookie from the client. Usually, this packet corresponds to the final step of the TCP 3-way handshake, the ACK packet. But is also possible to receive a valid SYN cookie from the first TCP data packet sent by the client, and thus create a child socket from that SYN cookie. Since a client socket is ready to send data as soon as it receives the SYN+ACK packet from the server, the client can send the ACK packet (sent by the TCP stack code), and the first data packet (sent by the userspace program) almost at the same time, and thus the server will equally receive the two TCP packets with valid SYN cookies almost at the same instant. When such event happens, the TCP stack code has a race condition that occurs between the momement a lookup is done to the established connections hashtable to check for the existence of a connection for the same client, and the moment that the child socket is added to the established connections hashtable. As a consequence, this race condition can lead to a situation where we add two child sockets to the established connections hashtable and deliver two sockets to the userspace program to the same client. This patch fixes the race condition by checking if an existing child socket exists for the same client when we are adding the second child socket to the established connections socket. If an existing child socket exists, we drop the packet and discard the second child socket to the same client. Signed-off-by: Ricardo Dias <rdias@singlestore.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201120111133.GA67501@rdias-suse-pc.lan Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 16:32:33 -08:00
Jakub Kicinski	1eae77bfad	wireless-drivers fixes for v5.10 First set of fixes for v5.10. One fix for iwlwifi kernel panic, others less notable. rtw88 * fix a bogus test found by clang iwlwifi * fix long memory reads causing soft lockup warnings * fix kernel panic during Channel Switch Announcement (CSA) * other smaller fixes MAINTAINERS * email address updates -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJfu93zAAoJEG4XJFUm622bc3wH+wVAoqqf/lM2e99uq+Z7KgRc HNX1ApQcjidObFBUIiXcuVPj1V8w0HXqE7r4EXznWhKu/01ch2CNaNmN5Ttwp9dA aIhjAx5kiCp7z/kJkTRSDh1vg6WwLDQ29il8nBiveXzD+VVLfcsPlKIF+dsXFOpz FbISvCp4j/4TNME4u6iOafdurCchdOeuHjoZGEWAWkl1wLiryow6WlTdaYou5vTt EDNZa2px4QXb7DS1qAr0SWHBkBH6YfdIW7zDQ/zGgrCmCE3IQNdIKQGw4/K1pn1a 9KbURnI7JgO28f/8wQjzfh6vJw4XA4P/v6mHMDIcFnS0t4YHCTfkfT0YFlBtUiI= =870O -----END PGP SIGNATURE----- Merge tag 'wireless-drivers-2020-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.10 First set of fixes for v5.10. One fix for iwlwifi kernel panic, others less notable. rtw88 * fix a bogus test found by clang iwlwifi * fix long memory reads causing soft lockup warnings * fix kernel panic during Channel Switch Announcement (CSA) * other smaller fixes MAINTAINERS * email address updates * tag 'wireless-drivers-2020-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers: iwlwifi: mvm: fix kernel panic in case of assert during CSA iwlwifi: pcie: set LTR to avoid completion timeout iwlwifi: mvm: write queue_sync_state only for sync iwlwifi: mvm: properly cancel a session protection for P2P iwlwifi: mvm: use the HOT_SPOT_CMD to cancel an AUX ROC iwlwifi: sta: set max HE max A-MPDU according to HE capa MAINTAINERS: update maintainers list for Cypress MAINTAINERS: update Yan-Hsuan's email address iwlwifi: pcie: limit memory read spin time rtw88: fix fw_fifo_addr check ==================== Link: https://lore.kernel.org/r/20201123161037.C11D1C43460@smtp.codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-23 14:59:40 -08:00
Jakub Kicinski	f9b0365321	Merge branch 'ibmvnic-fixes-in-reset-path' Lijun Pan says: ==================== ibmvnic: fixes in reset path Patch 1/3 and 2/3 notify peers in failover and migration reset. Patch 3/3 skips timeout reset if it is already resetting. ==================== Link: https://lore.kernel.org/r/20201120224013.46891-1-ljp@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 15:30:50 -08:00
Lijun Pan	855a631a4c	ibmvnic: skip tx timeout reset while in resetting Sometimes it takes longer than 5 seconds (watchdog timeout) to complete failover, migration, and other resets. In stead of scheduling another timeout reset, we wait for the current one to complete. Suggested-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 15:30:47 -08:00
Lijun Pan	98025bce3a	ibmvnic: notify peers when failover and migration happen Commit `61d3e1d9bc` ("ibmvnic: Remove netdev notify for failover resets") excluded the failover case for notify call because it said netdev_notify_peers() can cause network traffic to stall or halt. Current testing does not show network traffic stall or halt because of the notify call for failover event. netdev_notify_peers may be used when a device wants to inform the rest of the network about some sort of a reconfiguration such as failover or migration. It is unnecessary to call that in other events like FATAL, NON_FATAL, CHANGE_PARAM, and TIMEOUT resets since in those scenarios the hardware does not change. If the driver must do a hard reset, it is necessary to notify peers. Fixes: `61d3e1d9bc` ("ibmvnic: Remove netdev notify for failover resets") Suggested-by: Brian King <brking@linux.vnet.ibm.com> Suggested-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 15:30:47 -08:00
Lijun Pan	8393597579	ibmvnic: fix call_netdevice_notifiers in do_reset When netdev_notify_peers was substituted in commit `986103e792` ("net/ibmvnic: Fix RTNL deadlock during device reset"), call_netdevice_notifiers(NETDEV_RESEND_IGMP, dev) was missed. Fix it now. Fixes: `986103e792` ("net/ibmvnic: Fix RTNL deadlock during device reset") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 15:30:47 -08:00
Jens Axboe	5aac0390a6	tun: honor IOCB_NOWAIT flag tun only checks the file O_NONBLOCK flag, but it should also be checking the iocb IOCB_NOWAIT flag. Any fops using ->read/write_iter() should check both, otherwise it breaks users that correctly expect O_NONBLOCK semantics if IOCB_NOWAIT is set. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/e9451860-96cc-c7c7-47b8-fe42cadd5f4c@kernel.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 15:19:08 -08:00
Julian Wiedmann	c5dab0941f	net/af_iucv: set correct sk_protocol for child sockets Child sockets erroneously inherit their parent's sk_type (ie. SOCK_*), instead of the PF_IUCV protocol that the parent was created with in iucv_sock_create(). We're currently not using sk->sk_protocol ourselves, so this shouldn't have much impact (except eg. getting the output in skb_dump() right). Fixes: `eac3731bd0` ("[S390]: Add AF_IUCV socket support") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Link: https://lore.kernel.org/r/20201120100657.34407-1-jwi@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 14:43:45 -08:00
Yves-Alexis Perez	f33d9e2b48	usbnet: ipheth: fix connectivity with iOS 14 Starting with iOS 14 released in September 2020, connectivity using the personal hotspot USB tethering function of iOS devices is broken. Communication between the host and the device (for example ICMP traffic or DNS resolution using the DNS service running in the device itself) works fine, but communication to endpoints further away doesn't work. Investigation on the matter shows that no UDP and ICMP traffic from the tethered host is reaching the Internet at all. For TCP traffic there are exchanges between tethered host and server but packets are modified in transit leading to impossible communication. After some trials Matti Vuorela discovered that reducing the URB buffer size by two bytes restored the previous behavior. While a better solution might exist to fix the issue, since the protocol is not publicly documented and considering the small size of the fix, let's do that. Tested-by: Matti Vuorela <matti.vuorela@bitfactor.fi> Signed-off-by: Yves-Alexis Perez <corsac@corsac.net> Link: https://lore.kernel.org/linux-usb/CAAn0qaXmysJ9vx3ZEMkViv_B19ju-_ExN8Yn_uSefxpjS6g4Lw@mail.gmail.com/ Link: https://github.com/libimobiledevice/libimobiledevice/issues/1038 Link: https://lore.kernel.org/r/20201119172439.94988-1-corsac@corsac.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 14:01:34 -08:00
Tom Seewald	659fbdcf2f	cxgb4: Fix build failure when CONFIG_TLS=m After commit `9d2e5e9eeb` ("cxgb4/ch_ktls: decrypted bit is not enough") whenever CONFIG_TLS=m and CONFIG_CHELSIO_T4=y, the following build failure occurs: ld: drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.o: in function `cxgb_select_queue': cxgb4_main.c:(.text+0x2dac): undefined reference to `tls_validate_xmit_skb' Fix this by ensuring that if TLS is set to be a module, CHELSIO_T4 will also be compiled as a module. As otherwise the cxgb4 driver will not be able to access TLS' symbols. Fixes: `9d2e5e9eeb` ("cxgb4/ch_ktls: decrypted bit is not enough") Signed-off-by: Tom Seewald <tseewald@gmail.com> Link: https://lore.kernel.org/r/20201120192528.615-1-tseewald@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 13:10:18 -08:00
Jamie Iles	b9ad3e9f5a	bonding: wait for sysfs kobject destruction before freeing struct slave syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, releasing a struct slave device could result in the following splat: kobject: 'bonding_slave' (00000000cecdd4fe): kobject_release, parent 0000000074ceb2b2 (delayed 1000) bond0 (unregistering): (slave bond_slave_1): Releasing backup interface ------------[ cut here ]------------ ODEBUG: free active (active state 0) object type: timer_list hint: workqueue_select_cpu_near kernel/workqueue.c:1549 [inline] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98 kernel/workqueue.c:1600 WARNING: CPU: 1 PID: 842 at lib/debugobjects.c:485 debug_print_object+0x180/0x240 lib/debugobjects.c:485 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 842 Comm: kworker/u4:4 Tainted: G S 5.9.0-rc8+ #96 Hardware name: linux,dummy-virt (DT) Workqueue: netns cleanup_net Call trace: dump_backtrace+0x0/0x4d8 include/linux/bitmap.h:239 show_stack+0x34/0x48 arch/arm64/kernel/traps.c:142 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x174/0x1f8 lib/dump_stack.c:118 panic+0x360/0x7a0 kernel/panic.c:231 __warn+0x244/0x2ec kernel/panic.c:600 report_bug+0x240/0x398 lib/bug.c:198 bug_handler+0x50/0xc0 arch/arm64/kernel/traps.c:974 call_break_hook+0x160/0x1d8 arch/arm64/kernel/debug-monitors.c:322 brk_handler+0x30/0xc0 arch/arm64/kernel/debug-monitors.c:329 do_debug_exception+0x184/0x340 arch/arm64/mm/fault.c:864 el1_dbg+0x48/0xb0 arch/arm64/kernel/entry-common.c:65 el1_sync_handler+0x170/0x1c8 arch/arm64/kernel/entry-common.c:93 el1_sync+0x80/0x100 arch/arm64/kernel/entry.S:594 debug_print_object+0x180/0x240 lib/debugobjects.c:485 __debug_check_no_obj_freed lib/debugobjects.c:967 [inline] debug_check_no_obj_freed+0x200/0x430 lib/debugobjects.c:998 slab_free_hook mm/slub.c:1536 [inline] slab_free_freelist_hook+0x190/0x210 mm/slub.c:1577 slab_free mm/slub.c:3138 [inline] kfree+0x13c/0x460 mm/slub.c:4119 bond_free_slave+0x8c/0xf8 drivers/net/bonding/bond_main.c:1492 __bond_release_one+0xe0c/0xec8 drivers/net/bonding/bond_main.c:2190 bond_slave_netdev_event drivers/net/bonding/bond_main.c:3309 [inline] bond_netdev_event+0x8f0/0xa70 drivers/net/bonding/bond_main.c:3420 notifier_call_chain+0xf0/0x200 kernel/notifier.c:83 __raw_notifier_call_chain kernel/notifier.c:361 [inline] raw_notifier_call_chain+0x44/0x58 kernel/notifier.c:368 call_netdevice_notifiers_info+0xbc/0x150 net/core/dev.c:2033 call_netdevice_notifiers_extack net/core/dev.c:2045 [inline] call_netdevice_notifiers net/core/dev.c:2059 [inline] rollback_registered_many+0x6a4/0xec0 net/core/dev.c:9347 unregister_netdevice_many.part.0+0x2c/0x1c0 net/core/dev.c:10509 unregister_netdevice_many net/core/dev.c:10508 [inline] default_device_exit_batch+0x294/0x338 net/core/dev.c:10992 ops_exit_list.isra.0+0xec/0x150 net/core/net_namespace.c:189 cleanup_net+0x44c/0x888 net/core/net_namespace.c:603 process_one_work+0x96c/0x18c0 kernel/workqueue.c:2269 worker_thread+0x3f0/0xc30 kernel/workqueue.c:2415 kthread+0x390/0x498 kernel/kthread.c:292 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:925 This is a potential use-after-free if the sysfs nodes are being accessed whilst removing the struct slave, so wait for the object destruction to complete before freeing the struct slave itself. Fixes: `07699f9a7c` ("bonding: add sysfs /slave dir for bond slave devices.") Fixes: `a068aab422` ("bonding: Fix reference count leak in bond_sysfs_slave_add.") Cc: Qiushi Wu <wu000273@umn.edu> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20201120142827.879226-1-jamie@nuviainc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 13:07:00 -08:00
Jakub Kicinski	207d0bfc08	Merge branch 's390-qeth-fixes-2020-11-20' Julian Wiedmann says: ==================== s390/qeth: fixes 2020-11-20 This brings several fixes for qeth's af_iucv-specific code paths. Also one fix by Alexandra for the recently added BR_LEARNING_SYNC support. We want to trust the feature indication bit, so that HW can mask it out if there's any issues on their end. ==================== Link: https://lore.kernel.org/r/20201120090939.101406-1-jwi@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 18:59:50 -08:00
Julian Wiedmann	7ed10e16e5	s390/qeth: fix tear down of async TX buffers When qeth_iqd_tx_complete() detects that a TX buffer requires additional async completion via QAOB, it might fail to replace the queue entry's metadata (and ends up triggering recovery). Assume now that the device gets torn down, overruling the recovery. If the QAOB notification then arrives before the tear down has sufficiently progressed, the buffer state is changed to QETH_QDIO_BUF_HANDLED_DELAYED by qeth_qdio_handle_aob(). The tear down code calls qeth_drain_output_queue(), where qeth_cleanup_handled_pending() will then attempt to replace such a buffer _again_. If it succeeds this time, the buffer ends up dangling in its replacement's ->next_pending list ... where it will never be freed, since there's no further call to qeth_cleanup_handled_pending(). But the second attempt isn't actually needed, we can simply leave the buffer on the queue and re-use it after a potential recovery has completed. The qeth_clear_output_buffer() in qeth_drain_output_queue() will ensure that it's in a clean state again. Fixes: `72861ae792` ("qeth: recovery through asynchronous delivery") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 18:59:48 -08:00
Julian Wiedmann	8908f36d20	s390/qeth: fix af_iucv notification race The two expected notification sequences are 1. TX_NOTIFY_PENDING with a subsequent TX_NOTIFY_DELAYED_, when our TX completion code first observed the pending TX and the QAOB then completes at a later time; or 2. TX_NOTIFY_OK, when qeth_qdio_handle_aob() picked up the QAOB completion before our TX completion code even noticed that the TX was pending. But as qeth_iqd_tx_complete() and qeth_qdio_handle_aob() can run concurrently, we may end up with a race that results in a sequence of TX_NOTIFY_DELAYED_ followed by TX_NOTIFY_PENDING. Which would confuse the af_iucv code in its tracking of pending transmits. Rework the notification code, so that qeth_qdio_handle_aob() defers its notification if the TX completion code is still active. Fixes: `b333293058` ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 18:59:48 -08:00
Julian Wiedmann	34c7f50f7d	s390/qeth: make af_iucv TX notification call more robust Calling into socket code is ugly already, at least check whether we are dealing with the expected sk_family. Only looking at skb->protocol is bound to cause troubles (consider eg. af_packet). Fixes: `b333293058` ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 18:59:48 -08:00
Alexandra Winter	0d0e2b538c	s390/qeth: Remove pnso workaround Remove workaround that supported early hardware implementations of PNSO OC3. Rely on the 'enarf' feature bit instead. Fixes: `fa115adff2` ("s390/qeth: Detect PNSO OC3 capability") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> [jwi: use logical instead of bit-wise AND] Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 18:59:47 -08:00
Jakub Kicinski	e10823c719	Merge branch 'tcp-address-issues-with-ect0-not-being-set-in-dctcp-packets' Alexander Duyck says: ==================== tcp: Address issues with ECT0 not being set in DCTCP packets This patch set is meant to address issues seen with SYN/ACK packets not containing the ECT0 bit when DCTCP is configured as the congestion control algorithm for a TCP socket. A simple test using "tcpdump" and "test_progs -t bpf_tcp_ca" makes the issue obvious. Looking at the packets will result in the SYN/ACK packet with an ECT0 bit that does not match the other packets for the flow when the congestion control agorithm is switch from the default. So for example going from non-DCTCP to a DCTCP congestion control algorithm we will see the SYN/ACK IPV6 header will not have ECT0 set while the other packets in the flow will. Likewise if we switch from a default of DCTCP to cubic we will see the ECT0 bit set in the SYN/ACK while the other packets in the flow will not. ==================== Link: https://lore.kernel.org/r/160582070138.66684.11785214534154816097.stgit@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 18:10:47 -08:00
Alexander Duyck	55472017a4	tcp: Set INET_ECN_xmit configuration in tcp_reinit_congestion_control When setting congestion control via a BPF program it is seen that the SYN/ACK for packets within a given flow will not include the ECT0 flag. A bit of simple printk debugging shows that when this is configured without BPF we will see the value INET_ECN_xmit value initialized in tcp_assign_congestion_control however when we configure this via BPF the socket is in the closed state and as such it isn't configured, and I do not see it being initialized when we transition the socket into the listen state. The result of this is that the ECT0 bit is configured based on whatever the default state is for the socket. Any easy way to reproduce this is to monitor the following with tcpdump: tools/testing/selftests/bpf/test_progs -t bpf_tcp_ca Without this patch the SYN/ACK will follow whatever the default is. If dctcp all SYN/ACK packets will have the ECT0 bit set, and if it is not then ECT0 will be cleared on all SYN/ACK packets. With this patch applied the SYN/ACK bit matches the value seen on the other packets in the given stream. Fixes: `91b5b21c7c` ("bpf: Add support for changing congestion control") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 18:09:47 -08:00
Alexander Duyck	861602b577	tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header An issue was recently found where DCTCP SYN/ACK packets did not have the ECT bit set in the L3 header. A bit of code review found that the recent change referenced below had gone though and added a mask that prevented the ECN bits from being populated in the L3 header. This patch addresses that by rolling back the mask so that it is only applied to the flags coming from the incoming TCP request instead of applying it to the socket tos/tclass field. Doing this the ECT bits were restored in the SYN/ACK packets in my testing. One thing that is not addressed by this patch set is the fact that tcp_reflect_tos appears to be incompatible with ECN based congestion avoidance algorithms. At a minimum the feature should likely be documented which it currently isn't. Fixes: `ac8f1710c1` ("tcp: reflect tos value received in SYN to the socket") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 18:09:47 -08:00
Ioana Ciornei	d2624e70a2	dpaa2-eth: select XGMAC_MDIO for MDIO bus support Explicitly enable the FSL_XGMAC_MDIO Kconfig option in order to have MDIO access to internal and external PHYs. Fixes: `7194792308` ("dpaa2-eth: add MAC/PHY support through phylink") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20201119145106.712761-1-ciorneiioana@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 15:21:54 -08:00
Raju Rangoju	bff453921a	cxgb4: fix the panic caused by non smac rewrite SMT entry is allocated only when loopback Source MAC rewriting is requested. Accessing SMT entry for non smac rewrite cases results in kernel panic. Fix the panic caused by non smac rewrite Fixes: `937d842058` ("cxgb4: set up filter action after rewrites") Signed-off-by: Raju Rangoju <rajur@chelsio.com> Link: https://lore.kernel.org/r/20201118143213.13319-1-rajur@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 11:03:37 -08:00
Vadim Fedorenko	20ffc7adf5	net/tls: missing received data after fast remote close In case when tcp socket received FIN after some data and the parser haven't started before reading data caller will receive an empty buffer. This behavior differs from plain TCP socket and leads to special treating in user-space. The flow that triggers the race is simple. Server sends small amount of data right after the connection is configured to use TLS and closes the connection. In this case receiver sees TLS Handshake data, configures TLS socket right after Change Cipher Spec record. While the configuration is in process, TCP socket receives small Application Data record, Encrypted Alert record and FIN packet. So the TCP socket changes sk_shutdown to RCV_SHUTDOWN and sk_flag with SK_DONE bit set. The received data is not parsed upon arrival and is never sent to user-space. Patch unpauses parser directly if we have unparsed data in tcp receive queue. Fixes: `fcf4793e27` ("tls: check RCV_SHUTDOWN in tls_wait_data") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1605801588-12236-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 10:25:26 -08:00
Michael Chan	c54bc3ced5	bnxt_en: Release PCI regions when DMA mask setup fails during probe. Jump to init_err_release to cleanup. bnxt_unmap_bars() will also be called but it will do nothing if the BARs are not mapped yet. Fixes: `c0c050c58d` ("bnxt_en: New Broadcom ethernet driver.") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1605858271-8209-1-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 10:09:09 -08:00
Anmol Karn	3b3fd068c5	rose: Fix Null pointer dereference in rose_send_frame() rose_send_frame() dereferences `neigh->dev` when called from rose_transmit_clear_request(), and the first occurrence of the `neigh` is in rose_loopback_timer() as `rose_loopback_neigh`, and it is initialized in rose_add_loopback_neigh() as NULL. i.e when `rose_loopback_neigh` used in rose_loopback_timer() its `->dev` was still NULL and rose_loopback_timer() was calling rose_rx_call_request() without checking for NULL. - net/rose/rose_link.c This bug seems to get triggered in this line: rose_call = (ax25_address *)neigh->dev->dev_addr; Fix it by adding NULL checking for `rose_loopback_neigh->dev` in rose_loopback_timer(). Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Suggested-by: Jakub Kicinski <kuba@kernel.org> Reported-by: syzbot+a1c743815982d9496393@syzkaller.appspotmail.com Tested-by: syzbot+a1c743815982d9496393@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=9d2a7ca8c7f2e4b682c97578dfa3f236258300b3 Signed-off-by: Anmol Karn <anmol.karan123@gmail.com> Link: https://lore.kernel.org/r/20201119191043.28813-1-anmol.karan123@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 10:04:58 -08:00
Martin Habets	f46e79aa1a	MAINTAINERS: Change Solarflare maintainers Email from solarflare.com will stop working. Update the maintainers. A replacement for linux-net-drivers@solarflare.com is not working yet, for now remove it. Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Link: https://lore.kernel.org/r/20201120113207.GA1605547@mh-desktop Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 09:19:36 -08:00
Zhang Changzhong	3383176efc	bnxt_en: fix error return code in bnxt_init_board() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `c0c050c58d` ("bnxt_en: New Broadcom ethernet driver.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Link: https://lore.kernel.org/r/1605792621-6268-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-19 21:49:01 -08:00
Zhang Changzhong	b5f796b62c	bnxt_en: fix error return code in bnxt_init_one() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `c213eae8d3` ("bnxt_en: Improve VF/PF link change logic.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Link: https://lore.kernel.org/r/1605701851-20270-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-19 21:46:30 -08:00
Linus Torvalds	4d02da974e	Networking fixes for 5.10-rc5, including fixes from the WiFi (mac80211), can and bpf (including the strncpy_from_user fix). Current release - regressions: - mac80211: fix memory leak of filtered powersave frames - mac80211: free sta in sta_info_insert_finish() on errors to avoid sleeping in atomic context - netlabel: fix an uninitialized variable warning added in -rc4 Previous release - regressions: - vsock: forward all packets to the host when no H2G is registered, un-breaking AWS Nitro Enclaves - net: Exempt multicast addresses from five-second neighbor lifetime requirement, decreasing the chances neighbor tables fill up - net/tls: fix corrupted data in recvmsg - qed: fix ILT configuration of SRC block - can: m_can: process interrupt only when not runtime suspended Previous release - always broken: - page_frag: Recover from memory pressure by not recycling pages allocating from the reserves - strncpy_from_user: Mask out bytes after NUL terminator - ip_tunnels: Set tunnel option flag only when tunnel metadata is present, always setting it confuses Open vSwitch - bpf, sockmap: - Fix partial copy_page_to_iter so progress can still be made - Fix socket memory accounting and obeying SO_RCVBUF - net: Have netpoll bring-up DSA management interface - net: bridge: add missing counters to ndo_get_stats64 callback - tcp: brr: only postpone PROBE_RTT if RTT is < current min_rtt - enetc: Workaround MDIO register access HW bug - net/ncsi: move netlink family registration to a subsystem init, instead of tying it to driver probe - net: ftgmac100: unregister NC-SI when removing driver to avoid crash - lan743x: prevent interrupt storm on open - lan743x: fix freeing skbs in the wrong context - net/mlx5e: Fix socket refcount leak on kTLS RX resync - net: dsa: mv88e6xxx: Avoid VLAN database corruption on 6097 - fix 21 unset return codes and other mistakes on error paths, mostly detected by the Hulk Robot Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+226AACgkQMUZtbf5S IruE1w/+JX3CqJwGIqyzyhwVshNaKxmX9gAOMJzkckjEohn8932zPaNq7kbmNYqt 5QsJoou3cXjFeoIEAkQA5fqR4stTZpZMnLO+7JnxxQ0vb2YBN+tIGQRNCnmd1Q0h u9gb5+5AdORdlmk3E7oC8v50dzQRfboJXLEEZTo2uGJwUgLlEAiqTSV2w4YDHMhL JtgtWA/fraL0CUc2WMoxuimg9NirbRuMijsU6+d/yExzznDpdoho/qsxL+Odu1NF hSdaKirA8B8ml0pOd/b4mj+fm4+lKyXZBfSyLx4Ki1TqluEMLzDp7gQPRnU6yyJm AOu4zsKxx6qitOX2qLQCNlEpkQp6LA2N2Zb1orliUV3Bsq2DJRhU35FgLcghtdRP GTRSdKHr2BvMScOZ7dQo8l4TqVc3e/khSZDRGdvpsM275Dt0JyS/l7yAWxunPqMb +/483/s75OuBRO57ULLJ/hR02TG37g/Jv5sI0sG/7oDpGfnulinQX+fxy9izyTEM KYl0mAPSqhb6RcjE0YXWG0rhJN6FSvc/lwPQHjq8wPSkwEdD/FTb6/eYqbXDi1ld UTYhFpkh1PQrwct14eSScMeJqTsNKvG0VV39/uZLZCzcqa3yOY5+oTzwaCFlMsy3 a5yGGxqoh7/FTM8t1ml21is9uZ31LAQEnNTMPv69pZPwAv5G5yE= =SRwI -----END PGP SIGNATURE----- Merge tag 'net-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc5, including fixes from the WiFi (mac80211), can and bpf (including the strncpy_from_user fix). Current release - regressions: - mac80211: fix memory leak of filtered powersave frames - mac80211: free sta in sta_info_insert_finish() on errors to avoid sleeping in atomic context - netlabel: fix an uninitialized variable warning added in -rc4 Previous release - regressions: - vsock: forward all packets to the host when no H2G is registered, un-breaking AWS Nitro Enclaves - net: Exempt multicast addresses from five-second neighbor lifetime requirement, decreasing the chances neighbor tables fill up - net/tls: fix corrupted data in recvmsg - qed: fix ILT configuration of SRC block - can: m_can: process interrupt only when not runtime suspended Previous release - always broken: - page_frag: Recover from memory pressure by not recycling pages allocating from the reserves - strncpy_from_user: Mask out bytes after NUL terminator - ip_tunnels: Set tunnel option flag only when tunnel metadata is present, always setting it confuses Open vSwitch - bpf, sockmap: - Fix partial copy_page_to_iter so progress can still be made - Fix socket memory accounting and obeying SO_RCVBUF - net: Have netpoll bring-up DSA management interface - net: bridge: add missing counters to ndo_get_stats64 callback - tcp: brr: only postpone PROBE_RTT if RTT is < current min_rtt - enetc: Workaround MDIO register access HW bug - net/ncsi: move netlink family registration to a subsystem init, instead of tying it to driver probe - net: ftgmac100: unregister NC-SI when removing driver to avoid crash - lan743x: - prevent interrupt storm on open - fix freeing skbs in the wrong context - net/mlx5e: Fix socket refcount leak on kTLS RX resync - net: dsa: mv88e6xxx: Avoid VLAN database corruption on 6097 - fix 21 unset return codes and other mistakes on error paths, mostly detected by the Hulk Robot" * tag 'net-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (115 commits) fail_function: Remove a redundant mutex unlock selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL lib/strncpy_from_user.c: Mask out bytes after NUL terminator. net/smc: fix direct access to ib_gid_addr->ndev in smc_ib_determine_gid() net/smc: fix matching of existing link groups ipv6: Remove dependency of ipv6_frag_thdr_truncated on ipv6 module libbpf: Fix VERSIONED_SYM_COUNT number parsing net/mlx4_core: Fix init_hca fields offset atm: nicstar: Unmap DMA on send error page_frag: Recover from memory pressure net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset mlxsw: core: Use variable timeout for EMAD retries mlxsw: Fix firmware flashing net: Have netpoll bring-up DSA management interface atl1e: fix error return code in atl1e_probe() atl1c: fix error return code in atl1c_probe() ah6: fix error return code in ah6_input() net: usb: qmi_wwan: Set DTR quirk for MR400 can: m_can: process interrupt only when not runtime suspended can: flexcan: flexcan_chip_start(): fix erroneous flexcan_transceiver_enable() during bus-off recovery ...	2020-11-19 13:33:16 -08:00
Linus Torvalds	3be28e93cd	RDMA 5.10 third rc pull request A collection of error case bug fixes - Improper nesting of spinlock types in cm - Missing error codes and kfree() - Ensure dma_virt_ops users have the right kconfig symbols to work properly - Compilation failure of tools/testing -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl+2vPwACgkQOG33FX4g mxppSg//Z8+BoRgv26dqbug8lm9lvIeo3LprpZi+UXrR4ns2503bGN0TB/cgLjml 5eMrFN/Wgg8ntB15nOTqcWx+F3vCG+Jli4pKxfZM45VDauB4MYUS/npQRaxUgJuJ vzidl0j4atsa4OwYFn8xEKPrtp4/e/c+uIXtHgnwI+HYGhCTSdMPJbrd7Fg6R0ls wyGXXO5X5naLoYUq8NXbGmmEm/EFKqePGKqbdZSUkIY/sPRyvdTqt6n5k3rt8/i+ 38QohnijagXzc9mOv3jvUnBL6K9nRmmMIetkZdYiCymRjLh/kYEOBlcwqAjroE88 1xDOk4O+zpw5FAlJH2UQItTWApUBhiUMQSY67/tbVCzgUzyQrDCtS11za8PexcIM B6RcXBdlyQU1Q+qwzppP9kGY0MD9XVez0ZSGHXuLty/BKpeEfGcr16p/eYOT/DHi 0sogMoRL10a9ppOsUziIT4BYigW1INw1KUCfz2fUAEnvytoxhnhbAb4UuCoEpx1a SWRdfpsdNOGqcFL6VGJ9GBUi4Qh5MSzcKcGg9AEQC/0L/aH3QVM53/Qaqy1xH1W8 ghHOr2k0xWuZ6KDDz8t/VdTIYg0hqUosNjEwiLqcSVE6eUzDrje7cO/fgQHo98ak fcgH+lDT6Nee8dtXYPkqXsKIiLYKowKAFwxL66HDJr2Q6lTRIDY= =oIS7 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "The last two weeks have been quiet here, just the usual smattering of long standing bug fixes. A collection of error case bug fixes: - Improper nesting of spinlock types in cm - Missing error codes and kfree() - Ensure dma_virt_ops users have the right kconfig symbols to work properly - Compilation failure of tools/testing" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: tools/testing/scatterlist: Fix test to compile and run IB/hfi1: Fix error return code in hfi1_init_dd() RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs RDMA/pvrdma: Fix missing kfree() in pvrdma_register_device() RDMA/cm: Make the local_id_table xarray non-irq	2020-11-19 13:01:53 -08:00
Jakub Kicinski	e6ea60bac1	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== 1) libbpf should not attempt to load unused subprogs, from Andrii. 2) Make strncpy_from_user() mask out bytes after NUL terminator, from Daniel. 3) Relax return code check for subprograms in the BPF verifier, from Dmitrii. 4) Fix several sockmap issues, from John. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: fail_function: Remove a redundant mutex unlock selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL lib/strncpy_from_user.c: Mask out bytes after NUL terminator. libbpf: Fix VERSIONED_SYM_COUNT number parsing bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list bpf, sockmap: Handle memory acct if skb_verdict prog redirects to self bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self bpf, sockmap: Use truesize with sk_rmem_schedule() bpf, sockmap: Ensure SO_RCVBUF memory is observed on ingress redirect bpf, sockmap: Fix partial copy_page_to_iter so progress can still be made selftests/bpf: Fix error return code in run_getsockopt_test() bpf: Relax return code check for subprograms tools, bpftool: Add missing close before bpftool net attach exit MAINTAINERS/bpf: Update Andrii's entry. selftests/bpf: Fix unused attribute usage in subprogs_unused test bpf: Fix unsigned 'datasec_id' compared with zero in check_pseudo_btf_id bpf: Fix passing zero to PTR_ERR() in bpf_btf_printf_prepare libbpf: Don't attempt to load unused subprog as an entry-point BPF program ==================== Link: https://lore.kernel.org/r/20201119200721.288-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-19 12:26:10 -08:00
Luo Meng	2801a5da5b	fail_function: Remove a redundant mutex unlock Fix a mutex_unlock() issue where before copy_from_user() is not called mutex_locked. Fixes: `4b1a29a7f5` ("error-injection: Support fault injection framework") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Luo Meng <luomeng12@huawei.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/160570737118.263807.8358435412898356284.stgit@devnote2	2020-11-19 11:58:16 -08:00
Alexei Starovoitov	14d6d86c21	Merge branch 'Fix bpf_probe_read_user_str() overcopying' Daniel Xu says: ==================== `6ae08ae3de` ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers") introduced a subtle bug where bpf_probe_read_user_str() would potentially copy a few extra bytes after the NUL terminator. This issue is particularly nefarious when strings are used as map keys, as seemingly identical strings can occupy multiple entries in a map. This patchset fixes the issue and introduces a selftest to prevent future regressions. v6 -> v7: * Add comments v5 -> v6: * zero-pad up to sizeof(unsigned long) after NUL v4 -> v5: * don't read potentially uninitialized memory v3 -> v4: * directly pass userspace pointer to prog * test more strings of different length v2 -> v3: * set pid filter before attaching prog in selftest * use long instead of int as bpf_probe_read_user_str() retval * style changes v1 -> v2: * add Fixes: tag * add selftest ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-11-19 11:58:15 -08:00
Daniel Xu	c8a36aedf3	selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL Previously, bpf_probe_read_user_str() could potentially overcopy the trailing bytes after the NUL due to how do_strncpy_from_user() does the copy in long-sized strides. The issue has been fixed in the previous commit. This commit adds a selftest that ensures we don't regress bpf_probe_read_user_str() again. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/4d977508fab4ec5b7b574b85bdf8b398868b6ee9.1605642949.git.dxu@dxuuu.xyz	2020-11-19 11:58:15 -08:00
Daniel Xu	6fa6d28051	lib/strncpy_from_user.c: Mask out bytes after NUL terminator. do_strncpy_from_user() may copy some extra bytes after the NUL terminator into the destination buffer. This usually does not matter for normal string operations. However, when BPF programs key BPF maps with strings, this matters a lot. A BPF program may read strings from user memory by calling the bpf_probe_read_user_str() helper which eventually calls do_strncpy_from_user(). The program can then key a map with the destination buffer. BPF map keys are fixed-width and string-agnostic, meaning that map keys are treated as a set of bytes. The issue is when do_strncpy_from_user() overcopies bytes after the NUL terminator, it can result in seemingly identical strings occupying multiple slots in a BPF map. This behavior is subtle and totally unexpected by the user. This commit masks out the bytes following the NUL while preserving long-sized stride in the fast path. Fixes: `6ae08ae3de` ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/21efc982b3e9f2f7b0379eed642294caaa0c27a7.1605642949.git.dxu@dxuuu.xyz	2020-11-19 11:56:16 -08:00
Linus Torvalds	dda3f4252e	powerpc fixes for CVE-2020-4788 From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl+2aqETHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgG+hD/4njSFct2amqWfqDYR9b2OykWmnMQXn geookk5SbItQF7vh1q2SVA6r43s5ZAxgD5fezx4LgG6p3QU39+Tr0RhzUUHWMPDV UNGZK6x/N/GSYeq0bqvMHmVwS0FDjPE8nOtA8Hn2T9mUUsu9G0okpgYPLnEu6rb1 gIyS35zlLBh9obi3MfJzyln/AmCE7hdonKRtLAxvGiERJAyfAG757lrdjrwavyHy mwz+XPl5PF88jfO5cbcZT9gNHmZZPzVsOVwNcstCh2FcwuePv9dWe1pxsBxxKqP5 UXceXPcKM7VlRNmehimq7q/hfbget4RJGGKYPNXeKHOo6yfy7lJPiQV4h+5z2pSs SPP2fQQPq0aubmcO23CXFtZl4WRHQ4pax6opepnpIfC2vZ0HLXJtPrhMKcbFJNTo qPis6HWQPpIuI6l4MJfs+YO9ETxCR31Yd28qFAfPFoHlnQZTfx6NPhw8HKxTbSh2 Svr4X6Y14j3UsQgLTCArCXWAG/hlfRwxDZJ4AvR9EU0HJGDyZ45Y+LTD1N8bbsny zcYfPqWGPIanLcNPNFYIQwDZo7ff08KdmngUvf/Q9om60mP1hsPJMHf6VhPXj4fC 2TZ11fORssSlBSNtIkFkbjEG+aiWtWnz3fN3uSyT50rgGwtDHJzVzLiUWHlZKcxW X73YdxuT8fqQwg== =Yibq -----END PGP SIGNATURE----- Merge tag 'powerpc-cve-2020-4788' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes for CVE-2020-4788. From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups" * tag 'powerpc-cve-2020-4788' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: rename pnv\|pseries_setup_rfi_flush to _setup_security_mitigations selftests/powerpc: refactor entry and rfi_flush tests selftests/powerpc: entry flush test powerpc: Only include kup-radix.h for 64-bit Book3S powerpc/64s: flush L1D after user accesses powerpc/64s: flush L1D on kernel entry selftests/powerpc: rfi_flush: disable entry flush if present	2020-11-19 11:32:31 -08:00
Linus Torvalds	3494d58865	Xtensa fixes for v5.10: - fix placement of cache alias remapping area - disable preemption around cache alias management calls - add missing __user annotation to strncpy_from_user argument -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEK2eFS5jlMn3N6xfYUfnMkfg/oEQFAl+2VswTHGpjbXZia2Jj QGdtYWlsLmNvbQAKCRBR+cyR+D+gRF8kEACGXrZHeIOaw9AXsBznUcXP3yx2o5jO Sff2fe2qR20pxfw9dfS2+r8MlVvyD8fwuTd3LCV8RscM2WGestFFOkqGl+LZH1Za 0Px7cEzZvMWWd8qwrvshEzlAb2swYObmnXUj81A59FXSihbyuB+C1kcehXIQfM8Q KTEswWkKf+8Zkk39clkz5TYsrlg3l2H+x7CkGps6+bhh8l9CJEePZs7yKsIKA1sK zvIqg3RrLi2DOIAwwfThqI/RGN6qjJbyGo2JZm9Bcg2AYWjNNDvE41nJiY8aQNc9 EYGNSPTznJBEpWtIAAxThRwVKVe4BTzwZZq6Ri19lJBKPqJi165o/vLqeoo4HfZp UYQ+OhX/ga6KUZybR/srHU/Cr3aLecqnYIyHput1tZW/ZNX9hdThG6z+D4vScMir LwTIqxBGMJ8P+jklfR/miiR4l927aM76IJJOtAdsBRMQVCGVs/f9CO1W09YaNa6z VD6Pd5n0Y8W/c/Sr/tz9JuAHVOvyVhwZdy8jBlDn+WHKuNByzS/XVInbdYf6ojXd MgYIC4VcIEEkB9EcWZqsKvo+ZNT4e/PiyaVC8nX4IEh0OB3wmUDHm4dlrLIwmjGj WCrgizQm3h2DKWIntGSJOnBWAsQI/c289Xm8xVbvZ3eb7tq3PXuve1+XwPANIlWk TnGD1jSA+ISmBQ== =T/hd -----END PGP SIGNATURE----- Merge tag 'xtensa-20201119' of git://github.com/jcmvbkbc/linux-xtensa Pull xtensa fixes from Max Filippov: - fix placement of cache alias remapping area - disable preemption around cache alias management calls - add missing __user annotation to strncpy_from_user argument * tag 'xtensa-20201119' of git://github.com/jcmvbkbc/linux-xtensa: xtensa: uaccess: Add missing __user to strncpy_from_user() prototype xtensa: disable preemption around cache alias management calls xtensa: fix TLBTEMP area placement	2020-11-19 11:22:33 -08:00

1 2 3 4 5 ...

967694 Commits All Branches Search

967694 Commits

All Branches