linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Subash Abhinov Kasiviswanathan	9deb441c11	net: ipv6: Generate random IID for addresses on RAWIP devices RAWIP devices such as rmnet do not have a hardware address and instead require the kernel to generate a random IID for the IPv6 addresses. Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-05 10:16:25 -04:00
Yousuk Seung	f4c9f85f3b	tcp: refactor tcp_ecn_check_ce to remove sk type cast Refactor tcp_ecn_check_ce and __tcp_ecn_check_ce to accept struct sock* instead of tcp_sock* to clean up type casts. This is a pure refactor patch. Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-05 10:09:27 -04:00
David Ahern	f7225172f2	net/ipv6: prevent use after free in ip6_route_mpath_notify syzbot reported a use-after-free: BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180 Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555 CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432 ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180 ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391 ... Allocated by task 4555: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 dst_alloc+0xbb/0x1d0 net/core/dst.c:104 __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361 ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376 ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834 ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391 ... Freed by task 4555: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 dst_destroy+0x267/0x3c0 net/core/dst.c:140 dst_release_immediate+0x71/0x9e net/core/dst.c:205 fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305 __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011 ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391 ... The problem is that rt_last can point to a deleted route if the insert fails. One reproducer is to insert a route and then add a multipath route that has a duplicate nexthop.e.g,: $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2 $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2 Fix by not setting rt_last until the it is verified the insert succeeded. Fixes: `3b1137fe74` ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH") Cc: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-05 09:56:04 -04:00
Kun Yi	69e2ecccd0	net: phy: broadcom: Enable 125 MHz clock on LED4 pin for BCM54612E by default. BCM54612E have 4 multi-functional LED pins that can be configured through register setting; the LED4 pin can be configured to a 125MHz reference clock output by setting the spare register. Since the dedicated CLK125 reference clock pin is not brought out on the 48-Pin MLP, the LED4 pin is the only pin to provide such function in this package, and therefore it is beneficial to just enable the reference clock by default. Signed-off-by: Kun Yi <kunyi@google.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-05 09:43:09 -04:00
Guillaume Nault	3d609342cc	l2tp: fix refcount leakage on PPPoL2TP sockets Commit `d02ba2a611` ("l2tp: fix race in pppol2tp_release with session object destroy") tried to fix a race condition where a PPPoL2TP socket would disappear while the L2TP session was still using it. However, it missed the root issue which is that an L2TP session may accept to be reconnected if its associated socket has entered the release process. The tentative fix makes the session hold the socket it is connected to. That saves the kernel from crashing, but introduces refcount leakage, preventing the socket from completing the release process. Once stalled, everything the socket depends on can't be released anymore, including the L2TP session and the l2tp_ppp module. The root issue is that, when releasing a connected PPPoL2TP socket, the session's ->sk pointer (RCU-protected) is reset to NULL and we have to wait for a grace period before destroying the socket. The socket drops the session in its ->sk_destruct callback function, so the session will exist until the last reference on the socket is dropped. Therefore, there is a time frame where pppol2tp_connect() may accept reconnecting a session, as it only checks ->sk to figure out if the session is connected. This time frame is shortened by the fact that pppol2tp_release() calls l2tp_session_delete(), making the session unreachable before resetting ->sk. However, pppol2tp_connect() may grab the session before it gets unhashed by l2tp_session_delete(), but it may test ->sk after the later got reset. The race is not so hard to trigger and syzbot found a pretty reliable reproducer: https://syzkaller.appspot.com/bug?id=418578d2a4389074524e04d641eacb091961b2cf Before `d02ba2a611`, another race could let pppol2tp_release() overwrite the ->__sk pointer of an L2TP session, thus tricking pppol2tp_put_sk() into calling sock_put() on a socket that is different than the one for which pppol2tp_release() was originally called. To get there, we had to trigger the race described above, therefore having one PPPoL2TP socket being released, while the session it is connected to is reconnecting to a different PPPoL2TP socket. When releasing this new socket fast enough, pppol2tp_release() overwrites the session's ->__sk pointer with the address of the new socket, before the first pppol2tp_put_sk() call gets scheduled. Then the pppol2tp_put_sk() call invoked by the original socket will sock_put() the new socket, potentially dropping its last reference. When the second pppol2tp_put_sk() finally runs, its socket has already been freed. With `d02ba2a611`, the session takes a reference on both sockets. Furthermore, the session's ->sk pointer is reset in the pppol2tp_session_close() callback function rather than in pppol2tp_release(). Therefore, ->__sk can't be overwritten and pppol2tp_put_sk() is called only once (l2tp_session_delete() will only run pppol2tp_session_close() once, to protect the session against concurrent deletion requests). Now pppol2tp_put_sk() will properly sock_put() the original socket, but the new socket will remain, as l2tp_session_delete() prevented the release process from completing. Here, we don't depend on the ->__sk race to trigger the bug. Getting into the pppol2tp_connect() race is enough to leak the reference, no matter when new socket is released. So it all boils down to pppol2tp_connect() failing to realise that the session has already been connected. This patch drops the unneeded extra reference counting (mostly reverting `d02ba2a611`) and checks that neither ->sk nor ->__sk is set before allowing a session to be connected. Fixes: `d02ba2a611` ("l2tp: fix race in pppol2tp_release with session object destroy") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-05 09:40:18 -04:00
David S. Miller	7a723099be	Merge branch 'net-phy-improve-PM-handling-of-PHY-MDIO' Heiner Kallweit says: ==================== net: phy: improve PM handling of PHY/MDIO Current implementation of MDIO bus PM ops doesn't actually implement bus-specific PM ops but just calls PM ops defined on a device level what doesn't seem to be fully in line with the core PM model. When looking e.g. at __device_suspend() the PM core looks for PM ops of a device in a specific order: 1. device PM domain 2. device type 3. device class 4. device bus I think it has good reason that there's no PM ops on device level. The situation can be improved by modeling PHY's as device type of a MDIO device. If for some other type of MDIO device PM ops are needed, it could be modeled as struct device_type as well. ==================== Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-05 08:50:18 -04:00
Heiner Kallweit	9107c05e2e	net: phy: remove PM ops from MDIO bus Current implementation of MDIO bus PM ops doesn't actually implement bus-specific PM ops but just calls PM ops defined on a device level what doesn't seem to be fully in line with the core PM model. When looking e.g. at __device_suspend() the PM core looks for PM ops of a device in a specific order: 1. device PM domain 2. device type 3. device class 4. device bus I think it has good reason that there's no PM ops on device level. Now that a device type representation of PHY's as special type of MDIO devices was added (only user of MDIO bus PM ops), the MDIO bus PM ops can be removed including member pm of struct mdio_device. If for some other type of MDIO device PM ops are needed, it should be modeled as struct device_type as well. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-05 08:50:16 -04:00
Heiner Kallweit	7f4828ff27	net: phy: add struct device_type representation of a PHY A PHY is a type of MDIO device, so let's model it as struct device_type and place PM ops, attribute groups and release callback on device type level. For this the attribute definitions have to be moved. This change allows us to get rid of the PM ops on a bus level in a second step. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-05 08:50:16 -04:00
David S. Miller	7d840a6065	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2018-06-04 22:23:35 -04:00
David S. Miller	d67b66b45a	Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-06-04 This series contains a smorgasbord of updates to documentation, e1000e, igb, ixgbe, ixgbevf and i40e. Benjamin Poirier fixes a potential kernel crash due to NULL pointer dereference in e1000e. Jeff updates the kernel documentation for e100 and e1000 to correct default values and URLs which were incorrect in the documentation. Also took the time to update these to the new reStructured text format for kernel documentation. Joanna Yurdal fixes a missing PTP transmit timestamp by ensuring that TSICR gets cleared when ICR is cleared. Sergey updates igb to reset all the transmit queues at one time so that we only have to wait once for all the queues to be reset. Alex fixes ixgbevf so that malicious driver detection (MDD) can co-exist with XDP. Emil and Tony extend the RTNL lock to ensure we get the most up-to-date values for the bits and avoid a possible race condition when going down. YueHaibing from Huawei introduces a helper function in ixgbe for operation reads to simplify the code a bit more. Daniel Borkmann adds support for XDP meta data when using build SKB for i40e. Shannon Nelson provides twp fixes for the IPSec code in ixgbe, first is to make sure we do not try to offload the decryption of any incoming packet that is destined for the management engine. The other fix is to resolve a cast problem introduced by a sparse cleanup patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:35:35 -04:00
Xi Wang	f0b964e5e4	net: hns: Fix the process of adding broadcast addresses to tcam If the multicast mask value in device tree is configured not all 0xff, the broadcast mac will be lost from tcam table after the execution of command 'ifconfig up'. The address is appended by hns_ae_start, but will be clear later by hns_nic_set_rx_mode called in dev_open process. This patch fixed it by not use the multicast mask when add a broadcast address. Fixes: `b5996f11ea` ("net: add Hisilicon Network Subsystem basic ethernet support") Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:32:56 -04:00
Vlad Buslov	0e3990356d	net: sched: return error code when tcf proto is not found If requested tcf proto is not found, get and del filter netlink protocol handlers output error message to extack, but do not return actual error code. Add check to return ENOENT when result of tp find function is NULL pointer. Fixes: `c431f89b18` ("net: sched: split tc_ctl_tfilter into three handlers") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:31:44 -04:00
Dan Carpenter	25ea66544b	team: use netdev_features_t instead of u32 This code was introduced in 2011 around the same time that we made netdev_features_t a u64 type. These days a u32 is not big enough to hold all the potential features. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:30:04 -04:00
Dan Carpenter	a746407af1	net_failover: Use netdev_features_t instead of u32 The features mask needs to be a netdev_features_t (u64) because a u32 is not big enough. Fixes: `cfc80d9a11` ("net: Introduce net_failover driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:30:04 -04:00
YueHaibing	ff2e351e19	qed: use dma_zalloc_coherent instead of allocator/memset Use dma_zalloc_coherent instead of dma_alloc_coherent followed by memset 0. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:28:43 -04:00
YueHaibing	1f55c2865c	wan/fsl_ucc_hdlc: use dma_zalloc_coherent instead of allocator/memset Use dma_zalloc_coherent instead of dma_alloc_coherent followed by memset 0. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:27:59 -04:00
David S. Miller	828da43224	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2018-06-04 Here's one last bluetooth-next pull request for the 4.18 kernel: - New USB device IDs for Realtek 8822BE and 8723DE - reset/resume fix for Dell Inspiron 5565 - Fix HCI_UART_INIT_PENDING flag behavior - Fix patching behavior for some ATH3012 models - A few other minor cleanups & fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:22:17 -04:00
Olivier Gayot	bb38ccce88	docs: networking: fix minor typos in various documentation files This patch fixes some typos/misspelling errors in the Documentation/networking files. Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:21:28 -04:00
Maciej Żenczykowski	f396922d86	net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets It is not safe to do so because such sockets are already in the hash tables and changing these options can result in invalidating the tb->fastreuse(port) caching. This can have later far reaching consequences wrt. bind conflict checks which rely on these caches (for optimization purposes). Not to mention that you can currently end up with two identical non-reuseport listening sockets bound to the same local ip:port by clearing reuseport on them after they've already both been bound. There is unfortunately no EISBOUND error or anything similar, and EISCONN seems to be misleading for a bound-but-not-connected socket, so use EUCLEAN 'Structure needs cleaning' which AFAICT is the closest you can get to meaning 'socket in bad state'. (although perhaps EINVAL wouldn't be a bad choice either?) This does unfortunately run the risk of breaking buggy userspace programs... Signed-off-by: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Change-Id: I77c2b3429b2fdf42671eee0fa7a8ba721c94963b Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:14:22 -04:00
Maciej Żenczykowski	79e9fed460	net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean to an integer. It now takes the values 0, 1 and 2, where 0 and 1 behave as before, while 2 enables timewait socket reuse only for sockets that we can prove are loopback connections: ie. bound to 'lo' interface or where one of source or destination IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1. This enables quicker reuse of ephemeral ports for loopback connections - where tcp_tw_reuse is 100% safe from a protocol perspective (this assumes no artificially induced packet loss on 'lo'). This also makes estblishing many loopback connections much faster (allocating ports out of the first half of the ephemeral port range is significantly faster, then allocating from the second half) Without this change in a 32K ephemeral port space my sample program (it just establishes and closes [::1]:ephemeral -> [::1]:server_port connections in a tight loop) fails after 32765 connections in 24 seconds. With it enabled 50000 connections only take 4.7 seconds. This is particularly problematic for IPv6 where we only have one local address and cannot play tricks with varying source IP from 127.0.0.0/8 pool. Signed-off-by: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Wei Wang <weiwan@google.com> Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:13:35 -04:00
Yuval Bason	39dbc646fd	qed: Add srq core support for RoCE and iWARP This patch adds support for configuring SRQ and provides the necessary APIs for rdma upper layer driver (qedr) to enable the SRQ feature. Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: Yuval Bason <yuval.bason@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:09:54 -04:00
David S. Miller	7a9ee41b83	Merge branch 'bnx2-warnings' Varsha Rao says: ==================== net: bnx2: Fix checkpatch and clang warnings This patchset fixes NULL comparison and extra parentheses, checkpatch and clang warnings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:07:28 -04:00
Varsha Rao	b8aac410b7	net: ethernet: bnx2: Replace NULL comparison This patch fixes the checkpatch issue of NULL comparison. Replace x == NULL with !x, by using the following coccinelle script: @disable is_null@ expression e; @@ -e==NULL +!e Signed-off-by: Varsha Rao <rvarsha016@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:07:27 -04:00
Varsha Rao	6dc5aa2123	net: ethernet: bnx2: Remove extra parentheses The following coccinelle script removes extra parentheses to fix the clang warning of extraneous parentheses. @disable paren@ identifier i; expression e; statement s; @@ if ( -(i == e) +i == e ) s Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Varsha Rao <rvarsha016@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:07:27 -04:00
YueHaibing	13ce3bc9c1	net: gemini: fix spelling mistake: "it" -> "is" Trivial fix to spelling mistake in gemini dev_warn message Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:05:39 -04:00
Paul Blakey	f6521c587a	cls_flower: Fix comparing of old filter mask with new filter We incorrectly compare the mask and the result is that we can't modify an already existing rule. Fix that by comparing correctly. Fixes: `05cd271fd6` ("cls_flower: Support multiple masks per priority") Reported-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:03:37 -04:00
Paul Blakey	de9dc650f0	cls_flower: Fix missing free of rhashtable When destroying the instance, destroy the head rhashtable. Fixes: `05cd271fd6` ("cls_flower: Support multiple masks per priority") Reported-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:03:37 -04:00
Randy Dunlap	1f4c741321	net: skbuff.h: drop unneeded <linux/slab.h> <linux/skbuff.h> does not use nor need <linux/slab.h>, so drop this header file from skbuff.h. <linux/skbuff.h> is currently #included in around 1200 C source and header files, making it the 31st most-used header file. Build tested [allmodconfig] on 20 arch-es. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 17:02:06 -04:00
YueHaibing	40434a670f	net: chelsio: Use zeroing memory allocator instead of allocator/memset Use dma_zalloc_coherent for allocating zeroed memory and remove unnecessary memset function. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 16:07:30 -04:00
David Howells	1a025028d4	rxrpc: Fix handling of call quietly cancelled out on server Sometimes an in-progress call will stop responding on the fileserver when the fileserver quietly cancels the call with an internally marked abort (RX_CALL_DEAD), without sending an ABORT to the client. This causes the client's call to eventually expire from lack of incoming packets directed its way, which currently leads to it being cancelled locally with ETIME. Note that it's not currently clear as to why this happens as it's really hard to reproduce. The rotation policy implement by kAFS, however, doesn't differentiate between ETIME meaning we didn't get any response from the server and ETIME meaning the call got cancelled mid-flow. The latter leads to an oops when fetching data as the rotation partially resets the afs_read descriptor, which can result in a cleared page pointer being dereferenced because that page has already been filled. Handle this by the following means: (1) Set a flag on a call when we receive a packet for it. (2) Store the highest packet serial number so far received for a call (bearing in mind this may wrap). (3) If, when the "not received anything recently" timeout expires on a call, we've received at least one packet for a call and the connection as a whole has received packets more recently than that call, then cancel the call locally with ECONNRESET rather than ETIME. This indicates that the call was definitely in progress on the server. (4) In kAFS, if the rotation algorithm sees ECONNRESET rather than ETIME, don't try the next server, but rather abort the call. This avoids the oops as we don't try to reuse the afs_read struct. Rather, as-yet ungotten pages will be reread at a later data. Also: (5) Add an rxrpc tracepoint to log detection of the call being reset. Without this, I occasionally see an oops like the following: general protection fault: 0000 [#1] SMP PTI ... RIP: 0010:_copy_to_iter+0x204/0x310 RSP: 0018:ffff8800cae0f828 EFLAGS: 00010206 RAX: 0000000000000560 RBX: 0000000000000560 RCX: 0000000000000560 RDX: ffff8800cae0f968 RSI: ffff8800d58b3312 RDI: 0005080000000000 RBP: ffff8800cae0f968 R08: 0000000000000560 R09: ffff8800ca00f400 R10: ffff8800c36f28d4 R11: 00000000000008c4 R12: ffff8800cae0f958 R13: 0000000000000560 R14: ffff8800d58b3312 R15: 0000000000000560 FS: 00007fdaef108080(0000) GS:ffff8800ca680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb28a8fa000 CR3: 00000000d2a76002 CR4: 00000000001606e0 Call Trace: skb_copy_datagram_iter+0x14e/0x289 rxrpc_recvmsg_data.isra.0+0x6f3/0xf68 ? trace_buffer_unlock_commit_regs+0x4f/0x89 rxrpc_kernel_recv_data+0x149/0x421 afs_extract_data+0x1e0/0x798 ? afs_wait_for_call_to_complete+0xc9/0x52e afs_deliver_fs_fetch_data+0x33a/0x5ab afs_deliver_to_call+0x1ee/0x5e0 ? afs_wait_for_call_to_complete+0xc9/0x52e afs_wait_for_call_to_complete+0x12b/0x52e ? wake_up_q+0x54/0x54 afs_make_call+0x287/0x462 ? afs_fs_fetch_data+0x3e6/0x3ed ? rcu_read_lock_sched_held+0x5d/0x63 afs_fs_fetch_data+0x3e6/0x3ed afs_fetch_data+0xbb/0x14a afs_readpages+0x317/0x40d __do_page_cache_readahead+0x203/0x2ba ? ondemand_readahead+0x3a7/0x3c1 ondemand_readahead+0x3a7/0x3c1 generic_file_buffered_read+0x18b/0x62f __vfs_read+0xdb/0xfe vfs_read+0xb2/0x137 ksys_read+0x50/0x8c do_syscall_64+0x7d/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Note the weird value in RDI which is a result of trying to kmap() a NULL page pointer. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 16:06:26 -04:00
Chas Williams	4e24f2dd51	Allow ethtool to change tun link settings Let user space set whatever it would like to advertise for the tun interface. Preserve the existing defaults. Signed-off-by: Chas Williams <3chas3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 16:05:16 -04:00
David S. Miller	4cd328f839	Merge branch 'sh_eth-fix-and-clean-up-sh_eth_soft_swap' Sergei Shtylyov says: ==================== sh_eth: fix & clean up sh_eth_soft_swap() Here's a set of 3 patches against DaveM's 'net-next.git' repo. First one fixes an old buffer endiannes issue (luckily, the ARM SoCs are smart enough to not actually care) plus couple clean ups around sh_eth_soft_swap()... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 15:23:26 -04:00
Sergei Shtylyov	1100149a23	sh_eth: use DIV_ROUND_UP() in sh_eth_soft_swap() When initializing 'maxp' in sh_eth_soft_swap(), the buffer length needs to be rounded up -- that's just asking for DIV_ROUND_UP()! Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 15:23:25 -04:00
Sergei Shtylyov	bb2fa4e847	sh_eth: uninline sh_eth_soft_swap() sh_eth_tsu_soft_swap() is called twice by the driver, remove inline and move that function from the header to the driver itself to let gcc decide whether to expand it inline or not... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 15:23:25 -04:00
Sergei Shtylyov	232b6743e4	sh_eth: make sh_eth_soft_swap() work on ARM Browsing thru the driver disassembly, I noticed that ARM gcc generated no code whatsoever for sh_eth_soft_swap() while building a little-endian kernel -- apparently __LITTLE_ENDIAN__ was not being #define'd, however it got implicitly #define'd when building with the SH gcc (I could only find the explicit #define __LITTLE_ENDIAN that was #include'd when building a little-endian kernel). Luckily, the Ether controller only doing big- endian DMA is encountered on the early SH771x SoCs only and all ARM SoCs implement EDMR.DE and thus set 'sh_eth_cpu_data::hw_swap'. But anyway, we need to fix the #ifdef inside sh_eth_soft_swap() to something that would work on all architectures... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 15:23:24 -04:00
Shannon Nelson	9a75fa5c16	ixgbe: fix broken ipsec Rx with proper cast on spi Fix up a cast problem introduced by a sparse cleanup patch. This fixes a problem where the encrypted packets were not recognized on Rx and subsequently dropped. Fixes: `9cfbfa701b` ("ixgbe: cleanup sparse warnings") Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-06-04 10:31:22 -07:00
Shannon Nelson	2a8a15526d	ixgbe: check ipsec ip addr against mgmt filters Make sure we don't try to offload the decryption of an incoming packet that should get delivered to the management engine. This is a corner case that will likely be very seldom seen, but could really confuse someone if they were to hit it. Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-06-04 10:29:32 -07:00
David S. Miller	20677108c5	Merge branch 'mlxsw-Fixes-in-offloading-of-mirror-to-gretap' Ido Schimmel says: ==================== mlxsw: Fixes in offloading of mirror-to-gretap Petr says: These two patches fix issues in offloading of mirror-to-gretap when bridge is present in the underlay. In patch #1, reconsideration of SPAN configuration is not done right at the point that SWITCHDEV_OBJ_ID_PORT_VLAN deletion notification is distributed, but is postponed, because the notifications are actually distributed before the relevant change is implemented in the bridge. In patch #2, a problem in configuring VLAN tagging in situations when a VLAN device is on top of an 802.1Q bridge whose egress port is marked as "egress untagged". In that case, mlxsw would neglect to suppress the tagging implicitly assumed after the VLAN device was seen. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 13:27:58 -04:00
Petr Machata	1fc68bb7c3	mlxsw: spectrum_span: Suppress VLAN on BRIDGE_VLAN_INFO_UNTAGGED When offloading mirroring to gretap or ip6gretap netdevices, an 802.1q bridge is one of the soft devices permissible in the underlay when resolving the packet path. After the packet path is resolved to a particular bridge egress device, flags on packet VLAN determine whether the egressed packet should be tagged. The current logic however only ever sets the VLAN tag, never suppresses it. Thus if there's a VLAN netdevice above the bridge that determines the packet VLAN, that VLAN is never unset, and mirroring is configured with VLAN tagging. Fix by setting the packet VLAN on both branches: set to zero (for unset) when BRIDGE_VLAN_INFO_UNTAGGED, copy the resolved VLAN (e.g. from bridge PVID) otherwise. Fixes: `946a11e740` ("mlxsw: spectrum_span: Allow bridge for gretap mirror") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 13:27:57 -04:00
Petr Machata	f07ff01406	mlxsw: spectrum_switchdev: Postpone respin on object deletion VLAN deletion notifications are emitted before the relevant change is projected to bridge configuration. Thus, like with VLAN addition, schedule SPAN respin for later. Fixes: `c520bc6986` ("mlxsw: Respin SPAN on switchdev events") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 13:27:56 -04:00
Tony Nguyen	88adce4ea8	ixgbe: fix possible race in reset subtask Similar to ixgbevf, the same possibility for race exists. Extend the RTNL lock in ixgbe_reset_subtask() to protect the state bits; this is to make sure that we get the most up-to-date values for the bits and avoid a possible race when going down. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-06-04 10:25:15 -07:00
Daniel Borkmann	cc5b114dcf	bpf, i40e: add meta data support Add support for XDP meta data when using build skb variant of the i40e driver. Implementation is analogous to the existing ixgbe and ixgbevf support for meta data from `366a88fe2f` ("bpf, ixgbe: add meta data support") and `be8333322e` ("ixgbevf: Add support for meta data"). With the build skb variant we get 192 bytes of extra headroom which can be used for encaps or meta data. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-06-04 10:23:58 -07:00
Michal Kubecek	fa1be7e01e	ipv6: omit traffic class when calculating flow hash Some of the code paths calculating flow hash for IPv6 use flowlabel member of struct flowi6 which, despite its name, encodes both flow label and traffic class. If traffic class changes within a TCP connection (as e.g. ssh does), ECMP route can switch between path. It's also inconsistent with other code paths where ip6_flowlabel() (returning only flow label) is used to feed the key. Use only flow label everywhere, including one place where hash key is set using ip6_flowinfo(). Fixes: `51ebd31815` ("ipv6: add support of equal cost multipath (ECMP)") Fixes: `f70ea018da` ("net: Add functions to get skb->hash based on flow structures") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 13:21:18 -04:00
YueHaibing	e9c721837d	ixgbe: introduce a helper to simplify code ixgbe_dbg_reg_ops_read and ixgbe_dbg_netdev_ops_read copy-pasting the same code except for ixgbe_dbg_netdev_ops_buf/ixgbe_dbg_reg_ops_buf, so introduce a helper ixgbe_dbg_common_ops_read to remove redundant code. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-06-04 10:21:13 -07:00
David S. Miller	a925ab48da	Revert "ipv6: omit traffic class when calculating flow hash" This reverts commit `87ae68c8b4`. Applied the wrong version of this fix, correct version coming up. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 13:20:38 -04:00
Emil Tantilov	7d6446db1b	ixgbevf: fix possible race in the reset subtask Extend the RTNL lock in ixgbevf_reset_subtask() to protect the state bits check in addition to the call to ixgbevf_reinit_locked(). This is to make sure that we get the most up-to-date values for the bits and avoid a possible race when going down. Suggested-by: Zhiping du <zhipingdu@tencent.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-06-04 10:19:32 -07:00
Michal Kubecek	87ae68c8b4	ipv6: omit traffic class when calculating flow hash Some of the code paths calculating flow hash for IPv6 use flowlabel member of struct flowi6 which, despite its name, encodes both flow label and traffic class. If traffic class changes within a TCP connection (as e.g. ssh does), ECMP route can switch between path. It's also incosistent with other code paths where ip6_flowlabel() (returning only flow label) is used to feed the key. Use only flow label everywhere, including one place where hash key is set using ip6_flowinfo(). Fixes: `51ebd31815` ("ipv6: add support of equal cost multipath (ECMP)") Fixes: `f70ea018da` ("net: Add functions to get skb->hash based on flow structures") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-04 13:18:35 -04:00
Alexander Duyck	4be87727d4	ixgbevf: Fix coexistence of malicious driver detection with XDP In the case of the VF driver it is supposed to provide a context descriptor that allows us to provide information about the header offsets inside of the frame. However in the case of XDP we don't really have any of that information since the data is minimally processed. As a result we were seeing malicious driver detection (MDD) events being triggered when the PF had that functionality enabled. To address this I have added a bit of new code that will "prime" the XDP ring by providing one context descriptor that assumes the minimal setup of an Ethernet frame which is an L2 header length of 14. With just that we can provide enough information to make the hardware happy so that we don't trigger MDD events. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-06-04 10:17:55 -07:00
Sergey Nemov	06140c793d	igb: Wait 10ms just once after TX queues reset Move 10ms sleep out of function resetting TX queue. Reset all the TX queues in one turn and wait for all of them just once. Use usleep_range() instead of mdelay() in order not to affect transmission on other interfaces. Signed-off-by: Sergey Nemov <sergey.nemov@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-06-04 10:11:14 -07:00
Joanna Yurdal	1ec2297c5c	igb: Clear TSICR interrupts together with ICR Issuing "ip link set up/down" can block TSICR interrupts, what results in missing PTP Tx timestamp and no PPS pulse generation. Problem happens when the link is set up with the TSICR interrupts pending. ICR is cleared before enabling interrupts, while TSICR is not. When all TSICR interrupts are pending at this moment, time_sync interrupt will never be generated. TSICR should be cleared as well. In order to reproduce the issue: 1. Setup linux with IEEE 1588 grandmaster and PPS output enabled 2. Continue setting link up/down with random intervals between commands 3. Wait until PPS is not generated ( only one pulse is generated and PPS dies), and ptp4l complains constantly about Tx timeout. Signed-off-by: Joanna Yurdal <jyu@trackman.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-06-04 09:55:36 -07:00

1 2 3 4 5 ...

756148 Commits All Branches Search

756148 Commits

All Branches