OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Sowmini Varadhan	e33d4f13d2	xfrm: Fix unaligned access to stats in copy_to_user_state() On sparc, deleting established SAs (e.g., by restarting ipsec) results in unaligned access messages via xfrm_del_sa -> km_state_notify -> xfrm_send_state_notify(). Even though struct xfrm_usersa_info is aligned on 8-byte boundaries, netlink attributes are fundamentally only 4 byte aligned, and this cannot be changed for nla_data() that is passed up to userspace. As a result, the put_unaligned() macro needs to be used to set up potentially unaligned fields such as the xfrm_stats in copy_to_user_state() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-10-23 06:49:29 +02:00
Steffen Klassert	c386578f1c	xfrm: Let the flowcache handle its size by default. The xfrm flowcache size is limited by the flowcache limit (4096 * number of online cpus) and the xfrm garbage collector threshold (2 * 32768), whatever is reached first. This means that we can hit the garbage collector limit only on systems with more than 16 cpus. On such systems we simply refuse new allocations if we reach the limit, so new flows are dropped. On syslems with 16 or less cpus, we hit the flowcache limit. In this case, we shrink the flow cache instead of refusing new flows. We increase the xfrm garbage collector threshold to INT_MAX to get the same behaviour, independent of the number of cpus. The xfrm garbage collector threshold can still be set below the flowcache limit to reduce the memory usage of the flowcache. Tested-by: Dan Streetman <dan.streetman@canonical.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-09-29 11:44:16 +02:00
Jesper Dangaard Brouer	8a4683a5e0	net: help compiler generate better code in eth_get_headlen Noticed that the compiler (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)) generated suboptimal assembler code in eth_get_headlen(). This early return coding style is usually not an issue, on super scalar CPUs, but the compiler choose to put the return statement after this very unlikely branch, thus creating larger jump down to the likely code path. Performance wise, I could measure slightly less L1-icache-load-misses and less branch-misses, and an improvement of 1 nanosec with an IP-forwarding use-case with 257 bytes packets with ixgbe (CPU i7-4790K @ 4.00GHz). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:51:15 -07:00
Bendik Rønning Opstad	d2e1339f40	tcp: Fix CWV being too strict on thin streams Application limited streams such as thin streams, that transmit small amounts of payload in relatively few packets per RTT, can be prevented from growing the CWND when in congestion avoidance. This leads to increased sojourn times for data segments in streams that often transmit time-dependent data. Currently, a connection is considered CWND limited only after having successfully transmitted at least one packet with new data, while at the same time failing to transmit some unsent data from the output queue because the CWND is full. Applications that produce small amounts of data may be left in a state where it is never considered to be CWND limited, because all unsent data is successfully transmitted each time an incoming ACK opens up for more data to be transmitted in the send window. Fix by always testing whether the CWND is fully used after successful packet transmissions, such that a connection is considered CWND limited whenever the CWND has been filled. This is the correct behavior as specified in RFC2861 (section 3.1). Cc: Andreas Petlund <apetlund@simula.no> Cc: Carsten Griwodz <griff@simula.no> Cc: Jonas Markussen <jonassm@ifi.uio.no> Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Cc: Mads Johannessen <madsjoh@ifi.uio.no> Signed-off-by: Bendik Rønning Opstad <bro.devel+kernel@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Tested-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:36:30 -07:00
Hariprasad Shenai	5e2a5ebc3f	cxgb4: Add HW timesptamp support for RX Adds support for ethtool get time stamp ioctl, which is used by tcpdump to get the supported time stamp types eg: tcpdump -i eth5 -J Time stamp types for eth5 (use option -j to set): host (Host) adapter_unsynced (Adapter, not synced with system time) Adds support for adapter unsynced mode, by adding SIOCSHWTSTAMP support in driver. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:35:29 -07:00
huangdaode	e4600d69ff	net: Fix Hisilicon Network Subsystem Support Compilation This patch fixes the compilation error with arm allmodconfig, this error generated due to unavailability of readq() on 32-bit platform which was found during net-next daily compilation. In the same time, fix all the hns drivers compilation warnings. Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: zhaungyuzeng <Yisen.zhuang@huawei.com> Signed-off-by: kenneth Lee <liguozhu@hisilicon.com> Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:34:23 -07:00
Robert Jarzmik	1273bc573a	net: irda: pxaficp_ir: dmaengine conversion Convert pxaficp_ir to dmaengine. As pxa architecture is shifting from raw DMA registers access to pxa_dma dmaengine driver, convert this driver to dmaengine. Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Tested-by: Petr Cvek <petr.cvek@tul.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:32:48 -07:00
Robert Jarzmik	89fa57244a	net: irda: pxaficp_ir: convert to readl and writel Convert the pxa IRDA driver to readl and writel primitives, and remove another set of direct registers access. This leaves only the DMA registers access, which will be dealt with dmaengine conversion. Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Tested-by: Petr Cvek <petr.cvek@tul.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:32:48 -07:00
Robert Jarzmik	be01891e46	net: irda: pxaficp_ir: use sched_clock() for time management Instead of using directly the OS timer through direct register access, use the standard sched_clock(), which will end up in OSCR reading anyway. This is a first step for direct access register removal and machine specific code removal from this driver. This commit changes the behavior, as previously the minimum turnaround time was counted in 76ns steps, while with this patch it is counted in microsecond steps. The strictly equal formula would have been : while ((sched_clock() - si->last_clk) * 76 < mtt) Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:32:48 -07:00
Fabio Estevam	5b40f709a1	net: fec: Remove unneeded FEATURES_NEED_QUIESCE definition There is no need to have FEATURES_NEED_QUIESCE defined as we can simply use NETIF_F_RXCSUM instead as done in other parts of the driver. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:31:12 -07:00
David Ahern	17fb0b2b90	net: Remove redundant oif checks in rt6_device_match The oif has already been checked that it is non-zero; the 2 additional checks on oif within that if (oif) {...} block are redundant. CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:30:24 -07:00
Woojung.Huh@microchip.com	49d28b5642	lan78xx: Return 0 when lan78xx_suspend() has no error. lan78xx_suspend() may return non-zero from lan78xx_write_reg() in some scenario. Fix to return 0 when lan78xx_suspend() has no error. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:28:53 -07:00
David S. Miller	366f02d873	Merge branch 'mlx5-next' Or Gerlitz says: ==================== Mellanox mlx5 driver update Bunch of changes from the team, while warming engines for the upcoming SRIOV support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:19:56 -07:00
Eli Cohen	171bb2c560	net/mlx5_core: Update health syndromes Update new health monitored syndromes and their descriptions. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:19:50 -07:00
Eli Cohen	78ccb25861	net/mlx5_core: Fix wrong name in struct The name refers to syndrome so uset ext_synd instread of ext_sync. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:19:50 -07:00
Majd Dibbiny	a31208b1e1	net/mlx5_core: New init and exit flow for mlx5_core In the new flow, we separate the pci initialization and teardown from the initialization and teardown of the other resources. init_one calls mlx5_pci_init that handles the pci resources initialization. It then calls mlx5_load_one to initialize the remainder of the resources. When removing a device, remove_one is invoked. However, now remove_one calls mlx5_unload_one to free all the resources except the pci resources. When mlx5_unload_one returns, mlx5_pci_close is called to free the pci resources. The above separation will allow us to implement the pci error handlers and suspend and resume callbacks. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:19:50 -07:00
Eli Cohen	a8ffe63e60	net/mlx5_core: Fix notification of page supplement error Some errors did not result with notifying firmware that the page request could not be fulfilled. Fix this and put the notification logic into a separate function. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:19:49 -07:00
Eli Cohen	be87544de8	net/mlx5_core: Fix async commands return code In case of async command completion, the error code returned should take into account the command completion status. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:19:49 -07:00
Achiad Shochat	6c3dbd2d72	net/mlx5_core: Remove redundant "err" variable usage Cosmetic change. Do not use the an err variable just to assign and return it. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:19:49 -07:00
Saeed Mahameed	97909302f9	net/mlx5_core: Fix struct type in the DESTROY_TIR/TIS device commands Used the output mailbox format for input mailbox. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:19:49 -07:00
Achiad Shochat	343b29f308	net/mlx5e: Priv state flag not rolled-back upon netdev open error The private mlx5 state flag that indicates that the netdev is opened is set at the beginning of the netdev open flow. In case an error occured later in the mlx5 netdev open flow, this flag was not cleared, remaining set although the actual set is closed. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:19:49 -07:00
Andrzej Hajda	4de61ba234	tools: bpf_jit_disasm: make get_last_jit_image return unsigned The function returns always non-negative values. The problem has been detected using proposed semantic patch scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107 Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:18:01 -07:00
Eric Dumazet	7c85af8810	tcp: avoid reorders for TFO passive connections We found that a TCP Fast Open passive connection was vulnerable to reorders, as the exchange might look like [1] C -> S S <FO ...> <request> [2] S -> C S. ack request <options> [3] S -> C . <answer> packets [2] and [3] can be generated at almost the same time. If C receives the 3rd packet before the 2nd, it will drop it as the socket is in SYN_SENT state and expects a SYNACK. S will have to retransmit the answer. Current OOO avoidance in linux is defeated because SYNACK packets are attached to the LISTEN socket, while DATA packets are attached to the children. They might be sent by different cpus, and different TX queues might be selected. It turns out that for TFO, we created a child, which is a full blown socket in TCP_SYN_RECV state, and we simply can attach the SYNACK packet to this socket. This means that at the time tcp_sendmsg() pushes DATA packet, skb->ooo_okay will be set iff the SYNACK packet had been sent and TX completed. This removes the reorder source at the host level. We also removed the export of tcp_try_fastopen(), as it is no longer called from IPv6. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:11:19 -07:00
David S. Miller	eae93fe4ff	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-09-28 This series contains updates to i40e, i40evf and igb to resolve issues seen and reported by Red Hat. Kiran moves i40e_get_head() in preparation for the refactor of the Tx timeout logic, so that it can be used in other areas of the driver. Refactored the driver timeout logic by issuing a writeback request via a software interrupt to the hardware the first time the driver detects a hang. This was due to the driver being too aggressive in resetting a hung queue. Shannon adds the GRE protocol to the transmit checksum encoding. Anjali fixes an issue of forcing writeback too often, which caused us to not benefit from NAPI. We now disable force writeback in the clean routine for X710 and XL710 adapters. The X722 adapters do not enable interrupt to force a writeback and benefit from WB_ON_ITR and so force WB is left enabled for those adapters. Fixed a possible deadlock issue where sync_vsi_filters() can be called directly under RTNL or through the timer subtask without RTNL. So update the flow to see if we are already under RTNL before trying to grab it. Stefan Assmann provides a fix for igb where SR-IOV was not getting enabled properly and we ran into a NULL pointer if the max_vfs module parameter is specified. This is prevented by setting the IGB_FLAG_HAS_MSIX bit before calling igb_probe_vfs(). v2: added "i40e: Fix for recursive RTNL lock during PROMISC change" patch to the series, as it resolves another issues seen and reported by Red Hat. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 20:56:02 -07:00
Stefan Assmann	cbfe360a15	igb: assume MSI-X interrupts during initialization In igb_sw_init() the sequence of calls was changed from igb_init_queue_configuration() igb_init_interrupt_scheme() igb_probe_vfs() to igb_probe_vfs() igb_init_queue_configuration() igb_init_interrupt_scheme() This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not get enabled properly and we run into a NULL pointer if the max_vfs module parameter is specified (adapter->vf_data does not get allocated, crash on accessing the structure). [ 7.419348] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 7.419367] IP: [<ffffffffa02161c6>] igb_reset+0xe6/0x5d0 [igb] [ 7.419370] PGD 0 [ 7.419373] Oops: 0002 [#1] SMP [ 7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio [ 7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153 [ 7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 03/07/2013 [...] [ 7.419431] Call Trace: [ 7.419442] [<ffffffffa0217236>] igb_probe+0x8b6/0x1340 [igb] [ 7.419447] [<ffffffff814c7f15>] local_pci_probe+0x45/0xa0 Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling igb_probe_vfs(). The real interrupt capabilities will be checked during igb_init_interrupt_scheme() so this is safe to do. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-28 17:48:34 -07:00
Anjali Singhai	30e2561b95	i40e: Fix for recursive RTNL lock during PROMISC change The sync_vsi_filters function can be called directly under RTNL or through the timer subtask without one. This was causing a deadlock. If sync_vsi_filters is called from a thread which held the lock, and in another thread the PROMISC setting got changed we would be executing the PROMISC change in the thread which already held the lock alongside the other filter update. The PROMISC change requires a reset if we are on a VEB, which requires it to be called under RTNL. Earlier the driver would call reset for PROMISC change without checking if we were already under RTNL and would try to grab it causing a deadlock. This patch changes the flow to see if we are already under RTNL before trying to grab it. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-28 17:43:23 -07:00
Anjali Singhai	5804474311	i40e: Fix RS bit update in Tx path and disable force WB workaround This patch fixes the issue of forcing WB too often causing us to not benefit from NAPI. Without this patch we were forcing WB/arming interrupt too often taking away the benefits of NAPI and causing a performance impact. With this patch we disable force WB in the clean routine for X710 and XL710 adapters. X722 adapters do not enable interrupt to force a WB and benefit from WB_ON_ITR and hence force WB is left enabled for those adapters. For XL710 and X710 adapters if we have less than 4 packets pending a software Interrupt triggered from service task will force a WB. This patch also changes the conditions for setting RS bit as described in code comments. This optimizes when the HW does a tail bump and when it does a WB. It also optimizes when we do a wmb. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-28 17:38:28 -07:00
Shannon Nelson	c1d1791dc8	i40e: add GRE tunnel type to csum encoding Make sure the Tx checksum encoder knows about GRE protocol and sets the descriptor flag appropriately. Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-28 17:38:27 -07:00
Kiran Patil	b03a8c1f4c	i40e/i40evf: refactor tx timeout logic This patch modifies the driver timeout logic by issuing a writeback request via a software interrupt to the hardware the first time the driver detects a hang. The driver was too aggressive in resetting a hung queue, so back that off by removing logic to down the netdevice after too many hangs, and move the function to the service task. Change-ID: Ife100b9d124cd08cbdb81ab659008c1b9abbedea Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-28 17:38:27 -07:00
Kiran Patil	1e6d6f8c1b	i40e: Move i40e_get_head into header file i40e_get_head needs to be called in multiple files in a further patch, prepare by moving the function into a header file. Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-09-28 17:38:27 -07:00
Ian Wilson	34c2d9fb04	bridge: Allow forward delay to be cfgd when STP enabled Allow bridge forward delay to be configured when Spanning Tree is enabled. Signed-off-by: Ian Wilson <iwilson@brocade.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-27 19:09:38 -07:00
David S. Miller	8f35043729	Merge branch 'vxlan-ipv4-ipv6' Jiri Benc says: ==================== vxlan: support both IPv4 and IPv6 sockets Note: this needs net merged into net-next in order to apply. It's currently not easy enough to work with metadata based vxlan tunnels. In particular, it's necessary to create separate network interfaces for IPv4 and IPv6 tunneling. Assigning an IPv6 address to an IPv4 interface is allowed yet won't do what's expected. With route based tunneling, one has to pay attention to use the vxlan interface opened with the correct family. Other users of this (openvswitch) would need to always create two vxlan interfaces. Furthermore, there's no sane API for creating an IPv6 vxlan metadata based interface. This patchset simplifies this by opening both IPv4 and IPv6 socket if the vxlan interface has the metadata flag (IFLA_VXLAN_COLLECT_METADATA) set. Assignment of addresses etc. works as expected after this. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-26 22:40:56 -07:00
Jiri Benc	b1be00a6c3	vxlan: support both IPv4 and IPv6 sockets in a single vxlan device For metadata based vxlan interface, open both IPv4 and IPv6 socket. This is much more user friendly: it's not necessary to create two vxlan interfaces and pay attention to using the right one in routing rules. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-26 22:40:55 -07:00
Jiri Benc	205f356d16	vxlan: make vxlan_sock_add and vxlan_sock_release complementary Make vxlan_sock_add both alloc the socket and attach it to vxlan_dev. Let vxlan_sock_release accept vxlan_dev as its parameter instead of vxlan_sock. This makes vxlan_sock_add and vxlan_sock release complementary. It reduces code duplication in the next patch. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-26 22:40:55 -07:00
David Woodhouse	8b7a704822	8139cp: Fix GSO MSS handling When fixing the TSO support I noticed we just mask ->gso_size with the MSSMask value and don't care about the consequences. Provide a .ndo_features_check() method which drops the NETIF_F_TSO feature for any skb which would exceed the maximum, and thus forces it to be segmented by software. Then we can stop the masking in cp_start_xmit(), and just WARN if the maximum is exceeded, which should now never happen. Finally, Francois Romieu noticed that we didn't even have the right value for MSSMask anyway; it should be 0x7ff (11 bits) not 0xfff. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-26 22:38:34 -07:00
David Woodhouse	5a58f22779	8139cp: Enable offload features by default I fixed TSO. Hardware checksum and scatter/gather also appear to be working correctly both on real hardware and in QEMU's emulation. Let's enable them by default and see if anyone screams... Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-26 22:37:12 -07:00
David S. Miller	4963ed48f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv4/arp.c The net/ipv4/arp.c conflict was one commit adding a new local variable while another commit was deleting one. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-26 16:08:27 -07:00
Linus Torvalds	518a7cb698	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs instead of cloning them. From Daniel Borkmann. 2) When converting classical BPF into eBPF, fix the setting of the source reg to BPF_REG_X. From Tycho Andersen. 3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from Linus Lussing. 4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau. 5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from Roopa Prabhu. 6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai. 7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from Florian Westphal. 8) Check DMA mapping errors in bna driver, from Ivan Vecera. 9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context, misclearing of interrupt status in TX timeout handler, etc.) from David Woodhouse. 10) In tipc, reset SKB header pointer after skb_linearize(), from Erik Hugne. 11) Fix autobind races et al. in netlink code, from Herbert Xu with help from Tejun Heo and others. 12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan. 13) Fix various races in timewait timer and reqsk_queue_hadh_req, from Eric Dumazet. 14) Fix array overruns in mac80211, from Johannes Berg and Dan Carpenter. 15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov. 16) Fix race between poll_one_napi and napi_disable, from Neil Horman. 17) Fix byte order in geneve tunnel port config, from John W Linville. 18) Fix handling of ARP replies over lightweight tunnels, from Jiri Benc. 19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson Kok and Roopa Prabhu. 20) Several reference count handling bug fixes in the PHY/MDIO layer from Russel King. 21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault. 22) Fix crash in icmp_route_lookup(), from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) net: Fix panic in icmp_route_lookup net: update docbook comment for __mdiobus_register() ppp: fix lockdep splat in ppp_dev_uninit() net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected phy: marvell: add link partner advertised modes net: fix net_device refcounting phy: add phy_device_remove() phy: fixed-phy: properly validate phy in fixed_phy_update_state() net: fix phy refcounting in a bunch of drivers of_mdio: fix MDIO phy device refcounting phy: add proper phy struct device refcounting phy: fix mdiobus module safety net: dsa: fix of_mdio_find_bus() device refcount leak phy: fix of_mdio_find_bus() device refcount leak ip6_tunnel: Reduce log level in ip6_tnl_err() to debug ip6_gre: Reduce log level in ip6gre_err() to debug fib_rules: fix fib rule dumps across multiple skbs bnx2x: byte swap rss_key to comply to Toeplitz specs net: revert "net_sched: move tp->root allocation into fw_init()" lwtunnel: remove source and destination UDP port config option ...	2015-09-26 06:01:33 -04:00
David Ahern	bdb06cbf77	net: Fix panic in icmp_route_lookup Andrey reported a panic: [ 7249.865507] BUG: unable to handle kernel pointer dereference at 000000b4 [ 7249.865559] IP: [<c16afeca>] icmp_route_lookup+0xaa/0x320 [ 7249.865598] pdpt = 0000000030f7f001 pde = 0000000000000000 [ 7249.865637] Oops: 0000 [#1] ... [ 7249.866811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.3.0-999-generic #201509220155 [ 7249.866876] Hardware name: MSI MS-7250/MS-7250, BIOS 080014 08/02/2006 [ 7249.866916] task: c1a5ab00 ti: c1a52000 task.ti: c1a52000 [ 7249.866949] EIP: 0060:[<c16afeca>] EFLAGS: 00210246 CPU: 0 [ 7249.866981] EIP is at icmp_route_lookup+0xaa/0x320 [ 7249.867012] EAX: 00000000 EBX: f483ba48 ECX: 00000000 EDX: f2e18a00 [ 7249.867045] ESI: 000000c0 EDI: f483ba70 EBP: f483b9ec ESP: f483b974 [ 7249.867077] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 7249.867108] CR0: 8005003b CR2: 000000b4 CR3: 36ee07c0 CR4: 000006f0 [ 7249.867141] Stack: [ 7249.867165] 320310ee 00000000 00000042 320310ee 00000000 c1aeca00 f3920240 f0c69180 [ 7249.867268] f483ba04 f855058b a89b66cd f483ba44 f8962f4b 00000000 e659266c f483ba54 [ 7249.867361] 8004753c f483ba5c f8962f4b f2031140 000003c1 ffbd8fa0 c16b0e00 00000064 [ 7249.867448] Call Trace: [ 7249.867494] [<f855058b>] ? e1000_xmit_frame+0x87b/0xdc0 [e1000e] [ 7249.867534] [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack] [ 7249.867576] [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack] [ 7249.867615] [<c16b0e00>] ? icmp_send+0xa0/0x380 [ 7249.867648] [<c16b102f>] icmp_send+0x2cf/0x380 [ 7249.867681] [<f89c8126>] nf_send_unreach+0xa6/0xc0 [nf_reject_ipv4] [ 7249.867714] [<f89cd0da>] reject_tg+0x7a/0x9f [ipt_REJECT] [ 7249.867746] [<f88c29a7>] ipt_do_table+0x317/0x70c [ip_tables] [ 7249.867780] [<f895e0a6>] ? __nf_conntrack_find_get+0x166/0x3b0 [nf_conntrack] [ 7249.867838] [<f895eea8>] ? nf_conntrack_in+0x398/0x600 [nf_conntrack] [ 7249.867889] [<f84c0035>] iptable_filter_hook+0x35/0x80 [iptable_filter] [ 7249.867933] [<c16776a1>] nf_iterate+0x71/0x80 [ 7249.867970] [<c1677715>] nf_hook_slow+0x65/0xc0 [ 7249.868002] [<c1681811>] __ip_local_out_sk+0xc1/0xd0 [ 7249.868034] [<c1680f30>] ? ip_forward_options+0x1a0/0x1a0 [ 7249.868066] [<c1681836>] ip_local_out_sk+0x16/0x30 [ 7249.868097] [<c1684054>] ip_send_skb+0x14/0x80 [ 7249.868129] [<c16840f4>] ip_push_pending_frames+0x34/0x40 [ 7249.868163] [<c16844a2>] ip_send_unicast_reply+0x282/0x310 [ 7249.868196] [<c16a0863>] tcp_v4_send_reset+0x1b3/0x380 [ 7249.868227] [<c16a1b63>] tcp_v4_rcv+0x323/0x990 [ 7249.868257] [<c16776a1>] ? nf_iterate+0x71/0x80 [ 7249.868289] [<c167dc2b>] ip_local_deliver_finish+0x8b/0x230 [ 7249.868322] [<c167df4c>] ip_local_deliver+0x4c/0xa0 [ 7249.868353] [<c167dba0>] ? ip_rcv_finish+0x390/0x390 [ 7249.868384] [<c167d88c>] ip_rcv_finish+0x7c/0x390 [ 7249.868415] [<c167e280>] ip_rcv+0x2e0/0x420 ... Prior to the VRF change the oif was not set in the flow struct, so the VRF support should really have only added the vrf_master_ifindex lookup. Fixes: `613d09b30f` ("net: Use VRF device index for lookups on TX") Cc: Andrey Melnikov <temnota.am@gmail.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 21:44:02 -07:00
Russell King	59f069789c	net: update docbook comment for __mdiobus_register() Update the docbook comment for __mdiobus_register() to include the new module owner argument. This resolves a warning found by the 0-day builder. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 21:37:19 -07:00
Linus Torvalds	d4a748a10e	Merge branch 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull another cgroup fix from Tejun Heo: "The cgroup writeback support got inadvertently enabled for traditional hierarchies revealing two regressions which are currently being worked on. It shouldn't have been enabled on traditional hierarchies, so disable it on them. This is enough to make the regressions go away for people who aren't experimenting with cgroup" * 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup, writeback: don't enable cgroup writeback on traditional hierarchies	2015-09-25 16:20:55 -07:00
David S. Miller	4d54d86546	Merge branch 'listener-sock-const' Eric Dumazet says: ==================== dccp/tcp: constify listener sock Another patch bomb to prepare lockless TCP/DCCP LISTEN handling. SYNACK retransmits are built and sent without listener socket being locked. Soon, initial SYNACK packets will have same property. This series makes sure we did not something wrong with this model, by adding a const qualifier in all the paths taken from synack building and transmit, for IPv4/IPv6 and TCP/dccp. The only potential problem was the rewrite of ecn bits for connections with DCTCP as congestion module, but this was a very minor one. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:40 -07:00
Eric Dumazet	1b70e977ce	inet: constify inet_rtx_syn_ack() sock argument SYNACK packets are sent on behalf on unlocked listeners or fastopen sockets. Mark socket as const to catch future changes that might break the assumption. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:39 -07:00
Eric Dumazet	ea3bea3a1d	tcp/dccp: constify rtx_synack() and friends This is done to make sure we do not change listener socket while sending SYNACK packets while socket lock is not held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:39 -07:00
Eric Dumazet	802885fc04	dccp: constify dccp_make_response() socket argument Like tcp_make_synack() the only time we might change the socket is when calling sock_wmalloc(), which is using atomic operation to update sk->sk_wmem_alloc Also use MAX_DCCP_HEADER as both IPv4/IPv6 use this value for max_header. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:39 -07:00
Eric Dumazet	0f935dbedc	tcp: constify tcp_v{4\|6}_send_synack() socket argument This documents fact that listener lock might not be held at the time SYNACK are sent. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:39 -07:00
Eric Dumazet	1c1e9d2b67	ipv6: constify ip6_xmit() sock argument This is to document that socket lock might not be held at this point. skb_set_owner_w() and ipv6_local_error() are using proper atomic ops or spinlocks, so we promote the socket to non const when calling them. netfilter hooks should never assume socket lock is held, we also promote the socket to non const. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:38 -07:00
Eric Dumazet	5d062de7f8	tcp: constify tcp_make_synack() socket argument listener socket is not locked when tcp_make_synack() is called. We better make sure no field is written. There is one exception : Since SYNACK packets are attached to the listener at this moment (or SYN_RECV child in case of Fast Open), sock_wmalloc() needs to update sk->sk_wmem_alloc, but this is done using atomic operations so this is safe. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:38 -07:00
Eric Dumazet	6ac705b180	tcp: remove tcp_ecn_make_synack() socket argument SYNACK packets might be sent without holding socket lock. For DCTCP/ECN sake, we should call INET_ECN_xmit() while socket lock is owned, and only when we init/change congestion control. This also fixies a bug if congestion module is changed from dctcp to another one on a listener : we now clear ECN bits properly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:38 -07:00
Eric Dumazet	37bfbdda0b	tcp: remove tcp_synack_options() socket argument We do not use the socket in this function. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:38 -07:00

1 2 3 4 5 ...

547489 Commits All Branches Search

547489 Commits

All Branches