OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
David S. Miller	fe3edf4579	ipv4: Remove all RTCF_DIRECTSRC handliing. The last and final kernel user, ICMP address replies, has been removed. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 13:22:20 -07:00
David S. Miller	838942a594	ipv4: Really ignore ICMP address requests/replies. Alexey removed kernel side support for requests, and the only thing we do for replies is log a message if something doesn't look right. As Alexey's comment indicates, this belongs in userspace (if anywhere), and thus we can safely just get rid of this code. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 13:20:26 -07:00
David S. Miller	8acfaa9484	decnet: Don't set RTCF_DIRECTSRC. It's an ipv4 defined route flag, and only ipv4 uses it. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 13:16:59 -07:00
Saurabh	e7d4b18cbe	net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse. With CONFIG_SPARSE_RCU_POINTER=y sparse identified references which did not specificy __rcu in ip_vti.c Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 13:00:54 -07:00
Lin Ming	8fe5cb873b	ipv4: Remove redundant assignment It is redundant to set no_addr and accept_local to 0 and then set them with other values just after that. Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 13:00:54 -07:00
Weiping Pan	06b6a1cf6e	rds: set correct msg_namelen Jay Fenlason (fenlason@redhat.com) found a bug, that recvfrom() on an RDS socket can return the contents of random kernel memory to userspace if it was called with a address length larger than sizeof(struct sockaddr_in). rds_recvmsg() also fails to set the addr_len paramater properly before returning, but that's just a bug. There are also a number of cases wher recvfrom() can return an entirely bogus address. Anything in rds_recvmsg() that returns a non-negative value but does not go through the "sin = (struct sockaddr_in )msg->msg_name;" code path at the end of the while(1) loop will return up to 128 bytes of kernel memory to userspace. And I write two test programs to reproduce this bug, you will see that in rds_server, fromAddr will be overwritten and the following sock_fd will be destroyed. Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is better to make the kernel copy the real length of address to user space in such case. How to run the test programs ? I test them on 32bit x86 system, 3.5.0-rc7. 1 compile gcc -o rds_client rds_client.c gcc -o rds_server rds_server.c 2 run ./rds_server on one console 3 run ./rds_client on another console 4 you will see something like: server is waiting to receive data... old socket fd=3 server received data from client:data from client msg.msg_namelen=32 new socket fd=-1067277685 sendmsg() : Bad file descriptor /************** rds_client.c *****************/ int main(void) { int sock_fd; struct sockaddr_in serverAddr; struct sockaddr_in toAddr; char recvBuffer[128] = "data from client"; struct msghdr msg; struct iovec iov; sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0); if (sock_fd < 0) { perror("create socket error\n"); exit(1); } memset(&serverAddr, 0, sizeof(serverAddr)); serverAddr.sin_family = AF_INET; serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); serverAddr.sin_port = htons(4001); if (bind(sock_fd, (struct sockaddr)&serverAddr, sizeof(serverAddr)) < 0) { perror("bind() error\n"); close(sock_fd); exit(1); } memset(&toAddr, 0, sizeof(toAddr)); toAddr.sin_family = AF_INET; toAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); toAddr.sin_port = htons(4000); msg.msg_name = &toAddr; msg.msg_namelen = sizeof(toAddr); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = strlen(recvBuffer) + 1; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; if (sendmsg(sock_fd, &msg, 0) == -1) { perror("sendto() error\n"); close(sock_fd); exit(1); } printf("client send data:%s\n", recvBuffer); memset(recvBuffer, '\0', 128); msg.msg_name = &toAddr; msg.msg_namelen = sizeof(toAddr); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = 128; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; if (recvmsg(sock_fd, &msg, 0) == -1) { perror("recvmsg() error\n"); close(sock_fd); exit(1); } printf("receive data from server:%s\n", recvBuffer); close(sock_fd); return 0; } /*************** rds_server.c *****************/ int main(void) { struct sockaddr_in fromAddr; int sock_fd; struct sockaddr_in serverAddr; unsigned int addrLen; char recvBuffer[128]; struct msghdr msg; struct iovec iov; sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0); if(sock_fd < 0) { perror("create socket error\n"); exit(0); } memset(&serverAddr, 0, sizeof(serverAddr)); serverAddr.sin_family = AF_INET; serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); serverAddr.sin_port = htons(4000); if (bind(sock_fd, (struct sockaddr)&serverAddr, sizeof(serverAddr)) < 0) { perror("bind error\n"); close(sock_fd); exit(1); } printf("server is waiting to receive data...\n"); msg.msg_name = &fromAddr; /* * I add 16 to sizeof(fromAddr), ie 32, * and pay attention to the definition of fromAddr, * recvmsg() will overwrite sock_fd, * since kernel will copy 32 bytes to userspace. * * If you just use sizeof(fromAddr), it works fine. * / msg.msg_namelen = sizeof(fromAddr) + 16; / msg.msg_namelen = sizeof(fromAddr); */ msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = 128; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; while (1) { printf("old socket fd=%d\n", sock_fd); if (recvmsg(sock_fd, &msg, 0) == -1) { perror("recvmsg() error\n"); close(sock_fd); exit(1); } printf("server received data from client:%s\n", recvBuffer); printf("msg.msg_namelen=%d\n", msg.msg_namelen); printf("new socket fd=%d\n", sock_fd); strcat(recvBuffer, "--data from server"); if (sendmsg(sock_fd, &msg, 0) == -1) { perror("sendmsg()\n"); close(sock_fd); exit(1); } } close(sock_fd); return 0; } Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 01:01:44 -07:00
Dan Carpenter	5b3e7e6cb5	openvswitch: potential NULL deref in sample() If there is no OVS_SAMPLE_ATTR_ACTIONS set then "acts_list" is NULL and it leads to a NULL dereference when we call nla_len(acts_list). This is a static checker fix, not something I have seen in testing. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 00:59:54 -07:00
Eric Dumazet	563d34d057	tcp: dont drop MTU reduction indications ICMP messages generated in output path if frame length is bigger than mtu are actually lost because socket is owned by user (doing the xmit) One example is the ipgre_tunnel_xmit() calling icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); We had a similar case fixed in commit `a34a101e1e` (ipv6: disable GSO on sockets hitting dst_allfrag). Problem of such fix is that it relied on retransmit timers, so short tcp sessions paid a too big latency increase price. This patch uses the tcp_release_cb() infrastructure so that MTU reduction messages (ICMP messages) are not lost, and no extra delay is added in TCP transmits. Reported-by: Maciej Żenczykowski <maze@google.com> Diagnosed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 00:58:46 -07:00
Yuval Mintz	c3def943c7	bnx2x: Add new 57840 device IDs The 57840 boards come in two flavours: 2 x 20G and 4 x 10G. To better differentiate between the two flavours, a separate device ID was assigned to each. The silicon default value is still the currently supported 57840 device ID (0x168d), and since a user can damage the nvram (e.g., 'ethtool -E') the driver will still support this device ID to allow the user to amend the nvram back into a supported configuration. Notice this patch contains lines longer than 80 characters (strings). Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 00:58:21 -07:00
Julian Anastasov	9a0a9502cb	tcp: avoid oops in tcp_metrics and reset tcpm_stamp In tcp_tw_remember_stamp we incorrectly checked tw instead of tm, it can lead to oops if the cached entry is not found. tcpm_stamp was not updated in tcpm_check_stamp when tcpm_suck_dst was called, move the update into tcpm_suck_dst, so that we do not call it infinitely on every next cache hit after TCP_METRICS_TIMEOUT. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 00:57:12 -07:00
Shuah Khan	9b70749e64	niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value to be consistent with the rest of the checks after niu_rbr_add_page() calls in this file. Signed-off-by: Shuah Khan <shuah.khan@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 23:31:07 -07:00
Shuah Khan	ec2deec1f3	niu: Fix to check for dma mapping errors. Fix Neptune ethernet driver to check dma mapping error after map_page() interface returns. Signed-off-by: Shuah Khan <shuah.khan@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 23:31:06 -07:00
Jesper Juhl	818810472b	net: Fix references to out-of-scope variables in put_cmsg_compat() In net/compat.c::put_cmsg_compat() we may assign 'data' the address of either the 'ctv' or 'cts' local variables inside the 'if (!COMPAT_USE_64BIT_TIME)' branch. Those variables go out of scope at the end of the 'if' statement, so when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm), data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what it may be refering to - not good. Fix the problem by simply giving 'ctv' and 'cts' function scope. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 17:50:49 -07:00
David S. Miller	5e9965c15b	Merge branch 'kill_rtcache' The ipv4 routing cache is non-deterministic, performance wise, and is subject to reasonably easy to launch denial of service attacks. The routing cache works great for well behaved traffic, and the world was a much friendlier place when the tradeoffs that led to the routing cache's design were considered. What it boils down to is that the performance of the routing cache is a product of the traffic patterns seen by a system rather than being a product of the contents of the routing tables. The former of which is controllable by external entitites. Even for "well behaved" legitimate traffic, high volume sites can see hit rates in the routing cache of only ~%10. The general flow of this patch series is that first the routing cache is removed. We build a completely new rtable entry every lookup request. Next we make some simplifications due to the fact that removing the routing cache causes several members of struct rtable to become no longer necessary. Then we need to make some amends such that we can legally cache pre-constructed routes in the FIB nexthops. Firstly, we need to invalidate routes which are hit with nexthop exceptions. Secondly we have to change the semantics of rt->rt_gateway such that zero means that the destination is on-link and non-zero otherwise. Now that the preparations are ready, we start caching precomputed routes in the FIB nexthops. Output and input routes need different kinds of care when determining if we can legally do such caching or not. The details are in the commit log messages for those changes. The patch series then winds down with some more struct rtable simplifications and other tidy ups that remove unnecessary overhead. On a SPARC-T3 output route lookups are ~876 cycles. Input route lookups are ~1169 cycles with rpfilter disabled, and about ~1468 cycles with rpfilter enabled. These measurements were taken with the kbench_mod test module in the net_test_tools GIT tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git That GIT tree also includes a udpflood tester tool and stresses route lookups on packet output. For example, on the same SPARC-T3 system we can run: time ./udpflood -l 10000000 10.2.2.11 with routing cache: real 1m21.955s user 0m6.530s sys 1m15.390s without routing cache: real 1m31.678s user 0m6.520s sys 1m25.140s Performance undoubtedly can easily be improved further. For example fib_table_lookup() performs a lot of excessive computations with all the masking and shifting, some of it conditionalized to deal with edge cases. Also, Eric's no-ref optimization for input route lookups can be re-instated for the FIB nexthop caching code path. I would be really pleased if someone would work on that. In fact anyone suitable motivated can just fire up perf on the loading of the test net_test_tools benchmark kernel module. I spend much of my time going: bash# perf record insmod ./kbench_mod.ko dst=172.30.42.22 src=74.128.0.1 iif=2 bash# perf report Thanks to helpful feedback from Joe Perches, Eric Dumazet, Ben Hutchings, and others. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 17:04:15 -07:00
Mark A. Greer	3ba9738134	net: ethernet: davinci_emac: add pm_runtime support Add pm_runtime support to the TI Davinci EMAC driver. CC: Sekhar Nori <nsekhar@ti.com> CC: Kevin Hilman <khilman@ti.com> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:46:42 -07:00
Mark A. Greer	c6f0b4ea64	net: ethernet: davinci_emac: Remove unnecessary #include The '#include <mach/mux.h>' line in davinci_emac.c causes a compile error because that header file isn't found. It turns out that the #include isn't needed because the driver isn't (and shoudn't be) touching the mux anyway, so remove it. CC: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:46:42 -07:00
Rick Jones	f8b72d36d2	net-next: minor cleanups for bonding documentation The section titled "Configuring Bonding for Maximum Throughput" is actually section twelve not thirteen, and there are a couple of words spelled incorrectly. Signed-off-by: Rick Jones <rick.jones2@hp.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:44:01 -07:00
John Fastabend	406a3c638c	net: netprio_cgroup: rework update socket logic Instead of updating the sk_cgrp_prioidx struct field on every send this only updates the field when a task is moved via cgroup infrastructure. This allows sockets that may be used by a kernel worker thread to be managed. For example in the iscsi case today a user can put iscsid in a netprio cgroup and control traffic will be sent with the correct sk_cgrp_prioidx value set but as soon as data is sent the kernel worker thread isssues a send and sk_cgrp_prioidx is updated with the kernel worker threads value which is the default case. It seems more correct to only update the field when the user explicitly sets it via control group infrastructure. This allows the users to manage sockets that may be used with other threads. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:44:01 -07:00
Michael S. Tsirkin	0690899b4d	tun: experimental zero copy tx support Let vhost-net utilize zero copy tx when used with tun. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:39:33 -07:00
Michael S. Tsirkin	dcc0fb782b	skbuff: export skb_copy_ubufs Export skb_copy_ubufs so that modules can orphan frags. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:39:33 -07:00
Michael S. Tsirkin	1080e512d4	net: orphan frags on receive zero copy packets are normally sent to the outside network, but bridging, tun etc might loop them back to host networking stack. If this happens destructors will never be called, so orphan the frags immediately on receive. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:39:33 -07:00
Michael S. Tsirkin	868eefeb17	tun: orphan frags on xmit tun xmit is actually receive of the internal tun socket. Orphan the frags same as we do for normal rx path. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:39:33 -07:00
Michael S. Tsirkin	70008aa50e	skbuff: convert to skb_orphan_frags Reduce code duplication a bit using the new helper. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:39:33 -07:00
Michael S. Tsirkin	a353e0ce0f	skbuff: add an api to orphan frags Many places do if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY)) skb_copy_ubufs(skb, gfp_mask); to copy and invoke frag destructors if necessary. Add an inline helper for this. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:39:33 -07:00
David S. Miller	d47e12d63e	ixgbe: Fix build with PCI_IOV enabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:36:41 -07:00
Richard Cochran	7491302dc7	forcedeth: advertise transmit time stamping This driver now offers software transmit time stamping, so it should advertise that fact via ethtool. Compile tested only. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:33:32 -07:00
Richard Cochran	7b1115e017	e1000e: advertise transmit time stamping This driver now offers software transmit time stamping, so it should advertise that fact via ethtool. Compile tested only. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: e1000-devel@lists.sourceforge.net Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:33:32 -07:00
Richard Cochran	e10df2c63b	e1000: advertise transmit time stamping This driver now offers software transmit time stamping, so it should advertise that fact via ethtool. Compile tested only. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: e1000-devel@lists.sourceforge.net Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:33:32 -07:00
Richard Cochran	be53ce1edd	bnx2x: advertise transmit time stamping This driver now offers software transmit time stamping, so it should advertise that fact via ethtool. Compile tested only. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:33:32 -07:00
Mark A. Greer	1d69c2b343	rtnl: Add #ifdef CONFIG_RPS around num_rx_queues reference Commit `76ff5cc919` (rtnl: allow to specify number of rx and tx queues on device creation) added a reference to the net_device structure's 'num_rx_queues' member in net/core/rtnetlink.c:rtnl_fill_ifinfo() However, the definition for 'num_rx_queues' is surrounded by an '#ifdef CONFIG_RPS' while the new reference to it is not. This causes a compile error when CONFIG_RPS is not defined. Fix the compile error by surrounding the new reference to 'num_rx_queues' by an '#ifdef CONFIG_RPS'. CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:28:11 -07:00
David S. Miller	fd183f6a73	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: -------------------- This series contains updates to ixgbe and ixgbevf. ... Akeem G. Abodunrin (1): igb: reset PHY in the link_up process to recover PHY setting after power down. Alexander Duyck (8): ixgbe: Drop probe_vf and merge functionality into ixgbe_enable_sriov ixgbe: Change how we check for pre-existing and assigned VFs ixgbevf: Add lock around mailbox ops to prevent simultaneous access ixgbevf: Add support for PCI error handling ixgbe: Fix handling of FDIR_HASH flag ixgbe: Reduce Rx header size to what is actually used ixgbe: Use num_tcs.pg_tcs as upper limit for TC when checking based on UP ixgbe: Use 1TC DCB instead of disabling DCB for MSI and legacy interrupts Don Skidmore (1): ixgbe: add support for new 82599 device Greg Rose (1): ixgbevf: Fix namespace issue with ixgbe_write_eitr John Fastabend (2): ixgbe: fix RAR entry counting for generic and fdb_add() ixgbe: remove extra unused queues in DCB + FCoE case ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:23:18 -07:00
David S. Miller	5dc7c77967	Merge branch 'vhost-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	2012-07-22 12:19:24 -07:00
Randy Dunlap	2a304bb8fa	wimax: fix printk format warnings Fix printk format warnings in drivers/net/wimax/i2400m: drivers/net/wimax/i2400m/control.c: warning: format '%zu' expects argument of type 'size_t', but argument 4 has type 'ssize_t' [-Wformat] drivers/net/wimax/i2400m/control.c: warning: format '%zu' expects argument of type 'size_t', but argument 5 has type 'ssize_t' [-Wformat] drivers/net/wimax/i2400m/usb-fw.c: warning: format '%zu' expects argument of type 'size_t', but argument 4 has type 'ssize_t' [-Wformat] I don't see these warnings on x86. The warnings that are quoted above are from Geert's kernel build reports. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Cc: linux-wimax@intel.com Cc: wimax@linuxwimax.org Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:18:39 -07:00
Neil Horman	5aa93bcf66	sctp: Implement quick failover draft from tsvwg I've seen several attempts recently made to do quick failover of sctp transports by reducing various retransmit timers and counters. While its possible to implement a faster failover on multihomed sctp associations, its not particularly robust, in that it can lead to unneeded retransmits, as well as false connection failures due to intermittent latency on a network. Instead, lets implement the new ietf quick failover draft found here: http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 This will let the sctp stack identify transports that have had a small number of errors, and avoid using them quickly until their reliability can be re-established. I've tested this out on two virt guests connected via multiple isolated virt networks and believe its in compliance with the above draft and works well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: "David S. Miller" <davem@davemloft.net> CC: linux-sctp@vger.kernel.org CC: joe@perches.com Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:13:46 -07:00
Kevin Groeneveld	e3906486f6	net: fix race condition in several drivers when reading stats Fix race condition in several network drivers when reading stats on 32bit UP architectures. These drivers update their stats in a BH context and therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the stats. Signed-off-by: Kevin Groeneveld <kgroeneveld@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:12:32 -07:00
Eric Dumazet	0980e56e50	ipv4: tcp: set unicast_sock uc_ttl to -1 Set unicast_sock uc_ttl to -1 so that we select the right ttl, instead of sending packets with a 0 ttl. Bug added in commit `be9f4a44e7` (ipv4: tcp: remove per net tcp_sock) Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:06:21 -07:00
Akeem G. Abodunrin	7688659692	igb: reset PHY in the link_up process to recover PHY setting after power down. There was a previous patch to resolve issue with 82576 losing PHY setting after PHY power down. However that previous implementation triggered speed mismatch and occasional link lost. Now, this patch resolves both initial PHY setting and speed mismatch issues. Signed-off-by: Akeem G. Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:07:37 -07:00
Alexander Duyck	b724e9f250	ixgbe: Use 1TC DCB instead of disabling DCB for MSI and legacy interrupts This change makes it so that we can use 1TC DCB in the case of MSI and legacy interrupts. The advantage to this is that it allows us to fully support FCoE w/ DCB instead of having to drop to link flow control only when using these interrupt modes. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:07:13 -07:00
Don Skidmore	b6dfd939fd	ixgbe: add support for new 82599 device This patch adds support for a new 82599 device that supports WoL. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:06:46 -07:00
John Fastabend	3f4a6f009f	ixgbe: remove extra unused queues in DCB + FCoE case With DCB and FCoE configured extra queues may be allocated and never used. After this patch we calculate the max correctly. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:06:21 -07:00
John Fastabend	95447461fa	ixgbe: fix RAR entry counting for generic and fdb_add() Do RAR entry accounting correctly so that errors are reported and promisc mode is set correctly when the number of entries exceeds the hardware limits. This can happen with many macvlan devices attached to the PF or by adding many fdb entries in SR-IOV modes. Also this includes a small refactor to fdb_add() to avoid having so many nested if/else statements after adding a check for the number or RAR entries. The max entries for the PF is currently 16 we allow 15 additional entries to account for the defined MAC. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:05:59 -07:00
Alexander Duyck	b92ad72dea	ixgbe: Use num_tcs.pg_tcs as upper limit for TC when checking based on UP This change makes it so the function ixgbe_dcb_get_tc_from_up will use the num_tcs.pg_tcs to determine the starting value for determining a traffic class based on a user priority. The main motivation for this change is to address possible bad configurations in which more TCs worth of data are populated then there are actual TCs. By limiting this value we can at least make certain we are not providing a map with values that are out of range. As a result any user priorities that are setup in the configuration with a traffic class mapping higher than what the hardware supports will be reported as being on TC 0. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:05:28 -07:00
Alexander Duyck	252562c207	ixgbe: Reduce Rx header size to what is actually used The recent changes to netdev_alloc_skb actually make it so that the size of the buffer now actually has a more direct input on the truesize. So in order to make best use of the piece of a page we are allocated I am reducing the IXGBE_RX_HDR_SIZE to 256 so that our truesize will be reduced by 256 bytes as well. This should result in performance improvements since the number of uses per page should increase from 4 to 6 in the case of a 4K page. In addition we should see socket performance improvements due to the truesize dropping to less than 1K for buffers less than 256 bytes. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:04:51 -07:00
Greg Rose	ce42260669	ixgbevf: Fix namespace issue with ixgbe_write_eitr Make the function static to cleanup namespace. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <Sibai.li@intel.com Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:04:30 -07:00
Alexander Duyck	39cb681b3b	ixgbe: Fix handling of FDIR_HASH flag This change makes it so that we can use the atr_sample_rate to determine if we are capable of supporting ATR. The advantage to this approach is that it allows us to now determine the setting of the IXGBE_FLAG_FDIR_HASH_CAPABLE based on the queueing scheme, instead of the queueing scheme being based on the flag. Using this approach there are essentially 5 conditions that must be checked prior to trying to enable ATR: 1. Is SR-IOV disabled? 2. Are the number of TCs <= 1? 3. Is RSS queueing limit greater than 1? 4. Is atr_sample_rate set? 5. Is Flow Director perfect filtering disabled? If any of these conditions are enabled they should disable ATR filtering. Note that in the case of conditions 1 through 4 being met we will set things up for ATR queueing, however if test 5 fails we will still leave the queues allocated for use by perfect filters. The reason for this is to allow for us to switch back and forth between ntuple and ATR without needing to reallocate the descriptor rings. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:04:10 -07:00
Alexander Duyck	9f19f31dd4	ixgbevf: Add support for PCI error handling This change adds support for handling IO errors and slot resets. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:03:47 -07:00
Alexander Duyck	1c55ed768b	ixgbevf: Add lock around mailbox ops to prevent simultaneous access This change adds a spinlock around the mailbox accesses to prevent simultaneous access to the mailboxes. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:03:25 -07:00
Alexander Duyck	9297127b9c	ixgbe: Change how we check for pre-existing and assigned VFs This patch does two things. First it drops the unnecessary work of searching for enabled VFs when we first bring up the adapter and instead just uses pci_num_vf to determine how many VFs are enabled on the adapter. The second thing it does is drop the use of vfdev from the vf_data_storage structure. Instead we just search the entire system for a VF that has us as it's PF, and then if that VF is assigned we indicate that the VFs are assigned. This allows us to still check for assigned VFs even if the vfinfo allocation has failed, or vfinfo has been freed. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:02:56 -07:00
Alexander Duyck	99d744875d	ixgbe: Drop probe_vf and merge functionality into ixgbe_enable_sriov This is meant to fix a bug in which we were not checking for pre-existing VFs if we were not setting the max_vfs value at driver load. What happens now is that we always call ixgbe_enable_sriov and this checks for pre-existing VFs ore requested VFs prior to deciding on no SR-IOV. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2012-07-21 16:02:09 -07:00
Stefan Hajnoczi	163049aefd	vhost: make vhost work queue visible The vhost work queue allows processing to be done in vhost worker thread context, which uses the owner process mm. Access to the vring and guest memory is typically only possible from vhost worker context so it is useful to allow work to be queued directly by users. Currently vhost_net only uses the poll wrappers which do not expose the work queue functions. However, for tcm_vhost (vhost_scsi) it will be necessary to queue custom work. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-07-22 01:22:23 +03:00

1 2 3 4 5 ...

314020 Commits All Branches Search

314020 Commits

All Branches