OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
James Chapman	5de7aee541	l2tp: fix locking of 64-bit counters for smp L2TP uses 64-bit counters but since these are not updated atomically, we need to make them safe for smp. This patch addresses that. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-01 09:30:54 -04:00
David S. Miller	b6d151bb82	Merge branch 'tipc_net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux	2012-04-30 21:42:30 -04:00
Eric Dumazet	1d0c0b328a	net: makes skb_splice_bits() aware of skb->head_frag __skb_splice_bits() can check if skb to be spliced has its skb->head mapped to a page fragment, instead of a kmalloc() area. If so we can avoid a copy of the skb head and get a reference on underlying page. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-30 21:35:50 -04:00
Eric Dumazet	329033f645	tcp: makes tcp_try_coalesce aware of skb->head_frag TCP coalesce can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We had to disable coalescing in this case, for performance reasons. We 'upgrade' skb->head as a fragment in itself. This reduces number of cache misses when user makes its copies, since a less sk_buff are fetched. This makes receive and ofo queues shorter and thus reduce cache line misses in TCP stack. This is a followup of patch "net: allow skb->head to be a page fragment" Tested with tg3 nic, with GRO on or off. We can see "TCPRcvCoalesce" counter being incremented. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-30 21:35:49 -04:00
Eric Dumazet	d7e8883cfc	net: make GRO aware of skb->head_frag GRO can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We 'upgrade' skb->head as a fragment in itself This avoids the frag_list fallback, and permits to build true GRO skb (one sk_buff and up to 16 fragments), using less memory. This reduces number of cache misses when user makes its copy, since a single sk_buff is fetched. This is a followup of patch "net: allow skb->head to be a page fragment" Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-30 21:35:49 -04:00
Eric Dumazet	d3836f21b0	net: allow skb->head to be a page fragment skb->head is currently allocated from kmalloc(). This is convenient but has the drawback the data cannot be converted to a page fragment if needed. We have three spots were it hurts : 1) GRO aggregation When a linear skb must be appended to another skb, GRO uses the frag_list fallback, very inefficient since we keep all struct sk_buff around. So drivers enabling GRO but delivering linear skbs to network stack aren't enabling full GRO power. 2) splice(socket -> pipe). We must copy the linear part to a page fragment. This kind of defeats splice() purpose (zero copy claim) 3) TCP coalescing. Recently introduced, this permits to group several contiguous segments into a single skb. This shortens queue lengths and save kernel memory, and greatly reduce probabilities of TCP collapses. This coalescing doesnt work on linear skbs (or we would need to copy data, this would be too slow) Given all these issues, the following patch introduces the possibility of having skb->head be a fragment in itself. We use a new skb flag, skb->head_frag to carry this information. build_skb() is changed to accept a frag_size argument. Drivers willing to provide a page fragment instead of kmalloc() data will set a non zero value, set to the fragment size. Then, on situations we need to convert the skb head to a frag in itself, we can check if skb->head_frag is set and avoid the copies or various fallbacks we have. This means drivers currently using frags could be updated to avoid the current skb->head allocation and reduce their memory footprint (aka skb truesize). (thats 512 or 1024 bytes saved per skb). This also makes bpf/netfilter faster since the 'first frag' will be part of skb linear part, no need to copy data. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-30 21:35:11 -04:00
Paul Gortmaker	617d3c7a50	tipc: compress out gratuitous extra carriage returns Some of the comment blocks are floating in limbo between two functions, or between blocks of code. Delete the extra line feeds between any comment and its associated following block of code, to be consistent with the majority of the rest of the kernel. Also delete trailing newlines at EOF and fix a couple trivial typos in existing comments. This is a 100% cosmetic change with no runtime impact. We get rid of over 500 lines of non-code, and being blank line deletes, they won't even show up as noise in git blame. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-04-30 15:53:56 -04:00
Herbert Xu	bb63f1f8a0	bridge: Fix fatal typo in setup of multicast_querier_expired Unfortunately it seems that I didn't properly test the case of an expired external querier in the recent multicast bridge series. The setup of the timer in that case is completely broken and leads to a NULL-pointer dereference. This patch fixes it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-30 13:30:56 -04:00
David S. Miller	d499bd2ee9	l2tp: Add missing net/net/ip6_checksum.h include. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-30 13:21:28 -04:00
Benjamin LaHaise	d2cf336167	net/l2tp: add support for L2TP over IPv6 UDP Now that encap_rcv() works on IPv6 UDP sockets, wire L2TP up to IPv6. Support has been tested with and without hardware offloading. This version fixes the L2TP over localhost issue with incorrect checksums being reported. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-28 22:21:51 -04:00
Benjamin LaHaise	d7f3f62167	net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6 Now that the sematics of udpv6_queue_rcv_skb() match IPv4's udp_queue_rcv_skb(), introduce the UDP encap_rcv() hook for IPv6. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-28 22:21:51 -04:00
Benjamin LaHaise	cb80ef463d	net/ipv6/udp: UDP encapsulation: move socket locking into udpv6_queue_rcv_skb() In order to make sure that when the encap_rcv() hook is introduced it is not called with the socket lock held, move socket locking from callers into udpv6_queue_rcv_skb(), matching what happens in IPv4. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-28 22:21:51 -04:00
Benjamin LaHaise	f7ad74fef3	net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb This is the first step in reworking the IPv6 UDP code to be structured more like the IPv4 UDP code. This patch creates __udpv6_queue_rcv_skb() with the equivalent sematics to __udp_queue_rcv_skb(), and wires it up to the backlog_rcv method. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-28 22:21:50 -04:00
Jeffrin Jose	cb75a36c8a	net: Fixed a coding style issue related to spaces. Fixed a coding style issue relating to spaces in net/core/sock.c Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-28 21:45:00 -04:00
Allan Stephens	aad585473f	tipc: Reject payload messages with invalid message type Adds check to ensure TIPC sockets reject incoming payload messages that have an unrecognized message type. Remove the old open question about whether TIPC_ERR_NO_PORT is the proper return value. It is appropriate here since there are valid instances where another node can make use of the reply, and at this point in time the host is already broadcasting TIPC data, so there are no real security concerns. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-04-27 10:08:00 -04:00
Eric Dumazet	8298193012	net: cleanups in sock_setsockopt() Use min_t()/max_t() macros, reformat two comments, use !!test_bit() to match !!sock_flag() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-27 02:14:21 -04:00
hartleys	feb50ac19e	crush: include header for global symbols Include the header to pickup the definitions of the global symbols. Quiets the following sparse warnings: warning: symbol 'crush_find_rule' was not declared. Should it be static? warning: symbol 'crush_do_rule' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Sage Weil <sage@newdream.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-27 00:03:34 -04:00
Eric Dumazet	6746960140	ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing Quoting Tore Anderson from : https://bugzilla.kernel.org/show_bug.cgi?id=42572 When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment size does not take into account the size of the IPv6 Fragmentation header that needs to be included in outbound packets, causing every transmitted TCP segment to be fragmented across two IPv6 packets, the latter of which will only contain 8 bytes of actual payload. RTAX_FEATURE_ALLFRAG is typically set on a route in response to receving a ICMPv6 Packet Too Big message indicating a Path MTU of less than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6 PTBs with MTU < 1280 are still valid, in particular when an IPv6 packet is sent to an IPv4 destination through a stateless translator. Any ICMPv4 Need To Fragment packets originated from the IPv4 part of the path will be translated to ICMPv6 PTB which may then indicate an MTU of less than 1280. The Linux kernel refuses to reduce the effective MTU to anything below 1280 bytes, instead it sets it to exactly 1280 bytes, and RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header), instead of 1232 (additionally taking into account the 8 bytes required by the IPv6 Fragmentation extension header). This in turn results in rather inefficient transmission, as every transmitted TCP segment now is split in two fragments containing 1232+8 bytes of payload. After this patch, all the outgoing packets that includes a Fragmentation header all are "atomic" or "non-fragmented" fragments, i.e., they both have Offset=0 and More Fragments=0. With help from David S. Miller Reported-by: Tore Anderson <tore@fud.no> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Tom Herbert <therbert@google.com> Tested-by: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-27 00:03:34 -04:00
Allan Stephens	8f17789693	tipc: Enhance error checking of published names Consolidates validation of scope and name sequence range values into a single routine where it applies both to local name publications and to name publications issued by other nodes in the network. This change means that the scope value for non-local publications is now validated and the name sequence range for local publications is now validated only once. Additionally, a publication attempt that fails validation now creates an entry in the system log file only if debugging capabilities have been enabled; this prevents the system log from being cluttered up with messages caused by a defective application or network node. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-04-26 18:15:48 -04:00
Allan Stephens	f7fb9d20ad	tipc: Create helper routine to delete unused name sequence structure Replaces two identical chunks of code that delete an unused name sequence structure from TIPC's name table with calls to a new routine that performs this operation. This change is cosmetic and doesn't impact the operation of TIPC. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-04-26 18:15:47 -04:00
Allan Stephens	bbe6a295d0	tipc: remove redundant memset and stale comment from subscr.c Eliminate code to zero-out the main topology service structure, which is already zeroed-out. Get rid of a comment documenting a field of the main topology service structure that no longer exists. Both are cosmetic changes with no impact on runtime behaviour. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-04-26 18:15:46 -04:00
Allan Stephens	2d98abb9fe	tipc: Optimize initialization of network topology service Initialization now occurs in the calling thread of control, rather than being deferred to the TIPC tasklet. With the current codebase, the deferral is no longer necessary. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-04-26 18:15:45 -04:00
Allan Stephens	eb3865a99d	tipc: Enhance re-initialization of network topology service Streamlines the job of re-initializing TIPC's network topology service when a node's network address is first assigned. Rather than destroying the topology server port and breaking its connections to existing subscribers, TIPC now simply lets the service continue running (since the change to the port identifier of each port used by the topology service no longer impacts the flow of messages between the service and its subscribers). This enhancement means that applications that utilize the topology service prior to the assignment of TIPC's network address no longer need to re-establish their subscriptions when the address is finally assigned. However, it is worth noting that any subsequent events for existing subscriptions report the new port identifier of the publishing port, rather than the original port identifier. (For example, a name that was previously reported as being published by <0.0.0:ref> may be subsequently withdrawn by <Z.C.N:ref>.) This doesn't impact any of the existing known userspace in tipc-utils, since (a) TIPC continues to treat references to the original port ID correctly and (b) normal use cases assign an address before active use. However if there does happen to be some rare/custom application out there that was relying on this, they can simply bypass the enhancement by issuing a subscription to {0,0} and break its connection to the topology service, if an associated withdrawal event occurs. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-04-26 18:15:40 -04:00
Allan Stephens	eb323b075a	tipc: Optimize termination of configuration service Termination no longer tests to see if the configuration service port was successfully created or not. In the unlikely event that the port was not created, attempting to delete the non-existent port is detected gracefully and causes no harm. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-04-26 17:19:43 -04:00
Allan Stephens	861d3a0e5b	tipc: Optimize initialization of configuration service Initialization now occurs in the calling thread of control, rather than being deferred to the TIPC tasklet. With the current codebase, the deferral is no longer necessary. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-04-26 17:19:42 -04:00
Allan Stephens	a2cfd45b52	tipc: Optimize re-initialization of configuration service Streamlines the job of re-initializing TIPC's configuration service when a node's network address is first assigned. Rather than destroying the configuration server port and then recreating it, TIPC now simply withdraws the existing {0,<0.0.0>} name publication and creates a new {0,<Z.C.N>} name publication that identifies the node's network address to interested subscribers. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-04-26 17:19:07 -04:00
David S. Miller	a85c9bb895	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next	2012-04-26 15:54:45 -04:00
John W. Linville	d9b8ae6bd8	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/net/wireless/iwlwifi/iwl-testmode.c	2012-04-26 15:03:48 -04:00
Pavel Emelyanov	de248a75c3	tcp repair: Fix unaligned access when repairing options (v2) Don't pick __u8/__u16 values directly from raw pointers, but instead use an array of structures of code:value pairs. This is OK, since the buffer we take options from is not an skb memory, but a user-to-kernel one. For those options which don't require any value now, require this to be zero (for potential future extension of this API). v2: Changed tcp_repair_opt to use two __u32-s as spotted by David Laight. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-26 06:13:51 -04:00
alex.bluesman.smirnov@gmail.com	2d319508a3	6lowpan: duplicate definition of IEEE802154_ALEN The same macros is defined in 'include/net/af_ieee802154.h' and is called IEEE802154_ADDR_LEN. No need another one, so remove it. Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-26 06:01:09 -04:00
alex.bluesman.smirnov@gmail.com	c2e94d73ea	6lowpan: move frame allocation code to a separate function Separate frame allocation routine from data processing function. This makes code more human readable and easier for understanding. Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-26 06:01:09 -04:00
Shan Wei	8dcf01fc00	net: sock_diag_handler structs can be const read only, so change it to const. Signed-off-by: Shan Wei <davidshan@tencent.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-25 20:46:59 -04:00
Eric Dumazet	808db80a7e	ipv6: call consume_skb() in frag/reassembly Some kfree_skb() calls should be replaced by consume_skb() to avoid drop_monitor/dropwatch false positives. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-25 20:39:46 -04:00
John Fastabend	081579840b	net: dcb: add CEE notify calls This adds code to trigger CEE events when an APP change or setall command is made from user space. This simplifies user space code significantly by creating a single interface to listen on that works with both firmware and userland agents. And if we end up with multiple agents this keeps every thing in sync userland agents, firmware agents, and kernel notifier consumers. For an example agent that listens for these events see: https://github.com/jrfastab/cgdcbxd cgdcbxd is a daemon used to monitor DCB netlink events and manage the net_prio control group sub-system. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Shmulik Ravid <shmulikr@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-25 19:47:17 -04:00
Andrei Emeltchenko	94c514fe24	mac80211: Adds clean sdata helper Adds hepler to clean sdata ieee80211_clean_sdata similar way as ieee80211_setup_sdata is implemented. The function will be used by other interfaces later. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2012-04-24 14:56:10 -04:00
Felix Fietkau	7e3ed02c6e	mac80211: fix num_mcast_sta counting issues Moving a STA to an AP VLAN prevents num_mcast_sta from being decremented once the STA leaves, because sta->sdata changes. Fix this by checking for AP VLANs as well. Also exclude 4-addr VLAN stations from num_mcast_sta - remote 4-addr stations ignore 3-address multicast frames anyway. In a typical bridge configuration they receive the same packets as 4-address unicast. This patch also fixes clearing the sdata->u.vlan.sta pointer when the STA is removed from a 4-addr VLAN. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2012-04-24 14:54:28 -04:00
Felix Fietkau	030ef8f8a5	mac80211: rename AP variable num_sta_authorized to num_mcast_sta It is only used to test for BSS multicast receivers. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2012-04-24 14:54:28 -04:00
Wey-Yi Guy	be6bcabc79	mac80211: check for non-managed interface Average beacon signal only keep tracked by managed interface, give warning and return 0 for the others. Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2012-04-24 14:54:27 -04:00
Paul Gortmaker	872f24dbc6	tipc: remove inline instances from C source files. Untie gcc's hands and let it do what it wants within the individual source files. There are two files, node.c and port.c -- only the latter effectively changes (gcc-4.5.2). Objdump shows gcc deciding to not inline port_peernode(). Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-24 00:41:03 -04:00
Eric Dumazet	bfb253c9b2	af_netlink: drop_monitor/dropwatch friendly Need to consume_skb() instead of kfree_skb() in netlink_dump() and netlink_unicast_kernel() to avoid false dropwatch positives. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-24 00:35:14 -04:00
Eric Dumazet	658cb354ed	af_netlink: cleanups netlink_destroy_callback() move to avoid forward reference CodingStyle cleanups Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-24 00:35:14 -04:00
Eric Dumazet	38ba0a65fa	net: skb_can_coalesce returns a boolean Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-24 00:18:02 -04:00
Eric Dumazet	783c175f90	tcp: tcp_try_coalesce returns a boolean This clarifies code intention, as suggested by David. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 23:36:58 -04:00
Eric Dumazet	d7ccf7c0a0	net: make spd_fill_page() linear argument a bool Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 23:35:04 -04:00
David S. Miller	f24001941c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Fix merge between commit `3adadc08cc` ("net ax25: Reorder ax25_exit to remove races") and commit `0ca7a4c87d` ("net ax25: Simplify and cleanup the ax25 sysctl handling") The former moved around the sysctl register/unregister calls, the later simply removed them. With help from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 23:15:17 -04:00
David S. Miller	a108d5f35a	net: Use bool and remove inline in skb_splice_bits() code. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 23:06:11 -04:00
Eric Dumazet	41c73a0d44	net: speedup skb_splice_bits() Commit `35f3d14db` (pipe: add support for shrinking and growing pipes) added a slowdown for splice(socket -> pipe), as we might grow the spd used in skb_splice_bits() for each skb we process in splice() syscall. Its not needed since skb lengths are capped. The default on-stack arrays are more than enough. Use MAX_SKB_FRAGS instead of PIPE_DEF_BUFFERS to describe the reasonable limit per skb. Add coalescing support to help splicing of GRO skbs built from linear skbs (linked into frag_list) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 23:01:35 -04:00
Eric Dumazet	1402d36601	tcp: introduce tcp_try_coalesce commit `c8628155ec` (tcp: reduce out_of_order memory use) took care of coalescing tcp segments provided by legacy devices (linear skbs) We extend this idea to fragged skbs, as their truesize can be heavy. ixgbe for example uses 256+1024+PAGE_SIZE/2 = 3328 bytes per segment. Use this coalescing strategy for receive queue too. This contributes to reduce number of tcp collapses, at minimal cost, and reduces memory overhead and packets drops. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 22:42:49 -04:00
Eric Dumazet	da882c1f2e	tcp: sk_add_backlog() is too agressive for TCP While investigating TCP performance problems on 10Gb+ links, we found a tcp sender was dropping lot of incoming ACKS because of sk_rcvbuf limit in sk_add_backlog(), especially if receiver doesnt use GRO/LRO and sends one ACK every two MSS segments. A sender usually tweaks sk_sndbuf, but sk_rcvbuf stays at its default value (87380), allowing a too small backlog. A TCP ACK, even being small, can consume nearly same truesize space than outgoing packets. Using sk_rcvbuf + sk_sndbuf as a limit makes sense and is fast to compute. Performance results on netperf, single flow, receiver with disabled GRO/LRO : 7500 Mbits instead of 6050 Mbits, no more TCPBacklogDrop increments at sender. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 22:28:28 -04:00
Eric Dumazet	f545a38f74	net: add a limit parameter to sk_add_backlog() sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the memory limit. We need to make this limit a parameter for TCP use. No functional change expected in this patch, all callers still using the old sk_rcvbuf limit. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 22:28:28 -04:00

1 2 3 4 5 ...

22885 Commits