OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Li RongQing	b66c66dc5c	Documentation: fix neigh/default/gc_thresh1 default value. The default value is 128, not 256 #grep gc_thresh1 net/ -rI net/decnet/dn_neigh.c: .gc_thresh1 = 128, net/ipv6/ndisc.c: .gc_thresh1 = 128, net/ipv4/arp.c: .gc_thresh1 = 128, Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-15 09:12:23 -04:00
Nandita Dukkipati	6ba8a3b19e	tcp: Tail loss probe (TLP) This patch series implement the Tail loss probe (TLP) algorithm described in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The first patch implements the basic algorithm. TLP's goal is to reduce tail latency of short transactions. It achieves this by converting retransmission timeouts (RTOs) occuring due to tail losses (losses at end of transactions) into fast recovery. TLP transmits one packet in two round-trips when a connection is in Open state and isn't receiving any ACKs. The transmitted packet, aka loss probe, can be either new or a retransmission. When there is tail loss, the ACK from a loss probe triggers FACK/early-retransmit based fast recovery, thus avoiding a costly RTO. In the absence of loss, there is no change in the connection state. PTO stands for probe timeout. It is a timer event indicating that an ACK is overdue and triggers a loss probe packet. The PTO value is set to max(2SRTT, 10ms) and is adjusted to account for delayed ACK timer when there is only one oustanding packet. TLP Algorithm On transmission of new data in Open state: -> packets_out > 1: schedule PTO in max(2SRTT, 10ms). -> packets_out == 1: schedule PTO in max(2RTT, 1.5RTT + 200ms) -> PTO = min(PTO, RTO) Conditions for scheduling PTO: -> Connection is in Open state. -> Connection is either cwnd limited or no new data to send. -> Number of probes per tail loss episode is limited to one. -> Connection is SACK enabled. When PTO fires: new_segment_exists: -> transmit new segment. -> packets_out++. cwnd remains same. no_new_packet: -> retransmit the last segment. Its ACK triggers FACK or early retransmit based recovery. ACK path: -> rearm RTO at start of ACK processing. -> reschedule PTO if need be. In addition, the patch includes a small variation to the Early Retransmit (ER) algorithm, such that ER and TLP together can in principle recover any N-degree of tail loss through fast recovery. TLP is controlled by the same sysctl as ER, tcp_early_retrans sysctl. tcp_early_retrans==0; disables TLP and ER. ==1; enables RFC5827 ER. ==2; delayed ER. ==3; TLP and delayed ER. [DEFAULT] ==4; TLP only. The TLP patch series have been extensively tested on Google Web servers. It is most effective for short Web trasactions, where it reduced RTOs by 15% and improved HTTP response time (average by 6%, 99th percentile by 10%). The transmitted probes account for <0.5% of the overall transmissions. Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 08:30:34 -04:00
Stephen Hemminger	ca2eb5679f	tcp: remove Appropriate Byte Count support TCP Appropriate Byte Count was added by me, but later disabled. There is no point in maintaining it since it is a potential source of bugs and Linux already implements other better window protection heuristics. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-05 14:51:16 -05:00
YOSHIFUJI Hideaki / 吉藤英明	2724680bce	neigh: Keep neighbour cache entries if number of them is small enough. Since we have removed NCE (Neighbour Cache Entry) reference from routing entries, the only refcnt holders of an NCE are its timer (if running) and its owner table, in usual cases. As a result, neigh_periodic_work() purges NCEs over and over again even for gateways. It does not make sense to purge entries, if number of them is very small, so keep them. The minimum number of entries to keep is specified by gc_thresh1. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-22 14:25:28 -05:00
David S. Miller	4b87f92259	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: Documentation/networking/ip-sysctl.txt drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c Both conflicts were simply overlapping context. A build fix for qlcnic is in here too, simply removing the added devinit annotations which no longer exist. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-15 15:05:59 -05:00
Vijay Subramanian	3d55b32370	doc: Clarify behavior when sysctl tcp_ecn = 1 Recent commit (commit `7e3a2dc529` doc: make the description of how tcp_ecn works more explicit and clear ) clarified the behavior of tcp_ecn sysctl variable but description is inconsistent. When requested by incoming conections, ECN is enabled with not just tcp_ecn = 2 but also with tcp_ecn = 1. This patch makes it clear that with tcp_ecn = 1, ECN is enabled when requested by incoming connections. Also fix spelling of 'incoming'. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 14:21:16 -08:00
stephen hemminger	3b09adcb20	ip-sysctl: fix spelling errors Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 15:12:34 -08:00
Hannes Frederic Sowa	db2b620aa0	ipv6: document ndisc_notify in networking/ip-sysctl.txt I slipped in a new sysctl without proper documentation. I would like to make up for this now. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:35:38 -08:00
Rick Jones	d825da2ede	doc: Tighten-up and clarify description of tcp_fin_timeout The description for tcp_fin_timeout should be tigher and more clear. In addition to being tighter, we should make the spelling of the state name consistent with what utilities report, remove the now dated reference to 2.2 and put the default in the consistent place. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-10 17:14:28 -05:00
Shan Wei	5d248c491b	net: doc : use more suitable word 'unexpected' to replace 'secluded' 'secluded' is used to describe places, not suitable here. Suggested-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-07 14:31:07 -05:00
Shan Wei	cc86802805	net: doc: add default value for neighbour parameters Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-05 16:01:28 -05:00
Rick Jones	7e3a2dc529	doc: make the description of how tcp_ecn works more explicit and clear Make the description of how tcp_ecn works a bit more explicit and clear. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-29 13:14:58 -05:00
Neil Horman	3c68198e75	sctp: Make hmac algorithm selection for cookie generation dynamic Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to generate cookie values when establishing new connections via two build time config options. Theres no real reason to make this a static selection. We can add a sysctl that allows for the dynamic selection of these algorithms at run time, with the default value determined by the corresponding crypto library availability. This comes in handy when, for example running a system in FIPS mode, where use of md5 is disallowed, but SHA1 is permitted. Note: This new sysctl has no corresponding socket option to select the cookie hmac algorithm. I chose not to implement that intentionally, as RFC 6458 contains no option for this value, and I opted not to pollute the socket option namespace. Change notes: v2) * Updated subject to have the proper sctp prefix as per Dave M. * Replaced deafult selection options with new options that allow developers to explicitly select available hmac algs at build time as per suggestion by Vlad Y. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-26 02:22:18 -04:00
Jerry Chu	1046716368	tcp: TCP Fast Open Server - header & support functions This patch adds all the necessary data structure and support functions to implement TFO server side. It also documents a number of flags for the sysctl_tcp_fastopen knob, and adds a few Linux extension MIBs. In addition, it includes the following: 1. a new TCP_FASTOPEN socket option an application must call to supply a max backlog allowed in order to enable TFO on its listener. 2. A number of key data structures: "fastopen_rsk" in tcp_sock - for a big socket to access its request_sock for retransmission and ack processing purpose. It is non-NULL iff 3WHS not completed. "fastopenq" in request_sock_queue - points to a per Fast Open listener data structure "fastopen_queue" to keep track of qlen (# of outstanding Fast Open requests) and max_qlen, among other things. "listener" in tcp_request_sock - to point to the original listener for book-keeping purpose, i.e., to maintain qlen against max_qlen as part of defense against IP spoofing attack. 3. various data structure and functions, many in tcp_fastopen.c, to support server side Fast Open cookie operations, including /proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-31 20:02:18 -04:00
Alex Bergmann	6c9ff979d1	tcp: Increase timeout for SYN segments Commit `9ad7c049` ("tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side") changed the initRTO from 3secs to 1sec in accordance to RFC6298 (former RFC2988bis). This reduced the time till the last SYN retransmission packet gets sent from 93secs to 31secs. RFC1122 is stating that the retransmission should be done for at least 3 minutes, but this seems to be quite high. "However, the values of R1 and R2 may be different for SYN and data segments. In particular, R2 for a SYN segment MUST be set large enough to provide retransmission of the segment for at least 3 minutes. The application can close the connection (i.e., give up on the open attempt) sooner, of course." This patch increases the value of TCP_SYN_RETRIES to the value of 6, providing a retransmission window of 63secs. The comments for SYN and SYNACK retries have also been updated to describe the current settings. The same goes for the documentation file "Documentation/networking/ip-sysctl.txt". Signed-off-by: Alexander Bergmann <alex@linlab.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-31 15:42:10 -04:00
Eric Dumazet	0c7462a235	ipv4: remove rt_cache_rebuild_count After IP route cache removal, rt_cache_rebuild_count is no longer used. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-30 14:53:22 -07:00
Neil Horman	5aa93bcf66	sctp: Implement quick failover draft from tsvwg I've seen several attempts recently made to do quick failover of sctp transports by reducing various retransmit timers and counters. While its possible to implement a faster failover on multihomed sctp associations, its not particularly robust, in that it can lead to unneeded retransmits, as well as false connection failures due to intermittent latency on a network. Instead, lets implement the new ietf quick failover draft found here: http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 This will let the sctp stack identify transports that have had a small number of errors, and avoid using them quickly until their reliability can be re-established. I've tested this out on two virt guests connected via multiple isolated virt networks and believe its in compliance with the above draft and works well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: "David S. Miller" <davem@davemloft.net> CC: linux-sctp@vger.kernel.org CC: joe@perches.com Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:13:46 -07:00
Yuchung Cheng	67da22d23f	net-tcp: Fast Open client - cookie-less mode In trusted networks, e.g., intranet, data-center, the client does not need to use Fast Open cookie to mitigate DoS attacks. In cookie-less mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless of cookie availability. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 11:02:03 -07:00
Yuchung Cheng	cf60af03ca	net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2) and write(2). The application should replace connect() with it to send data in the opening SYN packet. For blocking socket, sendmsg() blocks until all the data are buffered locally and the handshake is completed like connect() call. It returns similar errno like connect() if the TCP handshake fails. For non-blocking socket, it returns the number of bytes queued (and transmitted in the SYN-data packet) if cookie is available. If cookie is not available, it transmits a data-less SYN packet with Fast Open cookie request option and returns -EINPROGRESS like connect(). Using MSG_FASTOPEN on connecting or connected socket will result in simlar errno like repeating connect() calls. Therefore the application should only use this flag on new sockets. The buffer size of sendmsg() is independent of the MSS of the connection. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 11:02:03 -07:00
Eric Dumazet	282f23c6ee	tcp: implement RFC 5961 3.2 Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s \| grep TCPChallengeACK) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-17 01:36:20 -07:00
Eric Dumazet	46d3ceabd8	tcp: TCP Small Queues This introduce TSQ (TCP Small Queues) TSQ goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. sk->sk_wmem_alloc not allowed to grow above a given limit, allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a given time. TSO packets are sized/capped to half the limit, so that we have two TSO packets in flight, allowing better bandwidth use. As a side effect, setting the limit to 40000 automatically reduces the standard gso max limit (65536) to 40000/2 : It can help to reduce latencies of high prio packets, having smaller TSO packets. This means we divert sock_wfree() to a tcp_wfree() handler, to queue/send following frames when skb_orphan() [2] is called for the already queued skbs. Results on my dev machines (tg3/ixgbe nics) are really impressive, using standard pfifo_fast, and with or without TSO/GSO. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) < 8ms on 100Mbit (instead of 132 ms) I no longer have 4 MBytes backlogged in qdisc by a single netperf session, and both side socket autotuning no longer use 4 Mbytes. As skb destructor cannot restart xmit itself ( as qdisc lock might be taken at this point ), we delegate the work to a tasklet. We use one tasklest per cpu for performance reasons. If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag. This flag is tested in a new protocol method called from release_sock(), to eventually send new segments. [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable [2] skb_orphan() is usually called at TX completion time, but some drivers call it in their start_xmit() handler. These drivers should at least use BQL, or else a single TCP session can still fill the whole NIC TX ring, since TSQ will have no effect. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Dave Taht <dave.taht@bufferbloat.net> Cc: Tom Herbert <therbert@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-11 18:12:59 -07:00
David S. Miller	c801e3cc19	ipv4: Clarify in docs that accept_local requires rp_filter. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-30 22:39:27 -07:00
Thomas Graf	d0daebc3d6	ipv4: Add interface option to enable routing of 127.0.0.0/8 Routing of 127/8 is tradtionally forbidden, we consider packets from that address block martian when routing and do not process corresponding ARP requests. This is a sane default but renders a huge address space practically unuseable. The RFC states that no address within the 127/8 block should ever appear on any network anywhere but it does not forbid the use of such addresses outside of the loopback device in particular. For example to address a pool of virtual guests behind a load balancer. This patch adds a new interface option 'route_localnet' enabling routing of the 127/8 address block and processing of ARP requests on a specific interface. Note that for the feature to work, the default local route covering 127/8 dev lo needs to be removed. Example: $ sysctl -w net.ipv4.conf.eth0.route_localnet=1 $ ip route del 127.0.0.0/8 dev lo table local $ ip addr add 127.1.0.1/16 dev eth0 $ ip route flush cache V2: Fix invalid check to auto flush cache (thanks davem) Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-12 15:25:46 -07:00
Pablo Neira Ayuso	4981682cc1	netfilter: bridge: optionally set indev to vlan if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge netfilter removes the vlan header temporarily and then feeds the packet to ip(6)tables. When the new "bridge-nf-pass-vlan-input-device" sysctl is on (default off), then bridge netfilter will also set the in-interface to the vlan interface; if such an interface exists. This is needed to make iptables REDIRECT target work with "vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to match the vlan device name. Also update Documentation with current brnf default settings. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-05-08 19:36:47 +02:00
David S. Miller	0d6c4a2e46	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/intel/e1000e/param.c drivers/net/wireless/iwlwifi/iwl-agn-rx.c drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c drivers/net/wireless/iwlwifi/iwl-trans.h Resolved the iwlwifi conflict with mainline using 3-way diff posted by John Linville and Stephen Rothwell. In 'net' we added a bug fix to make iwlwifi report a more accurate skb->truesize but this conflicted with RX path changes that happened meanwhile in net-next. In e1000e a conflict arose in the validation code for settings of adapter->itr. 'net-next' had more sophisticated logic so that logic was used. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-07 23:35:40 -04:00
Eric Dumazet	b49960a05e	tcp: change tcp_adv_win_scale and tcp_rmem[2] tcp_adv_win_scale default value is 2, meaning we expect a good citizen skb to have skb->len / skb->truesize ratio of 75% (3/4) In 2.6 kernels we (mis)accounted for typical MSS=1460 frame : 1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392. So these skbs were considered as not bloated. With recent truesize fixes, a typical MSS=1460 frame truesize is now the more precise : 2048 + 256 = 2304. But 2304 * 3/4 = 1728. So these skb are not good citizen anymore, because 1460 < 1728 (GRO can escape this problem because it build skbs with a too low truesize.) This also means tcp advertises a too optimistic window for a given allocated rcvspace : When receiving frames, sk_rmem_alloc can hit sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often, especially when application is slow to drain its receive queue or in case of losses (netperf is fast, scp is slow). This is a major latency source. We should adjust the len/truesize ratio to 50% instead of 75% This patch : 1) changes tcp_adv_win_scale default to 1 instead of 2 2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account better truesize tracking and to allow autotuning tcp receive window to reach same value than before. Note that same amount of kernel memory is consumed compared to 2.6 kernels. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-02 21:08:58 -04:00
Yuchung Cheng	eed530b6c6	tcp: early retransmit This patch implements RFC 5827 early retransmit (ER) for TCP. It reduces DUPACK threshold (dupthresh) if outstanding packets are less than 4 to recover losses by fast recovery instead of timeout. While the algorithm is simple, small but frequent network reordering makes this feature dangerous: the connection repeatedly enter false recovery and degrade performance. Therefore we implement a mitigation suggested in the appendix of the RFC that delays entering fast recovery by a small interval, i.e., RTT/4. Currently ER is conservative and is disabled for the rest of the connection after the first reordering event. A large scale web server experiment on the performance impact of ER is summarized in section 6 of the paper "Proportional Rate Reduction for TCP”, IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf Note that Linux has a similar feature called THIN_DUPACK. The differences are THIN_DUPACK do not mitigate reorderings and is only used after slow start. Currently ER is disabled if THIN_DUPACK is enabled. I would be happy to merge THIN_DUPACK feature with ER if people think it's a good idea. ER is enabled by sysctl_tcp_early_retrans: 0: Disables ER 1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4. 2: (Default) reduce dupthresh like mode 1. In addition, delay entering fast recovery by RTT/4. Note: mode 2 is implemented in the third part of this patch series. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-02 20:56:10 -04:00
Shan Wei	c60f6aa8ac	net: doc: merge /proc/sys/net/core/* documents into one place All parameter descriptions in /proc/sys/net/core/* now is separated two places. So, merge them into Documentation/sysctl/net.txt. Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-27 00:09:26 -04:00
Fernando Luis Vazquez Cao	5d6bd8619d	TCP: update ip_local_port_range documentation The explanation of ip_local_port_range in Documentation/networking/ip-sysctl.txt contains several factual errors: - The default value of ip_local_port_range does not depend on the amount of memory available in the system. - tcp_tw_recycle is not enabled by default. - 1024-4999 is not the default value. - Etc. Clean up the mess. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-03 17:38:55 -04:00
David S. Miller	959327c784	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2011-12-06 21:10:05 -05:00
Peter Pan(潘卫平)	99b53bdd81	ipv4:correct description for tcp_max_syn_backlog Since commit c5ed63d66f24(tcp: fix three tcp sysctls tuning), sysctl_max_syn_backlog is determined by tcp_hashinfo->ehash_mask, and the minimal value is 128, and it will increase in proportion to the memory of machine. The original description for tcp_max_syn_backlog and sysctl_max_syn_backlog are out of date. Changelog: V2: update description for sysctl_max_syn_backlog Signed-off-by: Weiping Pan <panweiping3@gmail.com> Reviewed-by: Shan Wei <shanwei88@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-06 13:02:28 -05:00
Eric Dumazet	d8a6e65f8b	tcp: inherit listener congestion control for passive cnx Rick Jones reported that TCP_CONGESTION sockopt performed on a listener was ignored for its children sockets : right after accept() the congestion control for new socket is the system default one. This seems an oversight of the initial design (quoted from Stephen) Based on prior investigation and patch from Rick. Reported-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Yuchung Cheng <ycheng@google.com> Tested-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-30 16:55:26 -05:00
Eric Dumazet	8b5c171bb3	neigh: new unresolved queue limits Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit : > From: David Miller <davem@davemloft.net> > Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST) > > > From: Eric Dumazet <eric.dumazet@gmail.com> > > Date: Wed, 09 Nov 2011 12:14:09 +0100 > > > >> unres_qlen is the number of frames we are able to queue per unresolved > >> neighbour. Its default value (3) was never changed and is responsible > >> for strange drops, especially if IP fragments are used, or multiple > >> sessions start in parallel. Even a single tcp flow can hit this limit. > > ... > > > > Ok, I've applied this, let's see what happens :-) > > Early answer, build fails. > > Please test build this patch with DECNET enabled and resubmit. The > decnet neigh layer still refers to the removed ->queue_len member. > > Thanks. Ouch, this was fixed on one machine yesterday, but not the other one I used this morning, sorry. [PATCH V5 net-next] neigh: new unresolved queue limits unres_qlen is the number of frames we are able to queue per unresolved neighbour. Its default value (3) was never changed and is responsible for strange drops, especially if IP fragments are used, or multiple sessions start in parallel. Even a single tcp flow can hit this limit. $ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108 PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data. 8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-14 00:47:54 -05:00
Eric Dumazet	20db93c340	net: min_pmtu default is 552 Small fix in Documentation, since min_pmtu is 512 + 20 + 20 = 552 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-08 14:21:44 -05:00
David S. Miller	88c5100c28	Merge branch 'master' of github.com:davem330/net Conflicts: net/batman-adv/soft-interface.c	2011-10-07 13:38:43 -04:00
Roy.Li	605b91c8f6	net: Documentation: Fix type of variables Signed-off-by: Roy.Li <rongqing.li@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-29 14:57:19 -04:00
David S. Miller	8decf86879	Merge branch 'master' of github.com:davem330/net Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c	2011-09-22 03:23:13 -04:00
Tore Anderson	026359bc6e	ipv6: Send ICMPv6 RSes only when RAs are accepted This patch improves the logic determining when to send ICMPv6 Router Solicitations, so that they are 1) always sent when the kernel is accepting Router Advertisements, and 2) never sent when the kernel is not accepting RAs. In other words, the operational setting of the "accept_ra" sysctl is used. The change also makes the special "Hybrid Router" forwarding mode ("forwarding" sysctl set to 2) operate exactly the same as the standard Router mode (forwarding=1). The only difference between the two was that RSes was being sent in the Hybrid Router mode only. The sysctl documentation describing the special Hybrid Router mode has therefore been removed. Rationale for the change: Currently, the value of forwarding sysctl is the only thing determining whether or not to send RSes. If it has the value 0 or 2, they are sent, otherwise they are not. This leads to inconsistent behaviour in the following cases: * accept_ra=0, forwarding=0 * accept_ra=0, forwarding=2 * accept_ra=1, forwarding=2 * accept_ra=2, forwarding=1 In the first three cases, the kernel will send RSes, even though it will not accept any RAs received in reply. In the last case, it will not send any RSes, even though it will accept and process any RAs received. (Most routers will send unsolicited RAs periodically, so suppressing RSes in the last case will merely delay auto-configuration, not prevent it.) Also, it is my opinion that having the forwarding sysctl control RS sending behaviour (completely independent of whether RAs are being accepted or not) is simply not what most users would intuitively expect to be the case. Signed-off-by: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-16 19:14:41 -04:00
Geoffrey Thomas	d5c073caf0	net: Documentation: RFC 2553bis is now RFC 3493 Signed-off-by: Geoffrey Thomas <geofft@mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-22 11:28:57 -07:00
David S. Miller	6a7ebdf2fd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c	2011-07-14 07:56:40 -07:00
David S. Miller	06b8fc5d30	net: Fix default in docs for tcp_orphan_retries. Default should be listed at 8 instead of 7. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-08 09:31:31 -07:00
Max Matveev	6539fefd9b	Update documented default values for various TCP/UDP tunables tcp_rmem and tcp_wmem use 1 page as default value for the minimum amount of memory to be used, same as udp_wmem_min and udp_rmem_min. Pages are different size on different architectures - use the right units when describing the defaults. Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Max Matveev <makc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-04 19:14:09 -07:00
Max Matveev	a6e1204b82	Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables sctp does not use second and third ("default" and "max") values of sctp_rmem tunable. The format is the same as tcp_rmem but the meaning is different so make the documentation explicit to avoid confusion. sctp_wmem is not used at all. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Max Matveev <makc@redhat.com> Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-04 19:14:09 -07:00
Eric Dumazet	4b9d9be839	inetpeer: remove unused list Andi Kleen and Tim Chen reported huge contention on inetpeer unused_peers.lock, on memcached workload on a 40 core machine, with disabled route cache. It appears we constantly flip peers refcnt between 0 and 1 values, and we must insert/remove peers from unused_peers.list, holding a contended spinlock. Remove this list completely and perform a garbage collection on-the-fly, at lookup time, using the expired nodes we met during the tree traversal. This removes a lot of code, makes locking more standard, and obsoletes two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also removes two pointers in inet_peer structure. There is still a false sharing effect because refcnt is in first cache line of object [were the links and keys used by lookups are located], we might move it at the end of inet_peer structure to let this first cache line mostly read by cpus. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <andi@firstfloor.org> CC: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-08 17:05:30 -07:00
Ilpo Järvinen	81146ec1b8	tcp: document tcp_max_ssthresh (Limited Slow-Start) Base on Ilpo's patch about documenting tcp_max_ssthresh. (see http://marc.info/?l=linux-netdev&m=117950581307310&w=2) According to errata of RFC3742, fix the number of segments increased during RTT time. Just to state the occasion to use this parameter, But about how to set parameter value, maybe some others can do it. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-20 11:10:14 -08:00
Peter Chubb	34a6ef381d	tcp_ecn is an integer not a boolean There was some confusion at LCA as to why the sysctl tcp_ecn took one of three values when it was documented as a Boolean. This patch fixes the documentation. Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-02 15:39:58 -08:00
Eric Dumazet	cc6f02dd49	net: change ip_default_ttl documentation Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-13 12:50:49 -08:00
David S. Miller	fe6c791570	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath9k/ar9003_eeprom.c net/llc/af_llc.c	2010-12-08 13:47:38 -08:00
Alexey Dobriyan	0147fc058d	tcp: restrict net.ipv4.tcp_adv_win_scale (#20312 ) tcp_win_from_space() does the following: if (sysctl_tcp_adv_win_scale <= 0) return space >> (-sysctl_tcp_adv_win_scale); else return space - (space >> sysctl_tcp_adv_win_scale); "space" is int. As per C99 6.5.7 (3) shifting int for 32 or more bits is undefined behaviour. Indeed, if sysctl_tcp_adv_win_scale is exactly 32, space >> 32 equals space and function returns 0. Which means we busyloop in tcp_fixup_rcvbuf(). Restrict net.ipv4.tcp_adv_win_scale to [-31, 31]. Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312 Steps to reproduce: echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale wget www.kernel.org [softlockup] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 10:39:45 -08:00
Jeremy Eder	d67ef35fff	clarify documentation for net.ipv4.igmp_max_memberships This patch helps clarify documentation for net.ipv4.igmp_max_memberships by providing a formula for calculating the maximum number of multicast groups that can be subscribed to, plus defining the theoretical limit. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Jeremy Eder <jeder@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-17 11:22:06 -08:00
Ben Greear	cbaf087a9f	docs: Add neigh/gc_thresh3 and route/max_size documentation. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-12 14:03:20 -08:00
Thomas Graf	ae8abfa00e	ipv6: Update ip-sysctl.txt documentation for recent changes to accept_ra and forwarding Documentation for recent changes to the tunables accept_ra and forwarding. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-03 09:43:15 -07:00
Ian Campbell	3f8dc2362f	arp_notify: document that a gratuitous ARP request is sent when this option is enabled This option causes a gratuitous ARP request, not a reply as the documentation currently suggests. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-31 00:24:55 -07:00
Amerigo Wang	e3826f1e94	net: reserve ports for applications using fixed port numbers (Dropped the infiniband part, because Tetsuo modified the related code, I will send a separate patch for it once this is accepted.) This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which allows users to reserve ports for third-party applications. The reserved ports will not be used by automatic port assignments (e.g. when calling connect() or bind() with port number 0). Explicit port allocation behavior is unchanged. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-15 23:28:40 -07:00
David S. Miller	0448873480	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-02-25 23:22:42 -08:00
Brian Haley	e79dc48431	IPv6: better document max_addresses parameter Andrew Morton wrote: >> >From ip-sysctl.txt file in kernel documentation I can see following description >> for max_addresses: >> max_addresses - INTEGER >> Number of maximum addresses per interface. 0 disables limitation. >> It is recommended not set too large value (or 0) because it would >> be too easy way to crash kernel to allow to create too much of >> autoconfigured addresses. ^^^^^^^^^^^^^^ >> If this parameter applies only for auto-configured IP addressed, please state >> it more clearly in docs or rename the parameter to show that it refers to >> auto-configuration. It did mention autoconfigured in the text, but the below makes it more obvious. More clearly document IPv6 max_addresses parameter. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-23 01:25:00 -08:00
Andreas Petlund	7e38017557	net: TCP thin dupack This patch enables fast retransmissions after one dupACK for TCP if the stream is identified as thin. This will reduce latencies for thin streams that are not able to trigger fast retransmissions due to high packet interarrival time. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin. Signed-off-by: Andreas Petlund <apetlund@simula.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-18 15:43:09 -08:00
Andreas Petlund	36e31b0af5	net: TCP thin linear timeouts This patch will make TCP use only linear timeouts if the stream is thin. This will help to avoid the very high latencies that thin stream suffer because of exponential backoff. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin. A maximum of 6 linear timeouts is tried before exponential backoff is resumed. Signed-off-by: Andreas Petlund <apetlund@simula.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-18 15:43:08 -08:00
Octavian Purdila	6d955180b2	ipv4: allow warming up the ARP cache with request type gratuitous ARP If the per device ARP_ACCEPT option is enable, currently we only allow creating new ARP cache entries for response type gratuitous ARP. Allowing gratuitous ARP to create new ARP entries (not only to update existing ones) is useful when we want to avoid unnecessary delays for the first packet of a stream. This patch allows request type gratuitous ARP to create new ARP cache entries as well. This is useful when we want to populate the ARP cache entries for a large number of hosts on the same LAN. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-19 02:12:34 -08:00
Jesper Dangaard Brouer	65324144b5	net: RFC3069, private VLAN proxy arp support This is to be used together with switch technologies, like RFC3069, that where the individual ports are not allowed to communicate with each other, but they are allowed to talk to the upstream router. As described in RFC 3069, it is possible to allow these hosts to communicate through the upstream router by proxy_arp'ing. This patch basically allow proxy arp replies back to the same interface (from which the ARP request/solicitation was received). Tunable per device via proc "proxy_arp_pvlan": /proc/sys/net/ipv4/conf/*/proxy_arp_pvlan This switch technology is known by different vendor names: - In RFC 3069 it is called VLAN Aggregation. - Cisco and Allied Telesyn call it Private VLAN. - Hewlett-Packard call it Source-Port filtering or port-isolation. - Ericsson call it MAC-Forced Forwarding (RFC Draft). Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-07 00:59:09 -08:00
Patrick McHardy	8153a10c08	ipv4 05/05: add sysctl to accept packets with local source addresses commit 8ec1e0ebe26087bfc5c0394ada5feb5758014fc8 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:16:35 2009 +0100 ipv4: add sysctl to accept packets with local source addresses Change fib_validate_source() to accept packets with a local source address when the "accept_local" sysctl is set for the incoming inet device. Combined with the previous patches, this allows to communicate between multiple local interfaces over the wire. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-12-03 12:14:38 -08:00
William Allen Simpson	519855c508	TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS Define sysctl (tcp_cookie_size) to turn on and off the cookie option default globally, instead of a compiled configuration option. Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant data values, retrieving variable cookie values, and other facilities. Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h, near its corresponding struct tcp_options_received (prior to changes). This is a straightforward re-implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 These functions will also be used in subsequent patches that implement additional features. Requires: net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>	2009-12-02 22:07:24 -08:00
David S. Miller	e00484023e	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2009-12-02 22:00:34 -08:00
Shan Wei	1f5865e73f	ip: update the description of rp_filter in ip-sysctl.txt The commit `27fed4175a` (ip: fix logic of reverse path filter sysctl) has changed the logic of rp_filter. The document about rp_filter is out of date. Now, setting conf/all/rp_filte with 0 can also enable source validation. Update the document according to the commit. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-12-02 15:39:04 -08:00
Octavian Purdila	f7734fdf61	make TLLAO option for NA packets configurable On Friday 02 October 2009 20:53:51 you wrote: > This is good although I would have shortened the name. Ah, I knew I forgot something :) Here is v4. tavi >From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001 From: Octavian Purdila <opurdila@ixiacom.com> Date: Fri, 2 Oct 2009 00:51:15 +0300 Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs Neighbor advertisements responding to unicast neighbor solicitations did not include the target link-layer address option. This patch adds a new sysctl option (disabled by default) which controls whether this option should be sent even with unicast NAs. The need for this arose because certain routers expect the TLLAO in some situations even as a response to unicast NS packets. Moreover, RFC 2461 recommends sending this to avoid a race condition (section 4.4, Target link-layer address) Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-07 01:10:45 -07:00
Bhaskar Dutta	723884339f	sctp: Sysctl configuration for IPv4 Address Scoping This patch introduces a new sysctl option to make IPv4 Address Scoping configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>. In networking environments where DNAT rules in iptables prerouting chains convert destination IP's to link-local/private IP addresses, SCTP connections fail to establish as the INIT chunk is dropped by the kernel due to address scope match failure. For example to support overlapping IP addresses (same IP address with different vlan id) a Layer-5 application listens on link local IP's, and there is a DNAT rule that maps the destination IP to a link local IP. Such applications never get the SCTP INIT if the address-scoping draft is strictly followed. This sysctl configuration allows SCTP to function in such unconventional networking environments. Sysctl options: 0 - Disable IPv4 address scoping draft altogether 1 - Enable IPv4 address scoping (default, current behavior) 2 - Enable address scoping but allow IPv4 private addresses in init/init-ack 3 - Enable address scoping but allow IPv4 link local address in init/init-ack Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-09-04 18:21:01 -04:00
Damian Lukowski	5d7892298a	RTO connection timeout: sysctl documentation update This patch updates the sysctl documentation concerning the interpretation of tcp_retries{1,2} and tcp_orphan_retries. Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-01 17:40:50 -07:00
Brian Haley	56d417b12e	IPv6: Add 'autoconf' and 'disable_ipv6' module parameters Add 'autoconf' and 'disable_ipv6' parameters to the IPv6 module. The first controls if IPv6 addresses are autoconfigured from prefixes received in Router Advertisements. The IPv6 loopback (::1) and link-local addresses are still configured. The second controls if IPv6 addresses are desired at all. No IPv6 addresses will be added to any interfaces. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-01 03:07:33 -07:00
David S. Miller	bb803cfbec	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/scsi/fcoe/fcoe.c	2009-05-18 21:08:20 -07:00
Wang Tinggong	705efc3b03	Doc: fixed descriptions on /proc/sys/net/core/* and /proc/sys/net/unix/* Signed-off-by: Wang Tinggong <wangtinggong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-17 21:19:31 -07:00
Ilpo Järvinen	255cac91c3	tcp: extend ECN sysctl to allow server-side only ECN This should be very safe compared with full enabled, so I see no reason why it shouldn't be done right away. As ECN can only be negotiated if the SYN sending party is also supporting it, somebody in the loop probably knows what he/she is doing. If SYN does not ask for ECN, the server side SYN-ACK is identical to what it is without ECN. Thus it's quite safe. The chosen value is safe w.r.t to existing configs which choose to currently set manually either 0 or 1 but silently upgrades those who have not explicitly requested ECN off. Whether to just enable both sides comes up time to time but unless that gets done now we can at least make the servers aware of ECN already. As there are some known problems to occur if ECN is enabled, it's currently questionable whether there's any real gain from enabling clients as servers mostly won't support it anyway (so we'd hit just the negative sides). After enabling the servers and getting that deployed, the client end enable really has some potential gain too. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-04 11:07:36 -07:00
Brian Haley	9bdd8d40c8	ipv6: Fix incorrect disable_ipv6 behavior Fix the behavior of allowing both sysctl and addrconf_dad_failure() to set the disable_ipv6 parameter without any bad side-effects. If DAD fails and accept_dad > 1, we will still set disable_ipv6=1, but then instead of allowing an RA to add an address then immediately fail DAD, we simply don't allow the address to be added in the first place. This also lets the user set this flag and disable all IPv6 addresses on the interface, or on the entire system. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-18 18:22:48 -07:00
Jesper Dangaard Brouer	e18f5feb0c	Doc: Cleanup whitespaces in ip-sysctl.txt Fix up whitespaces while going though ip-sysctl.txt anyway. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-24 03:47:41 -08:00
Jesper Dangaard Brouer	bf869c3062	Doc: Fix typos in ip-sysctl.txt about rp_filter. First fix a typo in Stephens patch ;-) Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-24 03:47:41 -08:00
Stephen Hemminger	c1cf8422f0	ip: add loose reverse path filtering Extend existing reverse path filter option to allow strict or loose filtering. (See http://en.wikipedia.org/wiki/Reverse_path_filtering). For compatibility with existing usage, the value 1 is chosen for strict mode and 2 for loose mode. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-22 19:54:45 -08:00
Stephen Hemminger	eefef1cf76	net: add ARP notify option for devices This adds another inet device option to enable gratuitous ARP when device is brought up or address change. This is handy for clusters or virtualization. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-01 01:04:33 -08:00
Neil Horman	1080d709fb	net: implement emergency route cache rebulds when gc_elasticity is exceeded This is a patch to provide on demand route cache rebuilding. Currently, our route cache is rebulid periodically regardless of need. This introduced unneeded periodic latency. This patch offers a better approach. Using code provided by Eric Dumazet, we compute the standard deviation of the average hash bucket chain length while running rt_check_expire. Should any given chain length grow to larger that average plus 4 standard deviations, we trigger an emergency hash table rebuild for that net namespace. This allows for the common case in which chains are well behaved and do not grow unevenly to not incur any latency at all, while those systems (which may be being maliciously attacked), only rebuild when the attack is detected. This patch take 2 other factors into account: 1) chains with multiple entries that differ by attributes that do not affect the hash value are only counted once, so as not to unduly bias system to rebuilding if features like QOS are heavily used 2) if rebuilding crosses a certain threshold (which is adjustable via the added sysctl in this patch), route caching is disabled entirely for that net namespace, since constant rebuilding is less efficient that no caching at all Tested successfully by me. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-27 17:06:14 -07:00
David S. Miller	2aec609fb4	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/netfilter/nf_conntrack_proto_tcp.c	2008-07-14 20:23:54 -07:00
Stephen Hemminger	4edc2f3416	ip: sysctl documentation cleanup Reduced version of the spelling cleanup patch. Take out the confusing language in tcp_frto, and organize the undocumented values. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-10 16:50:26 -07:00
J. Bruce Fields	53025f5efd	Documentation: clarify tcp_{r,w}mem sysctl docs Fix some of the defaults and attempt to clarify some language. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-10 16:47:41 -07:00
Vlad Yasevich	32e8d4948b	sctp: Add documentation for sctp sysctl variable Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 16:43:29 -07:00
David S. Miller	ea2aca084b	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wan/hdlc_fr.c drivers/net/wireless/iwlwifi/iwl-4965.c drivers/net/wireless/iwlwifi/iwl3945-base.c	2008-07-05 23:08:07 -07:00
YOSHIFUJI Hideaki	1b34be74cb	ipv6 addrconf: add accept_dad sysctl to control DAD operation. - If 0, disable DAD. - If 1, perform DAD (default). - If >1, perform DAD and disable IPv6 operation if DAD for MAC-based link-local address has been failed (RFC4862 5.4.5). We do not follow RFC4862 by default. Refer to the netdev thread entitled "Linux IPv6 DAD not full conform to RFC 4862 ?" http://www.spinics.net/lists/netdev/msg52027.html Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-07-03 17:51:56 +09:00
YOSHIFUJI Hideaki	778d80be52	ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-07-03 17:51:55 +09:00
Stephen Hemminger	6dbf4bcac9	icmp: fix units for ratelimit Convert the sysctl values for icmp ratelimit to use milliseconds instead of jiffies which is based on kernel configured HZ. Internal kernel jiffies are not a proper unit for any userspace API. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-01 19:29:07 -07:00
Stephen Hemminger	77a538d5aa	ipv4: fix sysctl documentation of time related values These sysctl values are time related and all use the same routine (proc_dointvec_jiffies) that internally converts from seconds to jiffies. The code is fine, the documentation is just wrong. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-01 17:22:48 -07:00
Hideo Aoki	95766fff6b	[UDP]: Add memory accounting. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:19 -08:00
Ryousei Takano	564262c1f0	[TCP]: Fix inconsistency of terms. Fix inconsistency of terms: 1) D-SACK 2) F-RTO Signed-off-by: Ryousei Takano <takano-ryousei@aist.go.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-25 23:03:52 -07:00
Simon Arlott	0f035b8e84	spelling fixes: Documentation/ Spelling fixes in Documentation/. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 01:30:25 +02:00
Ilpo Järvinen	cd99889c61	[TCP] FRTO: Update sysctl documentation Since the SACK enhanced FRTO was added, the code has been under test numerous times so remove "experimental" claim from the documentation. Also be a bit more verbose about the usage. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:11 -07:00
Linus Torvalds	e030dbf91a	Merge branch 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop * 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop: (28 commits) ioatdma: add the unisys "i/oat" pci vendor/device id ARM: Add drivers/dma to arch/arm/Kconfig iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver iop13xx: surface the iop13xx adma units to the iop-adma driver dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines md: remove raid5 compute_block and compute_parity5 md: handle_stripe5 - request io processing in raid5_run_ops md: handle_stripe5 - add request/completion logic for async expand ops md: handle_stripe5 - add request/completion logic for async read ops md: handle_stripe5 - add request/completion logic for async check ops md: handle_stripe5 - add request/completion logic for async compute ops md: handle_stripe5 - add request/completion logic for async write ops md: common infrastructure for running operations with raid5_run_ops md: raid5_run_ops - run stripe operations outside sh->lock raid5: replace custom debug PRINTKs with standard pr_debug raid5: refactor handle_stripe5 and handle_stripe6 (v3) async_tx: add the async_tx api xor: make 'xor_blocks' a library routine for use with async_tx dmaengine: make clients responsible for managing channels dmaengine: refactor dmaengine around dma_async_tx_descriptor ...	2007-07-13 10:52:27 -07:00
Chris Leech	72d0b7a81d	I/OAT: Add documentation for the tcp_dma_copybreak sysctl Signed-off-by: Chris Leech <christopher.leech@intel.com>	2007-07-11 15:39:05 -07:00
YOSHIFUJI Hideaki	bb4dbf9e61	[IPV6]: Do not send RH0 anymore. Based on <draft-ietf-ipv6-deprecate-rh0-00.txt>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:55:49 -07:00
Michael Milner	516299d2f5	[NETFILTER]: bridge-nf: filter bridged IPv4/IPv6 encapsulated in pppoe traffic The attached patch by Michael Milner adds support for using iptables and ip6tables on bridged traffic encapsulated in ppoe frames, similar to what's already supported for vlan. Signed-off-by: Michael Milner <milner@blissisland.ca> Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:57 -07:00
Ilpo Järvinen	89808060b7	[TCP] Sysctl documentation: tcp_frto_response In addition, fixed minor things in tcp_frto sysctl. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:24 -07:00
Ilpo Järvinen	127af0c44f	[TCP] FRTO: Sysctl documentation for SACK enhanced version The description is overly verbose to avoid ambiguity between "SACK enabled" and "SACK enhanced FRTO" Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:17 -07:00
YOSHIFUJI Hideaki	0bcbc92629	[IPV6]: Disallow RH0 by default. A security issue is emerging. Disallow Routing Header Type 0 by default as we have been doing for IPv4. Note: We allow RH2 by default because it is harmless. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-24 14:58:30 -07:00
John Heffner	71599cd1c3	[TCP]: Document several sysctls. This adds documentation for tcp_moderate_rcvbuf, tcp_no_metrics_save, tcp_base_mss, and tcp_mtu_probing. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-28 09:42:15 -08:00
Stephen Hemminger	ef56e622c6	[NET] ip-sysctl.txt: Alphabetize. Rearrange TCP entries in alpha order. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:51 -08:00
Stephen Hemminger	ce7bc3bf15	[TCP]: Restrict congestion control choices. Allow normal users to only choose among a restricted set of congestion control choices. The default is reno and what ever has been configured as default. But the policy can be changed by administrator at any time. For example, to allow any choice: cp /proc/sys/net/ipv4/tcp_available_congestion_control \ /proc/sys/net/ipv4/tcp_allowed_congestion_control Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:49 -08:00
Stephen Hemminger	3ff825b28d	[TCP]: Add tcp_available_congestion_control sysctl. Create /proc/sys/net/ipv4/tcp_available_congestion_control that reflects currently available TCP choices. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:48 -08:00
Matt LaPlante	d6bc8ac9e1	Fix typos in Documentation/: 'Q'-'R' This patch fixes typos in various Documentation txts. The patch addresses some words starting with the letters 'Q'-'R'. Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-10-03 22:54:15 +02:00
Matt LaPlante	2fe0ae78c6	Fix typos in Documentation/: 'H'-'M' This patch fixes typos in various Documentation txts. The patch addresses some words starting with the letters 'H'-'M'. Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-10-03 22:50:39 +02:00
YOSHIFUJI Hideaki	fbea49e1e2	[IPV6] NDISC: Add proxy_ndp sysctl. We do not always need proxy NDP functionality even we enable forwarding. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:20:25 -07:00
Paul Moore	8802f616f6	[NetLabel]: documentation Documentation for the NetLabel system, this includes a basic overview of how NetLabel works, how LSM developers can integrate it into their favorite LSM, as well as documentation on the CIPSO related sysctl variables. Also, due to the difficulty of finding expired IETF drafts, I am including the IETF CIPSO draft that is the basis of the NetLabel CIPSO implementation. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:31 -07:00
Stephen Hemminger	b3a8a40da5	[TCP]: Turn ABC off. Turn Appropriate Byte Count off by default because it unfairly penalizes applications that do small writes. Add better documentation to describe what it is so users will understand why they might want to turn it on. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-17 23:21:02 -07:00
Jan "Yenya" Kasprzak	fb33f82568	[NET]: Terminology in ip-sysctl.txt this minor patch fixes the description of net.ipv4.tcp_mem sysctl in ip-sysctl.txt - the headline names the values "min, pressure, max", while the description uses the "low, pressure, high" values. Both tcp_rmem and tcp_wmem descriptions use the "min, pressure, max" values, so I have changed the tcp_mem to match this and not vice versa. Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-17 16:29:50 -07:00
David S. Miller	35089bb203	[TCP]: Add tcp_slow_start_after_idle sysctl. A lot of people have asked for a way to disable tcp_cwnd_restart(), and it seems reasonable to add a sysctl to do that. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-17 21:30:53 -07:00
Rick Jones	15d99e02ba	[TCP]: sysctl to allow TCP window > 32767 sans wscale Back in the dark ages, we had to be conservative and only allow 15-bit window fields if the window scale option was not negotiated. Some ancient stacks used a signed 16-bit quantity for the window field of the TCP header and would get confused. Those days are long gone, so we can use the full 16-bits by default now. There is a sysctl added so that we can still interact with such old stacks Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:40:29 -08:00
Neil Horman	c1b1bce852	[IPV4] ARP: Documentation for new arp_accept sysctl variable. As John pointed out, I had not added documentation to describe the arp_accpet sysctl that I posted in my last patch. This patch adds that documentation. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:40:03 -08:00
YOSHIFUJI Hideaki	09c884d4c3	[IPV6]: ROUTE: Add accept_ra_rt_info_max_plen sysctl. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 17:07:03 -08:00
YOSHIFUJI Hideaki	52e1635631	[IPV6]: ROUTE: Add router_probe_interval sysctl. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 17:05:47 -08:00
YOSHIFUJI Hideaki	930d6ff2e2	[IPV6]: ROUTE: Add accept_ra_rtr_pref sysctl. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 17:05:30 -08:00
YOSHIFUJI Hideaki	c4fd30eb18	[IPV6]: ADDRCONF: Add accept_ra_pinfo sysctl. This controls whether we accept Prefix Information in RAs. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 16:55:26 -08:00
YOSHIFUJI Hideaki	65f5c7c114	[IPV6]: ROUTE: Add accept_ra_defrtr sysctl. This controls whether we accept default router information in RAs. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 16:55:08 -08:00
Horms	95f7daf1c0	[IPV4]: Document icmp_errors_use_inbound_ifaddr sysctl Taken largely from the commit of the patch that added this feature: `1c2fb7f93c` I'm not sure about the ordering of the options in sysctl.txt, so I took a wild guess about where it fits. Signed-Off-By: Horms <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-02 17:02:25 -08:00
Herbert Xu	89cee8b1cb	[IPV4]: Safer reassembly Another spin of Herbert Xu's "safer ip reassembly" patch for 2.6.16. (The original patch is here: http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2 and my only contribution is to have tested it.) This patch (optionally) does additional checks before accepting IP fragments, which can greatly reduce the possibility of reassembling fragments which originated from different IP datagrams. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arthur Kepner <akepner@sgi.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:31 -08:00
Stephen Hemminger	9772efb970	[TCP]: Appropriate Byte Count support This is an updated version of the RFC3465 ABC patch originally for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting bytes ack'd rather than packets when updating congestion control. The orignal ABC described in the RFC applied to a Reno style algorithm. For advanced congestion control there is little change after leaving slow start. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 17:09:53 -08:00
Akinobu Mita	e83b860539	[TCP]: fix tcp_tso_win_divisor documentation The default value for tcp_tso_win_divisor is 3. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-29 02:16:31 -02:00
David S. Miller	7ce312467e	[IPV4]: Update icmp sysctl docs and disable broadcast ECHO/TIMESTAMP by default It's not a good idea to be smurf'able by default. The few people who need this can turn it on. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 16:07:30 -07:00
Stephen Hemminger	9d7bcfc6b8	[TCP]: Update sysctl and congestion control documentation. Update the documentation to remove the old sysctl values and include the new congestion control infrastructure. Includes changes to tcp.txt by Ian McDonald. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:22:36 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

1 2 3 4 5

222 Commits