OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Mike Stroyan	18955cfcb2	[IPV4] tcp/route: Another look at hash table sizes The tcp_ehash hash table gets too big on systems with really big memory. It is worse on systems with pages larger than 4KB. It wastes memory that could be better used. It also makes the netstat command slow because reading /proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table. The default value should not be larger for larger page sizes. It seems that the effect of page size is an unintended error dating back a long time. I also wonder if the default value really should be a larger fraction of memory for systems with more memory. While systems with really big ram can afford more space for hash tables, it is not clear to me that they benefit from increasing the allocation ratio for this table. The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and mm/page_alloc.c:alloc_large_system_hash. tcp_init calls alloc_large_system_hash passing parameters- bucketsize=sizeof(struct tcp_ehash_bucket) numentries=thash_entries scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT) limit=0 On i386, PAGE_SHIFT is 12 for a page size of 4K On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K The num_physpages test above makes the allocation take a larger fraction of the total memory on systems with larger memory. The threshold size for a i386 system is 512MB. For an ia64 system with 16KB pages the threshold is 2GB. For smaller memory systems- On i386, scale = (27 - 12) = 15 On ia64, scale = (27 - 14) = 13 For larger memory systems- On i386, scale = (25 - 12) = 13 On ia64, scale = (25 - 14) = 11 For the rest of this discussion, I'll just track the larger memory case. The default behavior has numentries=thash_entries=0, so the allocated size is determined by either scale or by the default limit of 1/16 of total memory. In alloc_large_system_hash- \| numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages; \| numentries += (1UL << (20 - PAGE_SHIFT)) - 1; \| numentries >>= 20 - PAGE_SHIFT; \| numentries <<= 20 - PAGE_SHIFT; At this point, numentries is pages for all of memory, rounded up to the nearest megabyte boundary. \| /* limit to 1 bucket per 2^scale bytes of low memory */ \| if (scale > PAGE_SHIFT) \| numentries >>= (scale - PAGE_SHIFT); \| else \| numentries <<= (PAGE_SHIFT - scale); On i386, numentries >>= (13 - 12), so numentries is 1/8196 of bytes of total memory. On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of bytes of total memory. \| log2qty = long_log2(numentries); \| \| do { \| size = bucketsize << log2qty; bucketsize is 16, so size is 16 times numentries, rounded down to a power of two. On i386, size is 1/512 of bytes of total memory. On ia64, size is 1/128 of bytes of total memory. For smaller systems the results are On i386, size is 1/2048 of bytes of total memory. On ia64, size is 1/512 of bytes of total memory. The large page effect can be removed by just replacing the use of PAGE_SHIFT with a constant of 12 in the calls to alloc_large_system_hash. That makes them more like the other uses of that function from fs/inode.c and fs/dcache.c Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-29 16:12:55 -08:00
Herbert Xu	e5ed639913	[IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl The following patch renames __in_dev_get() to __in_dev_get_rtnl() and introduces __in_dev_get_rcu() to cover the second case. 1) RCU with refcnt should use in_dev_get(). 2) RCU without refcnt should use __in_dev_get_rcu(). 3) All others must hold RTNL and use __in_dev_get_rtnl(). There is one exception in net/ipv4/route.c which is in fact a pre-existing race condition. I've marked it as such so that we remember to fix it. This patch is based on suggestions and prior work by Suzanne Wood and Paul McKenney. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 14:35:55 -07:00
Julian Anastasov	ce723d8e04	[IPV4]: Fix refcount damaging in net/ipv4/route.c One such place that can damage the dst refcnts is route.c with CONFIG_IP_ROUTE_MULTIPATH_CACHED enabled, i don't see the user's .config. In this new code i see that rt_intern_hash is called before dst->refcnt is set to 1, dst is the 2nd arg to rt_intern_hash. Arg 2 of rt_intern_hash must come with refcnt 1 as it is added to table or dropped depending on error/add/update. One such example is ip_mkroute_input where __mkroute_input return rth with refcnt 0 which is provided to rt_intern_hash. ip_mkroute_output looks like a 2nd such place. Appending untested patch for comments and review. The idea is to put previous reference as we are going to return next result/error. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-08 13:34:47 -07:00
Arnaldo Carvalho de Melo	d8c97a9451	[NET]: Export symbols needed by the current DCCP code Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:49:35 -07:00
Adrian Bunk	0742fd53a3	[IPV4]: possible cleanups This patch contains the following possible cleanups: - make needlessly global code static - #if 0 the following unused global function: - xfrm4_state.c: xfrm4_state_fini - remove the following unneeded EXPORT_SYMBOL's: - ip_output.c: ip_finish_output - ip_output.c: sysctl_ip_default_ttl - fib_frontend.c: ip_dev_find - inetpeer.c: inet_peer_idlock - ip_options.c: ip_options_compile - ip_options.c: ip_options_undo - net/core/request_sock.c: sysctl_max_syn_backlog Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:33:20 -07:00
Olaf Kirch	0b7f22aab4	[IPV4]: Prevent oops when printing martian source In some cases, we may be generating packets with a source address that qualifies as martian. This can happen when we're in the middle of setting up the network, and netfilter decides to reject a packet with an RST. The IPv4 routing code would try to print a warning and oops, because locally generated packets do not have a valid skb->mac.raw pointer at this point. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-11 21:01:42 -07:00
Eric Dumazet	bb1d23b026	[IPV4]: Bug fix in rt_check_expire() - rt_check_expire() fixes (an overflow occured if size of the hash was >= 65536) reminder of the bugfix: The rt_check_expire() has a serious problem on machines with large route caches, and a standard HZ value of 1000. With default values, ie ip_rt_gc_interval = 60*HZ = 60000 ; the loop count : for (t = ip_rt_gc_interval << rt_hash_log; t >= 0; overflows (t is a 31 bit value) as soon rt_hash_log is >= 16 (65536 slots in route cache hash table). In this case, rt_check_expire() does nothing at all Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:00:32 -07:00
Eric Dumazet	424c4b70cc	[IPV4]: Use the fancy alloc_large_system_hash() function for route hash table - rt hash table allocated using alloc_large_system_hash() function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:58:19 -07:00
Eric Dumazet	22c047ccbc	[NET]: Hashed spinlocks in net/ipv4/route.c - Locking abstraction - Spinlocks moved out of rt hash table : Less memory (50%) used by rt hash table. it's a win even on UP. - Sizing of spinlocks table depends on NR_CPUS Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:55:24 -07:00
Dietmar Eggemann	2c2910a401	[IPV4]: Snmpv2 Mib IP counter ipInAddrErrors support I followed Thomas' proposal to see every martian destination as a case where the ipInAddrErrors counter has to be incremented. There are two advantages by doing so: (1) The relation between the ipInReceive counter and all the other ipInXXX counters is more accurate in the case the RTN_UNICAST code check fails and (2) it makes the code in ip_route_input_slow easier. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 13:06:23 -07:00
Chuck Short	7abaa27c1c	[IPV4]: Fix route.c gcc4 warnings Signed-off by: Chuck Short <zulcss@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:10:23 -07:00
Jamal Hadi Salim	b6544c0b4c	[NETLINK]: Correctly set NLM_F_MULTI without checking the pid This patch rectifies some rtnetlink message builders that derive the flags from the pid. It is now explicit like the other cases which get it right. Also fixes half a dozen dumpers which did not set NLM_F_MULTI at all. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-18 22:54:12 -07:00
Jesper Juhl	02c30a84e6	[PATCH] update Ross Biro bouncing email address Ross moved. Remove the bad email address so people will find the correct one in ./CREDITS. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-05-05 16:36:49 -07:00
Olaf Rempel	5bec0039f4	[NET]: /proc/net/stat/* header cleanup Signed-off-by: Olaf Rempel <razzor@kopf-tisch.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-04-28 12:16:08 -07:00
Dave Jones	7e3e0360b7	[IPV4]: Incorrect permissions on route flush sysctl This has been brought up before.. http://lkml.org/lkml/2000/1/21/116 but didnt seem to get resolved. This morning I got someone file a bugzilla about it breaking sysctl(8). Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-04-28 12:11:03 -07:00
Stephen Hemminger	9c2b3328f7	[NET]: skbuff: remove old NET_CALLER macro Here is a revised alternative that uses BUG_ON/WARN_ON (as suggested by Herbert Xu) to eliminate NET_CALLER. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-04-19 22:39:42 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

... 2 3 4 5 6

267 Commits