linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	e97e386b12	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: pack objects denser slub: Calculate min_objects based on number of processors. slub: Drop DEFAULT_MAX_ORDER / DEFAULT_MIN_OBJECTS slub: Simplify any_slab_object checks slub: Make the order configurable for each slab cache slub: Drop fallback to page allocator method slub: Fallback to minimal order during slab page allocation slub: Update statistics handling for variable order slabs slub: Add kmem_cache_order_objects struct slub: for_each_object must be passed the number of objects in a slab slub: Store max number of objects in the page struct. slub: Dump list of objects not freed on kmem_cache_close() slub: free_list() cleanup slub: improve kmem_cache_destroy() error message slob: fix bug - when slob allocates "struct kmem_cache", it does not force alignment.	2008-04-28 14:08:56 -07:00
Pekka Enberg	1b27d05b6e	mm: move cache_line_size() to <linux/cache.h> Not all architectures define cache_line_size() so as suggested by Andrew move the private implementations in mm/slab.c and mm/slob.c to <linux/cache.h>. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:19 -07:00
Mel Gorman	dd1a239f6f	mm: have zonelist contains structs with both a zone pointer and zone_idx Filtering zonelists requires very frequent use of zone_idx(). This is costly as it involves a lookup of another structure and a substraction operation. As the zone_idx is often required, it should be quickly accessible. The node idx could also be stored here if it was found that accessing zone->node is significant which may be the case on workloads where nodemasks are heavily used. This patch introduces a struct zoneref to store a zone pointer and a zone index. The zonelist then consists of an array of these struct zonerefs which are looked up as necessary. Helpers are given for accessing the zone index as well as the node index. [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers] [hugh@veritas.com: mm-have-zonelist: fix memcg ooms] [hugh@veritas.com: just return do_try_to_free_pages] [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Mel Gorman	54a6eb5c47	mm: use two zonelist that are filtered by GFP mask Currently a node has two sets of zonelists, one for each zone type in the system and a second set for GFP_THISNODE allocations. Based on the zones allowed by a gfp mask, one of these zonelists is selected. All of these zonelists consume memory and occupy cache lines. This patch replaces the multiple zonelists per-node with two zonelists. The first contains all populated zones in the system, ordered by distance, for fallback allocations when the target/preferred node has no free pages. The second contains all populated zones in the node suitable for GFP_THISNODE allocations. An iterator macro is introduced called for_each_zone_zonelist() that interates through each zone allowed by the GFP flags in the selected zonelist. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Mel Gorman	0e88460da6	mm: introduce node_zonelist() for accessing the zonelist for a GFP mask Introduce a node_zonelist() helper function. It is used to lookup the appropriate zonelist given a node and a GFP mask. The patch on its own is a cleanup but it helps clarify parts of the two-zonelist-per-node patchset. If necessary, it can be merged with the next patch in this set without problems. Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Christoph Lameter	c124f5b54f	slub: pack objects denser Since we now have more orders available use a denser packing. Increase slab order if more than 1/16th of a slab would be wasted. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:40 +03:00
Christoph Lameter	9b2cd506e5	slub: Calculate min_objects based on number of processors. The mininum objects per slab is calculated based on the number of processors that may come online. Processors min_objects --------------------------- 1 8 2 12 4 16 8 20 16 24 32 28 64 32 1024 48 4096 56 The higher the number of processors the large the order sizes used for various slab caches will become. This has been shown to address the performance issues in hackbench on 16p etc. The calculation is only performed if slub_min_objects is zero (default). If one specifies a slub_min_objects on boot then that setting is taken. As suggested by Zhang Yanmin's performance tests on 16-core Tigerton, use the formula '4 * (fls(nr_cpu_ids) + 1)': ./hackbench 100 process 2000: 1) 2.6.25-rc6slab: 23.5 seconds 2) 2.6.25-rc7SLUB+slub_min_objects=20: 31 seconds 3) 2.6.25-rc7SLUB+slub_min_objects=24: 23.5 seconds Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:40 +03:00
Christoph Lameter	114e9e89e6	slub: Drop DEFAULT_MAX_ORDER / DEFAULT_MIN_OBJECTS We can now fallback to order 0 slabs. So set the slub_max_order to PAGE_CACHE_ORDER_COSTLY but keep the slub_min_objects at 4. This will mostly preserve the orders used in 2.6.25. F.e. The 2k kmalloc slab will use order 1 allocs and the 4k kmalloc slab order 2. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:39 +03:00
Christoph Lameter	31d33baf36	slub: Simplify any_slab_object checks Since we now have total_objects counter per node use that to check for the presence of any objects. The loop over all cpu slabs is not that useful since any cpu slab would require an object allocation first. So drop that. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:18 +03:00
Christoph Lameter	06b285dc3d	slub: Make the order configurable for each slab cache Makes /sys/kernel/slab/<slabname>/order writable. The allocation order of a slab cache can then be changed dynamically during runtime. This can be used to override the objects per slabs value establisheed with the slub_min_objects setting that was manually specified or calculated on bootup. The changes of the slab order can occur while allocate_slab() runs. Allocate slab needs the order and the number of slab objects that are both changed by the change of order. Both are put into a single word (struct kmem_cache_order_objects). They can then be atomically updated and retrieved. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:18 +03:00
Christoph Lameter	319d1e2406	slub: Drop fallback to page allocator method There is now a generic method of falling back to a slab page of minimal order. No need anymore for the fallback to kmalloc_large(). Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:18 +03:00
Christoph Lameter	65c3376aac	slub: Fallback to minimal order during slab page allocation If any higher order allocation fails then fall back the smallest order necessary to contain at least one object. This enables fallback for all allocations to order 0 pages. The fallback will waste more memory (objects will not fit neatly) and the fallback slabs will be not as efficient as larger slabs since they contain less objects. Note that SLAB also depends on order 1 allocations for some slabs that waste too much memory if forced into PAGE_SIZE'd page. SLUB now can now deal with failing order 1 allocs which SLAB cannot do. Add a new field min that will contain the objects for the smallest possible order for a slab cache. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:18 +03:00
Christoph Lameter	205ab99dd1	slub: Update statistics handling for variable order slabs Change the statistics to consider that slabs of the same slabcache can have different number of objects in them since they may be of different order. Provide a new sysfs field total_objects which shows the total objects that the allocated slabs of a slabcache could hold. Add a max field that holds the largest slab order that was ever used for a slab cache. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:17 +03:00
Christoph Lameter	834f3d1192	slub: Add kmem_cache_order_objects struct Pack the order and the number of objects into a single word. This saves some memory in the kmem_cache_structure and more importantly allows us to fetch both values atomically. Later the slab orders become runtime configurable and we need to fetch these two items together in order to properly allocate a slab and initialize its objects. Fix the race by fetching the order and the number of objects in one word. [penberg@cs.helsinki.fi: fix memset() page order in new_slab()] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:17 +03:00
Christoph Lameter	224a88be40	slub: for_each_object must be passed the number of objects in a slab Pass the number of objects to the for_each_object macro. Most of these are debug related. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:17 +03:00
Christoph Lameter	39b264641a	slub: Store max number of objects in the page struct. Split the inuse field up to be able to store the number of objects in this page in the page struct as well. Necessary if we want to have pages of various orders for a slab. Also avoids touching struct kmem_cache cachelines in __slab_alloc(). Update diagnostic code to check the number of objects and make sure that the number of objects always stays within the bounds of a 16 bit unsigned integer. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:16 +03:00
Christoph Lameter	33b12c3813	slub: Dump list of objects not freed on kmem_cache_close() Dump a list of unfreed objects if a slab cache is closed but objects still remain. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:27:37 +03:00
Christoph Lameter	599870b175	slub: free_list() cleanup free_list looked a bit screwy so here is an attempt to clean it up. free_list is is only used for freeing partial lists. We do not need to return a parameter if we decrement nr_partial within the function which allows a simplification of the whole thing. The current version modifies nr_partial outside of the list_lock which is technically not correct. It was only ok because we should be the only user of this slab cache at this point. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:26:18 +03:00
Pekka Enberg	d629d81957	slub: improve kmem_cache_destroy() error message As pointed out by Ingo, the SLUB warning of calling kmem_cache_destroy() with cache that still has objects triggers in practice. So turn this WARN_ON() into a nice SLUB specific error message to avoid people confusing it to a SLUB bug. Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:26:06 +03:00
Christoph Lameter	3dc5063786	slab_err: Pass parameters correctly to slab_bug Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-23 12:47:48 -07:00
Christoph Lameter	0f389ec630	slub: No need for per node slab counters if !SLUB_DEBUG The per node counters are used mainly for showing data through the sysfs API. If that API is not compiled in then there is no point in keeping track of this data. Disable counters for the number of slabs and the number of total slabs if !SLUB_DEBUG. Incrementing the per node counters is also accessing a potentially contended cacheline so this could actually be a performance benefit to embedded systems. SLABINFO support is also affected. It now must depends on SLUB_DEBUG (which is on by default). Patch also avoids a check for a NULL kmem_cache_node pointer in new_slab() if the system is not compiled with NUMA support. [penberg@cs.helsinki.fi: fix oops and move ->nr_slabs into CONFIG_SLUB_DEBUG] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-14 18:53:02 +03:00
Christoph Lameter	49bd5221ce	slub: Move map/flag clearing to __free_slab __free_slab does some diagnostics. The resetting of mapcount etc in discard_slab() can interfere with debug processing. So move the reset immediately before the page is freed. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-14 18:52:18 +03:00
Christoph Lameter	50ef37b96c	slub: Fixes to per cpu stat output in sysfs Only output per cpu stats if the kernel is build for SMP. Use a capital "C" as a leading character for the processor number (same as the numa statistics that also use a capital letter "N"). Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-14 18:52:05 +03:00
Christoph Lameter	5b06c853ad	slub: Deal with config variable dependencies count_partial() is used by both slabinfo and the sysfs proc support. Move the function directly before the beginning of the sysfs code so that it can be easily found. Rework the preprocessor conditional to take into account that slub sysfs support depends on CONFIG_SYSFS and CONFIG_SLUB_DEBUG. Make CONFIG_SLUB_STATS depend on CONFIG_SLUB_DEBUG and CONFIG_SYSFS. There is no point of keeping statistics if no one can restrive them. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-14 18:51:34 +03:00
Christoph Lameter	4097d60175	slub: Reduce #ifdef ZONE_DMA by moving kmalloc_caches_dma near dma logic Move the definition of kmalloc_caches_dma() into a later #ifdef CONFIG_ZONE_DMA. This saves one #ifdef and leaves us with a total of two #ifdefs for dma slab support. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-14 18:51:18 +03:00
Pekka Enberg	62f75532b5	slub: Initialize per-cpu stats As spotted by kmemcheck, we need to initialize the per-CPU ->stat array before using it. [kmem_cache_cpu structures are usually allocated from arrays defined via DEFINE_PER_CPU that are zeroed so we have not noticed this so far --cl]. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-14 18:50:44 +03:00
Christoph Lameter	00460dd5f4	Fix undefined count_partial if !CONFIG_SLABINFO Small typo in the patch recently merged to avoid the unused symbol message for count_partial(). Discussion thread with confirmation of fix at http://marc.info/?t=120696854400001&r=1&w=2 Typo in the check if we need the count_partial function that was introduced by `53625b4204` Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-01 12:44:06 -07:00
Linus Torvalds	e72e9c23ee	Revert "SLUB: remove useless masking of GFP_ZERO" This reverts commit `3811dbf671`. The masking was not at all useless, and it was sensible. We handle GFP_ZERO in the caller, and passing it down to any page allocator logic is buggy and wrong. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-27 20:56:33 -07:00
Christoph Lameter	53625b4204	count_partial() is not used if !SLUB_DEBUG and !CONFIG_SLABINFO Avoid warnings about unused functions if neither SLUB_DEBUG nor CONFIG_SLABINFO is defined. This patch will be reversed when slab defrag is merged since slab defrag requires count_partial() to determine the fragmentation status of slab caches. Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-26 10:42:28 -07:00
Christoph Lameter	caeab084de	slub page alloc fallback: Enable interrupts for GFP_WAIT. The fallback path needs to enable interrupts like done for the other page allocator calls. This was not necessary with the alternate fast path since we handled irq enable/disable in the slow path. The regular fastpath handles irq enable/disable around calls to the slow path so we need to restore the proper status before calling the page allocator from the slowpath. Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-17 11:14:17 -07:00
Nick Piggin	b621038678	slub: Do not cross cacheline boundaries for very small objects SLUB should pack even small objects nicely into cachelines if that is what has been asked for. Use the same algorithm as SLAB for this. The effect of this patch for a system with a cacheline size of 64 bytes is that the 24 byte sized slab caches will now put exactly 2 objects into a cacheline instead of 3 with some overlap into the next cacheline. This reduces the object density in a 4k slab from 170 to 128 objects (same as SLAB). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-06 16:21:50 -08:00
Christoph Lameter	b773ad7369	slub statistics: Fix check for DEACTIVATE_REMOTE_FREES The remote frees are in the freelist of the page and not in the percpu freelist. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-06 16:21:49 -08:00
Cyrill Gorcunov	62e5c4b4d6	slub: fix possible NULL pointer dereference This patch fix possible NULL pointer dereference if kzalloc failed. To be able to return proper error code the function return type is changed to ssize_t (according to callees and sysfs definitions). Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-03 12:22:32 -08:00
Christoph Lameter	f619cfe1bd	slub: Add kmalloc_large_node() to support kmalloc_node fallback Slub is missing some NUMA support for large kmallocs. Provide that. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-03 12:22:32 -08:00
Pekka J Enberg	7693143481	slub: look up object from the freelist once We only need to look up object from c->page->freelist once in __slab_alloc(). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-03 12:22:32 -08:00
Christoph Lameter	6446faa2ff	slub: Fix up comments Provide comments and fix up various spelling / style issues. Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-03 12:22:32 -08:00
Christoph Lameter	d8b42bf54b	slub: Rearrange #ifdef CONFIG_SLUB_DEBUG in calculate_sizes() Group SLUB_DEBUG code together to reduce the number of #ifdefs. Move some debug checks under the #ifdef. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-03 12:22:31 -08:00
Christoph Lameter	ae20bfda68	slub: Remove BUG_ON() from ksize and omit checks for !SLUB_DEBUG The BUG_ONs are useless since the pointer derefs will lead to NULL deref errors anyways. Some of the checks are not necessary if no debugging is possible. Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-03 12:22:31 -08:00
Christoph Lameter	27d9e4e948	slub: Use the objsize from the kmem_cache_cpu structure No need to access the kmem_cache structure. We have the same value in kmem_cache_cpu. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-03 12:22:31 -08:00
Christoph Lameter	d692ef6dcd	slub: Remove useless checks in alloc_debug_processing Alloc debug processing is never called with a NULL object pointer. No reason to check for NULL. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-03 12:22:31 -08:00
Christoph Lameter	e153362a50	slub: Remove objsize check in kmem_cache_flags() There is no page->offset anymore and also no associated limit on the number of objects. The page->offset field was removed for 2.6.24. So the check in kmem_cache_flags() is now also obsolete (should have been dropped earlier, somehow a hunk vanished). Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-by: Christoph Lameter <clameter@sgi.com>	2008-03-03 12:22:30 -08:00
Christoph Lameter	d9acf4b7b6	slub: rename slab_objects to show_slab_objects The sysfs callback is better named show_slab_objects since it is always called from the xxx_show callbacks. We need the name for other purposes later. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-03 12:22:30 -08:00
Christoph Lameter	a973e9dd1e	Revert "unique end pointer" patch This only made sense for the alternate fastpath which was reverted last week. Mathieu is working on a new version that addresses the fastpath issues but that new code first needs to go through mm and it is not clear if we need the unique end pointers with his new scheme. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-03 12:22:30 -08:00
Linus Torvalds	00e962c540	Revert "SLUB: Alternate fast paths using cmpxchg_local" This reverts commit `1f84260c8c`, which is suspected to be the reason for some very occasional and hard-to-trigger crashes that usually look related to memory allocation (mostly reported in networking, but since that's generally the most common source of shortlived allocations - and allocations in interrupt contexts - that in itself is not a big clue). See for example http://bugzilla.kernel.org/show_bug.cgi?id=9973 http://lkml.org/lkml/2008/2/19/278 etc. One promising suspicion for what the root cause of bug is (which also explains why it's so hard to trigger in practice) came from Eric Dumazet: "I wonder how SLUB_FASTPATH is supposed to work, since it is affected by a classical ABA problem of lockless algo. cmpxchg_local(&c->freelist, object, object[c->offset]) can succeed, while an interrupt came (on this cpu), and several allocations were done, and one free was performed at the end of this interruption, so 'object' was recycled. c->freelist can then contain the previous value (object), but object[c->offset] was changed by IRQ. We then put back in freelist an already allocated object." but another reason for the revert is simply that everybody agrees that this code was the main suspect just by virtue of the pattern of oopses. Cc: Torsten Kaiser <just.for.lkml@googlemail.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-19 09:08:49 -08:00
Christoph Lameter	331dc558fa	slub: Support 4k kmallocs again to compensate for page allocator slowness Currently we hand off PAGE_SIZEd kmallocs to the page allocator in the mistaken belief that the page allocator can handle these allocations effectively. However, measurements indicate a minimum slowdown by the factor of 8 (and that is only SMP, NUMA is much worse) vs the slub fastpath which causes regressions in tbench. Increase the number of kmalloc caches by one so that we again handle 4k kmallocs directly from slub. 4k page buffering for the page allocator will be performed by slub like done by slab. At some point the page allocator fastpath should be fixed. A lot of the kernel would benefit from a faster ability to allocate a single page. If that is done then the 4k allocs may again be forwarded to the page allocator and this patch could be reverted. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-02-14 15:30:02 -08:00
Christoph Lameter	71c7a06ff0	slub: Fallback to kmalloc_large for failing higher order allocs Slub already has two ways of allocating an object. One is via its own logic and the other is via the call to kmalloc_large to hand off object allocation to the page allocator. kmalloc_large is typically used for objects >= PAGE_SIZE. We can use that handoff to avoid failing if a higher order kmalloc slab allocation cannot be satisfied by the page allocator. If we reach the out of memory path then simply try a kmalloc_large(). kfree() can already handle the case of an object that was allocated via the page allocator and so this will work just fine (apart from object accounting...). For any kmalloc slab that already requires higher order allocs (which makes it impossible to use the page allocator fastpath!) we just use PAGE_ALLOC_COSTLY_ORDER to get the largest number of objects in one go from the page allocator slowpath. On a 4k platform this patch will lead to the following use of higher order pages for the following kmalloc slabs: 8 ... 1024 order 0 2048 .. 4096 order 3 (4k slab only after the next patch) We may waste some space if fallback occurs on a 2k slab but we are always able to fallback to an order 0 alloc. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-02-14 15:30:01 -08:00
Christoph Lameter	b7a49f0d4c	slub: Determine gfpflags once and not every time a slab is allocated Currently we determine the gfp flags to pass to the page allocator each time a slab is being allocated. Determine the bits to be set at the time the slab is created. Store in a new allocflags field and add the flags in allocate_slab(). Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-02-14 15:30:01 -08:00
Adrian Bunk	dada123d99	make slub.c:slab_address() static slab_address() can become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-02-14 15:30:01 -08:00
Pekka Enberg	eada35efcb	slub: kmalloc page allocator pass-through cleanup This adds a proper function for kmalloc page allocator pass-through. While it simplifies any code that does slab tracing code a lot, I think it's a worthwhile cleanup in itself. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-02-14 15:30:01 -08:00
Ingo Molnar	3adbefee6f	SLUB: fix checkpatch warnings fix checkpatch --file mm/slub.c errors and warnings. $ q-code-quality-compare errors lines of code errors/KLOC mm/slub.c [before] 22 4204 5.2 mm/slub.c [after] 0 4210 0 no code changed: text data bss dec hex filename 22195 8634 136 30965 78f5 slub.o.before 22195 8634 136 30965 78f5 slub.o.after md5: 93cdfbec2d6450622163c590e1064358 slub.o.before.asm 93cdfbec2d6450622163c590e1064358 slub.o.after.asm [clameter: rediffed against Pekka's cleanup patch, omitted moves of the name of a function to the start of line] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-02-07 17:52:39 -08:00

1 2 3 4

177 Commits