linux-sg2042/mm
Christoph Lameter 6267276f3f [PATCH] optional ZONE_DMA: deal with cases of ZONE_DMA meaning the first zone
This patchset follows up on the earlier work in Andrew's tree to reduce the
number of zones.  The patches allow to go to a minimum of 2 zones.  This one
allows also to make ZONE_DMA optional and therefore the number of zones can be
reduced to one.

ZONE_DMA is usually used for ISA DMA devices.  There are a number of reasons
why we would not want to have ZONE_DMA

1. Some arches do not need ZONE_DMA at all.

2. With the advent of IOMMUs DMA zones are no longer needed.
   The necessity of DMA zones may drastically be reduced
   in the future. This patchset allows a compilation of
   a kernel without that overhead.

3. Devices that require ISA DMA get rare these days. All
   my systems do not have any need for ISA DMA.

4. The presence of an additional zone unecessarily complicates
   VM operations because it must be scanned and balancing
   logic must operate on its.

5. With only ZONE_NORMAL one can reach the situation where
   we have only one zone. This will allow the unrolling of many
   loops in the VM and allows the optimization of varous
   code paths in the VM.

6. Having only a single zone in a NUMA system results in a
   1-1 correspondence between nodes and zones. Various additional
   optimizations to critical VM paths become possible.

Many systems today can operate just fine with a single zone.  If you look at
what is in ZONE_DMA then one usually sees that nothing uses it.  The DMA slabs
are empty (Some arches use ZONE_DMA instead of ZONE_NORMAL, then ZONE_NORMAL
will be empty instead).

On all of my systems (i386, x86_64, ia64) ZONE_DMA is completely empty.  Why
constantly look at an empty zone in /proc/zoneinfo and empty slab in
/proc/slabinfo?  Non i386 also frequently have no need for ZONE_DMA and zones
stay empty.

The patchset was tested on i386 (UP / SMP), x86_64 (UP, NUMA) and ia64 (NUMA).

The RFC posted earlier (see
http://marc.theaimsgroup.com/?l=linux-kernel&m=115231723513008&w=2) had lots
of #ifdefs in them.  An effort has been made to minize the number of #ifdefs
and make this as compact as possible.  The job was made much easier by the
ongoing efforts of others to extract common arch specific functionality.

I have been running this for awhile now on my desktop and finally Linux is
using all my available RAM instead of leaving the 16MB in ZONE_DMA untouched:

christoph@pentium940:~$ cat /proc/zoneinfo
Node 0, zone   Normal
  pages free     4435
        min      1448
        low      1810
        high     2172
        active   241786
        inactive 210170
        scanned  0 (a: 0 i: 0)
        spanned  524224
        present  524224
    nr_anon_pages 61680
    nr_mapped    14271
    nr_file_pages 390264
    nr_slab_reclaimable 27564
    nr_slab_unreclaimable 1793
    nr_page_table_pages 449
    nr_dirty     39
    nr_writeback 0
    nr_unstable  0
    nr_bounce    0
    cpu: 0 pcp: 0
              count: 156
              high:  186
              batch: 31
    cpu: 0 pcp: 1
              count: 9
              high:  62
              batch: 15
  vm stats threshold: 20
    cpu: 1 pcp: 0
              count: 177
              high:  186
              batch: 31
    cpu: 1 pcp: 1
              count: 12
              high:  62
              batch: 15
  vm stats threshold: 20
  all_unreclaimable: 0
  prev_priority:     12
  temp_priority:     12
  start_pfn:         0

This patch:

In two places in the VM we use ZONE_DMA to refer to the first zone.  If
ZONE_DMA is optional then other zones may be first.  So simply replace
ZONE_DMA with zone 0.

This also fixes ZONETABLE_PGSHIFT.  If we have only a single zone then
ZONES_PGSHIFT may become 0 because there is no need anymore to encode the zone
number related to a pgdat.  However, we still need a zonetable to index all
the zones for each node if this is a NUMA system.  Therefore define
ZONETABLE_SHIFT unconditionally as the offset of the ZONE field in page flags.

[apw@shadowen.org: fix mismerge]
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:18 -08:00
..
Kconfig Fix "can not" in Documentation and Kconfig 2006-10-03 22:53:09 +02:00
Makefile [PATCH] separate bdi congestion functions from queue congestion functions 2006-10-20 10:26:35 -07:00
allocpercpu.c [PATCH] Allow NULL pointers in percpu_free 2006-12-07 08:39:22 -08:00
backing-dev.c [PATCH] separate bdi congestion functions from queue congestion functions 2006-10-20 10:26:35 -07:00
bootmem.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
bounce.c [PATCH] blktrace: only add a bounce trace when we really bounce 2007-01-12 10:46:49 -08:00
fadvise.c [PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
filemap.c [PATCH] mm: remove find_trylock_page 2007-02-09 08:06:14 -08:00
filemap.h Remove all inclusions of <linux/config.h> 2006-10-04 03:38:54 -04:00
filemap_xip.c [PATCH] mm: mremap correct rmap accounting 2007-01-30 08:33:32 -08:00
fremap.c [PATCH] mm: more rmap debugging 2006-12-22 08:55:49 -08:00
highmem.c [PATCH] Use ZVC for free_pages 2007-02-11 10:51:17 -08:00
hugetlb.c [PATCH] hugetlb: preserve hugetlb pte dirty state 2007-02-09 09:25:46 -08:00
internal.h [PATCH] mm: VM_BUG_ON 2006-09-26 08:48:44 -07:00
madvise.c [PATCH] Fix MADV_REMOVE protection checking 2006-04-17 18:22:18 -07:00
memory.c [PATCH] page_mkwrite caller race fix 2007-02-11 10:51:17 -08:00
memory_hotplug.c [PATCH] Fix sparsemem on Cell 2007-01-11 18:18:20 -08:00
mempolicy.c [PATCH] optional ZONE_DMA: deal with cases of ZONE_DMA meaning the first zone 2007-02-11 10:51:18 -08:00
mempool.c [PATCH] dm: work around mempool_alloc, bio_alloc_bioset deadlocks 2006-09-01 11:39:09 -07:00
migrate.c [PATCH] radix-tree: RCU lockless readside 2006-12-07 08:39:25 -08:00
mincore.c [PATCH] sys_mincore: s/max/min/ 2006-12-17 10:21:53 -08:00
mlock.c [PATCH] mlock cleanup 2006-12-07 08:39:22 -08:00
mmap.c [PATCH] Add install_special_mapping 2007-02-09 09:25:47 -08:00
mmzone.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
mprotect.c [PATCH] paravirt: lazy mmu mode hooks.patch 2006-10-01 00:39:33 -07:00
mremap.c [PATCH] mm: mremap correct rmap accounting 2007-01-30 08:33:32 -08:00
msync.c [PATCH] mm: msync() cleanup 2006-09-26 08:48:45 -07:00
nommu.c [PATCH] struct path: convert mm 2006-12-08 08:28:47 -08:00
oom_kill.c [PATCH] fix OOM killing of swapoff 2007-01-05 23:55:29 -08:00
page-writeback.c Fix balance_dirty_page() calculations with CONFIG_HIGHMEM 2007-01-29 16:37:38 -08:00
page_alloc.c [PATCH] optional ZONE_DMA: deal with cases of ZONE_DMA meaning the first zone 2007-02-11 10:51:18 -08:00
page_io.c [PATCH] swsusp: use block device offsets to identify swap locations 2006-12-07 08:39:27 -08:00
pdflush.c [PATCH] Add include/linux/freezer.h and move definitions from sched.h 2006-12-07 08:39:27 -08:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
readahead.c [PATCH] Drop __get_zone_counts() 2007-02-11 10:51:18 -08:00
rmap.c [PATCH] page_mkclean_one(): fix call to set_pte_at() 2006-12-30 10:56:42 -08:00
shmem.c [PATCH] Fix for shmem_truncate_range() BUG_ON() 2006-12-22 08:55:47 -08:00
shmem_acl.c [PATCH] Fix typos in mm/shmem_acl.c 2006-10-11 11:14:23 -07:00
slab.c [PATCH] slab: use parameter passed to cache_reap to determine pointer to work structure 2007-02-11 10:51:17 -08:00
slob.c [PATCH] MM: SLOB is broken by recent cleanup of slab.h 2006-12-30 10:56:42 -08:00
sparse.c [PATCH] numa node ids are int, page_to_nid and zone_to_nid should return int 2006-12-07 08:39:23 -08:00
swap.c [PATCH] hotplug CPU: clean up hotcpu_notifier() use 2006-12-07 08:39:39 -08:00
swap_state.c [PATCH] lockdep: locking init debugging improvement 2006-07-03 15:27:02 -07:00
swapfile.c [PATCH] swsusp: Do not fail if resume device is not set 2007-01-05 23:55:22 -08:00
thrash.c [PATCH] make mm/thrash.c:global_faults static 2006-12-07 08:39:22 -08:00
tiny-shmem.c [PATCH] struct path: convert mm 2006-12-08 08:28:47 -08:00
truncate.c [PATCH] MM: Remove [PATCH] invalidate_inode_pages2_range() debug 2007-01-26 13:51:00 -08:00
util.c [PATCH] slab: clean up leak tracking ifdefs a little bit 2006-10-04 07:55:13 -07:00
vmalloc.c [PATCH] Fix strange size check in __get_vm_area_node() 2006-11-16 11:43:38 -08:00
vmscan.c [PATCH] Use ZVC for inactive and active counts 2007-02-11 10:51:17 -08:00
vmstat.c [PATCH] Drop get_zone_counts() 2007-02-11 10:51:18 -08:00