Commit Graph

293 Commits

Author SHA1 Message Date
NeilBrown 41158c7eb2 [PATCH] md: optimise reconstruction when re-adding a recently failed drive.
When an array is degraded, bit in the intent-bitmap are never cleared.  So if
a recently failed drive is re-added, we only need to reconstruct the block
that are still reflected in the bitmap.

This patch adds support for this re-adding.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:46 -07:00
NeilBrown 191ea9b2c7 [PATCH] md: raid1 support for bitmap intent logging
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:46 -07:00
NeilBrown 77ad4bc706 [PATCH] md: enable the bitmap write-back daemon and wait for it.
Currently we don't wait for updates to the bitmap to be flushed to disk
properly.  The infrastructure all there, but it isn't being used....

A separate kernel thread (bitmap_writeback_daemon) is needed to wait for each
page as we cannot get callbacks when a page write completes.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:45 -07:00
NeilBrown 32a7627cf3 [PATCH] md: optimised resync using Bitmap based intent logging
With this patch, the intent to write to some block in the array can be logged
to a bitmap file.  Each bit represents some number of sectors and is set
before any update happens, and only cleared when all writes relating to all
sectors are complete.

After an unclean shutdown, information in this bitmap can be used to optimise
resync - only sectors which could be out-of-sync need to be updated.

Also if a drive is removed and then added back into an array, the recovery can
make use of the bitmap to optimise reconstruction.  This is not implemented in
this patch.

Currently the bitmap is stored in a file which must (obviously) be stored on a
separate device.

The patch only provided infrastructure.  It does not update any personalities
to bitmap intent logging.

Md arrays can still be used with no bitmap file.  This patch has minimal
impact on such arrays.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:43 -07:00
NeilBrown 57afd89f98 [PATCH] md: improve the interface to sync_request
1/ change the return value (which is number-of-sectors synced)
 from 'int' to 'sector_t'.
 The number of sectors is usually easily small enough to fit
 in an int, but if resync needs to abort, it may want to return
 the total number of remaining sectors, which could be large.
 Also errors cannot be returned as negative numbers now, so use
 0 instead
2/ Add a 'skipped' return parameter to allow the array to report
 that it skipped the sectors.  This allows md to take this into account
 in the speed calculations.
 Currently there is no important skipping, but the bitmap-based-resync
 that is coming will use this.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:43 -07:00
NeilBrown 06d91a5fe0 [PATCH] md: improve locking on 'safemode' and move superblock writes
When md marks the superblock dirty before a write, it calls
generic_make_request (to write the superblock) from within
generic_make_request (to write the first dirty block), which could cause
problems later.

With this patch, the superblock write is always done by the helper thread, and
write request are delayed until that write completes.

Also, the locking around marking the array dirty and writing the superblock is
improved to avoid possible races.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:43 -07:00
James Simmons f1ab5dac25 [PATCH] fbdev: stack reduction
Shrink the stack when calling the drawing alignment functions.

Signed-off-by: James Simmons <jsimmons@www.infradead.org>
Cc: "Antonino A. Daplas" <adaplas@hotpop.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:41 -07:00
Jurriaan 303b86d991 [PATCH] New framebuffer fonts + updated 12x22 font available
Improve the fonts for use with the framebuffer.

I've added all the characters marked 'FIXME' in the sun12x22 font and
created a 10x18 font (based on the sun12x22 font) and a 7x14 font (based
on the vga8x16 font).

This patch is non-intrusive, no options are enabled by default so most
users won't notice a thing.

I am placing my changes under the GPL, however, I've not seen any copyright
notices on the sun12x22 font and the vga8x16 font which I derived my new
fonts from so I don't know what the copyright status is.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:41 -07:00
James Simmons d5881eb488 [PATCH] fbdev: new pci id for chipsfb
Patch adds pci ID for CT 69000 chipset.

Signed-off-by: James Simmons <jsimmons@www.infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:41 -07:00
Jaya Kumar 1154ea7dcd [PATCH] Framebuffer driver for Arc LCD board
Add support for the Arc monochrome LCD board.

The board uses KS108 controllers to drive individual 64x64 LCD matrices.
The board can be paneled in a variety of setups such as 2x1=128x64,
4x4=256x256 and so on.  The board/host interface is through GPIO.

Signed-off-by: Jaya Kumar <jayalk@intworks.biz>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: <linux-fbdev-devel@lists.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:41 -07:00
James Simmons f5a9951c94 [PATCH] fbdev: iomove removal
Since no one is using the inbuf, outbuf of struct fb_pixmap I removed their
use in the framebuffer console.  The idea is instead move the pixmap
functionality below the accelerated functions intead of on top as the way
it is now.  If there is no objection please apply.  This is against Linus
latestr GIT tree.  Thank you.

Signed-off-by: James Simmons <jsimmons@www.infradead.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:39 -07:00
Ian Kent 8a96619145 [PATCH] autofs4: subversion bump to identify these changes
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:36 -07:00
Paolo 'Blaisorblade' Giarrusso b77d6adc92 [PATCH] uml: make hw_controller_type->release exist only for archs needing it
With Chris Wedgwood <cw@f00f.org>

As suggested by Chris, we can make the "just added" method ->release
conditional to UML only (better: to archs requesting it, i.e.  only UML
currently), so that other archs don't get this unneeded crud, and if UML
won't need it any more we can kill this.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:32 -07:00
Paolo 'Blaisorblade' Giarrusso dbce706e25 [PATCH] uml: add and use generic hw_controller_type->release
With Chris Wedgwood <cw@f00f.org>

Currently UML must explicitly call the UML-specific
free_irq_by_irq_and_dev() for each free_irq call it's done.

This is needed because ->shutdown and/or ->disable are only called when the
last "action" for that irq is removed.

Instead, for UML shared IRQs (UML IRQs are very often, if not always,
shared), for each dev_id some setup is done, which must be cleared on the
release of that fd.  For instance, for each open console a new instance
(i.e.  new dev_id) of the same IRQ is requested().

Exactly, a fd is stored in an array (pollfds), which is after read by a
host thread and passed to poll().  Each event registered by poll() triggers
an interrupt.  So, for each free_irq() we must remove the corresponding
host fd from the table, which we do via this -release() method.

In this patch we add an appropriate hook for this, and remove all uses of
it by pointing the hook to the said procedure; this is safe to do since the
said procedure.

Also some cosmetic improvements are included.

This is heavily based on some work by Chris Wedgwood, which however didn't
get the patch merged for something I'd call a "misunderstanding" (the need
for this patch wasn't cleanly explained, thus adding the generic hook was
felt as undesirable).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:32 -07:00
Brent Casavant d4c477ca54 [PATCH] ioc4: PCI bus speed detection
Several hardware features of SGI's IOC4 I/O controller chip require
timing-related driver calculations dependent upon the PCI bus speed.  This
patch enables the core IOC4 driver code to detect the actual bus speed and
store a value that can later be used by the IOC4 subdrivers as needed.

Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Acked-by: Pat Gefre <pfg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:32 -07:00
Brent Casavant 22329b511a [PATCH] ioc4: Core driver rewrite
This series of patches reworks the configuration and internal structure
of the SGI IOC4 I/O controller device drivers.

These changes are motivated by several factors:

- The IOC4 chip PCI resources are of mixed use between functions (i.e.
  multiple functions are handled in the same address range, sometimes
  within the same register), muddling resource ownership and initialization
  issues.  Centralizing this ownership in a core driver is desirable.

- The IOC4 chip implements multiple functions (serial, IDE, others not
  yet implemented in the mainline kernel) but is not a multifunction
  PCI device.  In order to properly handle device addition and removal
  as well as module insertion and deletion, an intermediary IOC4-specific
  driver layer is needed to handle these operations cleanly.

- All IOC4 drivers are currently enabled by a single CONFIG value.  As
  not all systems need all IOC4 functions, it is desireable to enable
  these drivers independently.

- The current IOC4 core driver will trigger loading of all function-level
  drivers, as it makes direct calls to them.  This situation should be
  reversed (i.e. function-level drivers cause loading of core driver)
  in order to maintain a clear and least-surprise driver loading model.

- IOC4 hardware design necessitates some driver-level dependency on
  the PCI bus clock speed.  Current code assumes a 66MHz bus, but the
  speed should be autodetected and appropriate compensation taken.

This patch series effects the above changes by a newly and better designed
IOC4 core driver with which the function-level drivers can register and
deregister themselves upon module insertion/removal.  By tracking these
modules, device addition/removal is also handled properly.  PCI resource
management and ownership issues are centralized in this core driver, and
IOC4-wide configuration actions such as bus speed detection are also
handled in this core driver.

This patch:

The SGI IOC4 I/O controller chip implements multiple functions, though it is
not a multi-function PCI device.  Additionally, various PCI resources of the
IOC4 are shared by multiple hardware functions, and thus resource ownership by
driver is not clearly delineated.  Due to the current driver design, all core
and subordinate drivers must be loaded, or none, which is undesirable if not
all IOC4 hardware features are being used.

This patch reorganizes the IOC4 drivers so that the core driver provides a
subdriver registration service.  Through appropriate callbacks the subdrivers
can now handle device addition and removal, as well as module insertion and
deletion (though the IOC4 IDE driver requires further work before module
deletion will work).  The core driver now takes care of allocating PCI
resources and data which must be shared between subdrivers, to clearly
delineate module ownership of these items.

Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Acked-by: Pat Gefre <pfg@sgi.com
Acked-by: Jeremy Higdon <jeremy@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:32 -07:00
Kumar Gala 5b37b700f7 [PATCH] ppc32: Added support for new MPC8548 family of PowerQUICC III processors
Added descriptions of the new MPC8548 family processors, e500 core and
peripherals.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:23 -07:00
Jes Sorensen f14f75b811 [PATCH] ia64 uncached alloc
This patch contains the ia64 uncached page allocator and the generic
allocator (genalloc).  The uncached allocator was formerly part of the SN2
mspec driver but there are several other users of it so it has been split
off from the driver.

The generic allocator can be used by device driver to manage special memory
etc.  The generic allocator is based on the allocator from the sym53c8xx_2
driver.

Various users on ia64 needs uncached memory.  The SGI SN architecture requires
it for inter-partition communication between partitions within a large NUMA
cluster.  The specific user for this is the XPC code.  Another application is
large MPI style applications which use it for synchronization, on SN this can
be done using special 'fetchop' operations but it also benefits non SN
hardware which may use regular uncached memory for this purpose.  Performance
of doing this through uncached vs cached memory is pretty substantial.  This
is handled by the mspec driver which I will push out in a seperate patch.

Rather than creating a specific allocator for just uncached memory I came up
with genalloc which is a generic purpose allocator that can be used by device
drivers and other subsystems as they please.  For instance to handle onboard
device memory.  It was derived from the sym53c7xx_2 driver's allocator which
is also an example of a potential user (I am refraining from modifying sym2
right now as it seems to have been under fairly heavy development recently).

On ia64 memory has various properties within a granule, ie.  it isn't safe to
access memory as uncached within the same granule as currently has memory
accessed in cached mode.  The regular system therefore doesn't utilize memory
in the lower granules which is mixed in with device PAL code etc.  The
uncached driver walks the EFI memmap and pulls out the spill uncached pages
and sticks them into the uncached pool.  Only after these chunks have been
utilized, will it start converting regular cached memory into uncached memory.
Hence the reason for the EFI related code additions.

Signed-off-by: Jes Sorensen <jes@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:18 -07:00
Christoph Lameter 4ae7c03943 [PATCH] Periodically drain non local pagesets
The pageset array can potentially acquire a huge amount of memory on large
NUMA systems.  F.e.  on a system with 512 processors and 256 nodes there
will be 256*512 pagesets.  If each pageset only holds 5 pages then we are
talking about 655360 pages.With a 16K page size on IA64 this results in
potentially 10 Gigabytes of memory being trapped in pagesets.  The typical
cases are much less for smaller systems but there is still the potential of
memory being trapped in off node pagesets.  Off node memory may be rarely
used if local memory is available and so we may potentially have memory in
seldom used pagesets without this patch.

The slab allocator flushes its per cpu caches every 2 seconds.  The
following patch flushes the off node pageset caches in the same way by
tying into the slab flush.

The patch also changes /proc/zoneinfo to include the number of pages
currently in each pageset.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:18 -07:00
Benjamin LaHaise c2f29ea111 [PATCH] __read_page_state(): pass unsigned long instead of unsigned
By making the offset argument of __read_page_state an unsigned long instead of
unsigned, we can avoid forcing the compiler to sign extend a usually constant
argument.  This saves 1 instruction on x86-64.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:17 -07:00
Benjamin LaHaise 83e5d8f725 [PATCH] __mod_page_state(): pass unsigned long instead of unsigned
By making the offset argument of __mod_page_state an unsigned long instead
of unsigned, we can avoid forcing the compiler to sign extend a usually
constant argument.  This saves 1 instruction on x86-64.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:17 -07:00
Darren Hart 1ad539b2bd [PATCH] vm: try_to_free_pages unused argument
try_to_free_pages accepts a third argument, order, but hasn't used it since
before 2.6.0.  The following patch removes the argument and updates all the
calls to try_to_free_pages.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:17 -07:00
Badari Pulavarty cbe37d0937 [PATCH] mm: remove PG_highmem
Remove PG_highmem, to save a page flag.  Use is_highmem() instead.  It'll
generate a little more code, but we don't use PageHigheMem() in many places.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:17 -07:00
Wolfgang Wander 1363c3cd86 [PATCH] Avoiding mmap fragmentation
Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.

The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.

The problem is twofold:

  1) the free_area_cache is used to continue a search for memory where
     the last search ended.  Before the change new areas were always
     searched from the base address on.

     So now new small areas are cluttering holes of all sizes
     throughout the whole mmap-able region whereas before small holes
     tended to close holes near the base leaving holes far from the base
     large and available for larger requests.

  2) the free_area_cache also is set to the location of the last
     munmap-ed area so in scenarios where we allocate e.g.  five regions of
     1K each, then free regions 4 2 3 in this order the next request for 1K
     will be placed in the position of the old region 3, whereas before we
     appended it to the still active region 1, placing it at the location
     of the old region 2.  Before we had 1 free region of 2K, now we only
     get two free regions of 1K -> fragmentation.

The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache.  If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.

The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.

Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.

Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.

Signed-off-by: Wolfgang Wander <wwc@rentec.com>
Credit-to: "Richard Purdie" <rpurdie@rpsys.net>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu> (partly)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:16 -07:00
Christoph Lameter e7c8d5c995 [PATCH] node local per-cpu-pages
This patch modifies the way pagesets in struct zone are managed.

Each zone has a per-cpu array of pagesets.  So any particular CPU has some
memory in each zone structure which belongs to itself.  Even if that CPU is
not local to that zone.

So the patch relocates the pagesets for each cpu to the node that is nearest
to the cpu instead of allocating the pagesets in the (possibly remote) target
zone.  This means that the operations to manage pages on remote zone can be
done with information available locally.

We play a macro trick so that non-NUMA pmachines avoid the additional
pointer chase on the page allocator fastpath.

AIM7 benchmark on a 32 CPU SGI Altix

w/o patches:
Tasks    jobs/min  jti  jobs/min/task      real       cpu
    1      484.68  100       484.6769     12.01      1.97   Fri Mar 25 11:01:42 2005
  100    27140.46   89       271.4046     21.44    148.71   Fri Mar 25 11:02:04 2005
  200    30792.02   82       153.9601     37.80    296.72   Fri Mar 25 11:02:42 2005
  300    32209.27   81       107.3642     54.21    451.34   Fri Mar 25 11:03:37 2005
  400    34962.83   78        87.4071     66.59    588.97   Fri Mar 25 11:04:44 2005
  500    31676.92   75        63.3538     91.87    742.71   Fri Mar 25 11:06:16 2005
  600    36032.69   73        60.0545     96.91    885.44   Fri Mar 25 11:07:54 2005
  700    35540.43   77        50.7720    114.63   1024.28   Fri Mar 25 11:09:49 2005
  800    33906.70   74        42.3834    137.32   1181.65   Fri Mar 25 11:12:06 2005
  900    34120.67   73        37.9119    153.51   1325.26   Fri Mar 25 11:14:41 2005
 1000    34802.37   74        34.8024    167.23   1465.26   Fri Mar 25 11:17:28 2005

with slab API changes and pageset patch:

Tasks    jobs/min  jti  jobs/min/task      real       cpu
    1      485.00  100       485.0000     12.00      1.96   Fri Mar 25 11:46:18 2005
  100    28000.96   89       280.0096     20.79    150.45   Fri Mar 25 11:46:39 2005
  200    32285.80   79       161.4290     36.05    293.37   Fri Mar 25 11:47:16 2005
  300    40424.15   84       134.7472     43.19    438.42   Fri Mar 25 11:47:59 2005
  400    39155.01   79        97.8875     59.46    590.05   Fri Mar 25 11:48:59 2005
  500    37881.25   82        75.7625     76.82    730.19   Fri Mar 25 11:50:16 2005
  600    39083.14   78        65.1386     89.35    872.79   Fri Mar 25 11:51:46 2005
  700    38627.83   77        55.1826    105.47   1022.46   Fri Mar 25 11:53:32 2005
  800    39631.94   78        49.5399    117.48   1169.94   Fri Mar 25 11:55:30 2005
  900    36903.70   79        41.0041    141.94   1310.78   Fri Mar 25 11:57:53 2005
 1000    36201.23   77        36.2012    160.77   1458.31   Fri Mar 25 12:00:34 2005

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Shai Fultheim <Shai@Scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:16 -07:00
David Gibson 63551ae0fe [PATCH] Hugepage consolidation
A lot of the code in arch/*/mm/hugetlbpage.c is quite similar.  This patch
attempts to consolidate a lot of the code across the arch's, putting the
combined version in mm/hugetlb.c.  There are a couple of uglyish hacks in
order to covert all the hugepage archs, but the result is a very large
reduction in the total amount of code.  It also means things like hugepage
lazy allocation could be implemented in one place, instead of six.

Tested, at least a little, on ppc64, i386 and x86_64.

Notes:
	- this patch changes the meaning of set_huge_pte() to be more
	  analagous to set_pte()
	- does SH4 need s special huge_ptep_get_and_clear()??

Acked-by: William Lee Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:15 -07:00
Martin Hicks 1e7e5a9048 [PATCH] VM: rate limit early reclaim
When early zone reclaim is turned on the LRU is scanned more frequently when a
zone is low on memory.  This limits when the zone reclaim can be called by
skipping the scan if another thread (either via kswapd or sync reclaim) is
already reclaiming from the zone.

Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:14 -07:00
Martin Hicks 0c35bbadc5 [PATCH] VM: add __GFP_NORECLAIM
When using the early zone reclaim, it was noticed that allocating new pages
that should be spread across the whole system caused eviction of local pages.

This adds a new GFP flag to prevent early reclaim from happening during
certain allocation attempts.  The example that is implemented here is for page
cache pages.  We want page cache pages to be spread across the whole system,
and we don't want page cache pages to evict other pages to get local memory.

Signed-off-by:  Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:14 -07:00
Martin Hicks 753ee72896 [PATCH] VM: early zone reclaim
This is the core of the (much simplified) early reclaim.  The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.

One of the major uses of this is NUMA machines.  With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.

This adds a zone tuneable to enable early zone reclaim.  It is selected on a
per-zone basis and is turned on/off via syscall.

Adding some extra throttling on the reclaim was also required (patch
4/4).  Without the machine would grind to a crawl when doing a "make -j"
kernel build.  Even with this patch the System Time is higher on
average, but it seems tolerable.  Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:

			wall  user   sys   %cpu  ctx sw.  sleeps
			----  ----   ---   ----   ------  ------
No patch		1009  1384   847   258   298170   504402
w/patch, no reclaim     880   1376   667   288   254064   396745
w/patch & reclaim       1079  1385   926   252   291625   548873

These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot.  Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.

I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.

Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).

The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c

Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:14 -07:00
Ingo Molnar 39c715b717 [PATCH] smp_processor_id() cleanup
This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.

The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.

Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.

In the new code, there are two externally visible symbols:

 - smp_processor_id(): debug variant.

 - raw_smp_processor_id(): nondebug variant. Replaces all existing
   uses of _smp_processor_id() and __smp_processor_id(). Defined
   by every SMP architecture in include/asm-*/smp.h.

There is one new internal symbol, dependent on DEBUG_PREEMPT:

 - debug_smp_processor_id(): internal debug variant, mapped to
                             smp_processor_id().

Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file.  All related comments got updated and/or
clarified.

I have build/boot tested the following 8 .config combinations on x86:

 {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}

I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT.  (Other
architectures are untested, but should work just fine.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:13 -07:00
Tony Luck 29516d75a0 Auto merge with /home/aegl/GIT/linus 2005-06-21 16:21:20 -07:00
Patrick McHardy 18b8afc771 [NETFILTER]: Kill nf_debug
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:01:57 -07:00
Patrick McHardy e45b1be8bc [NETFILTER]: Kill lockhelp.h
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:01:30 -07:00
Alexey Kuznetsov 18b504e25f [NETLINK]: netlink_callback structure needs 5 args not 4
net/ipv4/tcp_diag.c uses up to ->args[4]

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 12:38:48 -07:00
Linus Torvalds 1d345dac1f Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 2005-06-20 16:00:33 -07:00
Maneesh Soni 988d186de5 [PATCH] sysfs-iattr: add sysfs_setattr
o This adds ->i_op->setattr VFS method for sysfs inodes. The changed
  attribues are saved in the persistent sysfs_dirent structure as a pointer
  to struct iattr. The struct iattr is allocated only for those sysfs_dirent's
  for which default attributes are getting changed. Thanks to Jon Smirl for
  this suggestion.

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:37 -07:00
Yani Ioannou 0a3e7eeabc [PATCH] I2C: add i2c sensor_device_attribute and macros
This patch creates a new header with a potential standard i2c sensor
attribute type (which simply includes an int representing the sensor
number/index) and the associated macros, SENSOR_DEVICE_ATTR to define
a static attribute and to_sensor_dev_attr to get a
sensor_device_attribute reference from an embedded device_attribute
reference.

Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
2005-06-20 15:15:36 -07:00
Yani Ioannou 54b6f35c99 [PATCH] Driver core: change device_attribute callbacks
This patch adds the device_attribute paramerter to the
device_attribute store and show sysfs callback functions, and passes a
reference to the attribute when the callbacks are called.

Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:31 -07:00
Arnd Bergmann acaefc25d2 [PATCH] libfs: add simple attribute files
Based on the discussion about spufs attributes, this is my suggestion
for a more generic attribute file support that can be used by both
debugfs and spufs.

Simple attribute files behave similarly to sequential files from
a kernel programmers perspective in that a standard set of file
operations is provided and only an open operation needs to
be written that registers file specific get() and set() functions.

These operations are defined as

void foo_set(void *data, u64 val); and
u64 foo_get(void *data);

where data is the inode->u.generic_ip pointer of the file and the
operations just need to make send of that pointer. The infrastructure
makes sure this works correctly with concurrent access and partial
read calls.

A macro named DEFINE_SIMPLE_ATTRIBUTE is provided to further simplify
using the attributes.

This patch already contains the changes for debugfs to use attributes
for its internal file operations.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:30 -07:00
Keiichiro Tokunaga 4b45099b75 [PATCH] Driver core: unregister_node() for hotplug use
This adds a generic function 'unregister_node()'.
It is used to remove objects of a node going away
for hotplug.  All the devices on the node must be
unregistered before calling this function.

Signed-off-by: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

diff -puN drivers/base/node.c~numa_hp_base drivers/base/node.c
2005-06-20 15:15:29 -07:00
Patrick Mochel 0d3e5a2e39 [PATCH] Driver Core: fix bk-driver-core kills ppc64
There's no check to see if the device is already bound to a driver, which
could do bad things.  The first thing to go wrong is that it will try to match
a driver with a device already bound to one.  In some cases (it appears with
USB with drivers/usb/core/usb.c::usb_match_id()), some drivers will match a
device based on the class type, so it would be common (especially for HID
devices) to match a device that is already bound.

The fun comes when ->probe() is called, it fails, then
driver_probe_device() does this:

	dev->driver = NULL;

Later on, that pointer could be be dereferenced without checking and cause
hell to break loose.

This problem could be nasty. It's very hardware dependent, since some
devices could have a different set of matching qualifiers than others.

Now, I don't quite see exactly where/how you were getting that crash.
You're dereferencing bad memory, but I'm not sure which pointer was bad
and where it came from, but it could have come from a couple of different
places.

The patch below will hopefully fix it all up for you. It's against
2.6.12-rc2-mm1, and does the following:

- Move logic to driver_probe_device() and comments uncommon returns:
  1 - If device is bound
  0 - If device not bound, and no error
  error - If there was an error.

- Move locking to caller of that function, since we want to lock a
  device for the entire time we're trying to bind it to a driver (to
  prevent against a driver being loaded at the same time).

- Update __device_attach() and __driver_attach() to do that locking.

- Check if device is already bound in __driver_attach()

- Update the converse device_release_driver() so it locks the device
  around all of the operations.

- Mark driver_probe_device() as static and remove export. It's an
  internal function, it should stay that way, and there are no other
  callers. If there is ever a need to export it, we can audit it as
  necessary.

Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-06-20 15:15:27 -07:00
mochel@digitalimplant.org 36239577cf [PATCH] Use a klist for device child lists.
- Use klist iterator in device_for_each_child(), making it safe to use for
  removing devices.
- Remove unused list_to_dev() function.
- Kills all usage of devices_subsys.rwsem.

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:23 -07:00
mochel@digitalimplant.org 63c4f204ff [PATCH] Remove struct device::driver_list.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:18 -07:00
mochel@digitalimplant.org 7dc35cdf69 [PATCH] Remove struct device::bus_list.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:18 -07:00
mochel@digitalimplant.org 8b0c250be4 [PATCH] add klist_node_attached() to determine if a node is on a list or not.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

diff -Nru a/include/linux/klist.h b/include/linux/klist.h
2005-06-20 15:15:17 -07:00
mochel@digitalimplant.org cb85b6f1cc [PATCH] Remove the unused device_find().
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:16 -07:00
mochel@digitalimplant.org 94e7b1c5ff [PATCH] Add a klist to struct device_driver for the devices bound to it.
- Use it in driver_for_each_device() instead of the regular list_head and stop using
  the bus's rwsem for protection.
- Use driver_for_each_device() in driver_detach() so we don't deadlock on the
  bus's rwsem.
- Remove ->devices.
- Move klist access and sysfs link access out from under device's semaphore, since
  they're synchronized through other means.

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:16 -07:00
mochel@digitalimplant.org 38fdac3cdc [PATCH] Add a klist to struct bus_type for its drivers.
- Use it in bus_for_each_drv().
- Use the klist spinlock instead of the bus rwsem.

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:14 -07:00
mochel@digitalimplant.org 465c7a3a3a [PATCH] Add a klist to struct bus_type for its devices.
- Use it for bus_for_each_dev().
- Use the klist spinlock instead of the bus rwsem.

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:14 -07:00
mochel@digitalimplant.org 9a19fea436 [PATCH] Add initial implementation of klist helpers.
This klist interface provides a couple of structures that wrap around
struct list_head to provide explicit list "head" (struct klist) and
list "node" (struct klist_node) objects. For struct klist, a spinlock
is included that protects access to the actual list itself. struct
klist_node provides a pointer to the klist that owns it and a kref
reference count that indicates the number of current users of that node
in the list.

The entire point is to provide an interface for iterating over a list
that is safe and allows for modification of the list during the
iteration (e.g. insertion and removal), including modification of the
current node on the list.

It works using a 3rd object type - struct klist_iter - that is declared
and initialized before an iteration. klist_next() is used to acquire the
next element in the list. It returns NULL if there are no more items.
This klist interface provides a couple of structures that wrap around
struct list_head to provide explicit list "head" (struct klist) and
list "node" (struct klist_node) objects. For struct klist, a spinlock
is included that protects access to the actual list itself. struct
klist_node provides a pointer to the klist that owns it and a kref
reference count that indicates the number of current users of that node
in the list.

The entire point is to provide an interface for iterating over a list
that is safe and allows for modification of the list during the
iteration (e.g. insertion and removal), including modification of the
current node on the list.

It works using a 3rd object type - struct klist_iter - that is declared
and initialized before an iteration. klist_next() is used to acquire the
next element in the list. It returns NULL if there are no more items.
Internally, that routine takes the klist's lock, decrements the reference
count of the previous klist_node and increments the count of the next
klist_node. It then drops the lock and returns.

There are primitives for adding and removing nodes to/from a klist.
When deleting, klist_del() will simply decrement the reference count.
Only when the count goes to 0 is the node removed from the list.
klist_remove() will try to delete the node from the list and block
until it is actually removed. This is useful for objects (like devices)
that have been removed from the system and must be freed (but must wait
until all accessors have finished).

Internally, that routine takes the klist's lock, decrements the reference
count of the previous klist_node and increments the count of the next
klist_node. It then drops the lock and returns.

There are primitives for adding and removing nodes to/from a klist.
When deleting, klist_del() will simply decrement the reference count.
Only when the count goes to 0 is the node removed from the list.
klist_remove() will try to delete the node from the list and block
until it is actually removed. This is useful for objects (like devices)
that have been removed from the system and must be freed (but must wait
until all accessors have finished).

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

diff -Nru a/include/linux/klist.h b/include/linux/klist.h
2005-06-20 15:15:14 -07:00
mochel@digitalimplant.org fae3cd0025 [PATCH] Add driver_for_each_device().
Now there's an iterator for accessing each device bound to a driver.

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Index: linux-2.6.12-rc2/drivers/base/driver.c
===================================================================
2005-06-20 15:15:13 -07:00
mochel@digitalimplant.org af70316af1 [PATCH] Add a semaphore to struct device to synchronize calls to its driver.
This adds a per-device semaphore that is taken before every call from the core to a
driver method. This prevents e.g. simultaneous calls to the ->suspend() or ->resume()
and ->probe() or ->release(), potentially saving a whole lot of headaches.

It also moves us a step closer to removing the bus rwsem, since it protects the fields
in struct device that are modified by the core.

Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:12 -07:00
gregkh@suse.de cd987d38cc [PATCH] class: remove class_simple code, as no one in the tree is using it anymore.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:11 -07:00
gregkh@suse.de 8561b10f6e [PATCH] USB: move the usb hcd code to use the new class code.
This moves a kref into the main hcd structure, which detaches it from
the class device structure.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:07 -07:00
gregkh@suse.de 1235686f6e [PATCH] INPUT: move to use the new class code, instead of class_simple
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:04 -07:00
gregkh@suse.de e9ba6365fd [PATCH] CLASS: move a "simple" class logic into the class core.
One step on improving the class api so that it can not be used incorrectly.
This also fixes the module owner issue with the dev files that happened when
the devt logic moved to the class core.

Based on a patch originally written by Kay Sievers <kay.sievers@vrfy.org>

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:04 -07:00
Dmitry Torokhov d48593bf20 [PATCH] Make attributes names const char *
sysfs: make attributes and attribute_group's names const char *

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:01 -07:00
Dmitry Torokhov 8d790d7408 [PATCH] make driver's name be const char *
Driver core:
  change driver's, bus's, class's and platform device's names
  to be const char * so one can use
            const char *drv_name = "asdfg";
  when initializing structures.
  Also kill couple of whitespaces.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:01 -07:00
Dmitry Torokhov 419cab3fc6 [PATCH] kset_hotplug_ops->name shoudl return const char *
kobject: change name() method in kset_hotplug_ops return const char *
	 since users shoudl not try to modify returned data.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:01 -07:00
Dmitry Torokhov f3b4f3c6de [PATCH] Make kobject's name be const char *
kobject: make kobject's name const char * since users should not
	 attempt to change it (except by calling kobject_rename).

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:00 -07:00
Dmitry Torokhov e3a15db241 [PATCH] sysfs_{create|remove}_link should take const char *
sysfs: make sysfs_{create|remove}_link to take const char * name.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:00 -07:00
Robert Olsson 246955fe4c [NETLINK]: fib_lookup() via netlink
Below is a more generic patch to do fib_lookup via netlink. For others 
we should say that we discussed this as a way to verify route selection.
It's also possible there are others uses for this.

In short the fist half of struct fib_result_nl is filled in by caller 
and netlink call fills in the other half and returns it.

In case anyone is interested there is a corresponding user app to compare 
the full routing table this was used to test implementation of the LC-trie. 

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:36:39 -07:00
Alexey Dobriyan f6e276ee67 [ATALK]: endian annotations
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:32:05 -07:00
Herbert Xu dd87147eed [IPSEC]: Add XFRM_STATE_NOPMTUDISC flag
This patch adds the flag XFRM_STATE_NOPMTUDISC for xfrm states.  It is
similar to the nopmtudisc on IPIP/GRE tunnels.  It only has an effect
on IPv4 tunnel mode states.  For these states, it will ensure that the
DF flag is always cleared.

This is primarily useful to work around ICMP blackholes.

In future this flag could also allow a larger MTU to be set within the
tunnel just like IPIP/GRE tunnels.  This could be useful for short haul
tunnels where temporary fragmentation outside the tunnel is desired over
smaller fragments inside the tunnel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:21:43 -07:00
Tony Luck 8ba08378b4 Auto merge with /home/aegl/GIT/linus 2005-06-20 09:35:34 -07:00
Herbert Xu 0603eac0d6 [IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete notification
This patch changes the format of the XFRM_MSG_DELSA and
XFRM_MSG_DELPOLICY notification so that the main message
sent is of the same format as that received by the kernel
if the original message was via netlink.  This also means
that we won't lose the byid information carried in km_event.

Since this user interface is introduced by Jamal's patch
we can still afford to change it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:54:36 -07:00
Thomas Graf 1797754ea7 [NETLINK]: Introduce NLMSG_NEW macro to better handle netlink flags
Introduces a new macro NLMSG_NEW which extends NLMSG_PUT but takes
a flags argument. NLMSG_PUT stays there for compatibility but now
calls NLMSG_NEW with flags == 0. NLMSG_PUT_ANSWER is renamed to
NLMSG_NEW_ANSWER which now also takes a flags argument.

Also converts the users of NLMSG_PUT_ANSWER to use NLMSG_NEW_ANSWER
and fixes the two direct users of __nlmsg_put to either provide
the flags or use NLMSG_NEW(_ANSWER).

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:53:48 -07:00
Thomas Graf 8f48bcd4ef [RTNETLINK]: Add RTA_(PUT|GET) shortcuts for u8, u16, and flag
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:52:36 -07:00
Thomas Graf c52a3f89f8 [NETLINK]: Fix RTA_NEST_CANCEL().
Only skb_trim() if 'start' is non-NULL.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:51:26 -07:00
Thomas Graf c7fb64db00 [NETLINK]: Neighbour table configuration and statistics via rtnetlink
To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
NLM_F_DUMP flag set. Every neighbour table configuration is
spread over multiple messages to avoid running into message
size limits on systems with many interfaces. The first message
in the sequence transports all not device specific data such as
statistics, configuration, and the default parameter set.
This message is followed by 0..n messages carrying device
specific parameter sets.

Although the ordering should be sufficient, NDTA_NAME can be
used to identify sequences. The initial message can be identified
by checking for NDTA_CONFIG. The device specific messages do
not contain this TLV but have NDTPA_IFINDEX set to the
corresponding interface index.

To change neighbour table attributes, send RTM_SETNEIGHTBL
with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
otherwise. Device specific parameter sets can be changed by
setting NDTPA_IFINDEX to the interface index of the corresponding
device.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:50:55 -07:00
Thomas Graf 0076824492 [NETLINK] Routing attribute related shortcuts
RTA_GET_U(32|64)(tlv)
   Assumes TLV is a u32/u64 field and returns its value.

 RTA_GET_[M]SECS(tlv)
   Assumes TLV is a u64 and transports jiffies converted
   to seconds or milliseconds and returns its value.

 RTA_PUT_U(32|64)(skb, type, value)
   Appends %value as fixed u32/u64 to %skb as TLV %type.

 RTA_PUT_[M]SECS(skb, type, jiffies)
   Converts %jiffies to secs/msecs and appends it as u64
   to %skb as TLV %type.

 RTA_PUT_STRING(skb, type, string)
   Appends %NUL terminated %string to %skb as TLV %type.

 RTA_NEST(skb, type)
   Starts a nested TLV %type and returns the nesting handle.

 RTA_NEST_END(skb, nesting_handle)
   Finishes the nested TLV %nesting_handle, must be called
   symmetric to RTA_NEST(). Returns skb->len

 RTA_NEST_CANCEL(skb, nesting_handle)
   Cancel the nested TLV %nesting_handle and trim nested TLV
   from skb again, returns -1.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:50:38 -07:00
Thomas Graf f88a10d656 [NETLINK]: New message building macros
NLMSG_PUT_ANSWER(skb, nlcb, type, length)
   Start a new netlink message as answer to a request,
   returns the message header.

 NLMSG_END(skb, nlh)
   End a netlink message, fixes total message length,
   returns skb->len.

 NLMSG_CANCEL(skb, nlh)
   Cancel the building process and trim whole message
   from skb again, returns -1.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:50:12 -07:00
Arnaldo Carvalho de Melo 0e87506fcc [NET] Generalise tcp_listen_opt
This chunks out the accept_queue and tcp_listen_opt code and moves
them to net/core/request_sock.c and include/net/request_sock.h, to
make it useful for other transport protocols, DCCP being the first one
to use it.

Next patches will rename tcp_listen_opt to accept_sock and remove the
inline tcp functions that just call a reqsk_queue_ function.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:47:59 -07:00
Arnaldo Carvalho de Melo 60236fdd08 [NET] Rename open_request to request_sock
Ok, this one just renames some stuff to have a better namespace and to
dissassociate it from TCP:

struct open_request  -> struct request_sock
tcp_openreq_alloc    -> reqsk_alloc
tcp_openreq_free     -> reqsk_free
tcp_openreq_fastfree -> __reqsk_free

With this most of the infrastructure closely resembles a struct
sock methods subset.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:47:21 -07:00
Arnaldo Carvalho de Melo 2e6599cb89 [NET] Generalise TCP's struct open_request minisock infrastructure
Kept this first changeset minimal, without changing existing names to
ease peer review.

Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
has two new members:

->slab, that replaces tcp_openreq_cachep
->obj_size, to inform the size of the openreq descendant for
  a specific protocol

The protocol specific fields in struct open_request were moved to a
class hierarchy, with the things that are common to all connection
oriented PF_INET protocols in struct inet_request_sock, the TCP ones
in tcp_request_sock, that is an inet_request_sock, that is an
open_request.

I.e. this uses the same approach used for the struct sock class
hierarchy, with sk_prot indicating if the protocol wants to use the
open_request infrastructure by filling in sk_prot->rsk_prot with an
or_calltable.

Results? Performance is improved and TCP v4 now uses only 64 bytes per
open request minisock, down from 96 without this patch :-)

Next changeset will rename some of the structs, fields and functions
mentioned above, struct or_calltable is way unclear, better name it
struct request_sock_ops, s/struct open_request/struct request_sock/g,
etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:46:52 -07:00
Arnaldo Carvalho de Melo 1944972d3b [SLAB] Introduce kmem_cache_name
This is for use with slab users that pass a dynamically allocated slab name in
kmem_cache_create, so that before destroying the slab one can retrieve the name
and free its memory.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:46:19 -07:00
Jamal Hadi Salim 26b15dad9f [IPSEC] Add complete xfrm event notification
Heres the final patch.
What this patch provides

- netlink xfrm events
- ability to have events generated by netlink propagated to pfkey
  and vice versa.
- fixes the acquire lets-be-happy-with-one-success issue

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-06-18 22:42:13 -07:00
Linus Torvalds 19fa95e9e9 Merge master.kernel.org:/pub/scm/linux/kernel/git/dwmw2/audit-2.6 2005-06-18 13:54:12 -07:00
Linus Torvalds 43fde784a6 Merge 'upstream-2.6.13' branch of rsync://rsync.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev 2005-06-18 13:08:39 -07:00
Linus Torvalds 0e396ee43e Manual merge of rsync://rsync.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git
This is a fixed-up version of the broken "upstream-2.6.13" branch, where
I re-did the manual merge of drivers/net/r8169.c by hand, and made sure
the history is all good.
2005-06-18 11:42:35 -07:00
Jeff Garzik f9d1fe9630 Merge /spare/repo/linux-2.6/ 2005-06-18 13:21:24 -04:00
David Woodhouse 0107b3cf32 Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-06-18 08:36:46 +01:00
Lee Revell b8112df71c [SCSI] Add DMA mask constants other than 32 and 64 bit
Signed-Off-By: Lee Revell <rlrevell@joe-job.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-06-17 20:37:11 -05:00
James Bottomley 3237ee78fc merge by hand (fix up qla_os.c merge error) 2005-06-17 18:42:23 -05:00
Jesper Juhl 986a80d5c1 [PATCH] avoid signed vs unsigned comparison in efi_range_is_wc()
warning when building with gcc -W : 

 include/linux/efi.h: In function `efi_range_is_wc':
 include/linux/efi.h:320: warning: comparison between signed and unsigned

It looks to me like a significantly large 'len' passed in could cause the 
loop to never end. Isn't it safer to make 'i' an unsigned long as well? 
Like this little patch below (which of course also kills the warning) :

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-06-16 16:27:14 -07:00
J. Simonetti 1c2fb7f93c [IPV4]: Sysctl configurable icmp error source address.
This patch alows you to change the source address of icmp error
messages. It applies cleanly to 2.6.11.11 and retains the default
behaviour.

In the old (default) behaviour icmp error messages are sent with the ip
of the exiting interface.

The new behaviour (when the sysctl variable is toggled on), it will send
the message with the ip of the interface that received the packet that
caused the icmp error. This is the behaviour network administrators will
expect from a router. It makes debugging complicated network layouts
much easier. Also, all 'vendor routers' I know of have the later
behaviour.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 15:19:03 -07:00
Tom Rini 03722adce9 [NET]: linux/if_tr.h needs asm/byteorder.h
<linux/if_tr.h> uses __be16, but does not directly include
<asm/byteorder.h>.  Add this in, so that dhcp/net-tools token ring code
can compile again.

Signed-off-by: Tom Rini <trini@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-13 13:57:10 -07:00
Geert Uytterhoeven a58e76f254 [PATCH] Remove obsolete HAVE_ARCH_GET_SIGNAL_TO_DELIVER?
Now m68k no longer sets HAVE_ARCH_GET_SIGNAL_TO_DELIVER, can it be removed
completely? Or may ARM26 still need it? Note that its usage was removed from
kernel/signal.c about 2 months ago.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-12 20:43:21 -07:00
Linus Torvalds 5273a00d9c Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2005-06-08 16:36:31 -07:00
Thomas Graf 4890062960 [PKT_SCHED]: Allow socket attributes to be matched on via meta ematch
Adds meta collectors for all socket attributes that make sense
to be filtered upon. Some of them are only useful for debugging
but having them doesn't hurt.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 15:10:48 -07:00
Michael Chan 6d1cfbab4d [TG3]: Fix 5700/5701 DMA corruption on Apple G4.
Fix 5700/5701 DMA write corruption on Apple G4 by detecting the Apple
UniNorth PCI 1.5 chipset and adjusting the DMA write boundary to 16. DMA
test fails to detect the problem with this chipset.

Thanks to Manuel Perez Ayala for reporting the problem and helping to
debug it.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-08 14:13:14 -07:00
Alan Hourihane d0de98fa16 [PATCH] i945G patch for agpgart
Attached is a small patch for i945G support against 2.6.11.11.

From: Alan Hourihane <alanh@fairlite.demon.co.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-06-07 12:35:42 -07:00
David Mosberger 3f5948fa2c [PATCH] Include <linux/config.h> before testing CONFIG_ACPI
I'm not sure why this issue is suddenly showing, but without this
patchlet, the zx1 config won't compile anymore (e.g., to see the
compilation-error, look for "***" in [1]).

[1] http://www.gelato.unsw.edu.au/kerncomp/results//2005-06-06-17-00/zx1_defconfig-log.html

Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-06 17:02:03 -07:00
140fedb5f2 Automatic merge of /spare/repo/netdev-2.6 branch iff-running 2005-06-04 17:11:28 -04:00
91bcc018f9 Automatic merge of /spare/repo/netdev-2.6 branch we18 2005-06-04 17:08:24 -04:00
14d8ce70d5 Automatic merge of /spare/repo/netdev-2.6 branch hdlc 2005-06-04 17:03:09 -04:00
79121839aa Automatic merge of /spare/repo/netdev-2.6 branch dm9000 2005-06-04 17:02:29 -04:00
73561695b2 Automatic merge of /spare/repo/linux-2.6/.git branch HEAD 2005-06-03 23:54:56 -04:00
Roman Kagan 719df469cb [PATCH] USB: update urb documentation
On Wed, May 04, 2005 at 01:37:30PM -0700, David Brownell wrote:
> On Wednesday 04 May 2005 12:19 pm, Roman Kagan wrote:
> > struct urb {
> > 	/* private, usb core and host controller only fields in the urb */
> > 	...
> > 	struct list_head urb_list;	/* list pointer to all active urbs */
> > 	...
> > };
> >
> > Is it safe to use it for driver's purposes when the driver owns the urb,
> > that is, starting from the completion routine until the urb is submitted
> > with usb_submit_urb()?
>
> Right now, it should be.

Great!  FWIW I've briefly tested a modified version of usbatm using
the list head in struct urb instead of creating a wrapper struct, and I
haven't seen any failures yet.  So I tend to believe that your "should
be" actually means "is" :)

> > If it is, can it be guaranteed in future, e.g.
> > by moving the list head into the public section of struct urb?
>
> In fact I'm not sure why it ever got called "private" to usbcore/hcds.
> I thought the idea was that it should be like urb->status, reserved for
> whoever controls the URB.

OK then how about the following (essentially documentation) patch?

Signed-off-by: Roman Kagan <rkagan@mail.ru>
Acked-by: David Brownell <david-b@pacbell.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-03 00:04:30 -07:00
Linus Torvalds aa447acb92 Automatic merge of rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2005-06-02 17:39:49 -07:00