Commit Graph

53202 Commits

Author SHA1 Message Date
Tejun Heo b8cffc6ad8 libata: improve ata_std_prereset()
This patch updates ata_std_prereset() as follows.

* Don't fail on phy resume failure.  Just whine and continue.  Failure
  from prereset makes libata abort whole reset sequence and give up
  the port, so prereset() should be best effort.  This is more
  important with the coming EH updates as prereset() will be called
  with shorter timeout.

* If ata_wait_ready() fails, whine and request hardreset instead.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-05-01 07:49:54 -04:00
Tejun Heo 9b89391cc8 libata: improve 0xff status handling
For PATA, 0xff status indicates empty port.  For SATA, it depends on
how the controller emulates status register.  On some controllers,
0xff is used to represent broken link or certain stage during reset.

libata currently deals SATA the same.  This hasn't caused any problem
because problematic situations usually only occur after hotplug or
other link disruption events and libata blindly waited for the device
to spin up and settle after hotplug giving the link and device
whatever time to go through those stages.

libata is going to replace unconditional spinup wait with generic
timed sequence of resets, so not only getting 0xff handling right for
SATA is, well, the right thing to do, it's much more important now.

This patch makes the following changes.

* Make ata_bus_softreset() return -ENODEV if any of its wait fails
  due to 0xff status.

* Fail soft/hardreset if status wait returns -ENODEV indicating 0xff
  status while SStatus says the link is online.  e.g. Reset fails if
  status is 0xff after reset when SStatus reports the linke is online.
  If SCR registers are not available, everything is the same as
  before.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-05-01 07:49:54 -04:00
Tejun Heo d4b2bab4f2 libata: add deadline support to prereset and reset methods
Add @deadline to prereset and reset methods and make them honor it.
ata_wait_ready() which directly takes @deadline is implemented to be
used as the wait function.  This patch is in preparation for EH timing
improvements.

* ata_wait_ready() never does busy sleep.  It's only used from EH and
  no wait in EH is that urgent.  This function also prints 'be
  patient' message automatically after 5 secs of waiting if more than
  3 secs is remaining till deadline.

* ata_bus_post_reset() now fails with error code if any of its wait
  fails.  This is important because earlier reset tries will have
  shorter timeout than the spec requires.  If a device fails to
  respond before the short timeout, reset should be retried with
  longer timeout rather than silently ignoring the device.

  There are three behavior differences.

  1. Timeout is applied to both devices at once, not separately.  This
     is more consistent with what the spec says.

  2. When a device passes devchk but fails to become ready before
     deadline.  Previouly, post_reset would just succeed and let
     device classification remove the device.  New code fails the
     reset thus causing reset retry.  After a few times, EH will give
     up disabling the port.

  3. When slave device passes devchk but fails to become accessible
     (TF-wise) after reset.  Original code disables dev1 after 30s
     timeout and continues as if the device doesn't exist, while the
     patched code fails reset.  When this happens, new code fails
     reset on whole port rather than proceeding with only the primary
     device.

  If the failing device is suffering transient problems, new code
  retries reset which is a better behavior.  If the failing device is
  actually broken, the net effect is identical to it, but not to the
  other device sharing the channel.  In the previous code, reset would
  have succeeded after 30s thus detecting the working one.  In the new
  code, reset fails and whole port gets disabled.  IMO, it's a
  pathological case anyway (broken device sharing bus with working
  one) and doesn't really matter.

* ata_bus_softreset() is changed to return error code from
  ata_bus_post_reset().  It used to return 0 unconditionally.

* Spin up waiting is to be removed and not converted to honor
  deadline.

* To be on the safe side, deadline is set to 40s for the time being.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-05-01 07:49:53 -04:00
Linus Torvalds dc87c3985e libata: honour host controllers that want just one host
The Marvell IDE interface on my machine would hit a BUG_ON() in
lib/iomem.c because it was calling ata_pci_init_one() specifying just a
single port on the host, but that would actually end up trying to
initialize two ports, the second one with bogus information.

This fixes "ata_pci_init_one()" so that it actually passes down the
n_ports variable that it got from the low-level driver to the host
allocation routine ("ata_host_alloc_pinfo()"), which results in the ATA
layer actually having the correct port number information.

And in order to make it all work, I also needed to fix a few places that
had incorrectly hard-coded the fact that a host always had exactly two
ports (both ata_pci_init_bmdma() and ata_request_legacy_irqs() would
just always iterate over both ports).

Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 17:43:48 -07:00
David Rientjes 14e38ac823 pm: include EIO from errno-base.h
For backwards compatibility, call_platform_enable_wakeup() can return 0
instead of -EIO since we aren't guaranteed to have errno defined.

Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:41 -07:00
Jeremy Fitzhardinge 11443ec7d9 Add kvasprintf()
Add a kvasprintf() function to complement kasprintf().

No in-tree users yet, but I have some coming up.

[akpm@linux-foundation.org: EXPORT it]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Keir Fraser <keir@xensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Johannes Berg 9684e51cd1 power management: force pm_ops.valid callback to be assigned
This patch changes the docs and behaviour from "all states valid" to "no
states valid" if no .valid callback is assigned.  Users of pm_ops that only
need mem sleep can assign pm_valid_only_mem without any overhead, others
will require more elaborate callbacks.

Now that all users of pm_ops have a .valid callback this is a safe thing to
do and prevents things from getting messy again as they were before.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Looks-okay-to: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <linux-pm@lists.linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Johannes Berg e8c9c50269 power management: implement pm_ops.valid for everybody
Almost all users of pm_ops only support mem sleep, don't check in .valid and
don't reject any others in .prepare so users can be confused if they check
/sys/power/state, especially when new states are added (these would then
result in s-t-r although they're supposed to be something different).

This patch implements a generic pm_valid_only_mem function that is then
exported for users and puts it to use in almost all existing pm_ops.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David Brownell <david-b@pacbell.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: linux-pm@lists.linux-foundation.org
Cc: Len Brown <lenb@kernel.org>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Johannes Berg 11d77d0c01 power management: remove firmware disk mode
This patch removes the firmware disk suspend mode which is the wrong approach,
it is supposed to be used for implementing firmware-based disk suspend but
cannot actually be used for that.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: <linux-pm@lists.linux-foundation.org>
Cc: David Brownell <david-b@pacbell.net>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Johannes Berg fe0c935a6c rework pm_ops pm_disk_mode, kill misuse
This patch series cleans up some misconceptions about pm_ops.  Some users of
the pm_ops structure attempt to use it to stop the user from entering suspend
to disk, this, however, is not possible since the user can always use
"shutdown" in /sys/power/disk and then the pm_ops are never invoked.  Also,
platforms that don't support suspend to disk simply should not allow
configuring SOFTWARE_SUSPEND (read the help text on it, it only selects
suspend to disk and nothing else, all the other stuff depends on PM).

The pm_ops structure is actually intended to provide a way to enter
platform-defined sleep states (currently supported states are "standby" and
"mem" (suspend to ram)) and additionally (if SOFTWARE_SUSPEND is configured)
allows a platform to support a platform specific way to enter low-power mode
once everything has been saved to disk.  This is currently only used by ACPI
(S4).

This patch:

The pm_ops.pm_disk_mode is used in totally bogus ways since nobody really
seems to understand what it actually does.

This patch clarifies the pm_disk_mode description.

It also removes all the arm and sh users that think they can veto suspend to
disk via pm_ops; not so since the user can always do echo shutdown >
/sys/power/disk, they need to find a better way involving Kconfig or such.

ACPI is the only user left with a non-zero pm_disk_mode.

The patch also sets the default mode to shutdown again, but when a new pm_ops
is registered its pm_disk_mode is selected as default, that way the default
stays for ACPI where it is apparently required.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David Brownell <david-b@pacbell.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: <linux-pm@lists.linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Jeff Mahoney 1173a729fc reiserfs: suppress lockdep warning
We're getting lockdep warnings due to a post-2.6.21-rc7 bugfix.

The xattr_sem can never be taken in the manner described. Internal inodes
are protected by I_PRIVATE.  Add the appropriate annotation.

Cc: <stable@kernel.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Robert Peterson 42e380832a Extend print_symbol capability
Today's print_symbol function dumps a kernel symbol with printk.  This
patch extends the functionality of kallsyms.c so that the symbol lookup
function may be used without the printk.  This is useful for modules that
want to dump symbols elsewhere, for example, to debugfs.  I intend to use
the new function call in the GFS2 file system (which will be a separate
patch).

[akpm@linux-foundation.org: build fix]
[clameter@sgi.com: sprint_symbol should return length of string like sprintf]
Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Paulo Marques <pmarques@grupopie.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:39 -07:00
David S. Miller de34ed91c4 [UDP]: Do not allow specific bind when wildcard bind exists.
When allocating local ports, do not allow a bind to a port
with a specific local address when a bind to that port with
a wildcard local address already exists.

Noticed by Linus.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 14:51:58 -07:00
David S. Miller b7b5f487ab [IPV4] UDP: Fix endianness bugs in hashing changes.
I accidently applied an earlier version of Eric Dumazet's patch, from
March 21st.  His version from March 30th didn't have these bugs, so
this just interdiffs to the correct patch.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 13:35:29 -07:00
Linus Torvalds 40caf5ea5a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (56 commits)
  ieee1394: remove garbage from Kconfig
  ieee1394: more help in Kconfig
  ieee1394: ohci1394: Fix mistake in printk message.
  ieee1394: ohci1394: remove unnecessary rcvPhyPkt bit flipping in LinkControl register
  ieee1394: ohci1394: fix cosmetic problem in error logging
  ieee1394: eth1394: send async streams at S100 on 1394b buses
  ieee1394: eth1394: fix error path in module_init
  ieee1394: eth1394: correct return codes in hard_start_xmit
  ieee1394: eth1394: hard_start_xmit is called in atomic context
  ieee1394: eth1394: some conditions are unlikely
  ieee1394: eth1394: clean up fragment_overlap
  ieee1394: eth1394: don't use alloc_etherdev
  ieee1394: eth1394: omit useless set_mac_address callback
  ieee1394: eth1394: CONFIG_INET is always defined
  ieee1394: eth1394: allow MTU bigger than 1500
  ieee1394: unexport highlevel_host_reset
  ieee1394: eth1394: contain host reset
  ieee1394: eth1394: shorter error messages
  ieee1394: eth1394: correct a memset argument
  ieee1394: eth1394: refactor .probe and .update
  ...
2007-04-30 08:59:57 -07:00
Linus Torvalds d6454706c3 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jikos/hid: (21 commits)
  USB HID: don't warn on idVendor == 0
  USB HID: add 'quirks' module parameter
  USB HID: add support for dynamically-created quirks
  USB HID: clarify static quirk handling as squirks
  USB HID: encapsulate quirk handling into hid-quirks.c
  USB HID: EMS USBII device needs HID_QUIRK_MULTI_INPUT
  HID: update copyright and authorship macro
  HID: introduce proper zeroing of unused bits in output reports
  USB HID: add support for WiseGroup MP-8800 Quad Joypad
  USB HID: add FF support for Logitech Force 3D Pro Joystick
  USB HID: numlock quirk for dell W7658 keyboard
  USB HID: Logitech MX3000 keyboard needs report descriptor quirk
  USB HID: extend quirk for Logitech S510 keyboard
  USB HID: usbkbd/usbmouse - handle errors when registering devices
  USB HID: add QUIRK_HIDDEV for Belkin Flip KVM
  HID: enable dead keys on a belkin wireless keyboard
  USB HID: Thustmaster firestorm dual power v1 support
  USB HID: specify explicit size for hid_blacklist.quirks
  USB HID: fix retry & reset logic
  USB HID: consolidate vendor/product ids
  ...
2007-04-30 08:58:21 -07:00
Linus Torvalds 152a6a9da1 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  [IPV4] SNMP: Support OutMcastPkts and OutBcastPkts
  [IPV4] SNMP: Support InMcastPkts and InBcastPkts
  [IPV4] SNMP: Support InTruncatedPkts
  [IPV4] SNMP: Support InNoRoutes
  [SNMP]: Add definitions for {In,Out}BcastPkts
  [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent
  [TCP] FRTO: Delay skb available check until it's mandatory
  [XFRM]: Restrict upper layer information by bundle.
  [TCP]: Catch skb with S+L bugs earlier
  [PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo
  [L2TP]: Add the ability to autoload a pppox protocol module.
  [SKB]: Introduce skb_queue_walk_safe()
  [AF_IUCV/IUCV]: smp_call_function deadlock
  [IPV6]: Fix slab corruption running ip6sic
  [TCP]: Update references in two old comments
  [XFRM]: Export SPD info
  [IPV6]: Track device renames in snmp6.
  [SCTP]: Fix sctp_getsockopt_local_addrs_old() to use local storage.
  [NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats.
  [NETPOLL]: Remove CONFIG_NETPOLL_RX
  ...
2007-04-30 08:14:42 -07:00
Linus Torvalds cd9bb7e736 Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
  [PATCH] elevator: elv_list_lock does not need irq disabling
  [BLOCK] Don't pin lots of memory in mempools
  cfq-iosched: speedup cic rb lookup
  ll_rw_blk: add io_context private pointer
  cfq-iosched: get rid of cfqq hash
  cfq-iosched: tighten queue request overlap condition
  cfq-iosched: improve sync vs async workloads
  cfq-iosched: never allow an async queue idling
  cfq-iosched: get rid of ->dispatch_slice
  cfq-iosched: don't pass unused preemption variable around
  cfq-iosched: get rid of ->cur_rr and ->cfq_list
  cfq-iosched: slice offset should take ioprio into account
  [PATCH] cfq-iosched: style cleanups and comments
  cfq-iosched: sort IDLE queues into the rbtree
  cfq-iosched: sort RT queues into the rbtree
  [PATCH] cfq-iosched: speed up rbtree handling
  cfq-iosched: rework the whole round-robin list concept
  cfq-iosched: minor updates
  cfq-iosched: development update
  cfq-iosched: improve preemption for cooperating tasks
2007-04-30 08:12:39 -07:00
Linus Torvalds 24a77daf3d Merge branch 'for-2.6.22' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'for-2.6.22' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (255 commits)
  [POWERPC] Remove dev_dbg redefinition in drivers/ps3/vuart.c
  [POWERPC] remove kernel module option for booke wdt
  [POWERPC] Avoid putting cpu node twice
  [POWERPC] Spinlock initializer cleanup
  [POWERPC] ppc4xx_sgdma needs dma-mapping.h
  [POWERPC] arch/powerpc/sysdev/timer.c build fix
  [POWERPC] get_property cleanups
  [POWERPC] Remove the unused HTDMSOUND driver
  [POWERPC] cell: cbe_cpufreq cleanup and crash fix
  [POWERPC] Declare enable_kernel_spe in a header
  [POWERPC] Add dt_xlate_addr() to bootwrapper
  [POWERPC] bootwrapper: CONFIG_ -> CONFIG_DEVICE_TREE
  [POWERPC] Don't define a custom bd_t for Xilixn Virtex based boards.
  [POWERPC] Add sane defaults for Xilinx EDK generated xparameters files
  [POWERPC] Add uartlite boot console driver for the zImage wrapper
  [POWERPC] Stop using ppc_sys for Xilinx Virtex boards
  [POWERPC] New registration for common Xilinx Virtex ppc405 platform devices
  [POWERPC] Merge common virtex header files
  [POWERPC] Rework Kconfig dependancies for Xilinx Virtex ppc405 platform
  [POWERPC] Clean up cpufreq Kconfig dependencies
  ...
2007-04-30 08:10:12 -07:00
Mitsuru Chinen 80787ebc2b [IPV4] SNMP: Support OutMcastPkts and OutBcastPkts
A transmitted IP multicast datagram should be counted as OutMcastPkts.
By the same token, a transmitted IP broadcast datagram should be
counted as OutBcastPkts.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:32 -07:00
Mitsuru Chinen 5506b54b36 [IPV4] SNMP: Support InMcastPkts and InBcastPkts
A received IP multicast datagram should be counted as InMcastPkts.
By the same token, a received IP broadcast datagram should be
counted as InBcastPkts.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:29 -07:00
Mitsuru Chinen 704aed53b4 [IPV4] SNMP: Support InTruncatedPkts
An IP datagram which is being discarded because the datagram frame
didn't carry enough data should be counted as InTruncatedPkts.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:26 -07:00
Mitsuru Chinen e91a47ebb1 [IPV4] SNMP: Support InNoRoutes
An IP datagram which is being discarded because of no routes in the
forwarding path should be counted as InNoRoutes.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:22 -07:00
Mitsuru Chinen 71ff6c0a85 [SNMP]: Add definitions for {In,Out}BcastPkts
The updated IP-MIB RFC (RFC4293) specifys new objects, InBcastPkts
and OutBcastPkts. This adds definitions for them.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:19 -07:00
Ilpo Järvinen d551e4541d [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent
This is a corner case where less than MSS sized new data thingie
is awaiting in the send queue. For F-RTO to work correctly, a
new data segment must be sent at certain point or F-RTO cannot
be used at all. RFC4138 allows overriding of Nagle at that
point.

Implementation uses frto_counter states 2 and 3 to distinguish
when Nagle override is needed.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:16 -07:00
Ilpo Järvinen 575ee7140d [TCP] FRTO: Delay skb available check until it's mandatory
No new data is needed until the first ACK comes, so no need to check
for application limitedness until then.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:12 -07:00
Masahide NAKAMURA 157bfc2502 [XFRM]: Restrict upper layer information by bundle.
On MIPv6 usage, XFRM sub policy is enabled.
When main (IPsec) and sub (MIPv6) policy selectors have the same
address set but different upper layer information (i.e. protocol
number and its ports or type/code), multiple bundle should be created.
However, currently we have issue to use the same bundle created for
the first time with all flows covered by the case.

It is useful for the bundle to have the upper layer information
to be restructured correctly if it does not match with the flow.

1. Bundle was created by two policies
Selector from another policy is added to xfrm_dst.
If the flow does not match the selector, it goes to slow path to
restructure new bundle by single policy.

2. Bundle was created by one policy
Flow cache is added to xfrm_dst as originated one. If the flow does
not match the cache, it goes to slow path to try searching another
policy.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:09 -07:00
Ilpo Järvinen 34588b4c04 [TCP]: Catch skb with S+L bugs earlier
SACKED_ACKED and LOST are mutually exclusive with SACK, thus
having their sum larger than packets_out is bug with SACK.
Eventually these bugs trigger traps in the tcp_clean_rtx_queue
with SACK but it's much more informative to do this here.

Non-SACK TCP, however, could get more than packets_out duplicate
ACKs which each increment sacked_out, so it makes sense to do
this kind of limitting for non-SACK TCP but not for SACK enabled
one. Perhaps the author had the opposite in mind but did the
logic accidently wrong way around? Anyway, the sacked_out
incrementer code for non-SACK already deals this issue before
calling sync_left_out so this trapping can be done
unconditionally.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:57:33 -07:00
Eric Dumazet 6aaf47fa48 [PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo
Some people want to have many UDP sockets, binded to a single port but
many different addresses. We currently hash all those sockets into a
single chain.  Processing of incoming packets is very expensive,
because the whole chain must be examined to find the best match.

I chose in this patch to hash UDP sockets with a hash function that
take into account both their port number and address : This has a
drawback because we need two lookups : one with a given address, one
with a wildcard (null) address.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:26:00 -07:00
James Chapman 65def812ab [L2TP]: Add the ability to autoload a pppox protocol module.
This patch allows a name "pppox-proto-nnn" to be used in modprobe.conf
to autoload a PPPoX protocol nnn.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:21:02 -07:00
Jens Axboe 07e4470805 Merge branch 'cfq' into for-linus 2007-04-30 09:09:27 +02:00
Jens Axboe 2a12dcd71a [PATCH] elevator: elv_list_lock does not need irq disabling
It's never grabbed from irq context, so just make it plain spin_lock().

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:08:17 +02:00
Jens Axboe 5972511b77 [BLOCK] Don't pin lots of memory in mempools
Currently we scale the mempool sizes depending on memory installed
in the machine, except for the bio pool itself which sits at a fixed
256 entry pre-allocation.

There's really no point in "optimizing" this OOM path, we just need
enough preallocated to make progress. A single unit is enough, lets
scale it down to 2 just to be on the safe side.

This patch saves ~150kb of pinned kernel memory on a 32-bit box.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:08:17 +02:00
James Chapman 46f8914e53 [SKB]: Introduce skb_queue_walk_safe()
This patch provides a method for walking skb lists while inserting or
removing skbs from the list.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:07:31 -07:00
Jens Axboe 597bc485d6 cfq-iosched: speedup cic rb lookup
We often lookup the same queue many times in succession, so cache
the last looked up queue to avoid browsing the rbtree.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:23 +02:00
Jens Axboe 4e521c27ee ll_rw_blk: add io_context private pointer
To be used by as/cfq as they see fit.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:23 +02:00
Vasily Tarasov 91fac317a3 cfq-iosched: get rid of cfqq hash
cfq hash is no more necessary.  We always can get cfqq from io context.
cfq_get_io_context_noalloc() function is introduced, because we don't
want to allocate cic on merging and checking may_queue.  In order to
identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is
eliminated we need to use other criterion: sync flag for queue is added.
In all places where we dig in rb_tree we're in current context, so no
additional locking is required.

Advantages of this patch: no additional memory for hash, no seeking in
hash, code is cleaner. But it is necessary now to seek cic in per-ioc
rbtree, but it is faster:
- most processes work only with few devices
- most systems have only few block devices
- it is a rb-tree

Signed-off-by: Vasily Tarasov <vtaras@openvz.org>

Changes by me:

- Merge into CFQ devel branch
- Get rid of cfq_get_io_context_noalloc()
- Fix various bugs with dereferencing cic->cfqq[] with offset other
  than 0 or 1.
- Fix bug in cfqq setup, is_sync condition was reversed.
- Fix bug where only bio_sync() is used, we need to check for a READ too

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:23 +02:00
Jens Axboe cc19747977 cfq-iosched: tighten queue request overlap condition
For tagged devices, allow overlap of requests if the idle window
isn't enabled on the current active queue.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:23 +02:00
Jens Axboe 3ed9a2965c cfq-iosched: improve sync vs async workloads
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:23 +02:00
Jens Axboe 1be92f2fc7 cfq-iosched: never allow an async queue idling
We don't enable it by default, don't let it get enabled during
runtime.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:22 +02:00
Jens Axboe 20e493a8d0 cfq-iosched: get rid of ->dispatch_slice
We can track it fairly accurately locally, let the slice handling
take care of the rest.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:22 +02:00
Jens Axboe 6084cdda0e cfq-iosched: don't pass unused preemption variable around
We don't use it anymore in the slice expiry handling.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:22 +02:00
Jens Axboe edd75ffd92 cfq-iosched: get rid of ->cur_rr and ->cfq_list
It's only used for preemption now that the IDLE and RT queues also
use the rbtree. If we pass an 'add_front' variable to
cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion
at the front of the tree.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:22 +02:00
Jens Axboe 67e6b49e39 cfq-iosched: slice offset should take ioprio into account
Use the max_slice-cur_slice as the multipler for the insertion offset.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:22 +02:00
Jens Axboe 498d3aa2b4 [PATCH] cfq-iosched: style cleanups and comments
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:22 +02:00
Jens Axboe 67060e3799 cfq-iosched: sort IDLE queues into the rbtree
Same treatment as the RT conversion, just put the sorted idle
branch at the end of the tree.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:22 +02:00
Jens Axboe 0c534e0a46 cfq-iosched: sort RT queues into the rbtree
Currently CFQ does a linked insert into the current list for RT
queues. We can just factor the class into the rb insertion,
and then we don't have to treat RT queues in a special way. It's
faster, too.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:22 +02:00
Jens Axboe cc09e2990f [PATCH] cfq-iosched: speed up rbtree handling
For cases where the rbtree is mainly used for sorting and min retrieval,
a nice speedup of the rbtree code is to maintain a cache of the leftmost
node in the tree.

Also spotted in the CFS CPU scheduler code.

Improved by Alan D. Brunelle <Alan.Brunelle@hp.com> by updating the
leftmost hint in cfq_rb_first() if it isn't set, instead of only
updating it on insert.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:21 +02:00
Jens Axboe d9e7620e60 cfq-iosched: rework the whole round-robin list concept
Drawing on some inspiration from the CFS CPU scheduler design, overhaul
the pending cfq_queue concept list management. Currently CFQ uses a
doubly linked list per priority level for sorting and service uses.
Kill those lists and maintain an rbtree of cfq_queue's, sorted by when
to service them.

This unfortunately means that the ionice levels aren't as strong
anymore, will work on improving those later. We only scale the slice
time now, not the number of times we service. This means that latency
is better (for all priority levels), but that the distinction between
the highest and lower levels aren't as big.

The diffstat speaks for itself.

 cfq-iosched.c |  363 +++++++++++++++++---------------------------------
 1 file changed, 125 insertions(+), 238 deletions(-)

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:21 +02:00
Jens Axboe 1afba0451c cfq-iosched: minor updates
- Move the queue_new flag clear to when the queue is selected
- Only select the non-first queue in cfq_get_best_queue(), if there's
  a substantial difference between the best and first.
- Get rid of ->busy_rr
- Only select a close cooperator, if the current queue is known to take
  a while to "think".

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:01:21 +02:00