Commit Graph

180 Commits

Author SHA1 Message Date
David Howells 65f27f3844 WorkStruct: Pass the work_struct pointer instead of context data
Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct.  This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function.  This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated..  This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems.  But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default.  Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:55:48 +00:00
Andrew Morton df66b8552b [PATCH] tidy "md: check bio address after mapping through partitions"
Neil's xterms are too wide.

Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-03 12:27:55 -08:00
NeilBrown 5ddfe9691c [PATCH] md: check bio address after mapping through partitions.
Partitions are not limited to live within a device.  So we should range
check after partition mapping.

Note that 'maxsector' was being used for two different things.  I have
split off the second usage into 'old_sector' so that maxsector can be still
be used for it's primary usage later in the function.

Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-31 08:07:01 -08:00
Andrew Morton 3fcfab16c5 [PATCH] separate bdi congestion functions from queue congestion functions
Separate out the concept of "queue congestion" from "backing-dev congestion".
Congestion is a backing-dev concept, not a queue concept.

The blk_* congestion functions are retained, as wrappers around the core
backing-dev congestion functions.

This proper layering is needed so that NFS can cleanly use the congestion
functions, and so that CONFIG_BLOCK=n actually links.

Cc: "Thomas Maier" <balagi@justmail.de>
Cc: "Jens Axboe" <jens.axboe@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: David Howells <dhowells@redhat.com>
Cc: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20 10:26:35 -07:00
Thomas Maier 79e2de4bc5 [PATCH] export clear_queue_congested and set_queue_congested
Export the clear_queue_congested() and set_queue_congested() functions
located in ll_rw_blk.c

The functions are renamed to blk_clear_queue_congested() and
blk_set_queue_congested().

(needed in the pktcdvd driver's bio write congestion control)

Signed-off-by: Thomas Maier <balagi@justmail.de>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20 10:26:35 -07:00
David C Somayajulu f583f4924d [PATCH] helper function for retrieving scsi_cmd given host based block layer tag
This was necessitated by the need for a function to get back
to a scsi_cmnd, when an hba the posts its (corresponding) completion
interrupt with a block layer tag as its reference.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Somayajulu <david.somayajulu@qlogic.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2006-10-04 19:32:09 +02:00
Jens Axboe 059af497c2 [PATCH] blk_queue_start_tag() shared map race fix
If we share the tag map between two or more queues, then we cannot
use __set_bit() to set the bit. In fact we need to make sure we
atomically acquire this tag, so loop using test_and_set_bit() to
protect from that.

Noticed by Mike Christie <michaelc@cs.wisc.edu>

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:34 +02:00
Oleg Nesterov 25034d7a83 [PATCH] exit_io_context: don't disable irqs
We don't need to disable irqs to clear current->io_context, it is protected
by ->alloc_lock. Even IF it was possible to submit I/O from IRQ on behalf of
current this irq_disable() can't help: current_io_context() will re-instantiate
->io_context after irq_enable().

We don't need task_lock() or local_irq_disable() to clear ioc->task. This can't
prevent other CPUs from playing with our io_context anyway.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:31:18 +02:00
Jens Axboe 5404bc7a87 [PATCH] Allow file systems to differentiate between data and meta reads
We can use this information for making more intelligent priority
decisions, and it will also be useful for blktrace.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:42 +02:00
Jens Axboe da20a20f3b [PATCH] ll_rw_blk: allow more flexibility for read_ahead_kb store
It can make sense to set read-ahead larger than a single request.
We should not be enforcing such policy on the user. Additionally,
using the BLKRASET ioctl doesn't impose such a restriction. So
additionally we now expose identical behaviour through the two.

Issue also reported by Anton <cbou@mail.ru>

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:41 +02:00
Jens Axboe dc72ef4ae3 [PATCH] Add blk_start_queueing() helper
CFQ implements this on its own now, but it's really block layer
knowledge. Tells a device queue to start dispatching requests to
the driver, taking care to unplug if needed. Also fixes the issue
where as/cfq will invoke a stopped queue, which we really don't
want.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:40 +02:00
Jens Axboe b5deef9012 [PATCH] Make sure all block/io scheduler setups are node aware
Some were kmalloc_node(), some were still kmalloc(). Change them all to
kmalloc_node().

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:39 +02:00
Jens Axboe 1ea25ecb72 [PATCH] Audit block layer inlines
Kill a few inlines that bring in too much code to more than one location
Shrinks kernel text by about 300 bytes on 32-bit x86.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:38 +02:00
Jens Axboe fc46379daf [PATCH] cfq-iosched: kill cfq_exit_lock
cfq_exit_lock is protecting two things now:

- The per-ioc rbtree of cfq_io_contexts

- The per-cfqd linked list of cfq_io_contexts

The per-cfqd linked list can be protected by the queue lock, as it is (by
definition) per cfqd as the queue lock is.

The per-ioc rbtree is mainly used and updated by the process itself only.
The only outside use is the io priority changing. If we move the
priority changing to not browsing the rbtree, we can remove any locking
from the rbtree updates and lookup completely. Let the sys_ioprio syscall
just mark processes as having the iopriority changed and lazily update
the private cfq io contexts the next time io is queued, and we can
remove this locking as well.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:36 +02:00
Jens Axboe 51da90fcb6 [PATCH] ll_rw_blk: cleanup __make_request()
- Don't assign variables that are only used once.

- Kill spin_lock() prefetching, it's opportunistic at best.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:34 +02:00
Jens Axboe cb78b285c8 [PATCH] Drop useless bio passing in may_queue/set_request API
It's not needed for anything, so kill the bio passing.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:23 +02:00
Jens Axboe cdd6026217 [PATCH] Remove ->rq_status from struct request
After Christophs SCSI change, the only usage left is RQ_ACTIVE
and RQ_INACTIVE. The block layer sets RQ_INACTIVE right before freeing
the request, so any check for RQ_INACTIVE in a driver is a bug and
indicates use-after-free.

So kill/clean the remaining users, straight forward.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:23 +02:00
Jens Axboe 49171e5c6f [PATCH] Remove struct request_list from struct request
It is always identical to &q->rq, and we only use it for detecting
whether this request came out of our mempool or not. So replace it
with an additional ->flags bit flag.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:29:22 +02:00
Jens Axboe c00895ab2f [PATCH] Remove ->waiting member from struct request
As the comments indicates in blkdev.h, we can fold it into ->end_io_data
usage as that is really what ->waiting is. Fixup the users of
blk_end_sync_rq().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:29:12 +02:00
Jens Axboe 2e662b65f0 [PATCH] elevator: abstract out the rbtree sort handling
The rbtree sort/lookup/reposition logic is mostly duplicated in
cfq/deadline/as, so move it to the elevator core. The io schedulers
still provide the actual rb root, as we don't want to impose any sort
of specific handling on the schedulers.

Introduce the helpers and rb_node in struct request to help migrate the
IO schedulers.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:26:57 +02:00
Jens Axboe 9817064b68 [PATCH] elevator: move the backmerging logic into the elevator core
Right now, every IO scheduler implements its own backmerging (except for
noop, which does no merging). That results in duplicated code for
essentially the same operation, which is never a good thing. This patch
moves the backmerging out of the io schedulers and into the elevator
core. We save 1.6kb of text and as a bonus get backmerging for noop as
well. Win-win!

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:26:56 +02:00
Jens Axboe 4aff5e2333 [PATCH] Split struct request ->flags into two parts
Right now ->flags is a bit of a mess: some are request types, and
others are just modifiers. Clean this up by splitting it into
->cmd_type and ->cmd_flags. This allows introduction of generic
Linux block message types, useful for sending generic Linux commands
to block devices.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30 20:23:37 +02:00
Alexey Dobriyan 6c5c934153 [PATCH] ifdef blktrace debugging fields
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:09 -07:00
James Bottomley 1aedf2ccc6 Merge mulgrave-w:git/linux-2.6
Conflicts:

	include/linux/blkdev.h

Trivial merge to incorporate tag prototypes.
2006-09-23 21:03:52 -05:00
Trond Myklebust 275a082fe9 Add a real API for dealing with blk_congestion_wait()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22 23:24:54 -04:00
James Bottomley 492dfb4896 [SCSI] block: add support for shared tag maps
The current block queue implementation already contains most of the
machinery for shared tag maps.  The only remaining pieces are a way to
allocate and destroy a tag map independently of the queues (so that
the maps can be managed on the life cycle of the overseeing entity)

Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-31 11:17:18 -04:00
Oleg Nesterov 9f83e45eb5 [PATCH] Fix current_io_context() vs set_task_ioprio() race
I know nothing about io scheduler, but I suspect set_task_ioprio() is not safe.

current_io_context() initializes "struct io_context", then sets ->io_context.
set_task_ioprio() running on another cpu may see the changes out of order, so
->set_ioprio(ioc) may use io_context which was not initialized properly.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-08-21 08:34:15 +02:00
Jens Axboe 1959d21232 [PATCH] Only the first two bits in bio->bi_rw and rq->flags match
Not three, as assumed. This causes the barrier bit to be needlessly set
for some IO.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-07-06 10:18:05 +02:00
Ingo Molnar 60be6b9a41 [PATCH] lockdep: annotate on-stack completions
lockdep needs to have the waitqueue lock initialized for on-stack waitqueues
implicitly initialized by DECLARE_COMPLETION().  Annotate on-stack completions
accordingly.

Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:09 -07:00
Linus Torvalds 22a3e233ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  Remove obsolete #include <linux/config.h>
  remove obsolete swsusp_encrypt
  arch/arm26/Kconfig typos
  Documentation/IPMI typos
  Kconfig: Typos in net/sched/Kconfig
  v9fs: do not include linux/version.h
  Documentation/DocBook/mtdnand.tmpl: typo fixes
  typo fixes: specfic -> specific
  typo fixes in Documentation/networking/pktgen.txt
  typo fixes: occuring -> occurring
  typo fixes: infomation -> information
  typo fixes: disadvantadge -> disadvantage
  typo fixes: aquire -> acquire
  typo fixes: mecanism -> mechanism
  typo fixes: bandwith -> bandwidth
  fix a typo in the RTC_CLASS help text
  smb is no longer maintained

Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
2006-06-30 15:39:30 -07:00
Christoph Lameter f8891e5e1f [PATCH] Light weight event counters
The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat.  They have no
essential function for the VM.

We use a simple increment of per cpu variables.  In order to avoid the most
severe races we disable preempt.  Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter.  However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter.  For many architectures this will be reduced by the compiler to a
single instruction.  This single instruction is atomic for i386 and x86_64.
 And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
  on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
  Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:36 -07:00
Jörn Engel 6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Chandra Seetharaman 5a67e4c5b6 [PATCH] cpu hotplug: use hotplug version of cpu notifier in appropriate places
Make use the of newly defined hotplug version of cpu_notifier functionality
wherever appropriate.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:41 -07:00
Chandra Seetharaman 054cc8a2d8 [PATCH] cpu hotplug: revert initdata patch submitted for 2.6.17
This patch reverts notifier_block changes made in 2.6.17

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:41 -07:00
Andreas Mohr d6e05edc59 spelling fixes
acquired (aquired)
contiguous (contigious)
successful (succesful, succesfull)
surprise (suprise)
whether (weather)
some other misspellings

Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-26 18:35:02 +02:00
Andi Kleen 8269730b38 [BLOCK] Fix bounce limit address check
Do a safer check for when to enable DMA. Currently we enable ISA DMA
for cases that do not need it, resulting in OOM conditions when ZONE_DMA
runs out of space.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Jens Axboe b31dc66a54 [PATCH] Kill PF_SYNCWRITE flag
A process flag to indicate whether we are doing sync io is incredibly
ugly. It also causes performance problems when one does a lot of async
io and then proceeds to sync it. Part of the io will go out as async,
and the other part as sync. This causes a disconnect between the
previously submitted io and the synced io. For io schedulers such as CFQ,
this will cause us lost merges and suboptimal behaviour in scheduling.

Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
the O_DIRECT path just directly indicate that the writes are sync
by using WRITE_SYNC instead.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Paolo 'Blaisorblade' Giarrusso a038e25364 [PATCH] blk_start_queue() must be called with irq disabled - add warning
The queue lock can be taken from interrupts so it must always be taken with
irq disabling primitives.  Some primitives already verify this.
blk_start_queue() is called under this lock, so interrupts must be
disabled.

Also document this requirement clearly in blk_init_queue(), where the queue
spinlock is set.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:38 +02:00
Oleg Nesterov 626ab0e69d [PATCH] list: use list_replace_init() instead of list_splice_init()
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1.  We can use list_replace_init() instead.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:07 -07:00
Jens Axboe fd0ff8aa1d [PATCH] blk: fix gendisk->in_flight accounting during barrier sequence
While executing barrrier sequence, the bar_rq which carries actual
write was accounted as normal IO on completion, while it wasn't on
queueing.  This caused gendisk->in_flight to be decremented by 1 after
each barrier thus messed up statistics.

This patch makes bar_rq not accounted as normal IO.  As the containing
barrier request as a whole is accounted, part of it shouldn't be.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-23 10:39:43 -07:00
Jens Axboe dac07ec121 [BLOCK] limit request_fn recursion
Don't recurse back into the driver even if the unplug threshold is met,
when the driver asks for a requeue. This is both silly from a logical
point of view (requeues typically happen due to driver/hardware
shortage), and also dangerous since we could hit an endless request_fn
-> requeue -> unplug -> request_fn loop and crash on stack overrun.

Also limit blk_run_queue() to one level of recursion, similar to how
blk_start_queue() works.

This patch fixed a real problem with SLES10 and lpfc, and it could hit
any SCSI lld that returns non-zero from it's ->queuecommand() handler.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-11 12:38:59 -07:00
Chandra Seetharaman 649bbaa484 [PATCH] Remove __devinitdata from notifier block definitions
Few of the notifier_chain_register() callers use __devinitdata in the
definition of notifier_block data structure.  It is incorrect as the
data structure should be available after the initializations (they do
not unregister them during initializations).

This was leading to an oops when notifier_chain_register() call is
invoked for those callback chains after initialization.

This patch fixes all such usages to _not_ have the notifier_block data
structure in the init data section.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-26 08:27:50 -07:00
Coywolf Qi Hunt 7daac49020 [patch] cleanup: use blk_queue_stopped
This cleanup the source to use blk_queue_stopped.

Signed-off-by: Coywolf Qi Hunt <qiyong@freeforge.net>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-20 13:04:36 +02:00
Martin Waitz a580290c3e Documentation: fix minor kernel-doc warnings
This patch updates the comments to match the actual code.

Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-02 13:59:55 +02:00
Linus Torvalds 7baf398f12 Merge branch 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [BLOCK] cfq-iosched: seek and async performance fixes
  [PATCH] ll_rw_blk: fix 80-col offender in put_io_context()
  [PATCH] cfq-iosched: small cfq_choose_req() optimization
  [PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
2006-03-28 09:25:44 -08:00
KAMEZAWA Hiroyuki 0a94502277 [PATCH] for_each_possible_cpu: fixes for generic part
replaces for_each_cpu with for_each_possible_cpu().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:05 -08:00
Jens Axboe 7143dd4b01 [PATCH] ll_rw_blk: fix 80-col offender in put_io_context()
This makes akpm more happy.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 09:00:28 +02:00
Jens Axboe e2d74ac066 [PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
On setups with many disks, we spend a considerable amount of time
looking up the process-disk mapping on each queue of io. Testing with
a NULL based block driver, this costs 40-50% reduction in throughput
for 1000 disks.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 08:59:01 +02:00
Linus Torvalds 4fa639123d Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] Don't make debugfs depend on DEBUG_KERNEL
  [PATCH] Fix blktrace compile with sysfs not defined
  [PATCH] unused label in drivers/block/cciss.
  [BLOCK] increase size of disk stat counters
  [PATCH] blk_execute_rq_nowait-speedup
  [PATCH] ide-cd: quiet down GPCMD_READ_CDVD_CAPACITY failure
  [BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion
  [PATCH] kzalloc() conversion in drivers/block
  [PATCH] update max_sectors documentation
2006-03-27 08:46:49 -08:00
NeilBrown 89e5c8b5b8 [PATCH] md: Make sure QUEUE_FLAG_CLUSTER is set properly for md.
This flag should be set for a virtual device iff it is set for all underlying
devices.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:00 -08:00
Andrew Morton 4c5d0bbde9 [PATCH] blk_execute_rq_nowait-speedup
Both elv_add_request() and generic_unplug_device() grab the queue lock
and disable interrupts, do that locally and use the __ variants.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27 09:29:02 +02:00
Jens Axboe f68110fc28 [BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27 09:29:02 +02:00
Jens Axboe 2056a782f8 [PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-23 20:00:26 +01:00
Al Viro 483f4afc42 [PATCH] fix sysfs interaction and lifetime rules handling for queues 2006-03-18 18:34:37 -05:00
Al Viro 334e94de9b [PATCH] deal with rmmod/put_io_context() races
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:15 -05:00
Al Viro c981ff9f89 [PATCH] fix locking in queue_requests_store()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:51 -05:00
Al Viro 8669aafdb5 [PATCH] fix double-free in blk_init_queue_node()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:49 -05:00
Andi Kleen 5ee1af9f51 [PATCH] block: disable block layer bouncing for most memory on 64bit systems
The low level PCI DMA mapping functions should handle it in most cases.

This should fix problems with depleting the DMA zone early. The old
code used precious GFP_DMA memory in many cases where it was not needed.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 18:10:31 -08:00
Tejun Heo 30e9656cc3 [PATCH] block: implement elv_insert and use it (fix ordcolor flipping bug)
q->ordcolor must only be flipped on initial queueing of a hardbarrier
request.

Constructing ordered sequence and requeueing used to pass through
__elv_add_request() which flips q->ordcolor when it sees a barrier
request.

This patch separates out elv_insert() from __elv_add_request() and uses
elv_insert() when constructing ordered sequence and requeueing.
elv_insert() inserts the given request at the specified position and
does nothing else.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-08 07:52:58 -08:00
Jens Axboe 9a7a67af8b [PATCH] fix ordering on requeued request drainage
Previously, if a fs request which was being drained failed and got
requeued, blk_do_ordered() didn't allow it to be reissued, which causes
queue stall.  This patch makes blk_do_ordered() use the sequence of each
request to determine whether a request can be issued or not.  This fixes
the bug and simplifies code.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:06:51 -08:00
Eric Dumazet 88a2a4ac6b [PATCH] percpu data: only iterate over possible CPUs
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.

As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().

(The above only applies to users of asm-generic/percpu.h.  powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:06:51 -08:00
Jun'ichi "Nick" Nomura 3eaf840e0b [PATCH] device-mapper disk statistics: timing
Record I/O timing statistics

The start time is added to struct dm_io, an existing structure allocated
privately internally within dm and attached to each incoming bio.

We export disk_round_stats() from block/ll_rw_blk.c instead of creating a
private clone.

Signed-off-by: Jun'ichi "Nick" Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:11 -08:00
Jens Axboe fddfdeafa8 [BLOCK] A few kerneldoc fixups
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-31 15:24:34 +01:00
Tetsuo Takata 60481b12b8 [BLOCK] ll_rw_blk: fix setting of ->ordered on init
This makes XFS barrier mounts succeed on my SCSI system.

Signed-off-by: Tetsuo Takata <takatatt@intellilink.co.jp>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-24 10:34:36 +01:00
Jens Axboe 53e86061b5 [BLOCK] ll_rw_blk: use preempt-disabling disk_stat_add() in completion
It can legally be called with interrupts/preemption enabled.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-24 10:06:19 +01:00
Jens Axboe 2cb2e147a6 [BLOCK] ll_rw_blk: make max_sectors and max_hw_sectors unsigned ints
IDE lba48 can support full 64k request size, which overflows the
max_hw_sectors variable.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-24 10:06:19 +01:00
Linus Torvalds e2688f00dc Merge branch 'blk-softirq' of git://brick.kernel.dk/data/git/linux-2.6-block
Manual merge for trivial #include changes
2006-01-09 09:26:40 -08:00
Jens Axboe ff856bad67 [BLOCK] ll_rw_blk: Enable out-of-order request completions through softirq
Request completion can be a quite heavy process, since it needs to
iterate through the entire request and complete the bio's it holds.
This patch adds blk_complete_request() which moves this processing
into a dedicated block softirq.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09 16:02:34 +01:00
Jens Axboe 356cebea11 [BLOCK] Kill blk_attempt_remerge()
It's a broken interface, it's done way too late. And apparently it triggers
slab problems in recent kernels as well (most likely after the generic dispatch
code was merged). So kill it, ide-cd is the only user of it.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09 15:30:20 +01:00
Nicolas Kaiser 1abee6d2d1 [BLOCK][TRIVIAL] ll_rw_blk: header included twice
linux/blkdev.h included twice

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-09 14:44:15 +01:00
Tejun Heo 797e7dbbee [BLOCK] reimplement handling of barrier request
Reimplement handling of barrier requests.

* Flexible handling to deal with various capabilities of
  target devices.
* Retry support for falling back.
* Tagged queues which don't support ordered tag can do ordered.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-06 09:51:03 +01:00
Tejun Heo 52d9e67536 [BLOCK] ll_rw_blk: separate out bio init part from __make_request
Separate out bio initialization part from __make_request.  It
will be used by the following blk_ordered_reimpl.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-06 09:49:58 +01:00
Tejun Heo 8ffdc6550c [BLOCK] add @uptodate to end_that_request_last() and @error to rq_end_io_fn()
add @uptodate argument to end_that_request_last() and @error
to rq_end_io_fn().  there's no generic way to pass error code
to request completion function, making generic error handling
of non-fs request difficult (rq->errors is driver-specific and
each driver uses it differently).  this patch adds @uptodate
to end_that_request_last() and @error to rq_end_io_fn().

for fs requests, this doesn't really matter, so just using the
same uptodate argument used in the last call to
end_that_request_first() should suffice.  imho, this can also
help the generic command-carrying request jens is working on.

Signed-off-by: tejun heo <htejun@gmail.com>
Signed-Off-By: Jens Axboe <axboe@suse.de>
2006-01-06 09:49:03 +01:00
Arjan van de Ven 64100099ed [BLOCK] mark some block/ variables cons
the patch below marks various read-only variables in block/* as const,
so that gcc can optimize the use of them; eg gcc will replace the use by
the value directly now and will even remove the memory usage of these.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-06 09:46:02 +01:00
Jens Axboe 88ee5ef157 [BLOCK] ll_rw_blk: fastpath get_request()
Originally from: Nick Piggin <nickpiggin@yahoo.com.au>

Move current_io_context out of the get_request fastpth.  Also try to
streamline a few other things in this area.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-06 09:39:04 +01:00
Mike Christie defd94b754 [SCSI] seperate max_sectors from max_hw_sectors
- export __blk_put_request and blk_execute_rq_nowait
needed for async REQ_BLOCK_PC requests
- seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and
SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was
already testing against max_sectors and SCSI-ml was setting max_sectors and
max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only
prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set
a valid max_hw_sectors for all LLDs. Today if a LLD does not set it
SCSI-ml sets it to a safe default and some LLDs set it to a artificial low
value to overcome memory and feedback issues.

Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024,
drivers that used to call blk_queue_max_sectors with a large value of
max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-15 15:11:40 -08:00
Mike Christie 6e39b69e7e [SCSI] export blk layer functions needed for blk_execute_rq_nowait
To send async requests we need these two functions exported.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-14 19:00:50 -08:00
Coywolf Qi Hunt eb97b73d75 [BLOCK] new block/ directory comment tidy
Some leftover comments referring to drivers/block that are now block/.
They don't add any information we don't already have, so kill them.

Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-18 21:59:31 +01:00
Linus Torvalds 333c47c847 Merge branch 'block-dir' of git://brick.kernel.dk/data/git/linux-2.6-block 2005-11-07 08:32:39 -08:00
Jens Axboe 3a65dfe8c0 [BLOCK] Move all core block layer code to new block/ directory
drivers/block/ is right now a mix of core and driver parts. Lets move
the core parts to a new top level directory. Al will move the fs/
related block parts to block/ next.

Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-04 08:43:35 +01:00