OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jens Axboe	5044eed488	cfq-iosched: fix alias + front merge bug There's a really rare and obscure bug in CFQ, that causes a crash in cfq_dispatch_insert() due to rq == NULL. One example of the resulting oops is seen here: http://lkml.org/lkml/2007/4/15/41 Neil correctly diagnosed the situation for how this can happen: if two concurrent requests with the exact same sector number (due to direct IO or aliasing between MD and the raw device access), the alias handling will add the request to the sortlist, but next_rq remains NULL. Read the more complete analysis at: http://lkml.org/lkml/2007/4/25/57 This looks like it requires md to trigger, even though it should potentially be possible to due with O_DIRECT (at least if you edit the kernel and doctor some of the unplug calls). The fix is to move the ->next_rq update to when we add a request to the rbtree. Then we remove the possibility for a request to exist in the rbtree code, but not have ->next_rq correctly updated. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-25 08:41:48 -07:00
Jens Axboe	a993800655	cfq-iosched: fix sequential write regression We have a 10-15% performance regression for sequential writes on TCQ/NCQ enabled drives in 2.6.21-rcX after the CFQ update went in. It has been reported by Valerie Clement <valerie.clement@bull.net> and the Intel testing folks. The regression is because of CFQ's now more aggressive queue control, limiting the depth available to the device. This patches fixes that regression by allowing a greater depth when only one queue is busy. It has been tested to not impact sync-vs-async workloads too much - we still do a lot better than 2.6.20. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:29 -07:00
Alan Stern	44ec95425c	[SCSI] sg: cap reserved_size values at max_sectors This patch (as857) modifies the SG_GET_RESERVED_SIZE and SG_SET_RESERVED_SIZE ioctls in the sg driver, capping the values at the device's request_queue's max_sectors value. This will permit cdrecord to obtain a legal value for the maximum transfer length, fixing Bugzilla #7026. The patch also caps the initial reserved_size value. There's no reason to have a reserved buffer larger than max_sectors, since it would be impossible to use the extra space. The corresponding ioctls in the block layer are modified similarly, and the initial value for the reserved_size is set as large as possible. This will effectively make it default to max_sectors. Note that the actual value is meaningless anyway, since block devices don't have a reserved buffer. Finally, the BLKSECTGET ioctl is added to sg, so that there will be a uniform way for users to determine the actual max_sectors value for any raw SCSI transport. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Douglas Gilbert <dougg@torque.net> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-04-17 18:09:56 -04:00
Andrew Morton	2363cc0264	[PATCH] remove protection of LANANA-reserved majors Revert all this. It can cause device-mapper to receive a different major from earlier kernels and it turns out that the Amanda backup program (via GNU tar, apparently) checks major numbers on files when performing incremental backups. Which is a bit broken of Amanda (or tar), but this feature isn't important enough to justify the churn. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-04 21:12:47 -07:00
Thibaut VARENE	1ffb96c587	make elv_register() output atomic Booting 2.6.21-rc3-g45592145 I noticed the following on one of my machines in the bootlog: io scheduler noop registered<6>Time: jiffies clocksource has been installed. io scheduler deadline registered (default) Looking at block/elevator.c, it appears that elv_register() uses two consecutive printks in a non-atomic way, leading to the above glitch. The attached trivial patch fixes this issue, by using a single printk. Signed-off-by: Thibaut VARENE <varenet@parisc-linux.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-03-27 08:53:04 +02:00
Vasily Tarasov	f772b3d9ca	block: blk_max_pfn is somtimes wrong There is a small problem in handling page bounce. At the moment blk_max_pfn equals max_pfn, which is in fact not maximum possible _number_ of a page frame, but the _amount_ of page frames. For example for the 32bit x86 node with 4Gb RAM, max_pfn = 0x100000, but not 0xFFFF. request_queue structure has a member q->bounce_pfn and queue needs bounce pages for the pages _above_ this limit. This routine is handled by blk_queue_bounce(), where the following check is produced: if (q->bounce_pfn >= blk_max_pfn) return; Assume, that a driver has set q->bounce_pfn to 0xFFFF, but blk_max_pfn equals 0x10000. In such situation the check above fails and for each bio we always fall down for iterating over pages tied to the bio. I want to notice, that for quite a big range of device drivers (ide, md, ...) such problem doesn't happen because they use BLK_BOUNCE_ANY for bounce_pfn. BLK_BOUNCE_ANY is defined as blk_max_pfn << PAGE_SHIFT, and then the check above doesn't fail. But for other drivers, which obtain reuired value from drivers, it fails. For example sata_nv uses ATA_DMA_MASK or dev->dma_mask. I propose to use (max_pfn - 1) for blk_max_pfn. And the same for blk_max_low_pfn. The patch also cleanses some checks related with bounce_pfn. Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-03-27 08:52:47 +02:00
Peter Zijlstra	6d740cd5b1	[PATCH] lockdep: annotate BLKPG_DEL_PARTITION >============================================= >[ INFO: possible recursive locking detected ] >2.6.19-1.2909.fc7 #1 >--------------------------------------------- >anaconda/587 is trying to acquire lock: > (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24 > >but task is already holding lock: > (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24 > >other info that might help us debug this: >1 lock held by anaconda/587: > #0: (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24 > >stack backtrace: > [<c0405812>] show_trace_log_lvl+0x1a/0x2f > [<c0405db2>] show_trace+0x12/0x14 > [<c0405e36>] dump_stack+0x16/0x18 > [<c043bd84>] __lock_acquire+0x116/0xa09 > [<c043c960>] lock_acquire+0x56/0x6f > [<c05fb1fa>] __mutex_lock_slowpath+0xe5/0x24a > [<c05fb380>] mutex_lock+0x21/0x24 > [<c04d82fb>] blkdev_ioctl+0x600/0x76d > [<c04946b1>] block_ioctl+0x1b/0x1f > [<c047ed5a>] do_ioctl+0x22/0x68 > [<c047eff2>] vfs_ioctl+0x252/0x265 > [<c047f04e>] sys_ioctl+0x49/0x63 > [<c0404070>] syscall_call+0x7/0xb Annotate BLKPG_DEL_PARTITION's bd_mutex locking and add a little comment clarifying the bd_mutex locking, because I confused myself and initially thought the lock order was wrong too. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-20 17:10:16 -08:00
Andrew Morton	b446b60e4e	[PATCH] rework reserved major handling Several people have reported failures in dynamic major device number handling due to the recent changes in there to avoid handing out the local/experimental majors. Rolf reports that this is due to a gcc-4.1.0 bug. The patch refactors that code a lot in an attempt to provoke the compiler into behaving. Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-20 17:10:13 -08:00
Jesper Juhl	a8e14b950c	update I/O sched Kconfig help texts - CFQ is now default, not AS. Change I/O scheduler description to correctly show CFQ as being the default scheduler and not the anticipatory scheduler that previously was default. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-02-17 20:08:22 +01:00
Arjan van de Ven	2b8693c061	[PATCH] mark struct file_operations const 3 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:45 -08:00
Andrew Morton	fdf892be32	[PATCH] register_blkdev(): don't hand out the LOCAL/EXPERIMENTAL majors As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=7922, dynamic blockdev major allocation can hand out majors which LANANA has defined as being for local/experimental use. Cc: Torben Mathiasen <device@lanana.org> Cc: Greg KH <greg@kroah.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tomas Klas <tomas.klas@mepatek.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
Jens Axboe	9ede209e83	cfq-iosched: improve continue or break logic in cfq_dispatch This improves performance considerably for sync requests when you have command queuing enabled. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	28f95cbc3e	cfq-iosched: remove the implicit queue kicking in slice expire We only really need it for a process going away, so move it to those locations. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	3c6bd2f879	cfq-iosched: check whether a queue timed out in accounting Makes it more fair for the residual slice count. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	cb8874119e	cfq-iosched: tweak the FIFO checking We currently check the FIFO once per slice. Optimize that a bit and only do it as the first thing for a new slice, so we don't end up doing a single request and then seek to the FIFO requests. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	1792669cc1	cfq-iosched: don't pass in queue for cfq_arm_slice_timer() It must always be the active queue, otherwise it's a bug. So just use the active_queue, don't pass it in explicitly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	c5b680f3b7	cfq-iosched: account for slice over/under time If a slice uses less than it is entitled to (or perhaps more), include that in the decision on how much time to give it the next time it gets serviced. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	44f7c16065	cfq-iosched: defer slice activation to first request being active This better matches what time the queue is actually spending doing IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	99f9628aba	[PATCH] cfq-iosched: use last service point as the fairness criteria Right now we use slice_start, which gives async queues an unfair advantage. Chance that to service_last, and base the resorter on that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	b0b8d74941	cfq-iosched: document the cfqq flags Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	98e41c7dfc	[PATCH] cfq-iosched: move on_rr check into cfq_resort_rr_list() Move the on_rr check into cfq_resort_rr_list(), every call site needs to check it anyway. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	aaf1228ddf	cfq-iosched: remove cfq_io_context last_queue It hasn't been used for a while, kill it off and remove the old if 0 code chunk. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	783660b2f6	elevator: don't sort reads between writes Don't allow elv_dispatch_sort() to mix reads and writes together, it's rarely a good idea. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	cad9751642	elevator: abstract out the activate and deactivate functions Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Linus Torvalds	c827ba4cb4	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Update defconfig. [SPARC64]: Add PCI MSI support on Niagara. [SPARC64] IRQ: Use irq_desc->chip_data instead of irq_desc->handler_data [SPARC64]: Add obppath sysfs attribute for SBUS and PCI devices. [PARTITION]: Add whole_disk attribute.	2007-02-11 11:37:45 -08:00
Mathieu Desnoyers	23c887522e	[PATCH] Relay: add CPU hotplug support Mathieu originally needed to add this for tracing Xen, but it's something that's needed for any application that can be tracing while cpus are added. unplug isn't supported by this patch. The thought was that at minumum a new buffer needs to be added when a cpu comes up, but it wasn't worth the effort to remove buffers on cpu down since they'd be freed soon anyway when the channel was closed. [zanussi@us.ibm.com: avoid lock_cpu_hotplug deadlock] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:28 -08:00
Fabio Massimo Di Nitto	d18d7682c1	[PARTITION]: Add whole_disk attribute. Some partitioning systems create special partitions that span the entire disk. One example are Sun partitions, and this whole-disk partition exists to tell the firmware the extent of the entire device so it can load the boot block and do other things. Such partitions should not be treated as normal partitions, because all the other partitions overlap this whole-disk one. So we'd see multiple instances of the same UUID etc. which we do not want. udev and friends can thus search for this 'whole_disk' attribute and use it to decide to ignore the partition. Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:50:00 -08:00
Neil Brown	387bb17374	[PATCH] md: fix various bugs with aligned reads in RAID5 It is possible for raid5 to be sent a bio that is too big for an underlying device. So if it is a READ that we pass stright down to a device, it will fail and confuse RAID5. So in 'chunk_aligned_read' we check that the bio fits within the parameters for the target device and if it doesn't fit, fall back on reading through the stripe cache and making lots of one-page requests. Note that this is the earliest time we can check against the device because earlier we don't have a lock on the device, so it could change underneath us. Also, the code for handling a retry through the cache when a read fails has not been tested and was badly broken. This patch fixes that code. Signed-off-by: Neil Brown <neilb@suse.de> Cc: "Kai" <epimetreus@fastmail.fm> Cc: <stable@suse.de> Cc: <org@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-09 09:25:46 -08:00
Mike Christie	c0d4d573fe	[PATCH] Fix SG_IO timeout jiffy conversion Commit `85e04e371b` cleaned up the timeout conversion, but did it exactly the wrong way. We get msecs from user space, and should convert them into jiffies. Not the other way around. Here is a fix with the overflow check sg.c has added in. This fixes DVD burnign with Nero. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> [ "you'll be wanting a comma there" - Andrew ] Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-29 20:32:03 -08:00
Linas Vepstas	95543179f1	[PATCH] elevator: move clearing of unplug flag earlier A flag was recently added to the elevator code to avoid performing an unplug when reuests are being re-queued. The goal of this flag was to avoid a deep recursion that can occur when re-queueing requests after a SCSI device/host reset. See http://lkml.org/lkml/2006/5/17/254 However, that fix added the flag near the bottom of a case statement, where an earlier break (in an if statement) could transport one out of the case, without setting the flag. This patch sets the flag earlier in the case statement. I re-discovered the deep recursion recently during testing; I was told that it was a known problem, and the fix to it was in the kernel I was testing. Indeed it was ... but it didn't fix the bug. With the patch below, I no longer see the bug. Signed-off by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Jens Axboe <axboe@suse.de> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 11:01:17 -08:00
Jens Axboe	ec8acb6904	[PATCH] cfq-iosched: merging problem Two issues: - The final return 1 should be a return 0, otherwise comparing cfqq is a noop. - bio_sync() only checks the sync flag, while rq_is_sync() checks both for READ and sync. The latter is what we want. Expand the bio check to include reads, and relax the restriction to allow merging of async io into sync requests. In the future we want to clean up the SYNC logic, right now it means both sync request (such as READ and O_DIRECT WRITE) and unplug-on-issue. Leave that for later. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-02 09:46:16 -08:00
Jens Axboe	719d34027e	[PATCH] cfq-iosched: tighten allow merge criteria The logic in cfq_allow_merge() wasn't clear enough - basically allow merging for the same queues only. Do a fast check for 'rq and bio both sync/async' before doing the cfqq hash lookup. This is verified to work with the fixed elv_try_merge() from commit `bb4067e341`. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 14:13:08 -08:00
Randy Dunlap	af9997e426	[PATCH] fix kernel-doc warnings in 2.6.20-rc1 Fix kernel-doc warnings in 2.6.20-rc1. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:47 -08:00
Jens Axboe	bb4067e341	[PATCH] elevator: fixup typo in merge logic The recent io scheduler allow_merge commit left the block layer with no merging, oops. This patch fixes that up. That means the CFQ change needs to be verified again, it might not fix the original bug now. But that's a seperate thing, I'll double check that tomorrow. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-21 22:01:04 -08:00
Jens Axboe	da77526502	[PATCH] cfq-iosched: don't allow sync merges across queues Currently we allow any merge, even if the io originates from different processes. This can cause really bad starvation and unfairness, if those ios happen to be synchronous (reads or direct writes). So add a allow_merge hook to the io scheduler ops, so an io scheduler can help decide whether a bio/process combination may be merged with an existing request. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-20 11:04:12 +01:00
Jens Axboe	8e5cfc45e7	[PATCH] Fixup blk_rq_unmap_user() API The blk_rq_unmap_user() API is not very nice. It expects the caller to know that rq->bio has to be reset to the original bio, and it will silently do nothing if that is not done. Instead make it explicit that we need to pass in the first bio, by expecting a bio argument. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 11:12:46 +01:00
Jens Axboe	48785bb9fa	[PATCH] __blk_rq_unmap_user() fails to return error If the bio is user copied, the copy back could return -EFAULT. Make sure we return any error seen during unmapping. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 11:07:59 +01:00
Jens Axboe	9c9381f942	[PATCH] __blk_rq_map_user() doesn't need to grab the queue_lock It was for driver private back_merge_fn hooks, but they don't exist anymore. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 08:34:17 +01:00
Jens Axboe	1aa4f24fe9	[PATCH] Remove queue merging hooks We have full flexibility of merging parameters now, so we can remove the hooks that define back/front/request merge strategies. Nobody is using them anymore. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 08:33:11 +01:00
Jens Axboe	2985259b0e	[PATCH] ->nr_sectors and ->hard_nr_sectors are not used for BLOCK_PC requests It's a file system thing, for block requests the only size used in the io paths is ->data_len as it is in bytes, not sectors. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-19 08:27:31 +01:00
Jens Axboe	c65fb61b3c	[PATCH] Allow as-iosched to be unloaded We implemented the missing bits to allow this some time ago, and they are integrated in AS. So remove the __module_get() to allow the module to be unloaded. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-13 13:25:18 +01:00
Jens Axboe	7749a8d423	[PATCH] Propagate down request sync flag We need to do this, otherwise the io schedulers don't get access to the sync flag. Then they cannot tell the difference between a regular write and an O_DIRECT write, which can cause a performance loss. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-13 13:02:26 +01:00
FUJITA Tomonori	335302618f	[PATCH] remove unnecessary blk_queue_bounce in SG_IO When I converted the original patch, I left unnecessary blk_queue_bounce in SG_IO. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-12 10:26:55 +01:00
FUJITA Tomonori	77d172ce27	[PATCH] fix SG_IO bio leak This patch fixes bio leaks in SG_IO. rq->bio can be changed after io completion, so we need to reset rq->bio before calling blk_rq_unmap_user() http://marc.theaimsgroup.com/?l=linux-kernel&m=116570666807983&w=2 Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-12 10:22:23 +01:00
Boaz Harrosh	2b02a17920	[PATCH] remove blk_queue_activity_fn While working on bidi support at struct request level I have found that blk_queue_activity_fn is actually never used. The only user is in ide-probe.c with this code: /* enable led activity for disk drives only */ if (drive->media == ide_disk && hwif->led_act) blk_queue_activity_fn(q, hwif->led_act, drive); And led_act is never initialized anywhere. (Looking back at older kernels it was used in the PPC arch, but was removed around 2.6.18) Unless it is all for future use off course. (this patch is against linux-2.6-block.git as off 2006/12/4) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-12 10:22:23 +01:00
Andrew Morton	faccbd4b26	[PATCH] io-accounting: read accounting Wire up read accounting for block devices, within submit_bio(). Cc: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Cc: David Wright <daw@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:41 -08:00
Akinobu Mita	c17bb49517	[PATCH] fault-injection capability for disk IO This patch provides fault-injection capability for disk IO. Boot option: fail_make_request=<probability>,<interval>,<space>,<times> <interval> -- specifies the interval of failures. <probability> -- specifies how often it should fail in percent. <space> -- specifies the size of free space where disk IO can be issued safely in bytes. <times> -- specifies how many times failures may happen at most. Debugfs: /debug/fail_make_request/interval /debug/fail_make_request/probability /debug/fail_make_request/specifies /debug/fail_make_request/times Example: fail_make_request=10,100,0,-1 echo 1 > /sys/blocks/hda/hda1/make-it-fail generic_make_request() on /dev/hda1 fails once per 10 times. Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:29:02 -08:00
Josef Sipek	c5a20b6c26	[PATCH] struct path: convert block Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:44 -08:00
Peter Zijlstra	2e7b651df1	[PATCH] remove the old bd_mutex lockdep annotation Remove the old complex and crufty bd_mutex annotation. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:38 -08:00
Ingo Molnar	0231606785	[PATCH] hotplug CPU: clean up hotcpu_notifier() use There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn, prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus generating compiler warnings of unused symbols, hence forcing people to add #ifdefs. the compiler can skip truly unused functions just fine: text data bss dec hex filename 1624412 728710 3674856 6027978 5bfaca vmlinux.before 1624412 728710 3674856 6027978 5bfaca vmlinux.after [akpm@osdl.org: topology.c fix] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:39 -08:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Alan Stern	a120586873	[PATCH] Allow NULL pointers in percpu_free The patch (as824b) makes percpu_free() ignore NULL arguments, as one would expect for a deallocation routine. (Note that free_percpu is #defined as percpu_free in include/linux/percpu.h.) A few callers are updated to remove now-unneeded tests for NULL. A few other callers already seem to assume that passing a NULL pointer to percpu_free() is okay! The patch also removes an unnecessary NULL check in percpu_depopulate(). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:22 -08:00
David Howells	4796b71fbb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/pcmcia/ds.c Fix up merge failures with Linus's head and fix new compile failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-06 15:01:18 +00:00
Linus Torvalds	ec0bf39a47	Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (73 commits) [SCSI] aic79xx: Add ASC-29320LPE ids to driver [SCSI] stex: version update [SCSI] stex: change wait loop code [SCSI] stex: add new device type support [SCSI] stex: update device id info [SCSI] stex: adjust default queue length [SCSI] stex: add value check in hard reset routine [SCSI] stex: fix controller_info command handling [SCSI] stex: fix biosparam calculation [SCSI] megaraid: fix MMIO casts [SCSI] tgt: fix undefined flush_dcache_page() problem [SCSI] libsas: better error handling in sas_expander.c [SCSI] lpfc 8.1.11 : Change version number to 8.1.11 [SCSI] lpfc 8.1.11 : Misc Fixes [SCSI] lpfc 8.1.11 : Add soft_wwnn sysfs attribute, rename soft_wwn_enable [SCSI] lpfc 8.1.11 : Removed decoding of PCI Subsystem Id [SCSI] lpfc 8.1.11 : Add MSI (Message Signalled Interrupts) support [SCSI] lpfc 8.1.11 : Adjust LOG_FCP logging [SCSI] lpfc 8.1.11 : Fix Memory leaks [SCSI] lpfc 8.1.11 : Fix lpfc_multi_ring_support ...	2006-12-05 16:09:46 -08:00
David Howells	9db7372445	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/ata/libata-scsi.c include/linux/libata.h Futher merge of Linus's head and compilation fixups. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 17:01:28 +00:00
David Howells	4c1ac1b491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 14:37:56 +00:00
Matthew Wilcox	e62438630c	[PATCH] Centralise definitions of sector_t and blkcnt_t CONFIG_LBD and CONFIG_LSF are spread into asm/types.h for no particularly good reason. Centralising the definition in linux/types.h means that arch maintainers don't need to bother adding it, as well as fixing the problem with x86-64 users being asked to make a decision that has absolutely no effect. The H8/300 porters seem particularly confused since I'm not aware of any microcontrollers that need to support 2TB filesystems. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-04 19:41:15 -08:00
Jens Axboe	a863055b10	[PATCH] blktrace: don't return blktrace_seq from trace_note() Only the process notifier needs it, and it can set it manually. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-04 09:30:58 +01:00
Jens Axboe	d3d9d2a5ea	[PATCH] blktrace: uninline trace_note() It's too large to inline. Additionally clean it up, by fast pathing the likely path. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-04 09:27:41 +01:00
Jens Axboe	bb37b94c68	[BLOCK] Cleanup unused variable passing - ->init_queue() does not need the elevator passed in - ->put_request() is a hot path and need not have the queue passed in - cfq_update_io_seektime() does not need cfqd passed in Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-01 10:42:33 +01:00
Mike Christie	0e75f9063f	[PATCH] block: support larger block pc requests This patch modifies blk_rq_map/unmap_user() and the cdrom and scsi_ioctl.c users so that it supports requests larger than bio by chaining them together. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-01 10:40:55 +01:00
Olaf Kirch	be1c63411a	[PATCH] blktrace: add timestamp message This adds a new timestamp message to blktrace, giving the timeofday when we starting tracing. This helps user space correlate block trace events with eg an application strace. This requires a (compatible) update to blkparse. The changed blkparse is still able to process traces generated by older kernels, and older versions of blkparse should silently ignore the new records (because they have a pid of 0). Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-12-01 10:39:12 +01:00
James Bottomley	0bd2af4683	Merge ../scsi-rc-fixes-2.6	2006-11-22 12:06:44 -06:00
David Howells	65f27f3844	WorkStruct: Pass the work_struct pointer instead of context data Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is not cleared before jumping to the work function, then the work function may access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:55:48 +00:00
Tejun Heo	097b8457da	[PATCH] scsi: clear garbage after CDBs on SG_IO ATAPI devices transfer fixed number of bytes for CDBs (12 or 16). Some ATAPI devices choke when shorter CDB is used and the left bytes contain garbage. Block SG_IO cleared left bytes but SCSI SG_IO didn't. This patch makes SCSI SG_IO clear it and simplify CDB clearing in block SG_IO. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Mathieu Fluhr <mfluhr@nero.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Douglas Gilbert <dougg@torque.net> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: <stable@kernel.org> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-16 11:43:38 -08:00
Hannes Reinecke	85e04e371b	[SCSI] block: convert jiffies to msecs in scsi_ioctl() Use the proper conversion function for convert jiffies to msecs in sg_io(). Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2006-11-15 17:57:33 -06:00
Jens Axboe	616e8a091a	[PATCH] Fix bad data direction in SG_IO Contrary to what the name misleads you to believe, SG_DXFER_TO_FROM_DEV is really just a normal read seen from the device side. This patch fixes http://lkml.org/lkml/2006/10/13/100 Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-13 09:47:00 -08:00
Andrew Morton	df66b8552b	[PATCH] tidy "md: check bio address after mapping through partitions" Neil's xterms are too wide. Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-03 12:27:55 -08:00
Jens Axboe	5fccbf61be	[PATCH] CFQ: request <-> request merging rr_list fixup In very rare circumstances would we be pruning a merged request and at the same time delete the implicated cfqq from the rr_list, and not readd it when the merged request got added. This could cause io stalls until that process issued io again. Fix it up by putting the rr_list add handling into cfq_add_rq_rb(), identical to how pruning is handled in cfq_del_rq_rb(). This fixes a hang reproducible with fsx-linux. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-31 08:12:45 -08:00
NeilBrown	5ddfe9691c	[PATCH] md: check bio address after mapping through partitions. Partitions are not limited to live within a device. So we should range check after partition mapping. Note that 'maxsector' was being used for two different things. I have split off the second usage into 'old_sector' so that maxsector can be still be used for it's primary usage later in the function. Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-31 08:07:01 -08:00
Jens Axboe	c1b707d253	[PATCH] CFQ: bad locking in changed_ioprio() When the ioprio code recently got juggled a bit, a bug was introduced. changed_ioprio() is no longer called with interrupts disabled, so using plain spin_lock() on the queue_lock is a bug. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-30 11:01:50 -08:00
Jens Axboe	0261d6886e	[PATCH] CFQ: use irq safe locking in cfq_cic_link() If cfq_set_request() is called for a new process AND a non-fs io request (so that __GFP_WAIT may not be set), cfq_cic_link() may use spin_lock_irq() and spin_unlock_irq() with interrupts already disabled. Fix is to always use irq safe locking in cfq_cic_link() Acked-By: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-30 10:21:58 -08:00
Andrew Morton	3fcfab16c5	[PATCH] separate bdi congestion functions from queue congestion functions Separate out the concept of "queue congestion" from "backing-dev congestion". Congestion is a backing-dev concept, not a queue concept. The blk_* congestion functions are retained, as wrappers around the core backing-dev congestion functions. This proper layering is needed so that NFS can cleanly use the congestion functions, and so that CONFIG_BLOCK=n actually links. Cc: "Thomas Maier" <balagi@justmail.de> Cc: "Jens Axboe" <jens.axboe@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: David Howells <dhowells@redhat.com> Cc: Peter Osterlund <petero2@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:35 -07:00
Thomas Maier	79e2de4bc5	[PATCH] export clear_queue_congested and set_queue_congested Export the clear_queue_congested() and set_queue_congested() functions located in ll_rw_blk.c The functions are renamed to blk_clear_queue_congested() and blk_set_queue_congested(). (needed in the pktcdvd driver's bio write congestion control) Signed-off-by: Thomas Maier <balagi@justmail.de> Cc: Peter Osterlund <petero2@telia.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:35 -07:00
Vasily Tarasov	c584164224	[PATCH] block layer: elv_iosched_show should get elv_list_lock elv_iosched_show function iterates other elv_list, hence elv_list_lock should be got. Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Signed-off-by: Vasily Tarasov <jens.axboe@oracle.com>	2006-10-12 15:08:51 +02:00
Vasily Tarasov	a22b169df1	[PATCH] block layer: elevator_find function cleanup We can easily produce search through the elevator list without introducing additional elevator_type variable. Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-10-12 15:08:51 +02:00
David C Somayajulu	f583f4924d	[PATCH] helper function for retrieving scsi_cmd given host based block layer tag This was necessitated by the need for a function to get back to a scsi_cmnd, when an hba the posts its (corresponding) completion interrupt with a block layer tag as its reference. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: David Somayajulu <david.somayajulu@qlogic.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2006-10-04 19:32:09 +02:00
Alasdair G Kergon	7006f6eca8	[PATCH] dm: export blkdev_driver_ioctl Export blkdev_driver_ioctl for device-mapper. If we get as far as the device-mapper ioctl handler, we know the ioctl is not a standard block layer BLK* one, so we don't need to check for them a second time and can call blkdev_driver_ioctl() directly. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:13 -07:00
Peter Zijlstra	6e9a4738c9	[PATCH] completions: lockdep annotate on stack completions All on stack DECLARE_COMPLETIONs should be replaced by: DECLARE_COMPLETION_ONSTACK Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:24 -07:00
Jens Axboe	51d7513a8a	[PATCH] Only enable CONFIG_BLOCK option for embedded It's too easy for people to shoot themselves in the foot, and it only makes sense for embedded folks anyway. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 21:14:05 +02:00
Jens Axboe	059af497c2	[PATCH] blk_queue_start_tag() shared map race fix If we share the tag map between two or more queues, then we cannot use __set_bit() to set the bit. In fact we need to make sure we atomically acquire this tag, so loop using test_and_set_bit() to protect from that. Noticed by Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:34 +02:00
Jens Axboe	0fe2347957	[PATCH] Update axboe@suse.de email address As people often look for the copyright in files to see who to mail, update the link to a neutral one. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:34 +02:00
David Howells	9361401eb7	[PATCH] BLOCK: Make it possible to disable the block layer [try #6 ] Make it possible to disable the block layer. Not all embedded devices require it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require the block layer to be present. This patch does the following: () Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev support. () Adds dependencies on CONFIG_BLOCK to any configuration item that controls an item that uses the block layer. This includes: () Block I/O tracing. () Disk partition code. () All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS. () The SCSI layer. As far as I can tell, even SCSI chardevs use the block layer to do scheduling. Some drivers that use SCSI facilities - such as USB storage - end up disabled indirectly from this. () Various block-based device drivers, such as IDE and the old CDROM drivers. () MTD blockdev handling and FTL. () JFFS - which uses set_bdev_super(), something it could avoid doing by taking a leaf out of JFFS2's book. () Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is, however, still used in places, and so is still available. () Also made contingent are the contents of linux/mpage.h, linux/genhd.h and parts of linux/fs.h. () Makes a number of files in fs/ contingent on CONFIG_BLOCK. () Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK. () set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK is not enabled. () fs/no-block.c is created to hold out-of-line stubs and things that are required when CONFIG_BLOCK is not set: () Default blockdev file operations (to give error ENODEV on opening). () Makes some /proc changes: () /proc/devices does not list any blockdevs. () /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK. () Makes some compat ioctl handling contingent on CONFIG_BLOCK. () If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if given command other than Q_SYNC or if a special device is specified. () In init/do_mounts.c, no reference is made to the blockdev routines if CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2. () The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return error ENOSYS by way of cond_syscall if so). () The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if CONFIG_BLOCK is not set, since they can't then happen. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:31 +02:00
Martin Peschke	4090959aee	[PATCH] blktrace: cleanup using on_each_cpu This patch kills a few lines of code in blktrace by making use of on_each_cpu(). Signed-off-by: Martin Peschke <mp3@de.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:31:19 +02:00
Oleg Nesterov	25034d7a83	[PATCH] exit_io_context: don't disable irqs We don't need to disable irqs to clear current->io_context, it is protected by ->alloc_lock. Even IF it was possible to submit I/O from IRQ on behalf of current this irq_disable() can't help: current_io_context() will re-instantiate ->io_context after irq_enable(). We don't need task_lock() or local_irq_disable() to clear ioc->task. This can't prevent other CPUs from playing with our io_context anyway. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:31:18 +02:00
Jens Axboe	7457e6e2d7	[PATCH] blktrace: support for logging metadata reads Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:43 +02:00
Jens Axboe	374f84ac39	[PATCH] cfq-iosched: use metadata read flag Give meta data reads preference over regular reads, as the process often needs to get that out of the way to do the io it was actually interested in. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:43 +02:00
Jens Axboe	5404bc7a87	[PATCH] Allow file systems to differentiate between data and meta reads We can use this information for making more intelligent priority decisions, and it will also be useful for blktrace. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:42 +02:00
Jens Axboe	da20a20f3b	[PATCH] ll_rw_blk: allow more flexibility for read_ahead_kb store It can make sense to set read-ahead larger than a single request. We should not be enforcing such policy on the user. Additionally, using the BLKRASET ioctl doesn't impose such a restriction. So additionally we now expose identical behaviour through the two. Issue also reported by Anton <cbou@mail.ru> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:41 +02:00
Jens Axboe	bf57225670	[PATCH] cfq-iosched: improve queue preemption Don't touch the current queues, just make sure that the wanted queue is selected next. Simplifies the logic. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:41 +02:00
Jens Axboe	dc72ef4ae3	[PATCH] Add blk_start_queueing() helper CFQ implements this on its own now, but it's really block layer knowledge. Tells a device queue to start dispatching requests to the driver, taking care to unplug if needed. Also fixes the issue where as/cfq will invoke a stopped queue, which we really don't want. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:40 +02:00
Jens Axboe	981a79730d	[PATCH] cfq-iosched: kill the empty_list No point in having a place holder list just for empty queues, so remove it. It's not used for anything other than to keep ->cfq_list busy. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:40 +02:00
Jens Axboe	53b03744e5	[PATCH] cfq-iosched: Kill O(N) runtime of cfq_resort_rr_list() Currently it scales with number of processes in that priority group, which is potentially not very nice as it's called quite often. Basically we always need to do tail inserts, except for the case of a new process. So just mark/detect a queue as such. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:39 +02:00
Jens Axboe	b5deef9012	[PATCH] Make sure all block/io scheduler setups are node aware Some were kmalloc_node(), some were still kmalloc(). Change them all to kmalloc_node(). Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:39 +02:00
Jens Axboe	1ea25ecb72	[PATCH] Audit block layer inlines Kill a few inlines that bring in too much code to more than one location Shrinks kernel text by about 300 bytes on 32-bit x86. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:38 +02:00
Jens Axboe	4050cf1674	[PATCH] cfq-iosched: use new io context counting mechanism It's ok if the read path is a lot more costly, as long as inc/dec is really cheap. The inc/dec will happen for each created/freed io context, while the reading only happens when a disk queue exits. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:37 +02:00
Jens Axboe	e4313dd423	[PATCH] as-iosched: use new io context counting mechanism It's ok if the read path is a lot more costly, as long as inc/dec is really cheap. The inc/dec will happen for each created/freed io context, while the reading only happens when a disk queue exits. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:37 +02:00
Jens Axboe	fc46379daf	[PATCH] cfq-iosched: kill cfq_exit_lock cfq_exit_lock is protecting two things now: - The per-ioc rbtree of cfq_io_contexts - The per-cfqd linked list of cfq_io_contexts The per-cfqd linked list can be protected by the queue lock, as it is (by definition) per cfqd as the queue lock is. The per-ioc rbtree is mainly used and updated by the process itself only. The only outside use is the io priority changing. If we move the priority changing to not browsing the rbtree, we can remove any locking from the rbtree updates and lookup completely. Let the sys_ioprio syscall just mark processes as having the iopriority changed and lazily update the private cfq io contexts the next time io is queued, and we can remove this locking as well. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:36 +02:00
Jens Axboe	89850f7ee9	[PATCH] cfq-iosched: cleanups, fixes, dead code removal A collection of little fixes and cleanups: - We don't use the 'queued' sysfs exported attribute, since the may_queue() logic was rewritten. So kill it. - Remove dead defines. - cfq_set_active_queue() can be rewritten cleaner with else if conditions. - Several places had cfq_exit_cfqq() like logic, abstract that out and use that. - Annotate the cfqq kmem_cache_alloc() so the allocator knows that this is a repeat allocation if it fails with __GFP_WAIT set. Allows the allocator to start freeing some memory, if needed. CFQ already loops for this condition, so might as well pass the hint down. - Remove cfqd->rq_starved logic. It's not needed anymore after we dropped the crq allocation in cfq_set_request(). - Remove uneeded parameter passing. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:35 +02:00
Jens Axboe	51da90fcb6	[PATCH] ll_rw_blk: cleanup __make_request() - Don't assign variables that are only used once. - Kill spin_lock() prefetching, it's opportunistic at best. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:34 +02:00
Jens Axboe	cb78b285c8	[PATCH] Drop useless bio passing in may_queue/set_request API It's not needed for anything, so kill the bio passing. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:23 +02:00
Jens Axboe	cdd6026217	[PATCH] Remove ->rq_status from struct request After Christophs SCSI change, the only usage left is RQ_ACTIVE and RQ_INACTIVE. The block layer sets RQ_INACTIVE right before freeing the request, so any check for RQ_INACTIVE in a driver is a bug and indicates use-after-free. So kill/clean the remaining users, straight forward. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:23 +02:00
Jens Axboe	49171e5c6f	[PATCH] Remove struct request_list from struct request It is always identical to &q->rq, and we only use it for detecting whether this request came out of our mempool or not. So replace it with an additional ->flags bit flag. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:29:22 +02:00
Jens Axboe	c00895ab2f	[PATCH] Remove ->waiting member from struct request As the comments indicates in blkdev.h, we can fold it into ->end_io_data usage as that is really what ->waiting is. Fixup the users of blk_end_sync_rq(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:29:12 +02:00
Jens Axboe	8a8e674cb1	[PATCH] as-iosched: kill arq Get rid of the as_rq request type. With the added elevator_private2, we have enough room in struct request to get rid of any arq allocation/free for each request. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de>	2006-09-30 20:27:02 +02:00
Jens Axboe	5e70537479	[PATCH] cfq-iosched: kill crq Get rid of the cfq_rq request type. With the added elevator_private2, we have enough room in struct request to get rid of any crq allocation/free for each request. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:27:02 +02:00
Jens Axboe	5380a101d3	[PATCH] cfq-iosched: remove the crq flag functions/variable There's just one flag currently (SYNC), and that one can be grabbed from the request. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:27:01 +02:00
Jens Axboe	8840faa1ee	[PATCH] deadline-iosched: remove elevator private drq request type A big win, we now save an allocation/free on each request! With the previous rb/hash abstractions, we can just reuse queuelist/donelist for the FIFO data and be done with it. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:27:00 +02:00
Jens Axboe	9e2585a8a2	[PATCH] as-iosched: remove arq->is_sync member We can track this in struct request. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de>	2006-09-30 20:27:00 +02:00
Jens Axboe	d4f2f4629e	[PATCH] as-iosched: reuse rq for fifo Saves some space in arq. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de>	2006-09-30 20:27:00 +02:00
Jens Axboe	95e8810b28	[PATCH] cfq-iosched: convert to using the FIFO elevator defines Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:59 +02:00
Jens Axboe	b8aca35af5	[PATCH] deadline-iosched: migrate to using the elevator rb functions This removes the rbtree handling from deadline. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:58 +02:00
Jens Axboe	21183b07ee	[PATCH] cfq-iosched: migrate to using the elevator rb functions This removes the rbtree handling from CFQ. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:58 +02:00
Jens Axboe	e37f346e34	[PATCH] as-iosched: migrate to using the elevator rb functions This removes the rbtree handling from AS. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de>	2006-09-30 20:26:57 +02:00
Jens Axboe	2e662b65f0	[PATCH] elevator: abstract out the rbtree sort handling The rbtree sort/lookup/reposition logic is mostly duplicated in cfq/deadline/as, so move it to the elevator core. The io schedulers still provide the actual rb root, as we don't want to impose any sort of specific handling on the schedulers. Introduce the helpers and rb_node in struct request to help migrate the IO schedulers. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:57 +02:00
Jens Axboe	10fd48f237	[PATCH] rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev The conditions got reserved. Also make rb_next() and rb_prev() check for the empty condition. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:56 +02:00
Jens Axboe	9817064b68	[PATCH] elevator: move the backmerging logic into the elevator core Right now, every IO scheduler implements its own backmerging (except for noop, which does no merging). That results in duplicated code for essentially the same operation, which is never a good thing. This patch moves the backmerging out of the io schedulers and into the elevator core. We save 1.6kb of text and as a bonus get backmerging for noop as well. Win-win! Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:26:56 +02:00
Jens Axboe	4aff5e2333	[PATCH] Split struct request ->flags into two parts Right now ->flags is a bit of a mess: some are request types, and others are just modifiers. Clean this up by splitting it into ->cmd_type and ->cmd_flags. This allows introduction of generic Linux block message types, useful for sending generic Linux commands to block devices. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-09-30 20:23:37 +02:00
Alexey Dobriyan	6c5c934153	[PATCH] ifdef blktrace debugging fields Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:09 -07:00
Randy Dunlap	87a5726110	[PATCH] block: handle subsystem_register() init errors Check and handle init errors. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:05 -07:00
Theodore Ts'o	8e18e2941c	[PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_private The following patches reduce the size of the VFS inode structure by 28 bytes on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction in the inode size on a UP kernel that is configured in a production mode (i.e., with no spinlock or other debugging functions enabled; if you want to save memory taken up by in-core inodes, the first thing you should do is disable the debugging options; they are responsible for a huge amount of bloat in the VFS inode structure). This patch: The filesystem or device-specific pointer in the inode is inside a union, which is pretty pointless given that all 30+ users of this field have been using the void pointer. Get rid of the union and rename it to i_private, with a comment to explain who is allowed to use the void pointer. This is just a cleanup, but it allows us to reuse the union 'u' for something something where the union will actually be used. [judith@osdl.org: powerpc build fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Judith Lebzelter <judith@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:17 -07:00
James Bottomley	1aedf2ccc6	Merge mulgrave-w:git/linux-2.6 Conflicts: include/linux/blkdev.h Trivial merge to incorporate tag prototypes.	2006-09-23 21:03:52 -05:00
Trond Myklebust	275a082fe9	Add a real API for dealing with blk_congestion_wait() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:54 -04:00
James Bottomley	492dfb4896	[SCSI] block: add support for shared tag maps The current block queue implementation already contains most of the machinery for shared tag maps. The only remaining pieces are a way to allocate and destroy a tag map independently of the queues (so that the maps can be managed on the life cycle of the overseeing entity) Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2006-08-31 11:17:18 -04:00
Oleg Nesterov	2d8f613160	elv_unregister: fix possible crash on module unload An exiting task or process which didn't do I/O yet have no io context, elv_unregister() should check it is not NULL. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-08-22 12:52:23 -07:00
Oleg Nesterov	be33c3a67b	[PATCH] cfq_cic_link: fix usage of wrong cfq_io_context Obviously, cfq_cic_link() shouldn't free a just allocated cfq_io_context? The dead key is from __cic, so drop that. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-08-21 10:02:54 +02:00
Oleg Nesterov	9f83e45eb5	[PATCH] Fix current_io_context() vs set_task_ioprio() race I know nothing about io scheduler, but I suspect set_task_ioprio() is not safe. current_io_context() initializes "struct io_context", then sets ->io_context. set_task_ioprio() running on another cpu may see the changes out of order, so ->set_ioprio(ioc) may use io_context which was not initialized properly. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-08-21 08:34:15 +02:00
Jens Axboe	44eb123126	[PATCH] cfq-iosched: don't use a hard jiffies value, translate from msecs The CIC_SEEKY() test really wants to use the minimum of either: - 2 msecs (not jiffies) - or, the pending slice time So code it like that. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-07-25 15:05:21 +02:00
Milton Miller	ad01b1ca79	[PATCH] blktrace: fix read-ahead bit It should be toggling the same bit on and off, fix it up. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-07-25 15:04:13 +02:00
Arjan van de Ven	ddca60c590	[PATCH] lockdep: annotate the BLKPG_DEL_PARTITION ioctl The delete partition IOCTL takes the bd_mutex for both the disk and the partition; these have an obvious hierarchical relationship and this patch annotates this relationship for lockdep. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 21:53:53 -07:00
Jens Axboe	1959d21232	[PATCH] Only the first two bits in bio->bi_rw and rq->flags match Not three, as assumed. This causes the barrier bit to be needlessly set for some IO. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-07-06 10:18:05 +02:00
Nathan Scott	40359ccb83	[PATCH] blktrace: readahead support Provide the needed kernel support for distinguishing readahead from regular read requests when tracing block devices. Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-07-06 10:03:28 +02:00
Ingo Molnar	60be6b9a41	[PATCH] lockdep: annotate on-stack completions lockdep needs to have the waitqueue lock initialized for on-stack waitqueues implicitly initialized by DECLARE_COMPLETION(). Annotate on-stack completions accordingly. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-03 15:27:09 -07:00
Linus Torvalds	22a3e233ca	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: Remove obsolete #include <linux/config.h> remove obsolete swsusp_encrypt arch/arm26/Kconfig typos Documentation/IPMI typos Kconfig: Typos in net/sched/Kconfig v9fs: do not include linux/version.h Documentation/DocBook/mtdnand.tmpl: typo fixes typo fixes: specfic -> specific typo fixes in Documentation/networking/pktgen.txt typo fixes: occuring -> occurring typo fixes: infomation -> information typo fixes: disadvantadge -> disadvantage typo fixes: aquire -> acquire typo fixes: mecanism -> mechanism typo fixes: bandwith -> bandwidth fix a typo in the RTC_CLASS help text smb is no longer maintained Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S	2006-06-30 15:39:30 -07:00
Christoph Lameter	f8891e5e1f	[PATCH] Light weight event counters The remaining counters in page_state after the zoned VM counter patches have been applied are all just for show in /proc/vmstat. They have no essential function for the VM. We use a simple increment of per cpu variables. In order to avoid the most severe races we disable preempt. Preempt does not prevent the race between an increment and an interrupt handler incrementing the same statistics counter. However, that race is exceedingly rare, we may only loose one increment or so and there is no requirement (at least not in kernel) that the vm event counters have to be accurate. In the non preempt case this results in a simple increment for each counter. For many architectures this will be reduced by the compiler to a single instruction. This single instruction is atomic for i386 and x86_64. And therefore even the rare race condition in an interrupt is avoided for both architectures in most cases. The patchset also adds an off switch for embedded systems that allows a building of linux kernels without these counters. The implementation of these counters is through inline code that hopefully results in only a single instruction increment instruction being emitted (i386, x86_64) or in the increment being hidden though instruction concurrency (EPIC architectures such as ia64 can get that done). Benefits: - VM event counter operations usually reduce to a single inline instruction on i386 and x86_64. - No interrupt disable, only preempt disable for the preempt case. Preempt disable can also be avoided by moving the counter into a spinlock. - Handling is similar to zoned VM counters. - Simple and easily extendable. - Can be omitted to reduce memory use for embedded use. References: RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2 RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2 local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2 V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2 V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2 V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-30 11:25:36 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Chandra Seetharaman	5a67e4c5b6	[PATCH] cpu hotplug: use hotplug version of cpu notifier in appropriate places Make use the of newly defined hotplug version of cpu_notifier functionality wherever appropriate. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-27 17:32:41 -07:00
Chandra Seetharaman	054cc8a2d8	[PATCH] cpu hotplug: revert initdata patch submitted for 2.6.17 This patch reverts notifier_block changes made in 2.6.17 Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-27 17:32:41 -07:00
Andreas Mohr	d6e05edc59	spelling fixes acquired (aquired) contiguous (contigious) successful (succesful, succesfull) surprise (suprise) whether (weather) some other misspellings Signed-off-by: Andreas Mohr <andi@lisas.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-26 18:35:02 +02:00
Andi Kleen	8269730b38	[BLOCK] Fix bounce limit address check Do a safer check for when to enable DMA. Currently we enable ISA DMA for cases that do not need it, resulting in OOM conditions when ZONE_DMA runs out of space. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	dd67d05152	[PATCH] rbtree: support functions used by the io schedulers They all duplicate macros to check for empty root and/or node, and clearing a node. So put those in rbtree.h. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	fd61af0384	[PATCH] cfq-iosched: rq update fixes - Remember to set ->last_sector so that the cfq_choose_req() logic works correctly. - Remove redundant call to cfq_choose_req() Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	caaa5f9f0a	[PATCH] cfq-iosched: many performance fixes This is a collection of patches that greatly improve CFQ performance in some circumstances. - Change the idling logic to only kick in after a request is done and we are deciding what to do. Before the idling included the request service time, so it was hard to adjust. Now it's true think/idle time. - Take advantage of TCQ/NCQ/queueing for seeky sync workloads, but keep it in control for sync and sequential (or close to) workloads. - Expire queues immediately and move on to other busy queues, if we are not going to idle after the current one finishes. - Don't rearm idle timer if there are no busy queues. Just leave the system idle. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	35e6077cb1	[PATCH] cfq-iosched: correctly set ioprio on both targets Patch originally from Vasily Tarasov <vtaras@sw.ru> If you set io-priority of process 1 using sys_ioprio_set system call by another process 2 (like ionice do), then cfq_init_prio_data() function sets priority of process 2 (current) on queue of process 1 and clears the flag, that designates change of ioprio. So the process 1 will work like with priority of process 2. I propose not to call cfq_init_prio_data() on io-priority change, but only mark queue as queue with changed prority. Every time when new request comes cfq-scheduler checks for this flag and atomaticaly changes priority of queue to new value. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	b17fd9bceb	[PATCH] Make CFQ the default IO scheduler Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	b31dc66a54	[PATCH] Kill PF_SYNCWRITE flag A process flag to indicate whether we are doing sync io is incredibly ugly. It also causes performance problems when one does a lot of async io and then proceeds to sync it. Part of the io will go out as async, and the other part as sync. This causes a disconnect between the previously submitted io and the synced io. For io schedulers such as CFQ, this will cause us lost merges and suboptimal behaviour in scheduling. Remove PF_SYNCWRITE completely from the fsync/msync paths, and let the O_DIRECT path just directly indicate that the writes are sync by using WRITE_SYNC instead. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:39 +02:00
Jens Axboe	271f18f102	[PATCH] cfq-iosched: Don't set the queue batching limits We cannot update them if the user changes nr_requests, so don't set it in the first place. The gains are pretty questionable as well. The batching loss has been shown to decrease throughput. Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:38 +02:00
Dave Jones	acf4217555	[PATCH] remove dead code from elevator switching We already drop the refcount in elevator_exit(), and as we're setting 'e' to NULL, we'll never take that branch anyway. Finally, as 'e' is a local var that isn't referenced afterwards, setting it to NULL is pointless. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:38 +02:00
Paolo 'Blaisorblade' Giarrusso	a038e25364	[PATCH] blk_start_queue() must be called with irq disabled - add warning The queue lock can be taken from interrupts so it must always be taken with irq disabling primitives. Some primitives already verify this. blk_start_queue() is called under this lock, so interrupts must be disabled. Also document this requirement clearly in blk_init_queue(), where the queue spinlock is set. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>	2006-06-23 17:10:38 +02:00
Akinobu Mita	bae386f788	[PATCH] iosched: use hlist for request hashtable Use hlist instead of list_head for request hashtable in deadline-iosched and as-iosched. It also can remove the flag to know hashed or unhashed. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Jens Axboe <axboe@suse.de> block/as-iosched.c \| 45 +++++++++++++++++++-------------------------- block/deadline-iosched.c \| 39 ++++++++++++++++----------------------- 2 files changed, 35 insertions(+), 49 deletions(-)	2006-06-23 17:10:38 +02:00

1 2 3 4 5 ...

369 Commits