OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Ed Cashin	fea05a26c3	aoe: update copyright year in touched files Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:28 +09:00
Ed Cashin	b21faa25c6	aoe: remove unused code and add cosmetic improvements This change removes some unused code and attempts to increase code consistency. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	1b86fda9ad	aoe: increase net_device reference count while using it This change eliminates the danger that the user could rmmod the driver for a network interface that is being used for AoE by the aoe driver. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	64a80f5ac7	aoe: associate frames with the AoE storage target In the driver code, "target" and aoetgt refer to a particular remote interface on the AoE storage target. The latter is identified by its AoE major and minor addresses. Commands that are being sent to an AoE storage target {major, minor} can be sent or retransmitted to any of the remote MAC addresses associated with the AoE storage target. That is, frames are naturally associated with not an aoetgt (AoE major, AoE minor, remote MAC address) but an aoedev (AoE major, AoE minor). Making the code reflect that reality simplifies the driver, especially when the path to a remote MAC address becomes unusable. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	6583303c5e	aoe: disallow unsupported AoE minor addresses A guard is inserted to prevent AoE minor addresses (slot addresses) higher than 15 to be used, as they are not yet supported by the driver. There is a change coming that will allow the aoe driver to overcome this limit by using system device minor numbers dynamically, but until then, this guard prevents unexpected targets from being used by the driver when AoE targets with high minor numbers are on the AoE network. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	d54d35ac66	aoe: failover remote interface based on aoe_deadsecs parameter The aoe_deadsecs module parameter allows the user to specify a hard limit on the number of seconds an AoE command can be retransmitted before the AoE block device is considered to have failed. Using aoe_deadsecs to determine the time we try using a different remote interface helps to ensure that the hard limit is not reached before we've tried to recover by sending to a different remote port. As a data storage target, the AoE target is unambiguously identified by its {major, minor} AoE address tuple, and an AoE target can have multiple MAC addresses. However, note that "target" in the driver code and comments means a {major, minor, MAC address} tuple, as in "somewhere to send packets". Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	3f0f013374	aoe: use packets that work with the smallest-MTU local interface Users with several network interfaces dedicated to AoE generally do not configure them to support different-sized AoE data payloads on purpose. For a given AoE target, there will be a set of local network interfaces that can reach it. Using only the payload that will fit in the smallest-sized MTU of all those local interfaces greatly simplifies the driver, especially in failure scenarios. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	eb086ec596	aoe: use a kernel thread for transmissions The dev_queue_xmit function needs to have interrupts enabled, so the most simple way to get the locking right but still fulfill that requirement is to use a process that can call dev_queue_xmit serially over queued transmissions. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	69cf2d85de	aoe: become I/O request queue handler for increased user control To allow users to choose an elevator algorithm for their particular workloads, change from a make_request-style driver to an I/O-request-queue-handler-style driver. We have to do a couple of things that might be surprising. We manipulate the page _count directly on the assumption that we still have no guarantee that users of the block layer are prohibited from submitting bios containing pages with zero reference counts.[1] If such a prohibition now exists, I can get rid of the _count manipulation. Just as before this patch, we still keep track of the sk_buffs that the network layer still hasn't finished yet and cap the resources we use with a "pool" of skbs.[2] Now that the block layer maintains the disk stats, the aoe driver's diskstats function can go away. 1. https://lkml.org/lkml/2007/3/1/374 2. https://lkml.org/lkml/2007/7/6/241 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	896831f590	aoe: kernel thread handles I/O completions for simple locking Make the frames the aoe driver uses to track the relationship between bios and packets more flexible and detached, so that they can be passed to an "aoe_ktio" thread for completion of I/O. The frames are handled much like skbs, with a capped amount of preallocation so that real-world use cases are likely to run smoothly and degenerate gracefully even under memory pressure. Decoupling I/O completion from the receive path and serializing it in a process makes it easier to think about the correctness of the locking in the driver, especially in the case of a remote MAC address becoming unusable. [dan.carpenter@oracle.com: cleanup an allocation a bit] Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Ed Cashin	3d5b06051c	aoe: for performance support larger packet payloads tAdd adds the ability to work with large packets composed of a number of segments, using the scatter gather feature of the block layer (biovecs) and the network layer (skb frag array). The motivation is the performance gained by using a packet data payload greater than a page size and by using the network card's scatter gather feature. Users of the out-of-tree aoe driver already had these changes, but since early 2011, they have complained of increased memory utilization and higher CPU utilization during heavy writes.[1] The commit below appears related, as it disables scatter gather on non-IP protocols inside the harmonize_features function, even when the NIC supports sg. commit `f01a5236bd` Author: Jesse Gross <jesse@nicira.com> Date: Sun Jan 9 06:23:31 2011 +0000 net offloading: Generalize netif_get_vlan_features(). With that regression in place, transmits always linearize sg AoE packets, but in-kernel users did not have this patch. Before 2.6.38, though, these changes were working to allow sg to increase performance. 1. http://www.spinics.net/lists/linux-mm/msg15184.html Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Ed Cashin	8babe8cc65	aoe: assert AoE packets marked as requiring no checksum In order for the network layer to see that AoE requires no checksumming in a generic way, the packets must be marked as requiring no checksum, so we make this requirement explicit with the assertion. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-20 22:23:40 -04:00
Eric Dumazet	840a185ddd	aoe: remove dev_base_lock use from aoecmd_cfg_pkts() dev_base_lock is the legacy way to lock the device list, and is planned to disappear. (writers hold RTNL, readers hold RCU lock) Convert aoecmd_cfg_pkts() to RCU locking. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 13:50:07 -08:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Andrew Morton	6ec1480d85	aoe: switch to the new bio_flush_dcache_pages() interface Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Ilya Loginov <isloginov@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Horton <phorton@bitbox.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-12-22 09:12:48 +01:00
Peter Horton	0a1f127a05	aoe: prevent cache aliases Prevent the AoE block driver from creating cache aliases of page cache pages on machines with virtually indexed caches. Building kernels on an AT91SAM9G20 board without this patch fails with segmentation faults after a couple of passes. Signed-off-by: Peter Horton <zero@colonel-panic.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-01 16:32:20 -08:00
David S. Miller	438263ac58	aoe: Remove superfluous clearing of skb fields in new_skb(). This code uses alloc_skb() which clears them out for us. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-27 17:09:44 -07:00
Bartlomiej Zolnierkiewicz	04b3ab52a0	aoe: WIN_* -> ATA_CMD_* * Use ATA_CMD_* defines instead of WIN_* ones. * Include <linux/ata.h> directly instead of through <linux/hdreg.h>. Cc: Ed L. Cashin <ecashin@coraid.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2009-04-01 21:42:24 +02:00
Harvey Harrison	411c41eea5	aoe: remove private mac address format function Add %pm to omit the colons when printing a mac address. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 00:40:37 -08:00
Linus Torvalds	4dd9ec4946	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1075 commits) myri10ge: update driver version number to 1.4.3-1.369 r8169: add shutdown handler r8169: preliminary 8168d support r8169: support additional 8168cp chipset r8169: change default behavior for mildly identified 8168c chipsets r8169: add a new 8168cp flavor r8169: add a new 8168c flavor (bis) r8169: add a new 8168c flavor r8169: sync existing 8168 device hardware start sequences with vendor driver r8169: 8168b Tx performance tweak r8169: make room for more specific 8168 hardware start procedure r8169: shuffle some registers handling around (8168 operation only) r8169: new phy init parameters for the 8168b r8169: update phy init parameters r8169: wake up the PHY of the 8168 af_key: fix SADB_X_SPDDELETE response ath9k: Fix return code when ath9k_hw_setpower() fails on reset ath9k: remove nasty FAIL macro from ath9k_hw_reset() gre: minor cleanups in netlink interface gre: fix copy and paste error ...	2008-10-11 09:33:18 -07:00
Tejun Heo	074a7aca7a	block: move stats from disk to part0 Move stats related fields - stamp, in_flight, dkstats - from disk to part0 and unify stat handling such that... * part_stat_() now updates part0 together if the specified partition is not part0. ie. part_stat_() are now essentially all_stat_(). {disk\|all}_stat_() are gone. part_round_stats() is updated similary. It handles part0 stats automatically and disk_round_stats() is killed. * part_{inc\|dec}_in_fligh() is implemented which automatically updates part0 stats for parts other than part0. * disk_map_sector_rcu() is updated to return part0 if no part matches. Combined with the above changes, this makes NULL special case handling in callers unnecessary. * Separate stats show code paths for disk are collapsed into part stats show code paths. * Rename disk_stat_lock/unlock() to part_stat_lock/unlock() While at it, reposition stat handling macros a bit and add missing parentheses around macro parameters. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:08 +02:00
Tejun Heo	80795aefb7	block: move capacity from disk to part0 Move disk->capacity to part0->nr_sects and convert all users who directly accessed the field to use {get\|set}_capacity(). This is done early to allow the __dev field to be moved. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:07 +02:00
Tejun Heo	c995905916	block: fix diskstats access There are two variants of stat functions - ones prefixed with double underbars which don't care about preemption and ones without which disable preemption before manipulating per-cpu counters. It's unclear whether the underbarred ones assume that preemtion is disabled on entry as some callers don't do that. This patch unifies diskstats access by implementing disk_stat_lock() and disk_stat_unlock() which take care of both RCU (for partition access) and preemption (for per-cpu counter access). diskstats access should always be enclosed between the two functions. As such, there's no need for the versions which disables preemption. They're removed and double underbars ones are renamed to drop the underbars. As an extra argument is added, there's no danger of using the old version unconverted. disk_stat_lock() uses get_cpu() and returns the cpu index and all diskstat functions which access per-cpu counters now has @cpu argument to help RT. This change adds RCU or preemption operations at some places but also collapses several preemption ops into one at others. Overall, the performance difference should be negligible as all involved ops are very lightweight per-cpu ones. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:06 +02:00
Tejun Heo	e71bf0d0ee	block: fix disk->part[] dereferencing race disk->part[] is protected by its matching bdev's lock. However, non-critical accesses like collecting stats and printing out sysfs and proc information used to be performed without any locking. As partitions can come and go dynamically, partitions can go away underneath those non-critical accesses. As some of those accesses are writes, this theoretically can lead to silent corruption. This patch fixes the race by using RCU for the partition array and dev reference counter to hold partitions. * Rename disk->part[] to disk->__part[] to make sure no one outside genhd layer proper accesses it directly. * Use RCU for disk->__part[] dereferencing. * Implement disk_{get\|put}_part() which can be used to get and put partitions from gendisk respectively. * Iterators are implemented to help iterate through all partitions safely. * Functions which require RCU readlock are marked with _rcu suffix. * Use disk_put_part() in __blkdev_put() instead of directly putting the contained kobject. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:06 +02:00
Tejun Heo	310a2c1012	block: misc updates This patch makes the following misc updates in preparation for disk->part dereference fix and extended block devt support. * implment part_to_disk() * fix comment about gendisk->part indexing * rename get_part() to disk_map_sector() * don't use n which is always zero while printing disk information in diskstats_show() Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:04 +02:00
David S. Miller	e9bb8fb0b6	aoe: Use SKB interfaces for list management instead of home-grown stuff. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-21 22:36:49 -07:00
Harvey Harrison	823ed72e8f	block: use get_unaligned_* helpers Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-04 09:28:32 +02:00
Jens Axboe	28f13702f0	block: avoid duplicate calls to get_part() in disk stat code get_part() is fairly expensive, as it O(N) loops over partitions to find the right one. In lots of normal IO paths we end up looking up the partition twice, to make matters even worse. Change the stat add code to accept a passed in partition instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-05-07 10:15:46 +02:00
Harvey Harrison	f885f8d127	drivers/block: use get_unaligned_* helpers Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Ed L. Cashin <ecashin@coraid.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:27 -07:00
Linus Torvalds	03054de1e0	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: Enhanced partition statistics: documentation update Enhanced partition statistics: remove old partition statistics Enhanced partition statistics: procfs Enhanced partition statistics: sysfs Enhanced partition statistics: aoe fix Enhanced partition statistics: update partition statitics Enhanced partition statistics: core statistics block: fixup rq_init() a bit Manually fixed conflict in drivers/block/aoe/aoecmd.c due to statistics support.	2008-02-08 09:42:46 -08:00
Ed L. Cashin	52e112b3ab	aoe: update copyright date Update the year in the copyright notices. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	578c4aa0b4	aoe: make error messages more specific Andrew Morton pointed out that the "too many targets" message in patch 2 could be printed for failing GFP_ATOMIC allocations. This patch makes the messages more specific. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	1d75981a80	aoe: the aoeminor doesn't need a long format The aoedev aoeminor member doesn't need a long format. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	7df620d852	aoe: add module parameter for users who need more outstanding I/O An AoE target provides an estimate of the number of outstanding commands that the AoE initiator can send before getting a response. The aoe_maxout parameter provides a way to set an even lower limit. It will not allow a user to use more outstanding commands than the target permits. If a user discovers a problem with a large setting, this parameter provides a way for us to work with them to debug the problem. We expect to improve the dynamic window sizing algorithm and drop this parameter. For the time being, it is a debugging aid. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	6b9699bbd2	aoe: only install new AoE device once An aoe driver user who had about 70 AoE targets found that he was hitting a BUG in sysfs_create_file because the aoe driver was trying to tell the kernel about an AoE device more than once. Each AoE device was reachable by several local network interfaces, and multiple ATA device indentify responses were returning from that single device. This patch eliminates a race condition so that aoe always informs the block layer of a new AoE device once in the presence of multiple incoming ATA device identify responses. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	9bb237b6a6	aoe: dynamically allocate a capped number of skbs when necessary What this Patch Does Even before this recent series of 12 patches to 2.6.22-rc4, the aoe driver was reusing a small set of skbs that were allocated once and were only used for outbound AoE commands. The network layer cannot be allowed to put_page on the data that is still associated with a bio we haven't returned to the block layer, so the aoe driver (even before the patch under discussion) is still the owner of skbs that have been handed to the network layer for transmission. We need to keep track of these skbs so that we can free them, but by tracking them, we can also easily re-use them. The new patch was a response to the behavior of certain network drivers. We cannot reuse an skb that the network driver still has in its transmit ring. Network drivers can defer transmit ring cleanup and then use the state in the skb to determine how many data segments to clean up in its transmit ring. The tg3 driver is one driver that behaves in this way. When the network driver defers cleanup of its transmit ring, the aoe driver can find itself in a situation where it would like to send an AoE command, and the AoE target is ready for more work, but the network driver still has all of the pre-allocated skbs. In that case, the new patch just calls alloc_skb, as you'd expect. We don't want to get carried away, though. We try not to do excessive allocation in the write path, so we cap the number of skbs we dynamically allocate. Probably calling it a "dynamic pool" is misleading. We were already trying to use a small fixed-size set of pre-allocated skbs before this patch, and this patch just provides a little headroom (with a ceiling, though) to accomodate network drivers that hang onto skbs, by allocating when needed. The d->skbpool_hd list of allocated skbs is necessary so that we can free them later. We didn't notice the need for this headroom until AoE targets got fast enough. Alternatives If the network layer never did a put_page on the pages in the bio's we get from the block layer, then it would be possible for us to hand skbs to the network layer and forget about them, allowing the network layer to free skbs itself (and thereby calling our own skb->destructor callback function if we needed that). In that case we could get rid of the pre-allocated skbs and also the d->skbpool_hd, instead just calling alloc_skb every time we wanted to transmit a packet. The slab allocator would effectively maintain the list of skbs. Besides a loss of CPU cache locality, the main concern with that approach the danger that it would increase the likelihood of deadlock when VM is trying to free pages by writing dirty data from the page cache through the aoe driver out to persistent storage on an AoE device. Right now we have a situation where we have pre-allocation that corresponds to how much we use, which seems ideal. Of course, there's still the separate issue of receiving the packets that tell us that a write has successfully completed on the AoE target. When memory is low and VM is using AoE to flush dirty data to free up pages, it would be perfect if there were a way for us to register a fast callback that could recognize write command completion responses. But I don't think the current problems with the receive side of the situation are a justification for exacerbating the problem on the transmit side. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	1eb0da4cea	aoe: mac_addr: avoid 64-bit arch compiler warnings By returning unsigned long long, mac_addr does not generate compiler warnings on 64-bit architectures. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:31 -08:00
Ed L. Cashin	68e0d42f39	aoe: handle multiple network paths to AoE device A remote AoE device is something can process ATA commands and is identified by an AoE shelf number and an AoE slot number. Such a device might have more than one network interface, and it might be reachable by more than one local network interface. This patch tracks the available network paths available to each AoE device, allowing them to be used more efficiently. Andrew Morton asked about the call to msleep_interruptible in the revalidate function. Yes, if a signal is pending, then msleep_interruptible will not return 0. That means we will not loop but will call aoenet_xmit with a NULL skb, which is a noop. If the system is too low on memory or the aoe driver is too low on frames, then the user can hit control-C to interrupt the attempt to do a revalidate. I have added a comment to the code summarizing that. Andrew Morton asked whether the allocation performed inside addtgt could use a more relaxed allocation like GFP_KERNEL, but addtgt is called when the aoedev lock has been locked with spin_lock_irqsave. It would be nice to allocate the memory under fewer restrictions, but targets are only added when the device is being discovered, and if the target can't be added right now, we can try again in a minute when then next AoE config query broadcast goes out. Andrew Morton pointed out that the "too many targets" message could be printed for failing GFP_ATOMIC allocations. The last patch in this series makes the messages more specific. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:31 -08:00
Jerome Marchand	a890d62b9e	Enhanced partition statistics: aoe fix Updates the enhanced partition statistics in ATA over Ethernet driver (not tested). Signed-off-by: Jerome Marchand <jmarchan@redhat.com>	2008-02-08 12:41:57 +01:00
Ed L. Cashin	abdbf94d7c	aoe: remove unecessary wrapper function We can just use skb_mac_header now, and we don't need a wrapper function to perform the cast. Instead of requiring the reader to check aoe.h to look up what an aoe_hdr function does, I'd rather do without it. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:52 -07:00
Linus Torvalds	038a5008b2	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (867 commits) [SKY2]: status polling loop (post merge) [NET]: Fix NAPI completion handling in some drivers. [TCP]: Limit processing lost_retrans loop to work-to-do cases [TCP]: Fix lost_retrans loop vs fastpath problems [TCP]: No need to re-count fackets_out/sacked_out at RTO [TCP]: Extract tcp_match_queue_to_sack from sacktag code [TCP]: Kill almost unused variable pcount from sacktag [TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L [TCP]: Add bytes_acked (ABC) clearing to FRTO too [IPv6]: Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2 [NETFILTER]: x_tables: add missing ip6t_modulename aliases [NETFILTER]: nf_conntrack_tcp: fix connection reopening [QETH]: fix qeth_main.c [NETLINK]: fib_frontend build fixes [IPv6]: Export userland ND options through netlink (RDNSS support) [9P]: build fix with !CONFIG_SYSCTL [NET]: Fix dev_put() and dev_hold() comments [NET]: make netlink user -> kernel interface synchronious [NET]: unify netlink kernel socket recognition [NET]: cleanup 3rd argument in netlink_sendskb ... Fix up conflicts manually in Documentation/feature-removal-schedule.txt and my new least favourite crap, the "mod_devicetable" support in the files include/linux/mod_devicetable.h and scripts/mod/file2alias.c. (The latter files seem to be explicitly _designed_ to get conflicts when different subsystems work with them - that have an absolutely horrid lack of subsystem separation!) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-11 19:40:14 -07:00
Eric W. Biederman	881d966b48	[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:10 -07:00
NeilBrown	6712ecf8f6	Drop 'size' argument from bio_endio and bi_end_io As bi_end_io is only called once when the reqeust is complete, the 'size' argument is now redundant. Remove it. Now there is no need for bio_endio to subtract the size completed from bi_size. So don't do that either. While we are at it, change bi_end_io to return void. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:57 +02:00
Pavel Emelianov	7562f876cd	[NET]: Rework dev_base via list_head (v3) Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 15:13:45 -07:00
Arnaldo Carvalho de Melo	c1d2bbe1cd	[SK_BUFF]: Introduce skb_reset_network_header(skb) For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:46 -07:00
Arnaldo Carvalho de Melo	459a98ed88	[SK_BUFF]: Introduce skb_reset_mac_header(skb) For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:32 -07:00
Arnaldo Carvalho de Melo	029720f15d	[AOE]: Introduce aoe_hdr() For consistency with other skb->mac.raw users. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:28 -07:00
David S. Miller	43ecf5295b	[AOE]: Add get_unaligned() calls where needed. Based upon a report by Andrew Walrond. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-02 15:22:55 -08:00
Ed L. Cashin	19900cdee2	[PATCH] fix aoe without scatter-gather [Bug 7662] Fix a bug that only appears when AoE goes over a network card that does not support scatter-gather. The headers in the linear part of the skb appeared to be larger than they really were, resulting in data that was offset by 24 bytes. This patch eliminates the offset data on cards that don't support scatter-gather or have had scatter-gather turned off. There remains an unrelated issue that I'll address in a separate email. Fixes bugzilla #7662 Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Cc: <stable@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: <boddingt@optusnet.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:49 -08:00
David Howells	c4028958b6	WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:57:56 +00:00
Ed L. Cashin	a12c93f08b	aoe: revert printk macros This patch addresses the concern that the aoe driver should not introduce unecessary conventions that must be learned by the reader. It reverts patch 6. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-10-18 12:53:51 -07:00
Ed L. Cashin	392e4845f9	aoe: use bio->bi_idx Instead of starting with bio->bi_io_vec, use the offset in bio->bi_idx. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-10-18 12:53:50 -07:00
Ed L. Cashin	b751e8b659	aoe: module parameter for device timeout The aoe_deadsecs module parameter sets the number of seconds that elapse before a nonresponsive AoE device is marked as dead. This is runtime settable in sysfs or settable with a module load or kernel boot parameter. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-10-18 12:53:50 -07:00
Ed L. Cashin	4f51dc5e9a	aoe: zero copy write 2 of 2 Avoid memory copy on writes. (This patch follows patch 4.) Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-10-18 12:53:50 -07:00
Ed L. Cashin	dced3a053d	aoe: improve retransmission heuristics Add a dynamic minimum timer for better retransmission behavior. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-10-18 12:53:50 -07:00
Ed L. Cashin	ddec63e867	aoe: jumbo frame support 2 of 2 Add support for jumbo ethernet frames. (This patch follows patch 5.) Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-10-18 12:53:50 -07:00
Ed L. Cashin	6bb6285fdb	aoe: clean up printks via macros Use simple macros to clean up the printks. (This patch is reverted by the 14th patch to follow.) Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-10-18 12:53:50 -07:00
Ed L. Cashin	19bf26353c	aoe: jumbo frame support 1 of 2 Add support for jumbo ethernet frames. (This patch depends on patch 7 to follow.) Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-10-18 12:53:50 -07:00
Ed L. Cashin	e407a7f6cd	aoe: zero copy write 1 of 2 Avoid memory copy on writes. (This patch depends on fixes in patch 9 to follow.) Although skb->len should not be set when working with linear skbuffs, the skb->tail pointer maintained by skb_put/skb_trim is not relevant to what happens when the skb_fill_page_desc function is called. This issue was raised without comment in linux-kernel and netdev earlier this month: http://thread.gmane.org/gmane.linux.kernel/446474/ http://thread.gmane.org/gmane.linux.network/45444/ So until there is something analogous to skb_put that works for zero-copy write skbuffs, we will do what the other callers of skb_fill_page_desc are doing. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-10-18 12:53:50 -07:00
Ed L. Cashin	2611464d7f	aoe: update copyright date Update the copyright year to 2006. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-10-18 12:53:50 -07:00
Ed L. Cashin	9d41965b78	[PATCH] aoe [2/3]: don't request ATA device ID on ATA error On an ATA error response, take the device down instead of sending another ATA device identify command. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com>	2006-03-23 22:01:57 -08:00
Ed L. Cashin	1c6f3fcac0	[PATCH] aoe: do not stop retransmit timer when device goes down This patch is a bugfix that follows and depends on the eight aoe driver patches sent January 19th. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-03-23 22:01:56 -08:00
Ed L. Cashin	2dd5e42269	[PATCH] aoe [5/8]: allow network interface migration on packet retransmit Retransmit to the current network interface for an AoE device. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com>	2006-03-23 22:01:56 -08:00
Ed L. Cashin	eaf0a3cbe5	[PATCH] aoe [3/8]: increase allowed outstanding packets Increase the number of AoE packets per device that can be outstanding at one time, increasing performance. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-03-23 22:01:55 -08:00
Ed L. Cashin	3ae1c24e39	[PATCH] aoe [2/8]: support dynamic resizing of AoE devices Allow the driver to recognize AoE devices that have changed size. Devices not in use are updated automatically, and devices that are in use are updated at user request. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-03-23 22:01:55 -08:00
Ed L. Cashin	50bba752ca	[PATCH] aoe [1/8]: zero packet data after skb allocation Zero the data in new socket buffers to prevent leaking information. Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-03-23 22:01:55 -08:00
Jens Axboe	496456c24f	[BLOCK] aoe: update for combined io statistics Signed-off-by: Jens Axboe <axboe@suse.de>	2005-11-01 09:54:23 +01:00
Ed L. Cashin	475172fb18	[PATCH] aoe: use get_unaligned for accesses in ATA id buffer Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Use get_unaligned for possibly-unaligned multi-byte accesses to the ATA device identify response buffer.	2005-10-28 09:52:49 -07:00
ecashin@coraid.com	a4b3836409	[PATCH] aoe 12/12: send outgoing packets in order I can't use list.h, since sk_buff doesn't have a list_head but instead has two struct sk_buff pointers, and I want to avoid any extra memory allocation. send outgoing packets in order Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-04-18 22:00:22 -07:00
ecashin@coraid.com	0c6f0e7920	[PATCH] aoe 11/12: add support for disk statistics add support for disk statistics Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-04-18 22:00:22 -07:00
ecashin@coraid.com	63e9cc5d6f	[PATCH] aoe 6/12: Alexey Dobriyan sparse cleanup Alexey Dobriyan sparse cleanup Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru> Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-04-18 22:00:20 -07:00
ecashin@coraid.com	32465c6506	[PATCH] aoe 2/12: allow multiple aoe devices with same MAC allow multiple aoe devices with same MAC addr Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-04-18 22:00:18 -07:00
ecashin@coraid.com	fc458dcda2	[PATCH] aoe 1/12: remove too-low cap on minor number remove too-low cap on minor number Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2005-04-18 22:00:17 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

1 2 3

124 Commits