OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Dave Chinner	a1ecac3b06	loop: Make explicit loop device destruction lazy xfstests has always had random failures of tests due to loop devices failing to be torn down and hence leaving filesytems that cannot be unmounted. This causes test runs to immediately stop. Over the past 6 or 7 years we've added hacks like explicit unmount -d commands for loop mounts, losetup -d after unmount -d fails, etc, but still the problems persist. Recently, the frequency of loop related failures increased again to the point that xfstests 259 will reliably fail with a stray loop device that was not torn down. That is despite the fact the test is above as simple as it gets - loop 5 or 6 times running mkfs.xfs with different paramters: lofile=$(losetup -f) losetup $lofile "$testfile" "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null \|\| echo "mkfs failed!" sync losetup -d $lofile And losteup -d $lofile is failing with EBUSY on 1-3 of these loops every time the test is run. Turns out that blkid is running simultaneously with losetup -d, and so it sees an elevated reference count and returns EBUSY. But why is blkid running? It's obvious, isn't it? udev has decided to try and find out what is on the block device as a result of a creation notification. And it is racing with mkfs, so might still be scanning the device when mkfs finishes and we try to tear it down. So, make losetup -d force autoremove behaviour. That is, when the last reference goes away, tear down the device. xfstests wants it gone, not causing random teardown failures when we know that all the operations the tests have specifically run on the device have completed and are no longer referencing the loop device. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:31 +01:00
Selvan Mani	4453bc88f0	mtip32xx:Added appropriate timeout value for secure erase Added appropriate timeout value for secure erase based on identify device data Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:27 +01:00
Oliver Chick	1f999572f2	xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsigned int, rather than bool Changing the type of bdev parameters to be unsigned int :1, rather than bool. This is more consistent with the types of other features in the block drivers. Signed-off-by: Oliver Chick <oliver.chick@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:20 +01:00
Akinobu Mita	b7010ede43	cciss: select CONFIG_CHECK_SIGNATURE The patch cciss-use-check_signature.patch in -mm tree introduced a build error: drivers/built-in.o: In function `CISS_signature_present': drivers/block/cciss.c:4270: undefined reference to `check_signature' Add missing CONFIG_CHECK_SIGNATURE to fix this issue. Reported-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Fengguang Wu <wfg@linux.intel.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Acked-by: "Stephen M. Cameron" <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:00 +01:00
Wei Yongjun	2541aa799f	cciss: remove unneeded memset() The memory return by kzalloc() or kmem_cache_zalloc() has already be set to zero, so remove useless memset(0). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:36:58 +01:00
Wei Yongjun	654dbef214	xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-10-30 08:36:27 +01:00
Herton Ronaldo Krzesinski	1a4ae43e4f	floppy: remove dr, reuse drive on do_floppy_init This is a small cleanup, that also may turn error handling of unitialized disks more readable. We don't need a separate variable to track allocated disks, remove dr and reuse drive variable instead. Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:36:07 +01:00
Herton Ronaldo Krzesinski	8d3ab4ebfd	floppy: use common function to check if floppies can be registered The same checks to see if a drive can be or is registered are repeated through the code, factor out the checks in a common function and replace the repeated checks with it. Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski	d60e7ec18c	floppy: properly handle failure on add_disk loop On floppy initialization, if something failed inside the loop we call add_disk, there was no cleanup of previous iterations in the error handling. Cc: stable@vger.kernel.org Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski	238ab78469	floppy: do put_disk on current dr if blk_init_queue fails If blk_init_queue fails, we do not call put_disk on the current dr (dr is decremented first in the error handling loop). Cc: stable@vger.kernel.org Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski	b54e1f8889	floppy: don't call alloc_ordered_workqueue inside the alloc_disk loop Since commit `070ad7e` ("floppy: convert to delayed work and single-thread wq"), we end up calling alloc_ordered_workqueue multiple times inside the loop, which shouldn't be intended. Besides the leak, other side effect in the current code is if blk_init_queue fails, we would end up calling unregister_blkdev even if we didn't call yet register_blkdev. Just moved the allocation of floppy_wq before the loop, and adjusted the code accordingly. Cc: stable@vger.kernel.org # 3.5+ Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:24 +01:00
Konrad Rzeszutek Wilk	2911758f14	xen/blkback: Fix compile warning drivers/block/xen-blkback/xenbus.c:260:5: warning: symbol 'xenvbd_sysfs_addif' was not declared. Should it be static? drivers/block/xen-blkback/xenbus.c:284:6: warning: symbol 'xenvbd_sysfs_delif' was not declared. Should it be static? Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-10-30 08:32:43 +01:00
Kees Cook	b8977285ec	drivers/block: remove CONFIG_EXPERIMENTAL This config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it. CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Asai Thambi S P <asamymuthupa@micron.com> CC: Pete Zaitcev <zaitcev@redhat.com> CC: Cong Wang <xiyou.wangcong@gmail.com> CC: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-23 22:30:38 +02:00
Linus Torvalds	ce40be7a82	Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block Pull block IO update from Jens Axboe: "Core block IO bits for 3.7. Not a huge round this time, it contains: - First series from Kent cleaning up and generalizing bio allocation and freeing. - WRITE_SAME support from Martin. - Mikulas patches to prevent O_DIRECT crashes when someone changes the block size of a device. - Make bio_split() work on data-less bio's (like trim/discards). - A few other minor fixups." Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew Morton. It is due to the VM no longer using a prio-tree (see commit 6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree"). So make set_blocksize() use mapping_mapped() instead of open-coding the internal VM knowledge that has changed. * 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits) block: makes bio_split support bio without data scatterlist: refactor the sg_nents scatterlist: add sg_nents fs: fix include/percpu-rwsem.h export error percpu-rw-semaphore: fix documentation typos fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared blockdev: turn a rw semaphore into a percpu rw semaphore Fix a crash when block device is read and block size is changed at the same time block: fix request_queue->flags initialization block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() block: ioctl to zero block ranges block: Make blkdev_issue_zeroout use WRITE SAME block: Implement support for WRITE SAME block: Consolidate command flag and queue limit checks for merges block: Clean up special command handling logic block/blk-tag.c: Remove useless kfree block: remove the duplicated setting for congestion_threshold block: reject invalid queue attribute values block: Add bio_clone_bioset(), bio_clone_kmalloc() block: Consolidate bio_alloc_bioset(), bio_kmalloc() ...	2012-10-11 09:04:23 +09:00
Linus Torvalds	7035cdf36d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "The bulk of this pull is a series from Alex that refactors and cleans up the RBD code to lay the groundwork for supporting the new image format and evolving feature set. There are also some cleanups in libceph, and for ceph there's fixed validation of file striping layouts and a bugfix in the code handling a shrinking MDS cluster." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits) ceph: avoid 32-bit page index overflow ceph: return EIO on invalid layout on GET_DATALOC ioctl rbd: BUG on invalid layout ceph: propagate layout error on osd request creation libceph: check for invalid mapping ceph: convert to use le32_add_cpu() ceph: Fix oops when handling mdsmap that decreases max_mds rbd: update remaining header fields for v2 rbd: get snapshot name for a v2 image rbd: get the snapshot context for a v2 image rbd: get image features for a v2 image rbd: get the object prefix for a v2 rbd image rbd: add code to get the size of a v2 rbd image rbd: lay out header probe infrastructure rbd: encapsulate code that gets snapshot info rbd: add an rbd features field rbd: don't use index in __rbd_add_snap_dev() rbd: kill create_snap sysfs entry rbd: define rbd_dev_image_id() rbd: define some new format constants ...	2012-10-08 06:38:18 +09:00
Linus Torvalds	dc92b1f9ab	Merge branch 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio changes from Rusty Russell: "New workflow: same git trees pulled by linux-next get sent straight to Linus. Git is awkward at shuffling patches compared with quilt or mq, but that doesn't happen often once things get into my -next branch." * 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (24 commits) lguest: fix occasional crash in example launcher. virtio-blk: Disable callback in virtblk_done() virtio_mmio: Don't attempt to create empty virtqueues virtio_mmio: fix off by one error allocating queue drivers/virtio/virtio_pci.c: fix error return code virtio: don't crash when device is buggy virtio: remove CONFIG_VIRTIO_RING virtio: add help to CONFIG_VIRTIO option. virtio: support reserved vqs virtio: introduce an API to set affinity for a virtqueue virtio-ring: move queue_index to vring_virtqueue virtio_balloon: not EXPERIMENTAL any more. virtio-balloon: dependency fix virtio-blk: fix NULL checking in virtblk_alloc_req() virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path virtio-blk: Add bio-based IO path for virtio-blk virtio: console: fix error handling in init() function tools: Fix pthread flag for Makefile of trace-agent used by virtio-trace tools: Add guest trace agent as a user tool virtio/console: Allocate scatterlist according to the current pipe size ...	2012-10-07 21:04:56 +09:00
Linus Torvalds	f1c6872e49	Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQbJELAAoJEFjIrFwIi8fJSI4H/32qrQKyF5IIkFKHTN9FYDC1 OxEGc4y47DIQpGUd/PgZ/i6h9Iyhj+I6pb4lCevykwgd0j83noepluZlCIcJnTfL HVXNiRIQKqFhqKdjTANxVM4APup+7Lqrvqj6OZfUuoxaZ3tSTLhabJ/7UXf2+9xy g2RfZtbSdQ1sukQ/A2MeGQNT79rh7v7PrYQUYSrqytjSjSLPTqRf75HWQ+eapIAH X3aVz8Tn6nTixZWvZOK7rAaD4awsFxGP6E46oFekB02f4x9nWHJiCZiXwb35lORb tz9F9td99f6N4fPJ9LgcYTaCPwzVnceZKqE9hGfip4uT+0WrEqDxq8QmBqI5YtI= =gxJD -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull ADM Xen support from Konrad Rzeszutek Wilk: Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports ARMv7 platforms. The goal in implementing this architecture is to exploit the hardware as much as possible. That means use as little as possible of PV operations (so no PV MMU) - and use existing PV drivers for I/Os (network, block, console, etc). This is similar to how PVHVM guests operate in X86 platform nowadays - except that on ARM there is no need for QEMU. The end result is that we share a lot of the generic Xen drivers and infrastructure. Details on how to compile/boot/etc are available at this Wiki: http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions and this blog has links to a technical discussion/presentations on the overall architecture: http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/ * tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits) xen/xen_initial_domain: check that xen_start_info is initialized xen: mark xen_init_IRQ __init xen/Makefile: fix dom-y build arm: introduce a DTS for Xen unprivileged virtual machines MAINTAINERS: add myself as Xen ARM maintainer xen/arm: compile netback xen/arm: compile blkfront and blkback xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree xen/arm: receive Xen events on ARM xen/arm: initialize grant_table on ARM xen/arm: get privilege status xen/arm: introduce CONFIG_XEN on ARM xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM xen/arm: Introduce xen_ulong_t for unsigned long xen/arm: Xen detection and shared_info page mapping docs: Xen ARM DT bindings xen/arm: empty implementation of grant_table arch specific functions xen/arm: sync_bitops xen/arm: page.h definitions xen/arm: hypercalls ...	2012-10-07 07:13:01 +09:00
Ed Cashin	322c9ec009	aoe: update aoe-internal version number to 50 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:30 +09:00
Ed Cashin	1ac9e60262	aoe: remove unused code Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:30 +09:00
Ed Cashin	08b6062351	aoe: make dynamic block minor numbers the default Because udev use is so widespread, making the old static mapping the default is too conservative, given the severe limitations it places on usable AoE addresses. Storage virtualization and larger shelves have made the old limitations too confining. These changes make the dynamic block device minor numbers the default, removing the limitations on usable AoE addresses. The static arrangement is still available with aoe_dyndevs=0, and the aoe-stat tool from the userland aoetools package, the user space counterpart to the aoe driver, recognizes the case where there is a mismatch between the minor number in sysfs and the minor number in a special device file. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	7159e969d1	aoe: update and specify AoE address guards and error messages In general, specific is better when it comes to messages about AoE usage problems. Also, explicit checks for the AoE broadcast addresses are added. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	4bcce1a355	aoe: retain static block device numbers for backwards compatibility The old mapping between AoE target shelf and slot addresses and the block device minor number is retained as a backwards-compatible feature, with a new "aoe_dyndevs" module parameter available for enabling dynamic block device minor numbers. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	0c96621458	aoe: support more AoE addresses with dynamic block device minor numbers The ATA over Ethernet protocol uses a major (shelf) and minor (slot) address to identify a particular storage target. These changes remove an artificial limitation the aoe driver imposes on the use of AoE addresses. For example, without these changes, the slot address has a maximum of 15, but users commonly use slot numbers much greater than that. The AoE shelf and slot address space is often used sparsely. Instead of using a static mapping between AoE addresses and the block device minor number, the block device minor numbers are now allocated on demand. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:28 +09:00
Ed Cashin	fea05a26c3	aoe: update copyright year in touched files Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:28 +09:00
Ed Cashin	7392fbe5ad	aoe: update internal version number to 49 The internal version number of the aoe driver appears in a console message when the driver loads and is usually obtained by the user with the userland aoe-version tool, part of the aoetools.[1] Although this patchset includes bugfixes backported from higher-numbered versions published on the coraid.com website, it is a form of version 49. 1. http://aoetools.sourceforge.net/ Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	b21faa25c6	aoe: remove unused code and add cosmetic improvements This change removes some unused code and attempts to increase code consistency. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	1b86fda9ad	aoe: increase net_device reference count while using it This change eliminates the danger that the user could rmmod the driver for a network interface that is being used for AoE by the aoe driver. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	64a80f5ac7	aoe: associate frames with the AoE storage target In the driver code, "target" and aoetgt refer to a particular remote interface on the AoE storage target. The latter is identified by its AoE major and minor addresses. Commands that are being sent to an AoE storage target {major, minor} can be sent or retransmitted to any of the remote MAC addresses associated with the AoE storage target. That is, frames are naturally associated with not an aoetgt (AoE major, AoE minor, remote MAC address) but an aoedev (AoE major, AoE minor). Making the code reflect that reality simplifies the driver, especially when the path to a remote MAC address becomes unusable. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	6583303c5e	aoe: disallow unsupported AoE minor addresses A guard is inserted to prevent AoE minor addresses (slot addresses) higher than 15 to be used, as they are not yet supported by the driver. There is a change coming that will allow the aoe driver to overcome this limit by using system device minor numbers dynamically, but until then, this guard prevents unexpected targets from being used by the driver when AoE targets with high minor numbers are on the AoE network. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	25f4d75ea4	aoe: do revalidation steps in order The discovery process begins with an optional AoE config query command and an AoE config query response. Normally when an aoe device is already open, the config query response does not trigger an ATA identify device command to be sent out, since the response contains storage capacity information that, if changed, could surprise the user of the device. The userland "aoe-revalidate" tool uses a character device to trigger an AoE config query for a particular AoE storage target and an ATA device identify command, even when the device is open. This change causes the config query to go out first, reflecting the normal discovery sequence. The responses could come back in any order, so this change is fairly cosmetic. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	d54d35ac66	aoe: failover remote interface based on aoe_deadsecs parameter The aoe_deadsecs module parameter allows the user to specify a hard limit on the number of seconds an AoE command can be retransmitted before the AoE block device is considered to have failed. Using aoe_deadsecs to determine the time we try using a different remote interface helps to ensure that the hard limit is not reached before we've tried to recover by sending to a different remote port. As a data storage target, the AoE target is unambiguously identified by its {major, minor} AoE address tuple, and an AoE target can have multiple MAC addresses. However, note that "target" in the driver code and comments means a {major, minor, MAC address} tuple, as in "somewhere to send packets". Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	3f0f013374	aoe: use packets that work with the smallest-MTU local interface Users with several network interfaces dedicated to AoE generally do not configure them to support different-sized AoE data payloads on purpose. For a given AoE target, there will be a set of local network interfaces that can reach it. Using only the payload that will fit in the smallest-sized MTU of all those local interfaces greatly simplifies the driver, especially in failure scenarios. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	eb086ec596	aoe: use a kernel thread for transmissions The dev_queue_xmit function needs to have interrupts enabled, so the most simple way to get the locking right but still fulfill that requirement is to use a process that can call dev_queue_xmit serially over queued transmissions. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	69cf2d85de	aoe: become I/O request queue handler for increased user control To allow users to choose an elevator algorithm for their particular workloads, change from a make_request-style driver to an I/O-request-queue-handler-style driver. We have to do a couple of things that might be surprising. We manipulate the page _count directly on the assumption that we still have no guarantee that users of the block layer are prohibited from submitting bios containing pages with zero reference counts.[1] If such a prohibition now exists, I can get rid of the _count manipulation. Just as before this patch, we still keep track of the sk_buffs that the network layer still hasn't finished yet and cap the resources we use with a "pool" of skbs.[2] Now that the block layer maintains the disk stats, the aoe driver's diskstats function can go away. 1. https://lkml.org/lkml/2007/3/1/374 2. https://lkml.org/lkml/2007/7/6/241 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	896831f590	aoe: kernel thread handles I/O completions for simple locking Make the frames the aoe driver uses to track the relationship between bios and packets more flexible and detached, so that they can be passed to an "aoe_ktio" thread for completion of I/O. The frames are handled much like skbs, with a capped amount of preallocation so that real-world use cases are likely to run smoothly and degenerate gracefully even under memory pressure. Decoupling I/O completion from the receive path and serializing it in a process makes it easier to think about the correctness of the locking in the driver, especially in the case of a remote MAC address becoming unusable. [dan.carpenter@oracle.com: cleanup an allocation a bit] Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Ed Cashin	3d5b06051c	aoe: for performance support larger packet payloads tAdd adds the ability to work with large packets composed of a number of segments, using the scatter gather feature of the block layer (biovecs) and the network layer (skb frag array). The motivation is the performance gained by using a packet data payload greater than a page size and by using the network card's scatter gather feature. Users of the out-of-tree aoe driver already had these changes, but since early 2011, they have complained of increased memory utilization and higher CPU utilization during heavy writes.[1] The commit below appears related, as it disables scatter gather on non-IP protocols inside the harmonize_features function, even when the NIC supports sg. commit `f01a5236bd` Author: Jesse Gross <jesse@nicira.com> Date: Sun Jan 9 06:23:31 2011 +0000 net offloading: Generalize netif_get_vlan_features(). With that regression in place, transmits always linearize sg AoE packets, but in-kernel users did not have this patch. Before 2.6.38, though, these changes were working to allow sg to increase performance. 1. http://www.spinics.net/lists/linux-mm/msg15184.html Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Paul Clements	a336d29870	nbd: handle discard requests Add discard support to nbd. If the nbd-server supports discard, it will send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards for the device (QUEUE_FLAG_DISCARD). If discard support is enabled, then when the nbd client system receives a discard request, this will be passed along to the nbd-server. When the discard request is received by the nbd-server, it will perform: fallocate(.. FALLOC_FL_PUNCH_HOLE ..) To punch a hole in the backend storage, which is no longer needed. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Paul Clements	2f01250888	nbd: add set flags ioctl Add a set-flags ioctl, allowing various option flags to be set on an nbd device. This allows the nbd-client to set the device flags (to enable read-only mode, or enable discard support, etc.). Flags are typically specified by the nbd-server. During the negotiation phase of the nbd connection, the server sends its flags to the client. The client then uses NBD_SET_FLAGS to inform the kernel of the options. Also included is a one-line fix to debug output for the set-timeout ioctl. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:23 +09:00
Linus Torvalds	437589a74b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...	2012-10-02 11:11:09 -07:00
Linus Torvalds	033d9959ed	Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...	2012-10-02 09:54:49 -07:00
Sage Weil	6cae3717cd	rbd: BUG on invalid layout This shouldn't actually be possible because the layout struct is constructed from the RBD header and validated then. [elder@inktank.com: converted BUG() call to equivalent rbd_assert()] Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-10-01 17:20:00 -05:00
Linus Torvalds	d9a807461f	USB merge for 3.7-rc1 Here is the big USB pull request for 3.7-rc1 There are lots of gadget driver changes (including copying a bunch of files into the drivers/staging/ccg/ directory so that the other gadget drivers can be fixed up properly without breaking that driver), and we remove the old obsolete ub.c driver from the tree. There are also the usual XHCI set of updates, and other various driver changes and updates. We also are trying hard to remove the old dbg() macro, but the final bits of that removal will be coming in through the networking tree before we can delete it for good. All of these patches have been in the linux-next tree. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlBp3+AACgkQMUfUDdst+ym5vwCfe93FyJyXn/RDkGz7iBemvWFd vrwAoIxjaOa4/yWZWcgrWc5bP4aO3ssc =jYDr -----END PGP SIGNATURE----- Merge tag 'usb-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB changes from Greg Kroah-Hartman: "Here is the big USB pull request for 3.7-rc1 There are lots of gadget driver changes (including copying a bunch of files into the drivers/staging/ccg/ directory so that the other gadget drivers can be fixed up properly without breaking that driver), and we remove the old obsolete ub.c driver from the tree. There are also the usual XHCI set of updates, and other various driver changes and updates. We also are trying hard to remove the old dbg() macro, but the final bits of that removal will be coming in through the networking tree before we can delete it for good. All of these patches have been in the linux-next tree. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fix up several annoying - but fairly mindless - conflicts due to the termios structure having moved into the tty device, and often clashing with dbg -> dev_dbg conversion. * tag 'usb-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (339 commits) USB: ezusb: move ezusb.c from drivers/usb/serial to drivers/usb/misc USB: uas: fix gcc warning USB: uas: fix locking USB: Fix race condition when removing host controllers USB: uas: add locking USB: uas: fix abort USB: uas: remove aborted field, replace with status bit. USB: uas: fix task management USB: uas: keep track of command urbs xhci: Intel Panther Point BEI quirk. powerpc/usb: remove checking PHY_CLK_VALID for UTMI PHY USB: ftdi_sio: add TIAO USB Multi-Protocol Adapter (TUMPA) support Revert "usb : Add sysfs files to control port power." USB: serial: remove vizzini driver usb: host: xhci: Fix Null pointer dereferencing with `71c731a` for non-x86 systems Increase XHCI suspend timeout to 16ms USB: ohci-at91: fix null pointer in ohci_hcd_at91_overcurrent_irq USB: sierra_ms: don't keep unused variable fsl/usb: Add support for USB controller version 2.4 USB: qcaux: add Pantech vendor class match ...	2012-10-01 13:23:01 -07:00
Alex Elder	6e14b1a6c3	rbd: update remaining header fields for v2 There are three fields that are not yet updated for format 2 rbd image headers: the version of the header object; the encryption type; and the compression type. There is no interface defined for fetching the latter two, so just initialize them explicitly to 0 for now. Change rbd_dev_v2_snap_context() so the caller can be supplied the version for the header object. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:54 -05:00
Alex Elder	b8b1e2db52	rbd: get snapshot name for a v2 image Define rbd_dev_v2_snap_name() to fetch the name for a particular snapshot in a format 2 rbd image. Define rbd_dev_v2_snap_info() to to be a wrapper for getting the name, size, and features for a particular snapshot, using an interface that matches the equivalent function for version 1 images. Define rbd_dev_snap_info() wrapper function and use it to call the appropriate function for getting the snapshot name, size, and features, dependent on the rbd image format. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:54 -05:00
Alex Elder	35d489f946	rbd: get the snapshot context for a v2 image Fetch the snapshot context for an rbd format 2 image by calling the "get_snapcontext" method on its header object. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	b1b5402aa9	rbd: get image features for a v2 image The features values for an rbd format 2 image are fetched from the server using a "get_features" method. The same method is used for getting the features for a snapshot, so structure this addition with a generic helper routine that can get this information for either. The server will provide two 64-bit feature masks, one representing the features potentially in use for this image (or its snapshot), and one representing features that must be supported by the client in order to work with the image. For the time being, neither of these is really used so we keep things simple and just record the first feature vector. Once we start using these feature masks, what we record and what we expose to the user will most likely change. Signed-off-by: Alex Elder <elder@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	1e1301998e	rbd: get the object prefix for a v2 rbd image The object prefix of an rbd format 2 image is fetched from the server using a "get_object_prefix" method. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	9d475de5d1	rbd: add code to get the size of a v2 rbd image The size of an rbd format 2 image is fetched from the server using a "get_size" method. The same method is used for getting the size of a snapshot, so structure this addition with a generic helper routine that we can get this information for either. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	a30b71b999	rbd: lay out header probe infrastructure This defines a new function rbd_dev_probe() as a top-level function for populating detailed information about an rbd device. It first checks for the existence of a format 2 rbd image id object. If it exists, the image is assumed to be a format 2 rbd image, and another function rbd_dev_v2() is called to finish populating header data for that image. If it does not exist, it is assumed to be an old (format 1) rbd image, and calls a similar function rbd_dev_v1() to populate its header information. A new field, rbd_dev->format, is defined to record which version of the rbd image format the device represents. For a valid mapped rbd device it will have one of two values, 1 or 2. So far, the format 2 images are not really supported; this is laying out the infrastructure for fleshing out that support. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	cd892126c6	rbd: encapsulate code that gets snapshot info Create a function that encapsulates looking up the name, size and features related to a given snapshot, which is indicated by its index in an rbd device's snapshot context array of snapshot ids. This interface will be used to hide differences between the format 1 and format 2 images. At the moment this (looking up the name anyway) is slightly less efficient than what's done currently, but we may be able to optimize this a bit later on by cacheing the last lookup if it proves to be a problem. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00

1 2 3 4 5 ...

2685 Commits