This patch supports the upcomming Intel IOMMU hardware a.k.a. Intel(R)
Virtualization Technology for Directed I/O Architecture and the hardware spec
for the same can be found here
http://www.intel.com/technology/virtualization/index.htm
FAQ! (questions from akpm, answers from ak)
> So... what's all this code for?
>
> I assume that the intent here is to speed things up under Xen, etc?
Yes in some cases, but not this code. That would be the Xen version of this
code that could potentially assign whole devices to guests. I expect this to
be only useful in some special cases though because most hardware is not
virtualizable and you typically want an own instance for each guest.
Ok at some point KVM might implement this too; i likely would use this code
for this.
> Do we
> have any benchmark results to help us to decide whether a merge would be
> justified?
The main advantage for doing it in the normal kernel is not performance, but
more safety. Broken devices won't be able to corrupt memory by doing random
DMA.
Unfortunately that doesn't work for graphics yet, for that need user space
interfaces for the X server are needed.
There are some potential performance benefits too:
- When you have a device that cannot address the complete address range an
IOMMU can remap its memory instead of bounce buffering. Remapping is likely
cheaper than copying.
- The IOMMU can merge sg lists into a single virtual block. This could
potentially speed up SG IO when the device is slow walking SG lists. [I
long ago benchmarked 5% on some block benchmark with an old MPT Fusion; but
it probably depends a lot on the HBA]
And you get better driver debugging because unexpected memory accesses from
the devices will cause a trappable event.
>
> Does it slow anything down?
It adds more overhead to each IO so yes.
This patch:
Add support for early detection and parsing of DMAR's (DMA Remapping) reported
to OS via ACPI tables.
DMA remapping(DMAR) devices support enables independent address translations
for Direct Memory Access(DMA) from Devices. These DMA remapping devices are
reported via ACPI tables and includes pci device scope covered by these DMA
remapping device.
For detailed info on the specification of "Intel(R) Virtualization Technology
for Directed I/O Architecture" please see
http://www.intel.com/technology/virtualization/index.htm
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current memory notifier has some defects yet. (Fortunately, nothing uses
it.) This patch is to fix and rearrange for them.
- Add information of start_pfn, nr_pages, and node id if node status is
changes from/to memoryless node for callback functions.
Callbacks can't do anything without those information.
- Add notification going-online status.
It is necessary for creating per node structure before the node's
pages are available.
- Move GOING_OFFLINE status notification after page isolation.
It is good place for return memory like cache for callback,
because returned page is not used again.
- Make CANCEL events for rollingback when error occurs.
- Delete MEM_MAPPING_INVALID notification. It will be not used.
- Fix compile error of (un)register_memory_notifier().
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'master' of hera.kernel.org:/pub/scm/linux/kernel/git/kyle/parisc-2.6: (29 commits)
[PARISC] fix uninitialized variable warning in asm/rtc.h
[PARISC] Port checkstack.pl to parisc
[PARISC] Make palo target work when $obj != $src
[PARISC] Zap unused variable warnings in pci.c
[PARISC] Fix tests in palo target
[PARISC] Fix palo target
[PARISC] Restore palo target
[PARISC] Attempt to clean up parisc/Makefile
[PARISC] Fix infinite loop in /proc/iomem
[PARISC] Quiet sysfs_create_link __must_check warnings in pdc_stable
[PARISC] Squelch pci_enable_device __must_check warning in superio
[PARISC] Kill off broken irqstack code
[PARISC] Remove hardcoded uses of PAGE_SIZE
[PARISC] Clean up pointless ASM_PAGE_SIZE_DIV use
[PARISC] Kill off the last vestiges of ASM_PAGE_SIZE
[PARISC] Kill off ASM_PAGE_SIZE use
[PARISC] Beautify parisc vmlinux.lds.S
[PARISC] Clean up a resource_size_t warning in sba_iommu
[PARISC] Kill incorrect cast warning in unwinder
[PARISC] Kill zone_to_nid printk warning
...
Fixed trivial conflict in include/asm-parisc/tlbflush.h manually
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits)
fix do_sys_open() prototype
sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake
Documentation: Fix typo in SubmitChecklist.
Typo: depricated -> deprecated
Add missing profile=kvm option to Documentation/kernel-parameters.txt
fix typo about TBI in e1000 comment
proc.txt: Add /proc/stat field
small documentation fixes
Fix compiler warning in smount example program from sharedsubtree.txt
docs/sysfs: add missing word to sysfs attribute explanation
documentation/ext3: grammar fixes
Documentation/java.txt: typo and grammar fixes
Documentation/filesystems/vfs.txt: typo fix
include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros
trivial copy_data_pages() tidy up
Fix typo in arch/x86/kernel/tsc_32.c
file link fix for Pegasus USB net driver help
remove unused return within void return function
Typo fixes retrun -> return
x86 hpet.h: remove broken links
...
Fix build break:
drivers/net/tsi108_eth.c: In function 'tsi108_init_one':
drivers/net/tsi108_eth.c:1633: error: expected ')' before 'dev'
drivers/net/tsi108_eth.c:1633: warning: too few arguments for format
make[2]: *** [drivers/net/tsi108_eth.o] Error 1
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
[bugme-daemon@bugzilla.kernel.org wrote:]
From: Randy Dunlap <randy.dunlap@oracle.com>
Drivers that use lro functions should depend on INET, otherwise they
may not link correctly. Let's not select INET. Select should be used
only for library-like code, not to enable subsystems.
ERROR: "lro_flush_all" [drivers/net/myri10ge/myri10ge.ko] undefined!
ERROR: "lro_receive_frags" [drivers/net/myri10ge/myri10ge.ko] undefined!
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
- make the kconfig NAPI option prompt consistent across all net drivers
(other than EXPERIMENTAL; can it now be removed also, or is the new
napi_struct implementation now EXPERIMENTAL ?)
- remove comment about the now-deleted NAPI_HOWTO.txt file
- clean up typos in Tulip NAPI & Interrupt Mitigation
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Missing MODULE_LICENSE(), loading this module taints the kernel.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
DM9000 driver returns success even if it is failed to detect the chip.
Below patch fixes it.
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
drivers/net/dm9000.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
sata_sis has the same restrictions as other SFF controllers, and so must
use LIBATA_MAX_PRD to denote that SCSI may only fill ATA_MAX_PRD/2
entries, due to our need to handle IOMMU merging.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
SCR read for controllers which uses PCI configuration space for SCR
access got broken while adding @val argument to SCR accessors. Fix
it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix libata kernel-doc parameter name.
Warning(linux-2.6.23-git13//drivers/ata/libata-core.c:1415): No description found for parameter 'sgl'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add crypt prefix to dec_pending to avoid confusing it in backtraces with
the dm core function of the same name.
No functional change here.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds calls to dm_path_event for a failed path and a reinstated
path.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds support for the dm_path_event dm_send_event functions which
create and send udev events.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds a uevent skeleton to device-mapper.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds a function to obtain a copy of a mapped device's name and uuid.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Store a pointer to the owning mirror_set structure within each mirror
structure for a subsequent patch to use.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
There are now two phases to a suspend in device-mapper -
presuspend and postsuspend. This patch removes the
single 'suspend' in the logging API and replaces it with
'presuspend' and 'postsuspend' functions to align it
better with core device-mapper.
A subsequent patch will make use of 'presuspend'.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds retries to the hp hardware handler, and utilizes the
MP_RETRY flag of dm-multipath. For now in the hp handler, if we get a
pg_init completed with a check condition we just assume we can retry the
pg_init command. We make this assumption because of incomplete data on
specific check condition code of the HP hardware, and because testing
has shown the HP path initialization command to be idempotent.
The number of times we retry is settable via the "pg_init_retries"
multipath map feature.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch adds the most basic dm-multipath hardware support for the
HP active/passive arrays.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch allows a failed path group initialisation command to be retried.
It adds a generic MP_RETRY flag and a "pg_init_retries" feature to
device-mapper multipath which limits the number of retries.
1. A hw handler sends a path initialization command to the storage and
the command completes with an error code indicating the command
should be retried.
2. The hardware handler calls dm_pg_init_complete() with MP_RETRY
set in err_flags to ask the dm multipath core to retry.
3. If the retry limit has not been exceeded, pg_init() is retried.
Otherwise fail_path() is called.
If you are using the userspace multipath-tools or device-mapper-multipath
package, you can set pg_init_retries in the 'device' section of your
/etc/multipath.conf file. For example:
features "2 pg_init_retries 7"
The number of PG retries attempted is reported in the 'dmsetup status' output.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Replace numbers with names in labels in error paths, to avoid confusion
when new one get added between existing ones.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Clean up, convert some spaces to tabs.
No functional change here.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Add post-processing queue (per crypt device) for read operations.
Current implementation uses only one queue for all operations
and this can lead to starvation caused by many requests waiting
for memory allocation. But the needed memory-releasing operation
is queued after these requests (in the same queue).
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Use a separate single-threaded workqueue for each crypt device
instead of one global workqueue.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Remove BIO_LIST and DEFINE_BIO_LIST macros that gain us nothing
since contents are initialised to NULL.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
In drivers/md/dm-ioctl.c::copy_params() there's a call to vmalloc()
where we currently cast the return value, but that's pretty pointless
given that vmalloc() returns "void *".
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Use bio_io_error() in only two places and tidy the code,
preparing for later patches.
There is no functional change in this patch.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Kcopyd uses a semaphore as mutex. Use the mutex API instead of the (binary)
semaphore,
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Replacing n & (n - 1) for power of 2 check by is_power_of_2(n)
Signed-off-by: vignesh babu <vignesh.babu@wipro.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch fixes a bd_mount_sem counter corruption bug in device-mapper.
thaw_bdev() should be called only when freeze_bdev() was called for the
device.
Otherwise, thaw_bdev() will up bd_mount_sem and corrupt the semaphore counter.
struct block_device with the corrupted semaphore may remain in slab cache
and be reused later.
Attached patch will fix it by calling unlock_fs() instead.
unlock_fs() will determine whether it should call thaw_bdev()
by checking the device is frozen or not.
Easy reproducer is:
#!/bin/sh
while [ 1 ]; do
dmsetup --notable create a
dmsetup --nolockfs suspend a
dmsetup remove a
done
It's not easy to see the effect of corrupted semaphore.
So I have tested with putting printk below in bdev_alloc_inode():
if (atomic_read(&ei->bdev.bd_mount_sem.count) != 1)
printk(KERN_DEBUG "Incorrect semaphore count = %d (%p)\n",
atomic_read(&ei->bdev.bd_mount_sem.count),
&ei->bdev);
Without the patch, I saw something like:
Incorrect semaphore count = 17 (f2ab91c0)
With the patch, the message didn't appear.
The bug was introduced in 2.6.16 with this bug fix:
commit d9dde59ba0
Date: Fri Feb 24 13:04:24 2006 -0800
[PATCH] dm: missing bdput/thaw_bdev at removal
Need to unfreeze and release bdev otherwise the bdev inode with
inconsistent state is reused later and cause problem.
and backported to 2.6.15.5.
It occurs only in free_dev(), which is called only when the dm device is
removed. The buggy code is executed only if md->suspended_bdev is
non-NULL and that can happen only when the device was suspended without
noflush.
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
Fix missing space in dm-delay target status output
if separate read and write delay are configured.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Add a missing 'dm_put_device' in an error path in crypt target constructor.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Make size of dm_ioctl struct always 312 bytes on all supported
architectures.
This change retains compatibility with already-compiled code because
it uses an embedded offset to locate the payload that follows the
structure.
On 64-bit architectures there is no change at all; on 32-bit
we are increasing the size of dm-ioctl from 308 to 312 bytes.
Currently with 32-bit userspace / 64-bit kernel on x86_64
some ioctls (including rename, message) are incorrectly rejected
by the comparison against 'param + 1'. This breaks userspace
lvrename and multipath 'fail_if_no_path' changes, for example.
(BTW Device-mapper uses its own versioning and ignores the ioctl
size bits. Only the generic ioctl compat code on mixed arches
checks them, and that will continue to accept both sizes for now,
but we intend to list 308 as deprecated and eventually remove it.)
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Guido Guenther <agx@sigxcpu.org>
Cc: Kevin Corry <kevcorry@us.ibm.com>
Cc: stable@kernel.org