Commit Graph

153360 Commits

Author SHA1 Message Date
Mikulas Patocka 374bf7e7f6 dm: stripe support flush
Flush support for the stripe target.

This sets ti->num_flush_requests to the number of stripes and
remaps individual flush requests to the appropriate stripe devices.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:22 +01:00
Mikulas Patocka 433bcac564 dm: linear support flush
Flush support for the linear target.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:22 +01:00
Mikulas Patocka 52b1fd5a27 dm: send empty barriers to targets in dm_flush
Pass empty barrier flushes to the targets in dm_flush().

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:21 +01:00
Alasdair G Kergon 9015df24a8 dm: initialise tio in alloc_tio
Move repeated dm_target_io initialisation inside alloc_tio().

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:21 +01:00
Mikulas Patocka f9ab94cee3 dm: introduce num_flush_requests
Introduce num_flush_requests for a target to set to say how many flush
instructions (empty barriers) it wants to receive.  These are sent by
__clone_and_map_empty_barrier with map_info->flush_request going from 0
to (num_flush_requests - 1).

Old targets without flush support won't receive any flush requests.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:20 +01:00
Mikulas Patocka 27eaa14975 dm: remove check that prevents mapping empty bios
Remove the check that the size of the cloned bio is not zero because a
subsequent patch needs to send zero-sized barriers down this path.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:20 +01:00
Mikulas Patocka fdb9572b73 dm: remove EOPNOTSUPP for barriers
If the underlying device doesn't support barriers and dm receives a
barrier, it waits until all requests on that device drain so it no
longer needs to report -EOPNOTSUPP to the caller.

This patch deals with the confusing situation when moving a volume from
one physical device to another triggers an EOPNOTSUPP on a volume that
didn't report it before.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:19 +01:00
Mikulas Patocka 5aa2781d96 dm: store only first barrier error
With the following patches, more than one error can occur during
processing.  Change md->barrier_error so that only the first one is
recorded and returned to the caller.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:18 +01:00
Mikulas Patocka 2761e95fe4 dm: process requeue in dm_wq_work
If barrier request was returned with DM_ENDIO_REQUEUE,
requeue it in dm_wq_work instead of dec_pending.

This allows us to correctly handle a situation when some targets
are asking for a requeue and other targets signal an error.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:18 +01:00
Mikulas Patocka 531fe96364 dm: make dm_flush return void
Make dm_flush return void.

The first error during flush is stored in md->barrier_error instead.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:17 +01:00
Mikulas Patocka 32a926da5a dm: always hold bdev reference
Fix a potential deadlock when creating multiple snapshots by holding a
reference to struct block_device for the whole lifecycle of every dm
device instead of obtaining it independently at each point it is needed.

bdget_disk() was called while the device was being suspended, in
dm_suspend().  However there could be other devices already suspended,
for example when creating additional snapshots of a device. bdget_disk()
can wait for IO and allocate memory resulting in waiting for the
already-suspended device - deadlock.

This patch changes the code so that it gets the reference to struct
block_device when struct mapped_device is allocated and initialized in
alloc_dev() where it is always OK to allocate memory or wait for I/O.
It drops the reference when it is destroyed in free_dev().  Thus there
is no call to bdget_disk() while any device is suspended.

Previously unlock_fs() was called only if bdev was held.  Now it is
called unconditionally, but the superfluous calls are harmless because
it returns immediately if the filesystem was not previously frozen.

This patch also now allows the device size to be changed in a
noflush suspend because the bdev is held.  This has no adverse effect.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:17 +01:00
Mikulas Patocka db8fef4fab dm: rename suspended_bdev to bdev
Rename suspended_bdev to bdev.

This patch doesn't change any functionality, just renames the variable.
In the next patch, the variable will be used even for non-suspended device.

(Pre-requisite for the per-target barrier support patches.)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:15 +01:00
Jonathan Brassow f6bd4eb73c dm exception store: fix exstore lookup to be case insensitive
When snapshots are created using 'p' instead of 'P' as the
exception store type, the device-mapper table loading fails.

This patch makes the code case insensitive as intended and fixes some
regressions reported with device-mapper snapshots.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:15 +01:00
Mikulas Patocka 5657e8fa45 dm: use i_size_read
Use i_size_read() instead of reading i_size.

If someone changes the size of the device simultaneously, i_size_read
is guaranteed to return a valid value (either the old one or the new one).

i_size can return some intermediate invalid value (on 32-bit computers
with 64-bit i_size, the reads to both halves of i_size can be interleaved
with updates to i_size, resulting in garbage being returned).

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:14 +01:00
Mikulas Patocka 8cbeb67ad5 dm: avoid unsupported spanning of md stripe boundaries
A bio that has two or more vector entries, size less than or equal to
page size, that crosses a stripe boundary of an underlying md device is
accepted by device mapper (it conforms to all its limits) but not by the
underlying device.

The fix is: If device mapper selects the one-page maximum request size,
it also needs to set its own q->merge_bvec_fn to reject any bios with
multiple vector entries that span more pages.

The problem was discovered in the following scenario:
  * MD - RAID-0
  * LV on the top of it (raid1, snapshot or striped with chunk
size/stripe larger than RAID-0 stripe)
  * one of the logical volumes is exported to xen domU
  * inside xen domU it is partitioned, the key point is that the partition
must be unaligned on page boundary (fdisk normally aligns the partition to
63 sectors which will trigger it)
  * install the system on the partitioned disk in domU
This causes I/O failures in dom0.
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=223947

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:14 +01:00
Mikulas Patocka 53b351f972 dm mpath: flush keventd queue in destructor
The commit fe9cf30eb8 moves dm table event
submission from kmultipath queue to kernel kevent queue to avoid a
deadlock.

There is a possibility of race condition because kevent queue is not flushed
in the multipath destructor. The scenario is:
- some event happens and is queued to keventd
- keventd thread is delayed due to scheuling latency or some other work
- multipath device is destroyed
- keventd now attempts to process work_struct that is residing in already
  released memory.

The patch flushes the keventd queue in multipath constructor.
I've already fixed similar bug in dm-raid1.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
2009-06-22 10:12:13 +01:00
Mikulas Patocka a72986c562 dm raid1: keep retrying alloc if mempool_alloc failed
If the code can't handle allocation failures, use __GFP_NOFAIL so that
in case of memory pressure the allocator will retry indefinitely and
won't return NULL which would cause a crash in the function.

This is still not a correct fix, it may cause a classic deadlock when
memory manager waits for I/O being done and I/O waits for some free memory.
I/O code shouldn't allocate any memory. But in this case it probably
doesn't matter much in practice, people usually do not swap on RAID.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:13 +01:00
Chandra Seetharaman e54f77ddda dm mpath: call activate fn for each path in pg_init
Fixed a problem affecting reinstatement of passive paths.

Before we moved the hardware handler from dm to SCSI, it performed a pg_init
for a path group and didn't maintain any state about each path in hardware
handler code.

But in SCSI dh, such state is now maintained, as we want to fail I/O early on a
path if it is not the active path.

All the hardware handlers have a state now and set to active or some form of
inactive.  They have prep_fn() which uses this state to fail the I/O without
it ever being sent to the device.

So in effect when dm-multipath calls scsi_dh_activate(), activate is
sent to only one path and the "state" of that path is changed appropriately
to "active" while other paths in the same path group are never changed
as they never got an "activate".

In order make sure all the paths in a path group gets their state set
properly when a pg_init happens, we need to call scsi_dh_activate() on
all paths in a path group.

Doing this at the hardware handler layer is not a good option as we
want the multipath layer to define the relationship between path and path
groups and not the hardware handler.

Attached patch sends an "activate" on each path in a path group when a
path group is switched. It also sends an activate when a path is reinstated.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:12 +01:00
Hannes Reinecke a0cf7ea954 dm mpath: change attached scsi_dh
When specifying a different hardware handler via multipath
features we should be able to override the built-in defaults.

The problem here is the hardware table from scsi_dh is compiled
in and cannot be changed from userland. The multipath.conf OTOH
is purely user-defined and, what's more, the user might have a valid
reason for modifying it.
(EG EMC Clariion can well be run in PNR mode even though ALUA is
active, or the user might want to try ALUA on any as-of-yet unknown
devices)

So _not_ allowing multipath to override the device handler setting
will just add to the confusion and makes error tracking even more
difficult.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:11 +01:00
Milan Broz 4d89b7b4e4 dm: sysfs skip output when device is being destroyed
Do not process sysfs attributes when device is being destroyed.

Otherwise code can cause
  BUG_ON(test_bit(DMF_FREEING, &md->flags));
in dm_put() call.

Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:11 +01:00
Mikulas Patocka e094f4f15f dm mpath: validate hw_handler argument count
Fix arg count parsing error in hw handlers.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:12:10 +01:00
Mikulas Patocka 0e0497c0c0 dm mpath: validate table argument count
The parser reads the argument count as a number but doesn't check that
sufficient arguments are supplied. This command triggers the bug:

dmsetup create mpath --table "0 `blockdev --getsize /dev/mapper/cr0`
    multipath 0 0 2 1 round-robin 1000 0 1 1 /dev/mapper/cr0
    round-robin 0 1 1 /dev/mapper/cr1 1000"
kernel BUG at drivers/md/dm-mpath.c:530!

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22 10:08:02 +01:00
Linus Torvalds f234012f52 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
  sdhci: remove needless double parenthesis
  sdhci: Specific quirk vor VIA SDHCI controller in VX855ES
  s3cmci: fix dma configuration call
  mmc: Add new via-sdmmc host controller driver
  sdhci: Add support for hosts that are only capable of 1-bit transfers
  MAINTAINERS: add myself as atmel-mci maintainer (sd/mmc interface)
  sdhci: Add SDHCI_QUIRK_NO_MULTIBLOCK quirk
  sdhci: Add better ADMA error reporting
  sdhci-s3c: Samsung S3C based SDHCI controller glue
2009-06-21 13:14:22 -07:00
Linus Torvalds 00d94a6a5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: aes-ni - Remove CRYPTO_TFM_REQ_MAY_SLEEP from fpu template
  crypto: aes-ni - Do not sleep when using the FPU
  crypto: aes-ni - Fix cbc mode IV saving
  crypto: padlock-aes - work around Nano CPU errata in CBC mode
  crypto: padlock-aes - work around Nano CPU errata in ECB mode
2009-06-21 13:14:07 -07:00
Linus Torvalds 8b12e2505a Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: Select frame pointers on x86
  dma-debug: be more careful when building reference entries
  dma-debug: check for sg_call_ents in best-fit algorithm too
2009-06-21 13:13:53 -07:00
Linus Torvalds 413318444f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Add model=6530g option
  ALSA: hda - Acer Inspire 6530G model for Realtek ALC888
  ALSA: snd_usb_caiaq: fix legacy input streaming
  ASoC: Kill BUS_ID_SIZE
  ALSA: HDA - Correct trivial typos in comments.
  ALSA: HDA - Name-fixes in code (tagra/targa)
  ALSA: HDA - Add pci-quirk for MSI MS-7350 motherboard.
  ALSA: hda - Fix memory leak at codec creation
2009-06-21 13:13:08 -07:00
Linus Torvalds d06063cc22 Move FAULT_FLAG_xyz into handle_mm_fault() callers
This allows the callers to now pass down the full set of FAULT_FLAG_xyz
flags to handle_mm_fault().  All callers have been (mechanically)
converted to the new calling convention, there's almost certainly room
for architectures to clean up their code and then add FAULT_FLAG_RETRY
when that support is added.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-21 13:08:22 -07:00
Linus Torvalds 30c9f3a9fa Remove internal use of 'write_access' in mm/memory.c
The fault handling routines really want more fine-grained flags than a
single "was it a write fault" boolean - the callers will want to set
flags like "you can return a retry error" etc.

And that's actually how the VM works internally, but right now the
top-level fault handling functions in mm/memory.c all pass just the
'write_access' boolean around.

This switches them over to pass around the FAULT_FLAG_xyzzy 'flags'
variable instead.  The 'write_access' calling convention still exists
for the exported 'handle_mm_fault()' function, but that is next.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-21 13:06:05 -07:00
Johannes Weiner 232086b199 ipc: unbreak 32-bit shmctl/semctl/msgctl
31a985f "ipc: use __ARCH_WANT_IPC_PARSE_VERSION in ipc/util.h" would
choose the implementation of ipc_parse_version() based on a symbol
defined in <asm/unistd.h>.

But it failed to also include this header and thus broke
IPC_64-passing 32-bit userspace because the flag wasn't masked out
properly anymore and the command not understood.

Include <linux/unistd.h> to give the architecture a chance to ask for
the no-no-op ipc_parse_version().

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-21 12:48:43 -07:00
Pierre Ossman 11a2f1b78a sdhci: remove needless double parenthesis
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
2009-06-21 21:01:00 +02:00
Harald Welte 557b06971b sdhci: Specific quirk vor VIA SDHCI controller in VX855ES
The SDHCI controller found in the VX855ES requires 10ms
delay between applying power and applying clock.

This issue has been discovered and documented by the OLPC XO1.5 team.

Signed-off-by: Harald Welte <HaraldWelte@viatech.com>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
2009-06-21 21:00:59 +02:00
Ben Dooks fe9db6cbf1 s3cmci: fix dma configuration call
This was missed in the DMA changes during the s3c24xx
updates in commit 8970ef47d5.

Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
2009-06-21 21:00:59 +02:00
Harald Welte f0bf7f61b8 mmc: Add new via-sdmmc host controller driver
This adds the via-sdmmc driver for the SD/MMC-controller of VIA,
which is found in a number of recent integrated VIA chipset
products.

Signed-off-by: Harald Welte <HaraldWelte@viatech.com>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
2009-06-21 21:00:59 +02:00
Anton Vorontsov 5fe23c7f51 sdhci: Add support for hosts that are only capable of 1-bit transfers
Some hosts (hardware configurations, or particular SD/MMC slots) may
not support 4-bit bus. For example, on MPC8569E-MDS boards we can
switch between serial (1-bit only) and nibble (4-bit) modes, thought
we have to disable more peripherals to work in 4-bit mode.

Along with some small core changes, this patch modifies sdhci-of
driver, so that now it looks for "sdhci,1-bit-only" property in the
device-tree, and if specified we enable a proper quirk.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
2009-06-21 21:00:59 +02:00
Nicolas Ferre 04ac2f46d6 MAINTAINERS: add myself as atmel-mci maintainer (sd/mmc interface)
Add MAINTAINERS entry for atmel-mci driver.
This driver was maintained by its author: Haavard Skinnemoen. I take the
maintainance of it.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
2009-06-21 21:00:58 +02:00
Ben Dooks 1388eefd5a sdhci: Add SDHCI_QUIRK_NO_MULTIBLOCK quirk
Add quirk to show the controller cannot do multi-block IO.

This is mainly for the Samsung SDHCI controller that currently
cannot manage to do multi-block PIO without timing out.

Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
2009-06-21 21:00:58 +02:00
Ben Dooks 6882a8c071 sdhci: Add better ADMA error reporting
Update the ADMA error reporting to not only show the
overall controller state but also to print the ADMA
descriptor list.

Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
2009-06-21 21:00:58 +02:00
Ben Dooks 0d1bb41ad4 sdhci-s3c: Samsung S3C based SDHCI controller glue
Add support for the 'HSMMC' block(s) in the Samsung SoC
line. These are compatible with the SDHCI driver so add
the necessary setup and driver binding for the platform
devices.

Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
2009-06-21 21:00:57 +02:00
Takashi Iwai 47166281d2 Merge branch 'topic/hda' into for-linus
* topic/hda:
  ALSA: hda - Add model=6530g option
  ALSA: hda - Acer Inspire 6530G model for Realtek ALC888
  ALSA: HDA - Correct trivial typos in comments.
  ALSA: HDA - Name-fixes in code (tagra/targa)
  ALSA: HDA - Add pci-quirk for MSI MS-7350 motherboard.
  ALSA: hda - Fix memory leak at codec creation
2009-06-21 10:59:12 +02:00
Takashi Iwai 0b6306f69f Merge branch 'topic/caiaq' into for-linus
* topic/caiaq:
  ALSA: snd_usb_caiaq: fix legacy input streaming
2009-06-21 10:59:10 +02:00
Takashi Iwai 9fd0d96e79 Merge branch 'topic/asoc' into for-linus
* topic/asoc:
  ASoC: Kill BUS_ID_SIZE
2009-06-21 10:59:04 +02:00
Takashi Iwai b1a914690c ALSA: hda - Add model=6530g option
Add the new model string corresponding to the previous Acer Aspire
6530G support.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-06-21 10:57:16 +02:00
Tony Vroon d2fd4b09c0 ALSA: hda - Acer Inspire 6530G model for Realtek ALC888
The selected 4930G model seemed to keep the subwoofer 'tuba'
function from operating correctly. Removing the existing PCI
ID match made this work again, but it was mapped to 'Side'
instead of to LFE as one would expect.
This attempts to enable all functionality and keep the amount
of available mixer sliders low. Any slider that had no audible
effect on the output audio has been removed, and as such EAPD
is not currently enabled.

Signed-off-by: Tony Vroon <tony@linx.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-06-21 10:52:14 +02:00
Peter Zijlstra 00540e5d54 lockdep: Select frame pointers on x86
x86 stack traces are a piece of crap without frame pointers, and its not
like the 'performance gain' of not having stack pointers matters when you
selected lockdep.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <new-submission>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-21 10:14:33 +02:00
Johannes Weiner c277331d5f mm: page_alloc: clear PG_locked before checking flags on free
da456f1 "page allocator: do not disable interrupts in free_page_mlock()" moved
the PG_mlocked clearing after the flag sanity checking which makes mlocked
pages always trigger 'bad page'.  Fix this by clearing the bit up front.

Reported--and-debugged-by: Peter Chubb <peter.chubb@nicta.com.au>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Tested-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-20 16:08:22 -07:00
Linus Torvalds 9063c61fd5 x86, 64-bit: Clean up user address masking
The discussion about using "access_ok()" in get_user_pages_fast() (see
commit 7f8189068726492950bf1a2dcfd9b51314560abf: "x86: don't use
'access_ok()' as a range check in get_user_pages_fast()" for details and
end result), made us notice that x86-64 was really being very sloppy
about virtual address checking.

So be way more careful and straightforward about masking x86-64 virtual
addresses:

 - All the VIRTUAL_MASK* variants now cover half of the address
   space, it's not like we can use the full mask on a signed
   integer, and the larger mask just invites mistakes when
   applying it to either half of the 48-bit address space.

 - /proc/kcore's kc_offset_to_vaddr() becomes a lot more
   obvious when it transforms a file offset into a
   (kernel-half) virtual address.

 - Unify/simplify the 32-bit and 64-bit USER_DS definition to
   be based on TASK_SIZE_MAX.

This cleanup and more careful/obvious user virtual address checking also
uncovered a buglet in the x86-64 implementation of strnlen_user(): it
would do an "access_ok()" check on the whole potential area, even if the
string itself was much shorter, and thus return an error even for valid
strings. Our sloppy checking had hidden this.

So this fixes 'strnlen_user()' to do this properly, the same way we
already handled user strings in 'strncpy_from_user()'.  Namely by just
checking the first byte, and then relying on fault handling for the
rest.  That always works, since we impose a guard page that cannot be
mapped at the end of the user space address space (and even if we
didn't, we'd have the address space hole).

Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-20 15:40:00 -07:00
Linus Torvalds 2453d6ff6f Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq, irq.h: Fix kernel-doc warnings
  genirq: fix comment to say IRQ_WAKE_THREAD
2009-06-20 11:30:01 -07:00
Linus Torvalds 12e24f34cb Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  perfcounter: Handle some IO return values
  perf_counter: Push perf_sample_data through the swcounter code
  perf_counter tools: Define and use our own u64, s64 etc. definitions
  perf_counter: Close race in perf_lock_task_context()
  perf_counter, x86: Improve interactions with fast-gup
  perf_counter: Simplify and fix task migration counting
  perf_counter tools: Add a data file header
  perf_counter: Update userspace callchain sampling uses
  perf_counter: Make callchain samples extensible
  perf report: Filter to parent set by default
  perf_counter tools: Handle lost events
  perf_counter: Add event overlow handling
  fs: Provide empty .set_page_dirty() aop for anon inodes
  perf_counter: tools: Makefile tweaks for 64-bit powerpc
  perf_counter: powerpc: Add processor back-end for MPC7450 family
  perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
  perf_counter: powerpc: Change how processor-specific back-ends get selected
  perf_counter: powerpc: Use unsigned long for register and constraint values
  perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
  perf_counter tools: Add and use isprint()
  ...
2009-06-20 11:29:32 -07:00
Linus Torvalds 1eb51c33b2 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix out of scope variable access in sched_slice()
  sched: Hide runqueues from direct refer at source code level
  sched: Remove unneeded __ref tag
  sched, x86: Fix cpufreq + sched_clock() TSC scaling
2009-06-20 10:57:40 -07:00
Linus Torvalds b0b7065b64 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
  tracing/urgent: warn in case of ftrace_start_up inbalance
  tracing/urgent: fix unbalanced ftrace_start_up
  function-graph: add stack frame test
  function-graph: disable when both x86_32 and optimize for size are configured
  ring-buffer: have benchmark test print to trace buffer
  ring-buffer: do not grab locks in nmi
  ring-buffer: add locks around rb_per_cpu_empty
  ring-buffer: check for less than two in size allocation
  ring-buffer: remove useless compile check for buffer_page size
  ring-buffer: remove useless warn on check
  ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
  tracing: update sample event documentation
  tracing/filters: fix race between filter setting and module unload
  tracing/filters: free filter_string in destroy_preds()
  ring-buffer: use commit counters for commit pointer accounting
  ring-buffer: remove unused variable
  ring-buffer: have benchmark test handle discarded events
  ring-buffer: prevent adding write in discarded area
  tracing/filters: strloc should be unsigned short
  tracing/filters: operand can be negative
  ...

Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
2009-06-20 10:56:46 -07:00