Commit Graph

413268 Commits

Author SHA1 Message Date
Ilya Dryomov 9a3b490a20 crush: use breadth-first search for indep mode
Reflects ceph.git commit 86e978036a4ecbac4c875e7c00f6c5bbe37282d3.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31 20:32:16 +02:00
Ilya Dryomov c6d98a603a crush: return CRUSH_ITEM_UNDEF for failed placements with indep
For firstn mode, if we fail to make a valid placement choice, we just
continue and return a short result to the caller.  For indep mode, however,
we need to make the position stable, and return an undefined value on
failed placements to avoid shifting later results to the left.

Reflects ceph.git commit b1d4dd4eb044875874a1d01c01c7d766db5d0a80.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31 20:32:15 +02:00
Ilya Dryomov e8ef19c4ad crush: eliminate CRUSH_MAX_SET result size limitation
This is only present to size the temporary scratch arrays that we put on
the stack.  Let the caller allocate them as they wish and remove the
limitation.

Reflects ceph.git commit 1cfe140bf2dab99517589a82a916f4c75b9492d1.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31 20:32:14 +02:00
Ilya Dryomov 2a4ba74ef6 crush: fix some comments
Reflects ceph.git commit 3cef755428761f2481b1dd0e0fbd0464ac483fc5.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31 20:32:13 +02:00
Ilya Dryomov 8f99c85b7a crush: reduce scope of some local variables
Reflects ceph.git commit e7d47827f0333c96ad43d257607fb92ed4176550.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31 20:32:12 +02:00
Ilya Dryomov bfb16d7d69 crush: factor out (trivial) crush_destroy_rule()
Reflects ceph.git commit 43a01c9973c4b83f2eaa98be87429941a227ddde.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31 20:32:11 +02:00
Ilya Dryomov b3b33b0e43 crush: pass weight vector size to map function
Pass the size of the weight vector into crush_do_rule() to ensure that we
don't access values past the end.  This can happen if the caller misbehaves
and passes a weight vector that is smaller than max_devices.

Currently the monitor tries to prevent that from happening, but this will
gracefully tolerate previous bad osdmaps that got into this state.  It's
also a bit more defensive.

Reflects ceph.git commit 5922e2c2b8335b5e46c9504349c3a55b7434c01a.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31 20:32:10 +02:00
Ilya Dryomov 2b3e0c905a libceph: update ceph_features.h
This updates ceph_features.h so that it has all feature bits defined in
ceph.git.  In the interim since the last update, ceph.git crossed the
"32 feature bits" point, and, the addition of the 33rd bit wasn't
handled correctly.  The work-around is squashed into this commit and
reflects ceph.git commit 053659d05e0349053ef703b414f44965f368b9f0.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31 20:32:09 +02:00
Ilya Dryomov 12b4629a9f libceph: all features fields must be u64
In preparation for ceph_features.h update, change all features fields
from unsigned int/u32 to u64.  (ceph.git has ~40 feature bits at this
point.)

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31 20:32:08 +02:00
Ilya Dryomov e37180c0f2 rbd: tear down watch request if rbd_dev_device_setup() fails
Tear down watch request if rbd_dev_device_setup() fails.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31 20:32:07 +02:00
Ilya Dryomov fca2706539 rbd: introduce rbd_dev_header_unwatch_sync() and switch to it
Rename rbd_dev_header_watch_sync() to __rbd_dev_header_watch_sync() and
introduce two helpers: rbd_dev_header_{,un}watch_sync() to make it more
clear what is going on.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31 20:32:06 +02:00
Alex Elder 527a88b9ef MAINTAINERS: update an e-mail address
I no longer have direct access to my Inktank e-mail.  I still pay
attention to rbd, so update its entry in MAINTAINERS accordingly.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-31 20:32:05 +02:00
Ilya Dryomov 7e513d4366 rbd: enable extended devt in single-major mode
If single-major device number allocation scheme is turned on, instead
of reserving 256 minors per device, which imposes a limit of 4096
images mapped at once, reserve 16 minors per device and enable extended
devt feature.  This results in a theoretical limit of 65536 images
mapped at once, and still allows to have more than 15 partititions:
partitions starting with 16th are mapped under major 259 (Block
Extended Major):

$ rbd showmapped
id pool image snap device
0  rbd  b5    -    /dev/rbd0    # no partitions
1  rbd  b2    -    /dev/rbd1    # 40 partitions
2  rbd  b3    -    /dev/rbd2    #  2 partitions

$ cat /proc/partitions
 251        0       1024 rbd0
 251       16       1024 rbd1
 251       17          0 rbd1p1
 251       18          0 rbd1p2
 ...
 251       30          0 rbd1p14
 251       31          0 rbd1p15
 259        0          0 rbd1p16
 259        1          0 rbd1p17
 ...
 259       23          0 rbd1p39
 259       24          0 rbd1p40
 251       32       1024 rbd2
 251       33          0 rbd2p1
 251       34          0 rbd2p2

(major 251 was assigned dynamically at module load time)

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31 20:32:04 +02:00
Li Wang 183028052b ceph fscache: Uncaching no data page from fscache in readpage()
Currently, if one new page allocated into fscache in readpage(), however,
with no data read into due to error encountered during reading from OSDs,
the slot in fscache is not uncached. This patch fixes this.

Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Milosz Tanski <milosz@adfin.com>
2013-12-31 20:32:03 +02:00
Li Wang 3f42bc4bea ceph fscache: Introduce a routine for uncaching single no data page from fscache
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Milosz Tanski <milosz@adfin.com>
2013-12-31 20:32:02 +02:00
Guangliang Zhao 7221fe4c2e ceph: add acl for cephfs
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Reviewed-by: Li Wang <li.wang@ubuntykylin.com>
Reviewed-by: Zheng Yan <zheng.z.yan@intel.com>
2013-12-31 20:32:01 +02:00
Yan, Zheng 61f6881621 ceph: check caps in filemap_fault and page_mkwrite
Adds cap check to the page fault handler. The check prevents page
fault handler from adding new page to the page cache while Fcb caps
are being revoked. This solves Fc revoking hang in multiple clients
mmap IO workload.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-31 20:32:00 +02:00
Ilya Dryomov 9b60e70b3b rbd: add support for single-major device number allocation scheme
Currently each rbd device is allocated its own major number, which
leads to a hard limit of 230-250 images mapped at once.  This commit
adds support for a new single-major device number allocation scheme,
which is hidden behind a new single_major boolean module parameter and
is disabled by default for backwards compatibility reasons.  (Old
userspace cannot correctly unmap images mapped under single-major
scheme and would essentially just unmap a random image, if that.)

$ rbd showmapped
id pool image snap device
0  rbd  b100  -    /dev/rbd0
1  rbd  b101  -    /dev/rbd1
2  rbd  b102  -    /dev/rbd2
3  rbd  b103  -    /dev/rbd3

Old scheme (modprobe rbd):

$ ls -l /dev/rbd*
brw-rw---- 1 root disk 253, 0 Dec 10 12:24 /dev/rbd0
brw-rw---- 1 root disk 252, 0 Dec 10 12:28 /dev/rbd1
brw-rw---- 1 root disk 252, 1 Dec 10 12:28 /dev/rbd1p1
brw-rw---- 1 root disk 252, 2 Dec 10 12:28 /dev/rbd1p2
brw-rw---- 1 root disk 252, 3 Dec 10 12:28 /dev/rbd1p3
brw-rw---- 1 root disk 251, 0 Dec 10 12:28 /dev/rbd2
brw-rw---- 1 root disk 251, 1 Dec 10 12:28 /dev/rbd2p1
brw-rw---- 1 root disk 250, 0 Dec 10 12:24 /dev/rbd3

New scheme (modprobe rbd single_major=Y):

$ ls -l /dev/rbd*
brw-rw---- 1 root disk 253,   0 Dec 10 12:30 /dev/rbd0
brw-rw---- 1 root disk 253, 256 Dec 10 12:30 /dev/rbd1
brw-rw---- 1 root disk 253, 257 Dec 10 12:30 /dev/rbd1p1
brw-rw---- 1 root disk 253, 258 Dec 10 12:30 /dev/rbd1p2
brw-rw---- 1 root disk 253, 259 Dec 10 12:30 /dev/rbd1p3
brw-rw---- 1 root disk 253, 512 Dec 10 12:30 /dev/rbd2
brw-rw---- 1 root disk 253, 513 Dec 10 12:30 /dev/rbd2p1
brw-rw---- 1 root disk 253, 768 Dec 10 12:30 /dev/rbd3

(major 253 was assigned dynamically at module load time)

The new limit is 4096 images mapped at once, and it comes from the fact
that, as before, 256 minor numbers are reserved for each mapping.
(A follow-up commit changes the number of minors reserved and the way
we deal with partitions over that number.)

If single_major is set to true, two new sysfs interfaces show up:
/sys/bus/rbd/{add,remove}_single_major.  These are to be used instead
of /sys/bus/rbd/{add,remove}, which are disabled for backwards
compatibility reasons outlined above.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31 20:31:59 +02:00
Ilya Dryomov 92c76dc036 rbd: wire up is_visible() sysfs callback for rbd bus
In preparation for single-major device number allocation scheme, wire
up attribute_group::is_visible() callback for rbd bus.  This allows us
to make the new single-major attributes conditional.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31 20:31:58 +02:00
Ilya Dryomov dd82fff1e8 rbd: add 'minor' sysfs rbd device attribute
Introduce /sys/bus/rbd/devices/<id>/minor sysfs attribute for exporting
rbd whole disk minor numbers.  This is a step towards single-major
device number allocation scheme, but also a good thing on its own.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31 20:31:57 +02:00
Ilya Dryomov f8a22fc238 rbd: switch to ida for rbd id assignments
Currently rbd ids are allocated using an atomic variable that keeps
track of the highest id currently in use and each new id is simply one
more than the value of that variable.  That's nice and cheap, but it
does mean that rbd ids are allowed to grow boundlessly, and, more
importantly, it's completely unpredictable.  So, in preparation for
single-major device number allocation scheme, which is going to
establish and rely on a constant mapping between rbd ids and device
numbers, switch to ida for rbd id assignments.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31 20:31:56 +02:00
Ilya Dryomov e1b4d96dea rbd: refactor rbd_init() a bit
Refactor rbd_init() a bit to make it more clear what's going on.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31 20:31:55 +02:00
Ilya Dryomov 90da258b88 rbd: tweak "loaded" message and module description
Tweak "loaded" message, so that it looks like

[   30.184235] rbd: loaded

instead of

[   38.056564] rbd: loaded rbd (rados block device)

Also move (and slightly tweak) MODULE_DESCRIPTION so that all authors
are next to each other in modinfo output.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31 20:31:54 +02:00
Ilya Dryomov 70eebd2006 rbd: rbd_device::dev_id is an int, format it as such
rbd_device::dev_id is an int, format it as such.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31 20:31:53 +02:00
Josh Durgin 9a1ea2dbff libceph: resend all writes after the osdmap loses the full flag
With the current full handling, there is a race between osds and
clients getting the first map marked full. If the osd wins, it will
return -ENOSPC to any writes, but the client may already have writes
in flight. This results in the client getting the error and
propagating it up the stack. For rbd, the block layer turns this into
EIO, which can cause corruption in filesystems above it.

To avoid this race, osds are being changed to drop writes that came
from clients with an osdmap older than the last osdmap marked full.
In order for this to work, clients must resend all writes after they
encounter a full -> not full transition in the osdmap. osds will wait
for an updated map instead of processing a request from a client with
a newer map, so resent writes will not be dropped by the osd unless
there is another not full -> full transition.

This approach requires both osds and clients to be fixed to avoid the
race. Old clients talking to osds with this fix may hang instead of
returning EIO and potentially corrupting an fs. New clients talking to
old osds have the same behavior as before if they encounter this race.

Fixes: http://tracker.ceph.com/issues/6938

Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-13 23:04:28 +02:00
Josh Durgin d29adb34a9 libceph: block I/O when PAUSE or FULL osd map flags are set
The PAUSEWR and PAUSERD flags are meant to stop the cluster from
processing writes and reads, respectively. The FULL flag is set when
the cluster determines that it is out of space, and will no longer
process writes.  PAUSEWR and PAUSERD are purely client-side settings
already implemented in userspace clients. The osd does nothing special
with these flags.

When the FULL flag is set, however, the osd responds to all writes
with -ENOSPC. For cephfs, this makes sense, but for rbd the block
layer translates this into EIO.  If a cluster goes from full to
non-full quickly, a filesystem on top of rbd will not behave well,
since some writes succeed while others get EIO.

Fix this by blocking any writes when the FULL flag is set in the osd
client. This is the same strategy used by userspace, so apply it by
default.  A follow-on patch makes this configurable.

__map_request() is called to re-target osd requests in case the
available osds changed.  Add a paused field to a ceph_osd_request, and
set it whenever an appropriate osd map flag is set.  Avoid queueing
paused requests in __map_request(), but force them to be resent if
they become unpaused.

Also subscribe to the next osd map from the monitor if any of these
flags are set, so paused requests can be unblocked as soon as
possible.

Fixes: http://tracker.ceph.com/issues/6079

Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-13 09:13:29 -08:00
Libo Chen aa8b60e077 fs: ceph: new helper: file_inode(file)
Signed-off-by: Libo Chen <clbchenlibo.chen@huawei.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 09:13:29 -08:00
Li Wang 37c89bde5d ceph: Add necessary clean up if invalid reply received in handle_reply()
Wake up possible waiters, invoke the call back if any, unregister the request

Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 09:13:29 -08:00
Li Wang f36132a75a ceph: Clean up if error occurred in finish_read()
Clean up if error occurred rather than going through normal process

Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 09:13:28 -08:00
majianpeng 8eb4efb091 ceph: implement readv/preadv for sync operation
For readv/preadv sync-operatoin, ceph only do the first iov.
Now implement this.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-12-13 09:13:17 -08:00
majianpeng e8344e6689 ceph: Implement writev/pwritev for sync operation.
For writev/pwritev sync-operatoin, ceph only do the first iov.

I divided the write-sync-operation into two functions. One for
direct-write, other for none-direct-sync-write. This is because for
none-direct-sync-write we can merge iovs to one. But for direct-write,
we can't merge iovs.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 09:13:17 -08:00
Yan, Zheng 9f12bd119e ceph: drop unconnected inodes
Positve dentry and corresponding inode are always accompanied in MDS reply.
So no need to keep inode in the cache after dropping all its aliases.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-13 09:13:16 -08:00
Li Wang 56f91aad69 ceph: Avoid data inconsistency due to d-cache aliasing in readpage()
If the length of data to be read in readpage() is exactly
PAGE_CACHE_SIZE, the original code does not flush d-cache
for data consistency after finishing reading. This patches fixes
this.

Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13 09:11:38 -08:00
Yan, Zheng 86b58d1313 ceph: initialize inode before instantiating dentry
commit b18825a7c8 (Put a small type field into struct dentry::d_flags)
put a type field into struct dentry::d_flags. __d_instantiate() set the
field by checking inode->i_mode. So we should initialize inode before
instantiating dentry when handling mds reply.

Fixes: http://tracker.ceph.com/issues/6930
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-13 09:11:36 -08:00
Linus Torvalds 374b105797 Linux 3.13-rc3 2013-12-06 09:34:04 -08:00
Linus Torvalds 843f4f4bb1 A regression showed up that there's a large delay when enabling
all events. This was prevalent when FTRACE_SELFTEST was enabled which
 enables all events several times, and caused the system bootup to
 pause for over a minute.
 
 This was tracked down to an addition of a synchronize_sched() performed
 when system call tracepoints are unregistered.
 
 The synchronize_sched() is needed between the unregistering of the
 system call tracepoint and a deletion of a tracing instance buffer.
 But placing the synchronize_sched() in the unreg of *every* system call
 tracepoint is a bit overboard. A single synchronize_sched() before
 the deletion of the instance is sufficient.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSofYNAAoJEKQekfcNnQGuTUcIAJOx8745KlkFjN4VX+nCWNfP
 xrrpnLBymbGMA9lQ4fXk+kdiuhH8DjRYdKq9fU4T481MFYKkToUIZH6NeaLI5fr1
 0nBPPjVyAlJ+yt9JbOAYa1jEYnAr27ORDHEtdQnqb6OJSky3oh9jCQi+toxmh2qX
 Sv1tIYeAf3K2V/h5xt6uSl9oiZ6KBtwE3f+xkHWNizaU9i2rq2gxd77fSbPNTIps
 wLdsESYziA2UeAm13eh8xXo1uqRbfvx7bPr59cu0+3AqdOoaXFG+JoE/MqmljIO5
 HkyCKJVqP8HD+QieuEB+hw4zpSsl6A6iKSlaiHm8NMnRRocfM6qMKMQXQ28KhxA=
 =sYm/
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "A regression showed up that there's a large delay when enabling all
  events.  This was prevalent when FTRACE_SELFTEST was enabled which
  enables all events several times, and caused the system bootup to
  pause for over a minute.

  This was tracked down to an addition of a synchronize_sched()
  performed when system call tracepoints are unregistered.

  The synchronize_sched() is needed between the unregistering of the
  system call tracepoint and a deletion of a tracing instance buffer.
  But placing the synchronize_sched() in the unreg of *every* system
  call tracepoint is a bit overboard.  A single synchronize_sched()
  before the deletion of the instance is sufficient"

* tag 'trace-fixes-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Only run synchronize_sched() at instance deletion time
2013-12-06 08:34:16 -08:00
Linus Torvalds c537aba00e Merge git://git.kvack.org/~bcrl/aio-next
Pull aio fix from Benjamin LaHaise:
 "AIO fix from Gu Zheng that fixes a GPF that Dave Jones uncovered with
  trinity"

* git://git.kvack.org/~bcrl/aio-next:
  aio: clean up aio ring in the fail path
2013-12-06 08:32:59 -08:00
Linus Torvalds 7adfff587b SCSI fixes on 20131206
This is a set of nine fixes (and one author update). The libsas one should fix
 discovery in eSATA devices, the WRITE_SAME one is the largest, but it should
 fix a lot of problems we've been getting with the emulated RAID devices
 (they've been effectively lying about support and then firmware has been
 choking on the commands).  The rest are various crash, hang or warn driver
 fixes.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSoeU/AAoJEDeqqVYsXL0McPkH/0PdSofj8emMfravBWM5knWk
 YPc+JnEYF+LaOmBYE7X4mEBGq4utbRll5IuFPxuw19X7cWS3A0uOE5cBvzOWKOIw
 5xjGx5ik6rd04SRl/8wdERTtXJWUcTQFICjS+2IKogUyBuWgPOr+Ked2aDt2UXCQ
 zz1QkpLvE9ogKvVo8uuRynFDu7yCVCQB6QFAEiVxKSe+8IphfORQ3co3tczhWwPY
 Tp030xHEQFV6fKAO1ZnrtImih3tj5gp4Sp73x54FyjCYajWj95XkSplhAvGE/mXI
 xEGbtpuxHkVuS/+u1ziRuL3cVrgZvXRPyZAj3GSB4lOkrqA8danTStb+oRYbGCc=
 =nIqW
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of nine fixes (and one author update).

  The libsas one should fix discovery in eSATA devices, the WRITE_SAME
  one is the largest, but it should fix a lot of problems we've been
  getting with the emulated RAID devices (they've been effectively lying
  about support and then firmware has been choking on the commands).

  The rest are various crash, hang or warn driver fixes"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] bfa: Fix crash when symb name set for offline vport
  [SCSI] enclosure: fix WARN_ON in dual path device removing
  [SCSI] pm80xx: Tasklets synchronization fix.
  [SCSI] pm80xx: Resetting the phy state.
  [SCSI] pm80xx: Fix for direct attached device.
  [SCSI] pm80xx: Module author addition
  [SCSI] hpsa: return 0 from driver probe function on success, not 1
  [SCSI] hpsa: do not discard scsi status on aborted commands
  [SCSI] Disable WRITE SAME for RAID and virtual host adapter drivers
  [SCSI] libsas: fix usage of ata_tf_to_fis
2013-12-06 08:30:18 -08:00
Linus Torvalds 470abdcfda Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull IMA fixes from James Morris:
 "Here are two more fixes for IMA"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  ima: properly free ima_template_entry structures
  ima: Do not free 'entry' before it is initialized
2013-12-06 08:28:35 -08:00
Linus Torvalds 24cb412041 - Various DT binding documentation updates.
- Add Kumar Gala and remove Stephen Warren as DT binding maintainers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSnXWIAAoJEMhvYp4jgsXi5DwH/jAgH9cU3dAg97GtPgZsEBKS
 nzkFudH++oMaE/erA+vYgzwuSqaEO1lOtj+ficF1Q1kmOgJJysE8urbmNq9Mw4qP
 HEtChfijzRDEmLuD73yY/cvHKqNBQkzTWuTYqCLne23nmHDXImSIIUDxJdpYkErs
 DlBvclhWLYc6BLlGOAUI1+81+6T8W/6BuGI1q84edAcAb2g5UQly/BFW4lTl9UfK
 SZuPcxrGEsrU7pimAVqpNIlGn8r3HJzRrpSn9V/ZbjiyYveKjxjDz9HEXsOkV329
 hmSdgdT9MK7gmaP36sSEpWQtXhOrL3QhV7qPCLvZfZcyJ2IYyyycH5OVVVTuAZ0=
 =aZ5t
 -----END PGP SIGNATURE-----

Merge tag 'dt-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:
 - Various DT binding documentation updates
 - Add Kumar Gala and remove Stephen Warren as DT binding maintainers

* tag 'dt-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt: binding: reword PowerPC 8xxx GPIO documentation
  ARM: tegra: delete nvidia,tegra20-spi.txt binding
  hwmon: ntc_thermistor: Fix typo (pullup-uV -> pullup-uv)
  of: add vendor prefix for GMT
  clk: exynos: Fix typos in DT bindings documentation
  of: Add vendor prefix for LG Corporation
  Documentation: net: fsl-fec.txt: Add phy-supply entry
  ARM: dts: doc: Document missing binding for omap5-mpu
  dt-bindings: add ARMv8 PMU binding
  MAINTAINERS: remove swarren from DT bindings
  MAINTAINERS: Add Kumar to Device Tree Binding maintainers group
2013-12-06 08:27:47 -08:00
Gu Zheng d1b9432712 aio: clean up aio ring in the fail path
Clean up the aio ring file in the fail path of aio_setup_ring
and ioctx_alloc. And maybe it can fix the GPF issue reported by
Dave Jones:
https://lkml.org/lkml/2013/11/25/898

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2013-12-06 10:22:55 -05:00
James Morris bfb26328b9 Merge branch 'free-memory' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into for-linus 2013-12-07 01:21:02 +11:00
Linus Torvalds 002acf1fc1 Power management fixes for 3.13-rc3
- cpufreq regression fix from Bjørn Mork restoring the pre-3.12
    behavior of the framework during system suspend/hibernation to
    avoid garbage sysfs files from being left behind in case of a
    suspend error.
 
  - PNP regression fix to restore the correct states of devices after
    resume from hibernation broken in 3.12.  From Dmitry Torokhov.
 
  - cpuidle fix to prevent cpuidle device unregistration from crashing
    due to a NULL pointer dereference if cpuidle has been disabled
    from the kernel command line.  From Konrad Rzeszutek Wilk.
 
  - intel_idle fix for the C6 state definition on Intel Avoton/Rangeley
    processors from Arne Bockholdt.
 
  - Power capping framework fix to make the energy_uj sysfs attribute
    work in accordance with the documentation.  From Srinivas Pandruvada.
 
  - epoll fix to make it ignore the EPOLLWAKEUP flag if the kernel has
    been compiled with CONFIG_PM_SLEEP unset (in which case that flag
    should not have any effect).  From Amit Pundir.
 
  - cpufreq fix to prevent governor sysfs files from being lost over
    system suspend/resume in some (arguably unusual) situations.  From
    Viresh Kumar.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJSoTE5AAoJEILEb/54YlRxh/sP/jFZhLTc8g4MC3XzhguuROzT
 u0Pu9VJVfACqz8LyiCOtfOvvb2EPV7VSq7qYqszL5y9Dn5gwIHvQMMBK2PjZ4cc2
 MtSiw02Bk/DEESXYOjt++n5ja/0lc05CtJTlb3uoJXBOCqp3cMrvW7+QqnLbEfbG
 S+TcPFBr+4Owt/J7r2Z2JBYGZ6NbVol/x1hAFjiM+rBan6UGw7uNcg2LgQrVHcs4
 S1Cm6lsJTwRcSiswvJv9/C+ML9Z/1gYYUyu7ijQnGdbNUolyzHY6AxLWZdnSkAQO
 s8JVDRKy9+V44LtnWSENnJNftjlOoXWcZRJxvDePyM3dVpxESBa8Z/AxYWwCcmcB
 e4rsgm/WOF86DMhRu+gfeTF+1OkU7KhuPhbXskbw+JDcZKCui2FP/xti6IAaTsU/
 9M30/VeOpD1UBqckLnDTGcsFif7hVZ9LOHH5wK8OctjyaTMfUYtPd7WxfTQCpcSc
 1M0NQapwfXHASmPmMW4SszAaeduecUdgXU1epOPx0EpOMQhvuLeENJBgVC5uu1cA
 KAQ7suOx9ReS3slso8lpTlTEw2rsDPRuiHcF3hv7YpklNXV2jvtxsl4upHT5VN2t
 3L59Unq8vY2lt3SdzWMypaAquphc3Te5woYoFwgsSPfw40Kr+jg+oUtO6IHqM/ga
 OATUkTffzp+Rp3pg036Y
 =gHBV
 -----END PGP SIGNATURE-----

Merge tag 'pm-3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:

 - cpufreq regression fix from Bjørn Mork restoring the pre-3.12
   behavior of the framework during system suspend/hibernation to avoid
   garbage sysfs files from being left behind in case of a suspend error

 - PNP regression fix to restore the correct states of devices after
   resume from hibernation broken in 3.12.  From Dmitry Torokhov.

 - cpuidle fix to prevent cpuidle device unregistration from crashing
   due to a NULL pointer dereference if cpuidle has been disabled from
   the kernel command line.  From Konrad Rzeszutek Wilk.

 - intel_idle fix for the C6 state definition on Intel Avoton/Rangeley
   processors from Arne Bockholdt.

 - Power capping framework fix to make the energy_uj sysfs attribute
   work in accordance with the documentation.  From Srinivas Pandruvada.

 - epoll fix to make it ignore the EPOLLWAKEUP flag if the kernel has
   been compiled with CONFIG_PM_SLEEP unset (in which case that flag
   should not have any effect).  From Amit Pundir.

 - cpufreq fix to prevent governor sysfs files from being lost over
   system suspend/resume in some (arguably unusual) situations.  From
   Viresh Kumar.

* tag 'pm-3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PowerCap: Fix mode for energy counter
  PNP: fix restoring devices after hibernation
  cpuidle: Check for dev before deregistering it.
  epoll: drop EPOLLWAKEUP if PM_SLEEP is disabled
  cpufreq: fix garbage kobjects on errors during suspend/resume
  cpufreq: suspend governors on system suspend/hibernate
  intel_idle: Fixed C6 state on Avoton/Rangeley processors
2013-12-05 18:26:40 -08:00
Rafael J. Wysocki 8e7030097e Merge branches 'pm-epoll', 'pnp' and 'powercap'
* pm-epoll:
  epoll: drop EPOLLWAKEUP if PM_SLEEP is disabled

* pnp:
  PNP: fix restoring devices after hibernation

* powercap:
  PowerCap: Fix mode for energy counter
2013-12-06 02:18:28 +01:00
Rafael J. Wysocki 7cdcec991c Merge branches 'pm-cpuidle' and 'pm-cpufreq'
* pm-cpuidle:
  cpuidle: Check for dev before deregistering it.
  intel_idle: Fixed C6 state on Avoton/Rangeley processors

* pm-cpufreq:
  cpufreq: fix garbage kobjects on errors during suspend/resume
  cpufreq: suspend governors on system suspend/hibernate
2013-12-06 02:17:59 +01:00
Linus Torvalds b52b342d31 Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull arch/tile ftrace bug fix from Chris Metcalf:
 "This fixes a build failure with allyesconfig reported by Fengguang Wu
  and fixed by Tony Lu"

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  ftrace: default to tilegx if ARCH=tile is specified
2013-12-05 15:37:44 -08:00
Linus Torvalds 5ee540613d Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "A small collection of fixes for the current series. It contains:

   - A fix for a use-after-free of a request in blk-mq.  From Ming Lei

   - A fix for a blk-mq bug that could attempt to dereference a NULL rq
     if allocation failed

   - Two xen-blkfront small fixes

   - Cleanup of submit_bio_wait() type uses in the kernel, unifying
     that.  From Kent

   - A fix for 32-bit blkg_rwstat reading.  I apologize for this one
     looking mangled in the shortlog, it's entirely my fault for missing
     an empty line between the description and body of the text"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: fix use-after-free of request
  blk-mq: fix dereference of rq->mq_ctx if allocation fails
  block: xen-blkfront: Fix possible NULL ptr dereference
  xen-blkfront: Silence pfn maybe-uninitialized warning
  block: submit_bio_wait() conversions
  Update of blkg_stat and blkg_rwstat may happen in bh context
2013-12-05 15:33:27 -08:00
Linus Torvalds 29be6345bb NFS client bugfixes
- Stable fix for a NFSv4.1 delegation and state recovery deadlock
 - Stable fix for a loop on irrecoverable errors when returning delegations
 - Fix a 3-way deadlock between layoutreturn, open, and state recovery
 - Update the MAINTAINERS file with contact information for Trond Myklebust
 - Close needs to handle NFS4ERR_ADMIN_REVOKED
 - Enabling v4.2 should not recompile nfsd and lockd
 - Fix a couple of compile warnings
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSoLTpAAoJEGcL54qWCgDy2dgQAIKkKAXccg3OG2b1SxJmiaja
 PcrovNmgg3HvYQ7clUMqtrMByiXEpSybl6tAeXYUWE3sS1DISSBVEwO3MoOiASiM
 951Ssx+CoyhsHYo5aH83sUIiWFl/YsRhpKmSr2cdQd13DQTFbPq896k64Inf6L2/
 9fngoqOD7FunQHn8AiVPoDOQzObB0OuKhYCwuwLt47oPiwgmm12JQNCDxU1i4sxb
 lkGUBLkPMs6D5IyI8XHaMyX3+8MvmPiIsjIKaNJRdhkuX/k7ollucTJXyvyEQKK0
 PhBIWyUULmKcAXYwCfHf9UoyGZFvmj47YggyKcBd26OZUEFekcWrULfym46F1xak
 EcO6D4mlTy5i5W0RBqYCj1oGud57rixZBmhLTbeq6sSJaiqBfGEs225Q17H7rsEB
 YIghHiEFNnBmVWELhHxbJHQoY6HOugmZOuc0dxopaikN/7to8gnYoVyTIVlMfe/t
 UNXZoer6GOOohJGtZ7s7v4Al7EzvwnVnBCBklEAKFJ7Ca2LEmq+b58oQW3nJ1mPn
 y4TnihxYXsSEbqy+Lds9rumRhJLG1oVTpwficAm7N3HdK3abzCIPEt6iOHoCmXQz
 J1B4gmwOKsDqVlCSpBsnc3ZiBlSJGOn6MmVQUCNFpzv/DetWn/BxEUPE8cNm8DaI
 WioD0grC0/9bR8oD1m+w
 =UZ51
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Stable fix for a NFSv4.1 delegation and state recovery deadlock
 - Stable fix for a loop on irrecoverable errors when returning
   delegations
 - Fix a 3-way deadlock between layoutreturn, open, and state recovery
 - Update the MAINTAINERS file with contact information for Trond
   Myklebust
 - Close needs to handle NFS4ERR_ADMIN_REVOKED
 - Enabling v4.2 should not recompile nfsd and lockd
 - Fix a couple of compile warnings

* tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: fix do_div() warning by instead using sector_div()
  MAINTAINERS: Update contact information for Trond Myklebust
  NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery
  SUNRPC: do not fail gss proc NULL calls with EACCES
  NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKED
  NFSv4: Update list of irrecoverable errors on DELEGRETURN
  NFSv4 wait on recovery for async session errors
  NFS: Fix a warning in nfs_setsecurity
  NFS: Enabling v4.2 should not recompile nfsd and lockd
2013-12-05 13:05:48 -08:00
Tony Lu 2d8eedad92 ftrace: default to tilegx if ARCH=tile is specified
This matches the existing behavior in arch/tile/Makefile for defconfig.

Reported-by: fengguang.wu@intel.com
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Tony Lu <zlu@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-12-05 15:59:26 -05:00
Steven Rostedt 3ccb012392 tracing: Only run synchronize_sched() at instance deletion time
It has been reported that boot up with FTRACE_SELFTEST enabled can take a
very long time. There can be stalls of over a minute.

This was tracked down to the synchronize_sched() called when a system call
event is disabled. As the self tests enable and disable thousands of events,
this makes the synchronize_sched() get called thousands of times.

The synchornize_sched() was added with d562aff93b "tracing: Add support
for SOFT_DISABLE to syscall events" which caused this regression (added
in 3.13-rc1).

The synchronize_sched() is to protect against the events being accessed
when a tracer instance is being deleted. When an instance is being deleted
all the events associated to it are unregistered. The synchronize_sched()
makes sure that no more users are running when it finishes.

Instead of calling synchronize_sched() for all syscall events, we only
need to call it once, after the events are unregistered and before the
instance is deleted. The event_mutex is held during this action to
prevent new users from enabling events.

Link: http://lkml.kernel.org/r/20131203124120.427b9661@gandalf.local.home

Reported-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Petr Mladek <pmladek@suse.cz>
Tested-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-05 14:22:30 -05:00