Commit Graph

1410 Commits

Author SHA1 Message Date
Kent Overstreet f73a1c7d11 block: Add bio_end_sector()
Just a little convenience macro - main reason to add it now is preparing
for immutable bio vecs, it'll reduce the size of the patch that puts
bi_sector/bi_size/bi_idx into a struct bvec_iter.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Neil Brown <neilb@suse.de>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: linux-s390@vger.kernel.org
CC: Chris Mason <chris.mason@fusionio.com>
CC: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2013-03-23 14:15:29 -07:00
Eric W. Biederman 7f78e03513 fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03 19:36:31 -08:00
Linus Torvalds d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Zhao Hongjiang 4173581876 fs: change return values from -EACCES to -EPERM
According to SUSv3:

[EACCES] Permission denied. An attempt was made to access a file in a way
forbidden by its file access permissions.

[EPERM] Operation not permitted. An attempt was made to perform an operation
limited to processes with appropriate privileges or to the owner of a file
or other resource.

So -EPERM should be returned if capability checks fails.

Strictly speaking this is an API change since the error code user sees is
altered.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:14 -05:00
Namjae Jeon 94e07a7590 fs: encode_fh: return FILEID_INVALID if invalid fid_type
This patch is a follow up on below patch:

[PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type
commit: 216b6cbdcb

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Sage Weil <sage@inktank.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:10 -05:00
Linus Torvalds 94f2f14234 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
 "This set of changes starts with a few small enhnacements to the user
  namespace.  reboot support, allowing more arbitrary mappings, and
  support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
  user namespace root.

  I do my best to document that if you care about limiting your
  unprivileged users that when you have the user namespace support
  enabled you will need to enable memory control groups.

  There is a minor bug fix to prevent overflowing the stack if someone
  creates way too many user namespaces.

  The bulk of the changes are a continuation of the kuid/kgid push down
  work through the filesystems.  These changes make using uids and gids
  typesafe which ensures that these filesystems are safe to use when
  multiple user namespaces are in use.  The filesystems converted for
  3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The
  changes for these filesystems were a little more involved so I split
  the changes into smaller hopefully obviously correct changes.

  XFS is the only filesystem that remains.  I was hoping I could get
  that in this release so that user namespace support would be enabled
  with an allyesconfig or an allmodconfig but it looks like the xfs
  changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
  cifs: Enable building with user namespaces enabled.
  cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
  cifs: Convert struct cifs_sb_info to use kuids and kgids
  cifs: Modify struct smb_vol to use kuids and kgids
  cifs: Convert struct cifsFileInfo to use a kuid
  cifs: Convert struct cifs_fattr to use kuid and kgids
  cifs: Convert struct tcon_link to use a kuid.
  cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
  cifs: Convert from a kuid before printing current_fsuid
  cifs: Use kuids and kgids SID to uid/gid mapping
  cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
  cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
  cifs: Override unmappable incoming uids and gids
  nfsd: Enable building with user namespaces enabled.
  nfsd: Properly compare and initialize kuids and kgids
  nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
  nfsd: Modify nfsd4_cb_sec to use kuids and kgids
  nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
  nfsd: Convert nfsxdr to use kuids and kgids
  nfsd: Convert nfs3xdr to use kuids and kgids
  ...
2013-02-25 16:00:49 -08:00
Al Viro 496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Darrick J. Wong 1d1d1a7672 mm: only enforce stable page writes if the backing device requires it
Create a helper function to check if a backing device requires stable
page writes and, if so, performs the necessary wait.  Then, make it so
that all points in the memory manager that handle making pages writable
use the helper function.  This should provide stable page write support
to most filesystems, while eliminating unnecessary waiting for devices
that don't require the feature.

Before this patchset, all filesystems would block, regardless of whether
or not it was necessary.  ext3 would wait, but still generate occasional
checksum errors.  The network filesystems were left to do their own
thing, so they'd wait too.

After this patchset, all the disk filesystems except ext3 and btrfs will
wait only if the hardware requires it.  ext3 (if necessary) snapshots
pages instead of blocking, and btrfs provides its own bdi so the mm will
never wait.  Network filesystems haven't been touched, so either they
provide their own stable page guarantees or they don't block at all.
The blocking behavior is back to what it was before 3.0 if you don't
have a disk requiring stable page writes.

Here's the result of using dbench to test latency on ext2:

3.8.0-rc3:
 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 WriteX        109347     0.028    59.817
 ReadX         347180     0.004     3.391
 Flush          15514    29.828   287.283

Throughput 57.429 MB/sec  4 clients  4 procs  max_latency=287.290 ms

3.8.0-rc3 + patches:
 WriteX        105556     0.029     4.273
 ReadX         335004     0.005     4.112
 Flush          14982    30.540   298.634

Throughput 55.4496 MB/sec  4 clients  4 procs  max_latency=298.650 ms

As you can see, the maximum write latency drops considerably with this
patch enabled.  The other filesystems (ext3/ext4/xfs/btrfs) behave
similarly, but see the cover letter for those results.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21 17:22:19 -08:00
Eric W. Biederman d054642642 gfs2: Convert uids and gids between dinodes and vfs inodes.
When reading dinodes from the disk convert uids and gids
into kuids and kgids to store in vfs data structures.

When writing to dinodes to the disk convert kuids and kgids
in the in memory structures into plain uids and gids.

For now all on disk data structures are assumed to be
stored in the initial user namespace.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:11 -08:00
Eric W. Biederman 6b24c0d279 gfs2: Use uid_eq and gid_eq where appropriate
Where kuid_t values are compared use uid_eq and where kgid_t values
are compared use gid_eq.  This is unfortunately necessary because
of the type safety that keeps someone from accidentally mixing
kuids and kgids with other types.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:10 -08:00
Eric W. Biederman 7c06b5d672 gfs2: Use kuid_t and kgid_t types where appropriate.
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:09 -08:00
Eric W. Biederman 236c64e4b7 gfs2: Remove the QUOTA_USER and QUOTA_GROUP defines
Remove the QUOTA_USER and QUOTA_GRUP defines.  Remove
the last vestigal users of QUOTA_USER and QUOTA_GROUP.

Now that struct kqid is used throughout the gfs2 quota
code the need there is to use QUOTA_USER and QUOTA_GROUP
and the defines are just extraneous and confusing.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:08 -08:00
Eric W. Biederman 05e0a60d80 gfs2: Store qd_id in struct gfs2_quota_data as a struct kqid
- Change qd_id in struct gfs2_qutoa_data to struct kqid.
- Remove the now unnecessary QDF_USER bit field in qd_flags.
- Propopoage this change through the code generally making
  things simpler along the way.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:07 -08:00
Eric W. Biederman ed87dabcc3 gfs2: Convert gfs2_quota_refresh to take a kqid
- In quota_refresh_user_store convert the user supplied uid
  into a kqid and pass it to gfs2_quota_refresh.

- In quota_refresh_group_store convert the user supplied gid
  into a kqid and pass it to gfs2_quota_refresh.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:06 -08:00
Eric W. Biederman b59c8b6f9d gfs2: Modify qdsb_get to take a struct kqid
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:05 -08:00
Eric W. Biederman e08d8d7f20 gfs2: Modify struct gfs2_quota_change_host to use struct kqid
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:04 -08:00
Eric W. Biederman 2f6c9896f7 gfs2: Introduce qd2index
Both qd_alloc and qd2offset perform the exact same computation
to get an index from a gfs2_quota_data.   Make life a little
simpler and factor out this index computation.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:03 -08:00
Eric W. Biederman 558e85289f gfs2: Report quotas in the caller's user namespace.
When a quota is queried return the uid or the gid in the mapped into
the caller's user namespace.  In addition perform the munged version
of the mapping so that instead of -1 a value that does not map is
reported as the overflowuid or the overflowgid.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:02 -08:00
Eric W. Biederman f4108a607f gfs2: Split NO_QUOTA_CHANGE inot NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE
Split NO_QUOTA_CHANGE into NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE
so the constants may be well typed.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:01 -08:00
Eric W. Biederman 393551e989 gfs2: Remove improper checks in gfs2_set_dqblk.
In set_dqblk it is an error to look at fdq->d_id or fdq->d_flags.
Userspace quota applications do not set these fields when calling
quotactl(Q_XSETQLIM,...), and the kernel does not set those fields
when quota_setquota calls set_dqblk.

gfs2 never looks at fdq->d_id or fdq->d_flags after checking
to see if they match the id and type supplied to set_dqblk.

No other linux filesystem in set_dqblk looks at either fdq->d_id
or fdq->d_flags.

Therefore remove these bogus checks from gfs2 and allow normal
quota setting applications to work.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:00 -08:00
Steven Whitehouse fd95e81cb1 GFS2: Reinstate withdraw ack system
This patch reinstates the ack system which withdraw should be using. It
appears to have been accidentally forgotten when the lock module was
merged into GFS2, due to two different sysfs files having the same name.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-02-13 12:21:40 +00:00
Bob Peterson d2b47cfb26 GFS2: Get a block reservation before resizing a file
This patch allocates a block reservation structure before growing
or shrinking a file. Without this structure, the grow or shink code
can reference the bad pointer.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-02-01 20:37:33 +00:00
Steven Whitehouse 4506a519f2 GFS2: Split glock lru processing into two parts
The intent here is to split the processing of the glock lru
list into two parts, so that the selection of glocks and the
disposal are separate functions. The plan is then, that further
updates can then be made to these functions in the future
to improve the selection of glocks and also the efficiency of
glock disposal.

The new feature which this patch brings is sorting the
glocks to be disposed of into glock number (and thus also
disk block number) order. Not all glocks will need i/o in
order to dispose of them, but some will, and at least we'll
generate mostly disk block order i/o now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-02-01 20:36:03 +00:00
Steven Whitehouse 4513899092 GFS2: Use ->writepages for ordered writes
Instead of using a list of buffers to write ahead of the journal
flush, this now uses a list of inodes and calls ->writepages
via filemap_fdatawrite() in order to achieve the same thing. For
most use cases this results in a shorter ordered write list,
as well as much larger i/os being issued.

The ordered write list is sorted by inode number before writing
in order to retain the disk block ordering between inodes as
per the previous code.

The previous ordered write code used to conflict in its assumptions
about how to write out the disk blocks with mpage_writepages()
so that with this updated version we can also use mpage_writepages()
for GFS2's ordered write, writepages implementation. So we will
also send larger i/os from writeback too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:29:17 +00:00
Steven Whitehouse d564053f07 GFS2: Clean up freeze code
The freeze code has not been looked at a lot recently. Upstream has
moved on, and this is an attempt to catch us back up again. There
is a vfs level interface for the freeze code which can be called
from our (obsolete, but kept for backward compatibility purposes)
sysfs freeze interface. This means freezing this way vs. doing it
from the ioctl should now work in identical fashion.

As a result of this, the freeze function is only called once
and we can drop our own special purpose code for counting the
number of freezes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:29:05 +00:00
Steven Whitehouse c76c4d96bd GFS2: Merge gfs2_attach_bufdata() into trans.c
The locking in gfs2_attach_bufdata() was type specific (data/meta)
which made the function rather confusing. This patch moves the core
of gfs2_attach_bufdata() into trans.c renaming it gfs2_alloc_bufdata()
and moving the locking into gfs2_trans_add_data()/gfs2_trans_add_meta()

As a result all of the locking related to adding data and metadata to
the journal is now in these two functions. This should help to clarify
what is going on, and give us some opportunities to simplify in
some cases.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:28:44 +00:00
Steven Whitehouse 767f433f34 GFS2: Copy gfs2_trans_add_bh into new data/meta functions
This patch copies the body of gfs2_trans_add_bh into the two newly
added gfs2_trans_add_data and gfs2_trans_add_meta functions. We can
then move the .lo_add functions from lops.c into trans.c and call
them directly.

As a result of this, we no longer need to use the .lo_add functions
at all, so that is removed from the log operations structure.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:28:28 +00:00
Steven Whitehouse 350a9b0a72 GFS2: Split gfs2_trans_add_bh() into two
There is little common content in gfs2_trans_add_bh() between the data
and meta classes by the time that the functions which it calls are
taken into account. The intent here is to split this into two
separate functions. Stage one is to introduce gfs2_trans_add_data()
and gfs2_trans_add_meta() and update the callers accordingly.

Later patches will then pull in the content of gfs2_trans_add_bh()
and its dependent functions in order to clean up the code in this
area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:28:04 +00:00
Steven Whitehouse 75f2b879ae GFS2: Merge revoke adding functions
This moves the lo_add function for revokes into trans.c, removing
a function call and making the code easier to read.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:27:46 +00:00
Steven Whitehouse 2a00585593 GFS2: Separate LRU scanning from shrinker
This breaks out the LRU scanning function from the shrinker in
preparation for adding other callers to the LRU scanner.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:27:28 +00:00
David Teigland d4e0bfec9b GFS2: fix skip unlock condition
The recent commit fb6791d100
included the wrong logic.  The lvbptr check was incorrectly
added after the patch was tested.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-28 09:49:15 +00:00
Bob Peterson 13d2eb0129 GFS2: Reset rd_last_alloc when it reaches the end of the rgrp
In function rg_mblk_search, it's searching for multiple blocks in
a given state (e.g. "free"). If there's an active block reservation
its goal is the next free block of that. If the resource group
contains the dinode's goal block, that's used for the search. But
if neither is the case, it uses the rgrp's last allocated block.
That way, consecutive allocations appear after one another on media.
The problem comes in when you hit the end of the rgrp; it would never
start over and search from the beginning. This became a problem,
since if you deleted all the files and data from the rgrp, it would
never start over and find free blocks. So it had to keep searching
further out on the media to allocate blocks. This patch resets the
rd_last_alloc after it does an unsuccessful search at the end of
the rgrp.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-02 10:05:27 +00:00
Bob Peterson 15bd50ad82 GFS2: Stop looking for free blocks at end of rgrp
This patch adds a return code check after calling function
gfs2_rbm_from_block while determining the free extent size.
That way, when the end of an rgrp is reached, it won't try
to process unaligned blocks after the end.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-02 10:05:10 +00:00
Abhijith Das f1213cacc7 GFS2: Fix race in gfs2_rs_alloc
QE aio tests uncovered a race condition in gfs2_rs_alloc where it's possible
to come out of the function with a valid ip->i_res allocation but it gets
freed before use resulting in a NULL ptr dereference.

This patch envelopes the initial short-circuit check for non-NULL ip->i_res
into the mutex lock. With this patch, I was able to successfully run the
reproducer test multiple times.

Resolves: rhbz#878476
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-02 10:04:53 +00:00
Nathan Straz ec1487528b GFS2: Initialize hex string to '0'
When generating the DLM lock name, a value of 0 would skip
the loop and leave the string unchanged.  This left locks with
a value of 0 unlabeled.  Initializing the string to '0' fixes this.

Signed-off-by: Nathan Straz <nstraz@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-02 10:04:00 +00:00
Andrew Morton 965c8e59cf lseek: the "whence" argument is called "whence"
But the kernel decided to call it "origin" instead.  Fix most of the
sites.

Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:12 -08:00
Linus Torvalds 08242bc221 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull GFS2 updates from Steven Whitehouse:
 "The main feature this time is the new Orlov allocator and the patches
  leading up to it which allow us to allocate new inodes from their own
  allocation context, rather than borrowing that of their parent
  directory.  It is this change which then allows us to choose a
  different location for subdirectories when required.  This works
  exactly as per the ext3 implementation from the users point of view.

  In addition to that, we've got a speed up in gfs2_rbm_from_block()
  from Bob Peterson, three locking related improvements from Dave
  Teigland plus a selection of smaller bug fixes and clean ups."

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: Set gl_object during inode create
  GFS2: add error check while allocating new inodes
  GFS2: don't reference inode's glock during block allocation trace
  GFS2: remove redundant lvb pointer
  GFS2: only use lvb on glocks that need it
  GFS2: skip dlm_unlock calls in unmount
  GFS2: Fix one RG corner case
  GFS2: Eliminate redundant buffer_head manipulation in gfs2_unlink_inode
  GFS2: Use dirty_inode in gfs2_dir_add
  GFS2: Fix truncation of journaled data files
  GFS2: Add Orlov allocator
  GFS2: Use proper allocation context for new inodes
  GFS2: Add test for resource group congestion status
  GFS2: Rename glops go_xmote_th to go_sync
  GFS2: Speed up gfs2_rbm_from_block
  GFS2: Review bug traps in glops.c
2012-12-15 12:34:21 -08:00
Rafael Aquini 252aa6f5be mm: redefine address_space.assoc_mapping
Overhaul struct address_space.assoc_mapping renaming it to
address_space.private_data and its type is redefined to void*.  By this
approach we consistently name the .private_* elements from struct
address_space as well as allow extended usage for address_space
association with other data structures through ->private_data.

Also, all users of old ->assoc_mapping element are converted to reflect
its new name and type change (->private_data).

Signed-off-by: Rafael Aquini <aquini@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:26 -08:00
Bob Peterson 1e2d9d44f3 GFS2: Set gl_object during inode create
This patch fixes a cluster coherency problem that occurs when one
node creates a file, does several writes, then a different node
tries to write to the same file. When the inode's glock is demoted,
the inode wasn't synced to the media properly because the gl_object
wasn't set. Later, the flush daemon noticed the uncommitted data
and tried to flush it, only to discover the glock was no longer locked
properly in exclusive mode. That caused an assert withdraw.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-21 14:49:21 +00:00
Bob Peterson be4f245dbb GFS2: add error check while allocating new inodes
This patch adds a return code check after attempting to allocate
a new inode during dinode creation.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-16 14:26:57 +00:00
Bob Peterson b7804161a3 GFS2: don't reference inode's glock during block allocation trace
This patch changes the block allocation trace so that it references
the rgd's glock rather than the inode's glock. Now that the order
of inode creation is switched, this prevents a reference to the
glock which may not be set yet.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-16 14:21:48 +00:00
David Teigland 4e2f8849de GFS2: remove redundant lvb pointer
The lksb struct already contains a pointer to the lvb,
so another directly from the glock struct is not needed.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-15 10:17:22 +00:00
David Teigland dba2d70c5d GFS2: only use lvb on glocks that need it
Save the effort of allocating, reading and writing
the lvb for most glocks that do not use it.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-15 10:16:59 +00:00
David Teigland fb6791d100 GFS2: skip dlm_unlock calls in unmount
When unmounting, gfs2 does a full dlm_unlock operation on every
cached lock.  This can create a very large amount of work and can
take a long time to complete.  However, the vast majority of these
dlm unlock operations are unnecessary because after all the unlocks
are done, gfs2 leaves the dlm lockspace, which automatically clears
the locks of the leaving node, without unlocking each one individually.
So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
remove the locks implicitly.  The one exception is when the lock's lvb is
being used.  In this case, dlm_unlock is called because it may update the
lvb of the resource.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-14 09:37:04 +00:00
Steven Whitehouse aa8920c968 GFS2: Fix one RG corner case
For filesystems with only a single resource group, we need to be careful
that the allocation loop will not land up with a NULL resource group. This
fixes a bug in a previous patch where the gfs2_rgrpd_get_next() function
was being used instead of gfs2_rgrpd_get_first()

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-13 14:50:35 +00:00
Bob Peterson 4327a9bf71 GFS2: Eliminate redundant buffer_head manipulation in gfs2_unlink_inode
Since we now have a dirty_inode that takes care of manipulating the
inode buffer and writing from the inode to the buffer, we can
eliminate some unnecessary buffer manipulations in gfs2_unlink_inode
that are now redundant.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-13 09:55:26 +00:00
Bob Peterson 343cd8f0d7 GFS2: Use dirty_inode in gfs2_dir_add
This patch changes the gfs2_dir_add function so that it uses
the dirty_inode function (via mark_inode_dirty) rather than manually
updating the dinode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-13 09:54:54 +00:00
Steven Whitehouse fa731fc4e0 GFS2: Fix truncation of journaled data files
This patch fixes an issue relating to not having enough revokes
available when truncating journaled data files. In order to ensure
that we do no run out, the truncation is broken into separate pieces
if it is large enough.

Tested using fsx on a journaled data file.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-13 09:50:28 +00:00
Steven Whitehouse 9dbe9610b9 GFS2: Add Orlov allocator
Just like ext3, this works on the root directory and any directory
with the +T flag set. Also, just like ext3, any subdirectory created
in one of the just mentioned cases will be allocated to a random
resource group (GFS2 equivalent of a block group).

If you are creating a set of directories, each of which will contain a
job running on a different node, then by setting +T on the parent
directory before creating the subdirectories, each will land up in a
different resource group, and thus resource group contention between
nodes will be kept to a minimum.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 13:33:17 +00:00
Steven Whitehouse c9aecf7371 GFS2: Use proper allocation context for new inodes
Rather than using the parent directory's allocation context, this
patch allocated the new inode earlier in the process and then uses
it to contain all the information required. As a result, we can now
use the new inode's own allocation context to allocate it rather
than having to use the parent directory's context. This give us a
lot more flexibility in where the inode is placed on disk.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 13:32:42 +00:00
Steven Whitehouse bcd97c0630 GFS2: Add test for resource group congestion status
This patch uses information gathered by the recent glock statistics
patch in order to derrive a boolean verdict on the congestion
status of a resource group. This is then used when making decisions
on which resource group to choose during block allocation.

The aim is to avoid resource groups which are heavily contended
by other nodes, while still ensuring locality of access wherever
possible.

Once a reservation has been made in a particular resource group
we continue to use that resource group until a new reservation is
required. This should help to ensure that we do not change resource
groups too often.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 13:32:21 +00:00
Bob Peterson 06dfc30641 GFS2: Rename glops go_xmote_th to go_sync
[Editorial: This is a nit, but has been a minor irritation for a long time:]

This patch renames glops structure item for go_xmote_th to go_sync.
The functionality is unchanged; it's just for readability.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 13:31:57 +00:00
Bob Peterson a68a0a352a GFS2: Speed up gfs2_rbm_from_block
This patch is a rewrite of function gfs2_rbm_from_block. Rather than
looping to find the right bitmap, the code now does a few simple
math calculations.

I compared the performance of both algorithms side by side and the new
algorithm is noticeably faster. Sample instrumentation output from a
"fast" machine:

5 million calls: millisec spent: Orig: 166 New: 113
5 million calls: millisec spent: Orig: 189 New: 114

In addition, I ran postmark (on a somewhat slowr CPU) before the after
the new algorithm was put in place and postmark showed a decent
improvement:

Before the new algorithm:
-------------------------
Time:
	645 seconds total
	584 seconds of transactions (171 per second)

Files:
	150087 created (232 per second)
		Creation alone: 100000 files (2083 per second)
		Mixed with transactions: 50087 files (85 per second)
	49995 read (85 per second)
	49991 appended (85 per second)
	150087 deleted (232 per second)
		Deletion alone: 100174 files (7705 per second)
		Mixed with transactions: 49913 files (85 per second)

Data:
	273.42 megabytes read (434.08 kilobytes per second)
	852.13 megabytes written (1.32 megabytes per second)

With the new algorithm:
-----------------------
Time:
	599 seconds total
	530 seconds of transactions (188 per second)

Files:
	150087 created (250 per second)
		Creation alone: 100000 files (1886 per second)
		Mixed with transactions: 50087 files (94 per second)
	49995 read (94 per second)
	49991 appended (94 per second)
	150087 deleted (250 per second)
		Deletion alone: 100174 files (6260 per second)
		Mixed with transactions: 49913 files (94 per second)

Data:
	273.42 megabytes read (467.42 kilobytes per second)
	852.13 megabytes written (1.42 megabytes per second)

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 13:31:36 +00:00
Steven Whitehouse 8eae1ca003 GFS2: Review bug traps in glops.c
Two of the bug traps here could really be warnings. The others are
converted from BUG() to GLOCK_BUG_ON() since we'll most likely
need to know the glock state in order to debug any issues which
arise. As a result of this, __dump_glock has to be renamed and
is no longer static.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 13:31:07 +00:00
Benjamin Marzinski 96e5d1d3ad GFS2: Test bufdata with buffer locked and gfs2_log_lock held
In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
buffer without having the gfs2_log_lock held. It was then assuming it would
stay attached for the rest of the function. However, without either the log
lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
time.  This patch moves the locking before the test.  If there isn't a bd
already attached, gfs2 can safely allocate one and attach it before locking.
There is no way that the newly allocated bd could be on the ail list,
and thus no way for __gfs2_ail_flush() to detach it.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 09:43:03 +00:00
Benjamin Marzinski 3d1626889a GFS2: Don't call file_accessed() with a shared glock
file_accessed() was being called by gfs2_mmap() with a shared glock. If it
needed to update the atime, it was crashing because it dirtied the inode in
gfs2_dirty_inode() without holding an exclusive lock. gfs2_dirty_inode()
checked if the caller was already holding a glock, but it didn't make sure that
the glock was in the exclusive state. Now, instead of calling file_accessed()
while holding the shared lock in gfs2_mmap(), file_accessed() is called after
grabbing and releasing the glock to update the inode.  If file_accessed() needs
to update the atime, it will grab an exclusive lock in gfs2_dirty_inode().

gfs2_dirty_inode() now also checks to make sure that if the calling process has
already locked the glock, it has an exclusive lock.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 09:42:49 +00:00
Lukas Czerner 076f0faa76 GFS2: Fix FITRIM argument handling
Currently implementation in gfs2 uses FITRIM arguments as it were in
file system blocks units which is wrong. The FITRIM arguments
(fstrim_range.start, fstrim_range.len and fstrim_range.minlen) are
actually in bytes.

Moreover, check for start argument beyond the end of file system, len
argument being smaller than file system block and minlen argument being
bigger than biggest resource group were missing.

This commit converts the code to convert FITRIM argument to file system
blocks and also adds appropriate checks mentioned above.

All the problems were recognised by xfstests 251 and 260.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 09:41:58 +00:00
Lukas Czerner 3a238adefb GFS2: Require user to provide argument for FITRIM
When the fstrim_range argument is not provided by user in FITRIM ioctl
we should just return EFAULT and not promoting bad behaviour by filling
the structure in kernel. Let the user deal with it.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 09:41:37 +00:00
Andrew Price 73738a77f4 GFS2: Clean up some unused assignments
Cleans up two cases where variables were assigned values but then never
used again.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 09:41:07 +00:00
Andrew Price cd0ed19fb6 GFS2: Fix possible null pointer deref in gfs2_rs_alloc
Despite the return value from kmem_cache_zalloc() being checked, the
error wasn't being returned until after a possible null pointer
dereference. This patch returns the error immediately, allowing the
removal of the error variable.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 09:40:39 +00:00
Andrew Price aaaf68c562 GFS2: Fix an unchecked error from gfs2_rs_alloc
Check the return value of gfs2_rs_alloc(ip) and avoid a possible null
pointer dereference.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 09:40:05 +00:00
Hugh Dickins 35c2a7f490 tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking
Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
	u64 inum = fid->raw[2];
which is unhelpfully reported as at the end of shmem_alloc_inode():

BUG: unable to handle kernel paging request at ffff880061cd3000
IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Call Trace:
 [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0
 [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0
 [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10
 [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6

Right, tmpfs is being stupid to access fid->raw[2] before validating that
fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
fall at the end of a page, and the next page not be present.

But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
could oops in the same way: add the missing fh_len checks to those.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Sage Weil <sage@inktank.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:55 -04:00
Konstantin Khlebnikov 0b173bc4da mm: kill vma flag VM_CAN_NONLINEAR
Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().

Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:17 +09:00
Linus Torvalds 437589a74b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "This is a mostly modest set of changes to enable basic user namespace
  support.  This allows the code to code to compile with user namespaces
  enabled and removes the assumption there is only the initial user
  namespace.  Everything is converted except for the most complex of the
  filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
  nfs, ocfs2 and xfs as those patches need a bit more review.

  The strategy is to push kuid_t and kgid_t values are far down into
  subsystems and filesystems as reasonable.  Leaving the make_kuid and
  from_kuid operations to happen at the edge of userspace, as the values
  come off the disk, and as the values come in from the network.
  Letting compile type incompatible compile errors (present when user
  namespaces are enabled) guide me to find the issues.

  The most tricky areas have been the places where we had an implicit
  union of uid and gid values and were storing them in an unsigned int.
  Those places were converted into explicit unions.  I made certain to
  handle those places with simple trivial patches.

  Out of that work I discovered we have generic interfaces for storing
  quota by projid.  I had never heard of the project identifiers before.
  Adding full user namespace support for project identifiers accounts
  for most of the code size growth in my git tree.

  Ultimately there will be work to relax privlige checks from
  "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
  root in a user names to do those things that today we only forbid to
  non-root users because it will confuse suid root applications.

  While I was pushing kuid_t and kgid_t changes deep into the audit code
  I made a few other cleanups.  I capitalized on the fact we process
  netlink messages in the context of the message sender.  I removed
  usage of NETLINK_CRED, and started directly using current->tty.

  Some of these patches have also made it into maintainer trees, with no
  problems from identical code from different trees showing up in
  linux-next.

  After reading through all of this code I feel like I might be able to
  win a game of kernel trivial pursuit."

Fix up some fairly trivial conflicts in netfilter uid/git logging code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
  userns: Convert the ufs filesystem to use kuid/kgid where appropriate
  userns: Convert the udf filesystem to use kuid/kgid where appropriate
  userns: Convert ubifs to use kuid/kgid
  userns: Convert squashfs to use kuid/kgid where appropriate
  userns: Convert reiserfs to use kuid and kgid where appropriate
  userns: Convert jfs to use kuid/kgid where appropriate
  userns: Convert jffs2 to use kuid and kgid where appropriate
  userns: Convert hpfs to use kuid and kgid where appropriate
  userns: Convert btrfs to use kuid/kgid where appropriate
  userns: Convert bfs to use kuid/kgid where appropriate
  userns: Convert affs to use kuid/kgid wherwe appropriate
  userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
  userns: On ia64 deal with current_uid and current_gid being kuid and kgid
  userns: On ppc convert current_uid from a kuid before printing.
  userns: Convert s390 getting uid and gid system calls to use kuid and kgid
  userns: Convert s390 hypfs to use kuid and kgid where appropriate
  userns: Convert binder ipc to use kuids
  userns: Teach security_path_chown to take kuids and kgids
  userns: Add user namespace support to IMA
  userns: Convert EVM to deal with kuids and kgids in it's hmac computation
  ...
2012-10-02 11:11:09 -07:00
Linus Torvalds 033d9959ed Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue changes from Tejun Heo:
 "This is workqueue updates for v3.7-rc1.  A lot of activities this
  round including considerable API and behavior cleanups.

   * delayed_work combines a timer and a work item.  The handling of the
     timer part has always been a bit clunky leading to confusing
     cancelation API with weird corner-case behaviors.  delayed_work is
     updated to use new IRQ safe timer and cancelation now works as
     expected.

   * Another deficiency of delayed_work was lack of the counterpart of
     mod_timer() which led to cancel+queue combinations or open-coded
     timer+work usages.  mod_delayed_work[_on]() are added.

     These two delayed_work changes make delayed_work provide interface
     and behave like timer which is executed with process context.

   * A work item could be executed concurrently on multiple CPUs, which
     is rather unintuitive and made flush_work() behavior confusing and
     half-broken under certain circumstances.  This problem doesn't
     exist for non-reentrant workqueues.  While non-reentrancy check
     isn't free, the overhead is incurred only when a work item bounces
     across different CPUs and even in simulated pathological scenario
     the overhead isn't too high.

     All workqueues are made non-reentrant.  This removes the
     distinction between flush_[delayed_]work() and
     flush_[delayed_]_work_sync().  The former is now as strong as the
     latter and the specified work item is guaranteed to have finished
     execution of any previous queueing on return.

   * In addition to the various bug fixes, Lai redid and simplified CPU
     hotplug handling significantly.

   * Joonsoo introduced system_highpri_wq and used it during CPU
     hotplug.

  There are two merge commits - one to pull in IRQ safe timer from
  tip/timers/core and the other to pull in CPU hotplug fixes from
  wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."

Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.

Tejun pointed out a few of them, I fixed a couple more.

* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
  workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
  workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
  workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
  workqueue: remove @delayed from cwq_dec_nr_in_flight()
  workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
  workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
  workqueue: use __cpuinit instead of __devinit for cpu callbacks
  workqueue: rename manager_mutex to assoc_mutex
  workqueue: WORKER_REBIND is no longer necessary for idle rebinding
  workqueue: WORKER_REBIND is no longer necessary for busy rebinding
  workqueue: reimplement idle worker rebinding
  workqueue: deprecate __cancel_delayed_work()
  workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
  workqueue: use mod_delayed_work() instead of __cancel + queue
  workqueue: use irqsafe timer for delayed_work
  workqueue: clean up delayed_work initializers and add missing one
  workqueue: make deferrable delayed_work initializer names consistent
  workqueue: cosmetic whitespace updates for macro definitions
  workqueue: deprecate system_nrt[_freezable]_wq
  workqueue: deprecate flush[_delayed]_work_sync()
  ...
2012-10-02 09:54:49 -07:00
Benjamin Marzinski 2216db70c9 GFS2: Write out dirty inode metadata in delayed deletes
If a dirty GFS2 inode was being deleted but was in use by another node, its
metadata was not getting written out before GFS2 checked for dirty buffers in
gfs2_ail_flush().  GFS2 was relying on inode_go_sync() to write out the
metadata when the other node tried to free the file, but it failed the error
check before it got that far. This patch writes out the metadata before calling
gfs2_ail_flush()

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:30 +01:00
Eric Sandeen a0b4df2943 GFS2: fix s_writers.counter imbalance in gfs2_ail_empty_gl
gfs2_ail_empty_gl() contains an "inline version" of gfs2_trans_begin(),
so it needs an explicit sb_start_intwrite() as well, to balance the
sb_end_intwrite() which will be called by gfs2_trans_end().

With this, xfstest 068 passes on lock_nolock local gfs2.
Without it, we reach a writer count of -1 and get stuck.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:29 +01:00
Bob Peterson 3701530aed GFS2: Fix infinite loop in rbm_find
This patch fixes an infinite loop in gfs2_rbm_find that was introduced
by the previous patch. The problem occurred when the length was less
than 3 but the rbm block was byte-aligned, causing it to improperly
return a extent length of zero, which caused it to spin.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Tested-by: Barry Marson <bmarson@redhat.com>
2012-09-24 10:47:27 +01:00
Steven Whitehouse ff7f4cb461 GFS2: Consolidate free block searching functions
With the recently added block reservation code, an additional function
was added to search for free blocks. This had a restriction of only being
able to search for aligned extents of free blocks. As a result the
allocation patterns when reserving blocks were suboptimal when the
existing allocation of blocks for an inode was not aligned to the same
boundary.

This patch resolves that problem by adding the ability for gfs2_rbm_find
to search for extents of a particular minimum size. We can then use
gfs2_rbm_find for both looking for reservations, and also looking for
free blocks on an individual basis when we actually come to do the
allocation later on. As a result we only need a single set of code
to deal with both situations.

The function gfs2_rbm_from_block() is moved up rgrp.c so that it
occurs before all of its callers.

Many thanks are due to Bob for helping track down the final issue in
this patch. That fix to the rb_tree traversal and to not share
block reservations from a dirctory to its children is included here.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2012-09-24 10:47:26 +01:00
Jan Kara 56aa72d0fc GFS2: Get rid of I_MUTEX_QUOTA usage
GFS2 uses i_mutex on its system quota inode to synchronize writes to
quota file. Since this is an internal inode to GFS2 (not part of directory
hiearchy or visible by user) we are safe to define locking rules for it. So
let's just get it its own locking class to make it clear.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:24 +01:00
Bob Peterson 0688a5ecea GFS2: Stop block extents at the end of bitmaps
This patch stops multiple block allocations if a nonzero
return code is received from gfs2_rbm_from_block. Without
this patch, if enough pressure is put on the file system,
you get a kernel warning quickly followed by:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa04f47e8>] gfs2_alloc_blocks+0x2c8/0x880 [gfs2]
With this patch, things run normally.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:23 +01:00
Steven Whitehouse c743ffd09f GFS2: Fix unclaimed_blocks() wrapping bug and clean up
When rgd->rd_free_clone is less than rgd->rd_reserved, the
unclaimed_blocks() calculation would wrap and produce
incorrect results. This patch checks for this condition
when this function is called from gfs2_mblk_search()

In addition, the use of this particular function in other
places in the code has been dropped by means of a general
clean up of gfs2_inplace_reserve(). This function is now
much easier to follow.

Also the setting of the rgd->rd_last_alloc field is corrected.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:21 +01:00
Steven Whitehouse 9e733d3923 GFS2: Improve block reservation tracing
This patch improves the tracing of block reservations by
removing some corner cases and also providing more useful
detail in the traces.

A new field is added to the reservation structure to contain
the inode number. This is used since in certain contexts it is
not possible to access the inode itself to obtain this information.
As a result we can then display the inode number for all tracepoints
and also in case we dump the resource group.

The "del" tracepoint operation has been removed. This could be called
with the reservation rgrp set to NULL. That resulted in not printing
the device number, and thus making the information largely useless
anyway. Also, the conditional on the rgrp being NULL can then be
removed from the tracepoint. After this change, all the block
reservation tracepoint calls will be called with the rgrp information.

The existing ins,clm and tdel calls to the block reservation tracepoint
are sufficient to track the entire life of the block reservation.

In gfs2_block_alloc() the error detection is updated to print out
the inode number of the problematic inode. This can then be compared
against the information in the glock dump,tracepoints, etc.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:20 +01:00
Steven Whitehouse 137834a696 GFS2: Fall back to ignoring reservations, if there are no other blocks left
When we get to the stage of allocating blocks, we know that the
resource group in question must contain enough free blocks, otherwise
gfs2_inplace_reserve() would have failed. So if we are left with only
free blocks which are reserved, then we must use those. This can happen
if another node has sneeked in and use some blocks reserved on this
node, for example. Generally this will happen very rarely and only
when the resouce group is nearly full.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:19 +01:00
Steven Whitehouse 2b9731e8bb GFS2: Fix ->show_options() for statfs slow
The ->show_options() function for GFS2 was not correctly displaying
the value when statfs slow in in use.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Milos Jakubicek <xjakub@fi.muni.cz>
2012-09-24 10:47:17 +01:00
Steven Whitehouse 3e6339dd28 GFS2: Use rbm for gfs2_setbit()
Use the rbm structure for gfs2_setbit() in order to simplify the
arguments to the function. We have to add a bool to control whether
the clone bitmap should be updated (if it exists) but otherwise it
is a more or less direct substitution.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:16 +01:00
Steven Whitehouse c04a2ef3a8 GFS2: Use rbm for gfs2_testbit()
Change the arguments to gfs2_testbit() so that it now just takes an
rbm specifying the position of the two bit entry to return.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:14 +01:00
Bob Peterson 29c05b205d GFS2: Eliminate unnecessary check for state > 3 in bitfit
Function gfs2_bitfit was checking for state > 3, but that's
impossible since it is only called from rgblk_search, which receives
only GFS2_BLKST_ constants.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:13 +01:00
Bob Peterson e5dc76b9af GFS2: Eliminate redundant calls to may_grant
Function add_to_queue was checking may_grant for the passed-in
holder for every iteration of its gh2 loop. Now it only checks it
once at the beginning to see if a try lock is futile.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:12 +01:00
Bob Peterson 81e1d45061 GFS2: Combine functions gfs2_glock_dq_wait and wait_on_demote
Function gfs2_glock_dq_wait called two-line function wait_on_demote,
so they were combined.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:10 +01:00
Bob Peterson 07a7904942 GFS2: Combine functions gfs2_glock_wait and wait_on_holder
Function gfs2_glock_wait only called function wait_on_holder and
returned its return code, so they were combined for readability.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:09 +01:00
Bob Peterson 4abb6ad9ea GFS2: inline __gfs2_glock_schedule_for_reclaim
Since function gfs2_glock_schedule_for_reclaim is only two
significant lines, we can eliminate it, simplifying the code
and making it more readable.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:07 +01:00
Bob Peterson 8e711e100f GFS2: change function gfs2_direct_IO to use a normal gfs2_glock_dq
This patch changes function gfs2_direct_IO so that it uses a normal
call to gfs2_glock_dq rather than a call to a multiple-dq of one item.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:06 +01:00
Bob Peterson 8d8b752a0f GFS2: rbm code cleanup
This patch fixes a few small rbm related things. First, it fixes
a corner case where the rbm needs to switch bitmaps and wasn't
adjusting its buffer pointer. Second, there's a white space issue
fixed. Third, the logic in function gfs2_rbm_from_block was optimized
a bit. Lastly, a check for goal block overflows was added to function
gfs2_alloc_blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:04 +01:00
Steven Whitehouse 5d50d53246 GFS2: Fix case where reservation finished at end of rgrp
One corner case which the original patch failed to take into
account was when there is a reservation which ended such that
the following block was one beyond the end of the rgrp in
question. This extra test fixes that case.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Bob Peterson <rpeterso@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
2012-09-24 10:47:03 +01:00
Michel Lespinasse 24d634e8f3 GFS2: Use RB_CLEAR_NODE() rather than rb_init_node()
gfs2 calls RB_EMPTY_NODE() to check if nodes are not on an rbtree.
The corresponding initialization function is RB_CLEAR_NODE().
rb_init_node() was never clearly defined and is going away.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:02 +01:00
Steven Whitehouse 3b1d0b9d0b GFS2: Update rgblk_free() to use rbm
Replace open coded version with a call to gfs2_rbm_from_block()

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:00 +01:00
Steven Whitehouse 3983903a71 GFS2: Update gfs2_get_block_type() to use rbm
Use the new gfs2_rbm_from_block() function to replace an open
coded version of the same code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:59 +01:00
Steven Whitehouse 5b924ae2dc GFS2: Replace rgblk_search with gfs2_rbm_find
This is part of a series of patches which are introducing the
gfs2_rbm structure throughout the block allocation code. The
main aim of this part is to create a search function which can
deal directly with struct gfs2_rbm. In this case it specifies
the initial position at which to start the search and also the
point at which the search terminates.

The net result of this is to clean up the search code and make
it rather more readable, and the various possible exceptions which
may occur during the search are partitioned into their own functions.

There are some bug fixes too. We should not be checking the reservations
while allocating extents - the time for that is when we are searching
for where to put the extent, not when we've already made that decision.

Also, rgblk_search had two uses, and in only one of those cases did
it make sense to check for reservations. This is fixed in the new
gfs2_rbm_find function, which has a cleaner interface.

The reservation checking has been improved by always checking for
contiguous reservations, and returning the first free block after
all contiguous reservations. This is done under the spin lock to
ensure consistancy of the tree.

The allocation of extents is now in all cases done by the existing
allocation code, and if there is an active reservation, that is updated
after the fact. Again this is done under the spin lock, since it entails
changing the lookup key for the reservation in question.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:57 +01:00
Steven Whitehouse 4a993fb150 GFS2: Add structure to contain rgrp, bitmap, offset tuple
This patch introduces a new structure, gfs2_rbm, which is a
tuple of a resource group, a bitmap within the resource group
and an offset within that bitmap. This is designed to make
manipulating these sets of variables easier. There is also a
new helper function which converts this representation back
to a disk block address.

In addition, the rbtree nodes which are used for the reservations
were not being correctly initialised, which is now fixed. Also,
the tracing was not passing through the inode where it should
have been. That is mostly fixed aside from one corner case. This
needs to be revisited since there can also be a NULL rgrp in
some cases which results in the device being incorrect in the
trace.

This is intended to be the first step towards cleaning up some
of the allocation code, and some further bug fixes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:56 +01:00
Steven Whitehouse 71f890f7f7 GFS2: Remove rs_requested field from reservations
The rs_requested field is left over from the original allocation
code, however this should have been a parameter passed to the
various functions from gfs2_inplace_reserve() and not a member of the
reservation structure as the value is not required after the
initial allocation.

This also helps simplify the code since we no longer need to set
the rs_requested to zero. Also the gfs2_inplace_release()
function can also be simplified since the reservation structure
will always be defined when it is called, and the only remaining
task is to unlock the rgrp if required. It can also now be
called unconditionally too, resulting in a further simplification.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:54 +01:00
Steven Whitehouse 1f98169743 GFS2: Merge two nearly identical xattr functions
There were two functions in the xattr code which were nearly
identical, the only difference being that one was copy data into
the unstuffed xattrs and the other was copying data out from it.

This patch merges the two functions such that the code which deal
with iteration over the unstuffed xattrs is no longer duplicated.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:53 +01:00
Eric W. Biederman 431f19744d userns: Convert quota netlink aka quota_send_warning
Modify quota_send_warning to take struct kqid instead a type and
identifier pair.

When sending netlink broadcasts always convert uids and quota
identifiers into the intial user namespace.  There is as yet no way to
send a netlink broadcast message with different contents to receivers
in different namespaces, so for the time being just map all of the
identifiers into the initial user namespace which preserves the
current behavior.

Change the callers of quota_send_warning in gfs2, xfs and dquot
to generate a struct kqid to pass to quota send warning.  When
all of the user namespaces convesions are complete a struct kqid
values will be availbe without need for conversion, but a conversion
is needed now to avoid needing to convert everything at once.

Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:40 -07:00
Eric W. Biederman 74a8a10378 userns: Convert qutoactl
Update the quotactl user space interface to successfull compile with
user namespaces support enabled and to hand off quota identifiers to
lower layers of the kernel in struct kqid instead of type and qid
pairs.

The quota on function is not converted because while it takes a quota
type and an id.  The id is the on disk quota format to use, which
is something completely different.

The signature of two struct quotactl_ops methods were changed to take
struct kqid argumetns get_dqblk and set_dqblk.

The dquot, xfs, and ocfs2 implementations of get_dqblk and set_dqblk
are minimally changed so that the code continues to work with
the change in parameter type.

This is the first in a series of changes to always store quota
identifiers in the kernel in struct kqid and only use raw type and qid
values when interacting with on disk structures or userspace.  Always
using struct kqid internally makes it hard to miss places that need
conversion to or from the kernel internal values.

Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:39 -07:00
Eric W. Biederman 5f3a4a28ec userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr
- Pass the user namespace the uid and gid values in the xattr are stored
   in into posix_acl_from_xattr.

 - Pass the user namespace kuid and kgid values should be converted into
   when storing uid and gid values in an xattr in posix_acl_to_xattr.

- Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
  pass in &init_user_ns.

In the short term this change is not strictly needed but it makes the
code clearer.  In the longer term this change is necessary to be able to
mount filesystems outside of the initial user namespace that natively
store posix acls in the linux xattr format.

Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:35 -07:00
Linus Torvalds 3f0c3c8fe3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
Pull GFS2 fixes from Steven Whitehouse:
 "Here are three GFS2 fixes for the current kernel tree.  These are all
  related to the block reservation code which was added at the merge
  window.  That code will be getting an update at the forthcoming merge
  window too.  In the mean time though there are a few smaller issues
  which should be fixed.

  The first patch resolves an issue with write sizes of greater than 32
  bits with the size hinting code.  The second ensures that the
  allocation data structure is initialised when using xattrs and the
  third takes into account allocations which may have been made by other
  nodes which affect a reservation on the local node."

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
  GFS2: Take account of blockages when using reserved blocks
  GFS2: Fix missing allocation data for set/remove xattr
  GFS2: Make write size hinting code common
2012-09-14 18:05:14 -07:00
Steven Whitehouse 62e252eeef GFS2: Take account of blockages when using reserved blocks
The claim_reserved_blks() function was not taking account of
the possibility of "blockages" while performing allocation.
This can be caused by another node allocating something in
the same extent which has been reserved locally.

This patch tests for this condition and then skips the remainder
of the reservation in this case. This is a relatively rare event,
so that it should not affect the general performance improvement
which the block reservations provide.

The claim_reserved_blks() function also appears not to be able
to deal with reservations which cross bitmap boundaries, but
that can be dealt with in a future patch since we don't generate
boundary crossing reservations currently.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: David Teigland <teigland@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
2012-09-13 10:30:58 +01:00
Steven Whitehouse 645b2ccc75 GFS2: Fix missing allocation data for set/remove xattr
These entry points were missed in the original patch to allocate
this data structure.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-13 10:30:34 +01:00
Steven Whitehouse da1dfb6af8 GFS2: Make write size hinting code common
This collects up the write size hinting code which is used by the
block reservation subsystem into a single function. At the same
time this also corrects the rounding for this calculation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-13 10:30:00 +01:00
Tejun Heo 43829731dd workqueue: deprecate flush[_delayed]_work_sync()
flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
and convert all users to flush[_delayed]_work().

If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant and the regular flushes guarantee that the work item is
not pending or running on any CPU on return, so there's no reason to
use the sync flushes at all and they're going away.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mattia Dongili <malattia@linux.it>
Cc: Kent Yoder <key@linux.vnet.ibm.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Bryan Wu <bryan.wu@canonical.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Sangbeom Kim <sbkim73@samsung.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Avi Kivity <avi@redhat.com>
2012-08-20 14:51:24 -07:00