Commit Graph

27911 Commits

Author SHA1 Message Date
Jan Kara d0e91b13eb vfs: Remove unnecessary flushing of block devices
It is not necessary to write block devices twice. The reason why we first did
flush and then proper sync is that
  for_each_bdev() {
    write_bdev()
    wait_for_completion()
  }
is much slower than
  for_each_bdev()
    write_bdev()
  for_each_bdev()
    wait_for_completion()
when there is bigger amount of data. But as is seen in the above, there's no real
need to scan pages and submit them twice. We just need to separate the submission
and waiting part. This patch does that.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:53 +04:00
Jan Kara a8c7176b6d vfs: Make sys_sync writeout also block device inodes
In case block device does not have filesystem mounted on it, sys_sync will just
ignore it and doesn't writeout its dirty pages. This is because writeback code
avoids writing inodes from superblock without backing device and
blockdev_superblock is such a superblock.  Since it's unexpected that sync
doesn't writeout dirty data for block devices be nice to users and change the
behavior to do so. So now we iterate over all block devices on blockdev_super
instead of iterating over all superblocks when syncing block devices.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:49 +04:00
Jan Kara 5c0d6b60a0 vfs: Create function for iterating over block devices
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:45 +04:00
Jan Kara b3de653105 vfs: Reorder operations during sys_sync
Change the order of operations during sync from

for_each_sb {
        writeback_inodes_sb();
        sync_fs(nowait);
        __sync_blockdev(nowait);
}
for_each_sb {
        sync_inodes_sb();
        sync_fs(wait);
        __sync_blockdev(wait);
}

to

for_each_sb
        writeback_inodes_sb();
for_each_sb
        sync_fs(nowait);
for_each_sb
        __sync_blockdev(nowait);
for_each_sb
        sync_inodes_sb();
for_each_sb
        sync_fs(wait);
for_each_sb
        __sync_blockdev(wait);

This is a preparation for the following patches in this series.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:41 +04:00
Jan Kara a117782571 quota: Move quota syncing to ->sync_fs method
Since the moment writes to quota files are using block device page cache and
space for quota structures is reserved at the moment they are first accessed we
have no reason to sync quota before inode writeback. In fact this order is now
only harmful since quota information can easily change during inode writeback
(either because conversion of delayed-allocated extents or simply because of
allocation of new blocks for simple filesystems not using page_mkwrite).

So move syncing of quota information after writeback of inodes into ->sync_fs
method. This way we do not have to use ->quota_sync callback which is primarily
intended for use by quotactl syscall anyway and we get rid of calling
->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
also support legacy non-journalled quotas) and thus there are no dirty quota
structures.

CC: "Theodore Ts'o" <tytso@mit.edu>
CC: Joel Becker <jlbec@evilplan.org>
CC: reiserfs-devel@vger.kernel.org
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Dave Kleikamp <shaggy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:34 +04:00
Jan Kara ceed17236a quota: Split dquot_quota_sync() to writeback and cache flushing part
Split off part of dquot_quota_sync() which writes dquots into a quota file
to a separate function. In the next patch we will use the function from
filesystems and we do not want to abuse ->quota_sync quotactl callback more
than necessary.

Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:19 +04:00
Jan Kara 6eedc70150 vfs: Move noop_backing_dev_info check from sync into writeback
In principle, a filesystem may want to have ->sync_fs() called during sync(1)
although it does not have a bdi (i.e. s_bdi is set to noop_backing_dev_info).
Only writeback code really needs bdi set to something reasonable. So move the
checks where they are more logical.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:18 +04:00
Artem Bityutskiy 9e9ad5f408 fs/ufs: get rid of write_super
This patch makes UFS stop using the VFS '->write_super()' method along with
the 's_dirt' superblock flag, because they are on their way out.

The way we implement this is that we schedule a delay job instead relying on
's_dirt' and '->write_super()'.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblocks using the '->write_super()' call-back.  But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.

Tested using fsstress from the LTP project.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:16 +04:00
Artem Bityutskiy 7bd54ef722 fs/ufs: re-arrange the code a bit
This patch does not do any functional changes. It only moves 3 functions
in fs/ufs/super.c a little bit up in order to prepare for further changes
where I'll need this new arrangement to avoid forward declarations.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:14 +04:00
Artem Bityutskiy 65e5e83f7d fs/ufs: remove extra superblock write on unmount
UFS calls 'ufs_write_super()' from 'ufs_put_super()' in order to write the
superblocks to the media. However, it is not needed because VFS calls
'->sync_fs()' before calling '->put_super()' - so by the time we are in
'ufs_write_super()', the superblocks are already synchronized.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:14 +04:00
Artem Bityutskiy 9d46be294d fs/sysv: stop using write_super and s_dirt
It does not look like sysv FS needs 'write_super()' at all, because all it
does is a timestamp update. I cannot test this patch, because this
file-system is so old and probably has not been used by anyone for years,
so there are no tools to create it in Linux. But from the code I see that
marking the superblock as dirty is basically marking the superblock buffers as
drity and then setting the s_dirt flag. And when 'write_super()' is executed to
handle the s_dirt flag, we just update the timestamp and again mark the
superblock buffer as dirty. Seems pointless.

It looks like we can update the timestamp more opprtunistically - on unmount
or remount of sync, and nothing should change.

Thus, this patch removes 'sysv_write_super()' and 's_dirt'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:12 +04:00
Artem Bityutskiy eee458936b fs/sysv: remove another useless write_super call
We do not need to call 'sysv_write_super()' from 'sysv_remount()',
because VFS has called 'sysv_sync_fs()' before calling '->remount()'.
So remove it. Remove also '(un)lock_super()' which obvioulsy is becoming
useless in this function.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:11 +04:00
Artem Bityutskiy a4d05d315a fs/sysv: remove useless write_super call
We do not need to call 'sysv_write_super()' from 'sysv_put_super()',
because VFS has called 'sysv_sync_fs()' before calling '->put_super()'.
So remove it.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:10 +04:00
Artem Bityutskiy 5687b5780e hfs: get rid of hfs_sync_super
This patch makes hfs stop using the VFS '->write_super()' method along with
the 's_dirt' superblock flag, because they are on their way out.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblocks using the '->write_super()' call-back.  But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.

Tested using fsstress from the LTP project.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:09 +04:00
Artem Bityutskiy b16ca62635 hfs: introduce VFS superblock object back-reference
Add an 'sb' VFS superblock back-reference to the 'struct hfs_sb_info' data
structure - we will need to find the VFS superblock from a
'struct hfs_sb_info' object in the next patch, so this change is jut a
preparation.

Remove few useless newlines while on it.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:08 +04:00
Artem Bityutskiy 4527440d5d hfs: simplify a bit checking for R/O
We have the following pattern in 2 places in HFS

if (!RDONLY)
	hfs_mdb_commit();

This patch pushes the RDONLY check down to 'hfs_mdb_commit()'. This will
make the following patches a bit simpler.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:07 +04:00
Artem Bityutskiy a3742d4828 hfs: remove extra mdb write on unmount
HFS calls 'hfs_write_super()' from 'hfs_put_super()' in order to write the MDB
to the media. However, it is not needed because VFS calls '->sync_fs()' before
calling '->put_super()' - so by the time we are in 'hfs_write_super()', the MDB
is already synchronized.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:07 +04:00
Artem Bityutskiy b59352359d hfs: get rid of lock_super
Stop using lock_super for serializing the MDB changes - use the buffer-head own
lock instead. Tested with fsstress.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:06 +04:00
Artem Bityutskiy 715189d836 hfs: push lock_super down
HFS uses 'lock_super()'/'unlock_super()' around 'hfs_mdb_commit()' in order
to serialize MDB (Master Directory Block) changes. Push it down to
'hfs_mdb_commit()' in order to simplify the code a bit.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:05 +04:00
Artem Bityutskiy 9e6c5829b0 hfsplus: get rid of write_super
This patch makes hfsplus stop using the VFS '->write_super()' method along with
the 's_dirt' superblock flag, because they are on their way out.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblocks using the '->write_super()' call-back.  But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.

Tested using fsstress from the LTP project.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:04 +04:00
Artem Bityutskiy 58770d7e83 hfsplus: remove useless check
This check is useless because we always have 'sb->s_fs_info' to be non-NULL.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:03 +04:00
Artem Bityutskiy b7a90e8043 hfsplus: amend debugging print
Print correct function name in the debugging print of the
'hfsplus_sync_fs()' function.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:02 +04:00
Artem Bityutskiy 0a81861978 hfsplus: make hfsplus_sync_fs static
... because it is used only in fs/hfsplus/super.c.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:01 +04:00
Al Viro 3ffa3c0e3f aio: now fput() is OK from interrupt context; get rid of manual delayed __fput()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:57:59 +04:00
Al Viro 4a9d4b024a switch fput to task_work_add
... and schedule_work() for interrupt/kernel_thread callers
(and yes, now it *is* OK to call from interrupt).

We are guaranteed that __fput() will be done before we return
to userland (or exit).  Note that for fput() from a kernel
thread we get an async behaviour; it's almost always OK, but
sometimes you might need to have __fput() completed before
you do anything else.  There are two mechanisms for that -
a general barrier (flush_delayed_fput()) and explicit
__fput_sync().  Both should be used with care (as was the
case for fput() from kernel threads all along).  See comments
in fs/file_table.c for details.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:57:58 +04:00
Al Viro 1e0ea00144 use __lookup_hash() in kern_path_parent()
No need to bother with lookup_one_len() here - it's an overkill

Signed-off-by Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:57:53 +04:00
Linus Torvalds ce9f8d6b39 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull pnfs/ore fixes from Boaz Harrosh:
 "These are catastrophic fixes to the pnfs objects-layout that were just
  discovered.  They are also destined for @stable.

  I have found these and worked on them at around RC1 time but
  unfortunately went to the hospital for kidney stones and had a very
  slow recovery.  I refrained from sending them as is, before proper
  testing, and surly I have found a bug just yesterday.

  So now they are all well tested, and have my sign-off.  Other then
  fixing the problem at hand, and assuming there are no bugs at the new
  code, there is low risk to any surrounding code.  And in anyway they
  affect only these paths that are now broken.  That is RAID5 in pnfs
  objects-layout code.  It does also affect exofs (which was not broken)
  but I have tested exofs and it is lower priority then objects-layout
  because no one is using exofs, but objects-layout has lots of users."

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
  pnfs-obj: don't leak objio_state if ore_write/read fails
  ore: Unlock r4w pages in exact reverse order of locking
  ore: Remove support of partial IO request (NFS crash)
  ore: Fix NFS crash by supporting any unaligned RAID IO
2012-07-20 11:43:53 -07:00
Linus Torvalds 1793416287 Fix a bug in UBIFS free space fix-up reported already twice recently:
http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html
 http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html
 
 and we finally have the fix. I am quite confident the fix is correct
 because I could reproduce the problem with nandsim and verify the
 fix. It was also verified by Iwo (the reporter).
 
 I am also confident that this is OK to merge the fix so late because
 this patch affects only the fixup functionality, which is not used by
 most users.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQCQZ1AAoJECmIfjd9wqK0fosP/RD3Ruo5ILvTtBThKJPUoeld
 kihD9w3rk26cILlpGA3Cs/kaoOj/wPtjMVKGVkw50cWKRQemFLMh4ZcbCepfae+b
 g+YsH+ihkINgjdpKM351lgSCS+NEPJ695zmxNJ+/zjM5+ewfP6vK0qivnjF7w81k
 jLAVt80a1nhjNPyDMeQVr69HxBegYuX927LL4onJULYqvmrSiX/5tXzI+02emjDf
 9gA99fyc4pLNAJzzQyr44pogNaSME+Q90p4PAd11tlaVfn1kXgCXA3Ybv2cy7cer
 ipQfHQzfMjiCMO7Kpt5Ja3necuTarZsHV4UtmXhc4uIOr5p57dJX7RfBzA3j4RmV
 2ZFynqjl7n6ZT0pAM/0F9h9FyjZrCcgg1BGcEsqfJv2Yu7txOX1Qo2gkEvYJl8Sx
 Q2G6xNdzyib8MXClm4L2Zix16WqAF7CyUZo+szUTpdO8PPzgJ/vNpAk+3yqoVeep
 0Dr0HmTMRuP6tJGa9TH58QlvhClkXGSb7ukk1UlV4RVXtvvYtjVBwUXoHSUHNDJO
 HB9B+7ViTIjm9fdILqCX5wtrnZZQgFd1hBiQ/13/ZFrtB1hz5WfOdgfLRIBifjbq
 hGkwQyb5zsWTm7KGTOV0Yncmbnkut4zSJpMCbjZvcPJ2r5zwNwImKdvLBJ7oCKmd
 nPZ2dJmJYdKw2L00SzGZ
 =bUPG
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs

Pull UBIFS free space fix-up bugfix from Artem Bityutskiy:
 "It's been reported already twice recently:

    http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html
    http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html

  and we finally have the fix.  I am quite confident the fix is correct
  because I could reproduce the problem with nandsim and verify the fix.
  It was also verified by Iwo (the reporter).

  I am also confident that this is OK to merge the fix so late because
  this patch affects only the fixup functionality, which is not used by
  most users."

* tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs:
  UBIFS: fix a bug in empty space fix-up
2012-07-20 11:42:30 -07:00
Bob Peterson 15e1c96022 GFS2: Eliminate 64-bit divides
This patch removes the 64-bit divides introduced in the previous patch
in favor of shifting, so that it will compile properly on 32-bit machines.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-07-20 19:15:09 +01:00
Boaz Harrosh c999ff6802 pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.

Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.

Fix both birds, by returning a global zero_page, if offset is beyond
i_size.

TODO:
	Change the API to ->__r4w_get_page() so a NULL can be
	returned without being considered as error, since XOR API
	treats NULL entries as zero_pages.

[Bug since 3.2. Should apply the same way to all Kernels since]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20 11:50:31 +03:00
Boaz Harrosh 9909d45a85 pnfs-obj: don't leak objio_state if ore_write/read fails
[Bug since 3.2 Kernel]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20 11:50:30 +03:00
Boaz Harrosh 537632e0a5 ore: Unlock r4w pages in exact reverse order of locking
The read-4-write pages are locked in address ascending order.
But where unlocked in a way easiest for coding. Fix that,
locks should be released in opposite order of locking, .i.e
descending address order.

I have not hit this dead-lock. It was found by inspecting the
dbug print-outs. I suspect there is an higher lock at caller that
protects us, but fix it regardless.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20 11:49:25 +03:00
Boaz Harrosh 62b62ad873 ore: Remove support of partial IO request (NFS crash)
Do to OOM situations the ore might fail to allocate all resources
needed for IO of the full request. If some progress was possible
it would proceed with a partial/short request, for the sake of
forward progress.

Since this crashes NFS-core and exofs is just fine without it just
remove this contraption, and fail.

TODO:
	Support real forward progress with some reserved allocations
	of resources, such as mem pools and/or bio_sets

[Bug since 3.2 Kernel]
CC: Stable Tree <stable@kernel.org>
CC: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20 11:47:43 +03:00
Boaz Harrosh 9ff19309a9 ore: Fix NFS crash by supporting any unaligned RAID IO
In RAID_5/6 We used to not permit an IO that it's end
byte is not stripe_size aligned and spans more than one stripe.
.i.e the caller must check if after submission the actual
transferred bytes is shorter, and would need to resubmit
a new IO with the remainder.

Exofs supports this, and NFS was supposed to support this
as well with it's short write mechanism. But late testing has
exposed a CRASH when this is used with none-RPC layout-drivers.

The change at NFS is deep and risky, in it's place the fix
at ORE to lift the limitation is actually clean and simple.
So here it is below.

The principal here is that in the case of unaligned IO on
both ends, beginning and end, we will send two read requests
one like old code, before the calculation of the first stripe,
and also a new site, before the calculation of the last stripe.
If any "boundary" is aligned or the complete IO is within a single
stripe. we do a single read like before.

The code is clean and simple by splitting the old _read_4_write
into 3 even parts:
1._read_4_write_first_stripe
2. _read_4_write_last_stripe
3. _read_4_write_execute

And calling 1+3 at the same place as before. 2+3 before last
stripe, and in the case of all in a single stripe then 1+2+3
is preformed additively.

Why did I not think of it before. Well I had a strike of
genius because I have stared at this code for 2 years, and did
not find this simple solution, til today. Not that I did not try.

This solution is much better for NFS than the previous supposedly
solution because the short write was dealt  with out-of-band after
IO_done, which would cause for a seeky IO pattern where as in here
we execute in order. At both solutions we do 2 separate reads, only
here we do it within a single IO request. (And actually combine two
writes into a single submission)

NFS/exofs code need not change since the ORE API communicates the new
shorter length on return, what will happen is that this case would not
occur anymore.

hurray!!

[Stable this is an NFS bug since 3.2 Kernel should apply cleanly]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20 11:45:28 +03:00
Julia Lawall 7074e5eb23 UBIFS: remove invalid reference to list iterator variable
If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure.  Thus this value should not be used after
the end of the iterator.  Replace a field access from orphan by NULL in two
places.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier c;
expression E;
iterator name list_for_each_entry;
statement S;
@@

list_for_each_entry(c,...) { ... when != break;
                                 when forall
                                 when strict
}
...
(
c = E
|
*c
)
// </smpl>

Artem: fortunately, this did not cause any issues because we iterate the orphan
list using the elements count, so we never dereferenced the corrupted pointer.
This is why I do not send this patch to -stable. But otherwise - well spotted!

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
2012-07-20 10:27:25 +03:00
Artem Bityutskiy d51f17ea0a UBIFS: simplify reply code a bit
In the log reply code we assume that 'c->lhead_offs' is known and may be
non-zero, which is not the case because we do not store it in the master
node and have to find out by scanning on every mount. Knowing this fact
allows us to simplify the log scanning loop a bit and remove a couple
of unneeded local variables.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
2012-07-20 10:27:25 +03:00
Artem Bityutskiy 06bef9451a UBIFS: add debugfs knob to switch to R/O mode
This patch adds another debugfs knob which switches UBIFS to R/O mode.
I needed it while trying to reproduce the 'first log node is not CS node'
bug. Without this debugfs knob you have to perform a power cut to repruduce
the bug. The knob is named 'ro_error' and all it does is it sets the
'ro_error' UBIFS flag which makes UBIFS disallow any further writes - even
write-back will fail with -EROFS. Useful for debugging.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
2012-07-20 10:27:25 +03:00
Alexandre Pereira da Silva 782759b9f5 UBIFS: fix compilation warning
Fix the following compilation warning:

fs/ubifs/dir.c: In function 'ubifs_rename':
fs/ubifs/dir.c:972:15: warning: 'saved_nlink' may be used uninitialized
in this function

Use the 'uninitialized_var()' macro to get rid of this false-positive.

Artem: massaged the patch a bit.

Signed-off-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-07-20 10:27:25 +03:00
Artem Bityutskiy c6727932cf UBIFS: fix a bug in empty space fix-up
UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
limitations of dumb flasher programs. Namely, of those flashers that are unable
to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
relatively new (introduced in v3.0).

The fix-up routine (fixup_free_space()) is executed only once at the very first
mount if the superblock has the 'space_fixup' flag set (can be done with -F
option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
writes it back to the same LEB. The routine assumes the image is pristine and
does not have anything in the journal.

There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
All but one LEB of the log of a pristine file-system are empty. And one
contains just a commit start node. And 'fixup_free_space()' just unmapped this
LEB, which resulted in wiping the commit start node. As a result, some users
were unable to mount the file-system next time with the following symptom:

UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0

The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
that the beginning of empty space in the log head (c->lhead_offs) was known
on mount. However, it is not the case - it was always 0. UBIFS does not store
in it the master node and finds out by scanning the log on every mount.

The fix is simple - just pass commit start node size instead of 0 to
'fixup_leb()'.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Cc: stable@vger.kernel.org [v3.0+]
Reported-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Tested-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Reported-by: James Nute <newten82@gmail.com>
2012-07-20 10:13:27 +03:00
Bob Peterson 8e2e004735 GFS2: Reduce file fragmentation
This patch reduces GFS2 file fragmentation by pre-reserving blocks. The
resulting improved on disk layout greatly speeds up operations in cases
which would have resulted in interlaced allocation of blocks previously.
A typical example of this is 10 parallel dd processes, each writing to a
file in a common dirctory.

The implementation uses an rbtree of reservations attached to each
resource group (and each inode).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-07-19 14:51:08 +01:00
Linus Torvalds a9866ba47c Merge git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  cifs: always update the inode cache with the results from a FIND_*
  cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps
  cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space
  Initialise mid_q_entry before putting it on the pending queue
2012-07-18 09:28:11 -07:00
Al Viro 331ae4962b ext4: fix duplicated mnt_drop_write call in EXT4_IOC_MOVE_EXT
Caused, AFAICS, by mismerge in commit ff9cb1c4ee ("Merge branch
'for_linus' into for_linus_merged")

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org  # 3.3+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-18 08:59:46 -07:00
Abhijith Das 294f2ad5a5 GFS2: kernel panic with small gfs2 filesystems - 1 RG
In the unlikely setup where there's only one resource group in the gfs2
filesystem, gfs2_rgrpd_get_next() returns a NULL rgd that is not dealt with
properly, causing a kernel NULL ptr dereference. This patch fixes this issue.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-07-18 16:45:13 +01:00
Anton Vorontsov cbe7cbf5a6 pstore/ram: Make tracing log versioned
Decoding the binary trace w/ a different kernel might be troublesome
since we convert addresses to symbols. For kernels with minimal changes,
the mappings would probably match, but it's not guaranteed at all.
(But still we could convert the addresses by hand, since we do print
raw addresses.)

If we use modules, the symbols could be loaded at different addresses
from the previously booted kernel, and so this would also fail, but
there's nothing we can do about it.

Also, the binary data format that pstore/ram is using in its ringbuffer
may change between the kernels, so here we too must ensure that we're
running the same kernel.

So, there are two questions really:

1. How to compute the unique kernel tag;
2. Where to store it.

In this patch we're using LINUX_VERSION_CODE, just as hibernation
(suspend-to-disk) does. This way we are protecting from the kernel
version mismatch, making sure that we're running the same kernel
version and patch level. We could use CRC of a symbol table (as
suggested by Tony Luck), but for now let's not be that strict.

And as for storing, we are using a small trick here. Instead of
allocating a dedicated buffer for the tag (i.e. another prz), or
hacking ram_core routines to "reserve" some control data in the
buffer, we are just encoding the tag into the buffer signature
(and XOR'ing it with the actual signature value, so that buffers
not needing a tag can just pass zero, which will result into the
plain old PRZ signature).

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Colin Cross <ccross@android.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 16:48:09 -07:00
Linus Torvalds a5e135122c Last-minute PM update for 3.5
This renames CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND to encourage future
 reuse of the capability in question in related cases.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIcBAABAgAGBQJQBcRhAAoJEKhOf7ml8uNsnIoP/2XhSul9N/AWC5jfEAh4Af07
 QdhfJmYXnXC1Irndh/IoAITu+vHQecm0XjbvAy/9QOBn9oSkM7kNilvOLrCrdzzQ
 j9/BRMRCJRcu/vMyJmt37z0OIgfiktgDoOBaE6nC5t+1nHotcByAMWdy/AGwqqaL
 q3lbYcoRtDDQpDr9XPm68cyRdddvWnq81gXb90gNovvfgCjNFVvscshXmMGv3Luy
 Dx29zROJHJNOWG3kV1Xq7PdNffZj1ChCgIsBRKkzKWROcVEGPEuH5O0wjf4I4rCV
 PW6nRV9WOykqJI5CAnrWzr9bf8AvpclXtGYWFiwPvUF0kMggSoNFb5xQyRy45SBC
 nC+daLZNO123yU8xKb3qXaotsKPJ0qRTKAWUqWaGkRkQ0Mg90VmanyYkmP5PkeUX
 ZABNS4QlxnLGDtZuhSBioUO5pf0iDdzSrYkIOuYD81DGM8yKWWmUyxupOoVW5Kmu
 QD0d34+ZgEndv9znZzBF8DdGxkwjwljJW6sIBw7PGDq3qXcYdzd4awgtPlnGEOh/
 oi6iG24r8oysB8w5IJpwj20/zCvJyYVR+m+eHXxEs373xIGpbAfJbHYRKHqkYgTo
 nYkZyLgE0g46Izqbb42yrN7y5dUhSsrbImTI8L5xaLVkBYhspEuSO/eSLgoklWiw
 VgbmreU3R0apj0hwPcA5
 =oZrz
 -----END PGP SIGNATURE-----

Merge tag 'pm-post-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull a last-minute PM update from Rafael J. Wysocki:
 "This renames CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND to encourage future
  reuse of the capability in question in related cases."

* tag 'pm-post-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND
2012-07-17 14:15:43 -07:00
Michael Kerrisk d9914cf661 PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND
As discussed in
http://thread.gmane.org/gmane.linux.kernel/1249726/focus=1288990,
the capability introduced in 4d7e30d989
to govern EPOLLWAKEUP seems misnamed: this capability is about governing
the ability to suspend the system, not using a particular API flag
(EPOLLWAKEUP). We should make the name of the capability more general
to encourage reuse in related cases. (Whether or not this capability
should also be used to govern the use of /sys/power/wake_lock is a
question that needs to be separately resolved.)

This patch renames the capability to CAP_BLOCK_SUSPEND. In order to ensure
that the old capability name doesn't make it out into the wild, could you
please apply and push up the tree to ensure that it is incorporated
for the 3.5 release.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-07-17 21:37:27 +02:00
Anton Vorontsov 67a101f573 pstore: Headers should include all stuff they use
Headers should really include all the needed prototypes, types, defines
etc. to be self-contained. This is a long-standing issue, but apparently
the new tracing code unearthed it (SMP=n is also a prerequisite):

In file included from fs/pstore/internal.h:4:0,
                 from fs/pstore/ftrace.c:21:
include/linux/pstore.h:43:15: error: field ‘read_mutex’ has incomplete type

While at it, I also added the following:

linux/types.h -> size_t, phys_addr_t, uXX and friends
linux/spinlock.h -> spinlock_t
linux/errno.h -> Exxxx
linux/time.h -> struct timespec (struct passed by value)
struct module and rs_control forward declaration (passed via pointers).

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 12:15:30 -07:00
Anton Vorontsov a694d1b591 pstore/ram: Add ftrace messages handling
The ftrace log size is configurable via ramoops.ftrace_size
module option, and the log itself is available via
<pstore-mount>/ftrace-ramoops file.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 10:14:17 -07:00
Anton Vorontsov c2b7113261 pstore/ram: Convert to write_buf callback
Don't use pstore.buf directly, instead convert the code to write_buf callback
which passes a pointer to a buffer as an argument.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 10:07:09 -07:00
Anton Vorontsov 060287b8c4 pstore: Add persistent function tracing
With this support kernel can save function call chain log into a
persistent ram buffer that can be decoded and dumped after reboot
through pstore filesystem. It can be used to determine what function
was last called before a reset or panic.

We store the log in a binary format and then decode it at read time.

p.s.
Mostly the code comes from trace_persistent.c driver found in the
Android git tree, written by Colin Cross <ccross@android.com>
(according to sign-off history). I reworked the driver a little bit,
and ported it to pstore.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 10:05:52 -07:00
Anton Vorontsov 897dba0274 pstore: Introduce write_buf backend callback
For function tracing we need to stop using pstore.buf directly, since
in a tracing callback we can't use spinlocks, and thus we can't safely
use the global buffer.

With write_buf callback, backends no longer need to access pstore.buf
directly, and thus we can pass any buffers (e.g. allocated on stack).

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 09:51:38 -07:00
Anton Vorontsov c1743cbc8d pstore/ram_core: Get rid of prz->ecc enable/disable flag
Nowadays we can use prz->ecc_size as a flag, no need for the special
member in the prz struct.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 09:46:52 -07:00
Anton Vorontsov 5ca5d4e61d pstore/ram: Make ECC size configurable
This is now pretty straightforward: instead of using bool, just pass
an integer. For backwards compatibility ramoops.ecc=1 means 16 bytes
ECC (using 1 byte for ECC isn't much of use anyway).

Suggested-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 09:46:52 -07:00
Anton Vorontsov 4a53ffae6a pstore/ram_core: Get rid of prz->ecc_symsize and prz->ecc_poly
The struct members were never used anywhere outside of
persistent_ram_init_ecc(), so there's actually no need for them
to be in the struct.

If we ever want to make polynomial or symbol size configurable,
it would make more sense to just pass initialized rs_decoder
to the persistent_ram init functions.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 09:46:52 -07:00
Andrew Morton 17f79be93d sysfs: fail dentry revalidation after namespace change fix
don't assume that KOBJ_NS_TYPE_NONE==0.  Also save a test-n-branch.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 09:43:55 -07:00
Glauber Costa e5bcac6147 sysfs: fail dentry revalidation after namespace change
When we change the namespace tag of a sysfs entry, the associated dentry
is still kept around. readdir() will work correctly and not display the
old entries, but open() will still succeed, so will reads and writes.

This will no longer happen if sysfs is remounted, hinting that this is a
cache-related problem.

I am using the following sequence to demonstrate that:

shell1:
ip link add type veth
unshare -nm

shell2:
ip link set veth1 <pid_of_shell_1>
cat /sys/devices/virtual/net/veth1/ifindex

Before that patch, this will succeed (fail to fail). After it, it will
correctly return an error. Differently from a normal rename, which we
handle fine, changing the object namespace will keep it's path intact.
So this check seems necessary as well.

[ v2: get type from parent, as suggested by Eric Biederman ]

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Tejun Heo <tj@kernel.org>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-17 09:43:55 -07:00
Jeff Layton cd60042cc1 cifs: always update the inode cache with the results from a FIND_*
When we get back a FIND_FIRST/NEXT result, we have some info about the
dentry that we use to instantiate a new inode. We were ignoring and
discarding that info when we had an existing dentry in the cache.

Fix this by updating the inode in place when we find an existing dentry
and the uniqueid is the same.

Cc: <stable@vger.kernel.org> # .31.x
Reported-and-Tested-by: Andrew Bartlett <abartlet@samba.org>
Reported-by: Bill Robertson <bill_robertson@debortoli.com.au>
Reported-by: Dion Edwards <dion_edwards@debortoli.com.au>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-16 23:57:23 -05:00
Jeff Layton 3cf003c08b cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps
Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:

crash> bt
PID: 2789   TASK: f02edaa0  CPU: 3   COMMAND: "fsx"
 #0 [eed63cbc] schedule at c083c5b3
 #1 [eed63d80] kmap_high at c0500ec8
 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
 #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
 #4 [eed63e50] do_writepages at c04f3e32
 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a
 #6 [eed63ea4] filemap_fdatawrite at c04e1b3e
 #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
 #8 [eed63ecc] do_sync_write at c052d202
 #9 [eed63f74] vfs_write at c052d4ee
#10 [eed63f94] sys_write at c052df4c
#11 [eed63fb0] ia32_sysenter_target at c0409a98
    EAX: 00000004  EBX: 00000003  ECX: abd73b73  EDX: 012a65c6
    DS:  007b      ESI: 012a65c6  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bf8db178  EBP: bf8db1f8  GS:  0033
    CS:  0073      EIP: 40000424  ERR: 00000004  EFLAGS: 00000246

Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.

This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.

There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.

Cc: <stable@vger.kernel.org>
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-16 23:57:14 -05:00
Jeff Layton 3ae629d98b cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space
We currently rely on being able to kmap all of the pages in an async
read or write request. If you're on a machine that has CONFIG_HIGHMEM
set then that kmap space is limited, sometimes to as low as 512 slots.

With 512 slots, we can only support up to a 2M r/wsize, and that's
assuming that we can get our greedy little hands on all of them. There
are other users however, so it's possible we'll end up stuck with a
size that large.

Since we can't handle a rsize or wsize larger than that currently, cap
those options at the number of kmap slots we have. We could consider
capping it even lower, but we currently default to a max of 1M. Might as
well allow those luddites on 32 bit arches enough rope to hang
themselves.

A more robust fix would be to teach the send and receive routines how
to contend with an array of pages so we don't need to marshal up a kvec
array at all. That's a fairly significant overhaul though, so we'll need
this limit in place until that's ready.

Cc: <stable@vger.kernel.org>
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-16 23:57:09 -05:00
Sachin Prabhu ffc61ccbb9 Initialise mid_q_entry before putting it on the pending queue
A user reported a crash in cifs_demultiplex_thread() caused by an
incorrectly set mid_q_entry->callback() function. It appears that the
callback assignment made in cifs_call_async() was not flushed back to
memory suggesting that a memory barrier was required here. Changing the
code to make sure that the mid_q_entry structure was completely
initialised before it was added to the pending queue fixes the problem.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-07-16 23:57:02 -05:00
Greg Kroah-Hartman 28a78e46f0 Merge 3.5-rc7 into driver-core-next
This pulls in the printk fixes to the driver-core-next branch.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 18:19:55 -07:00
David Teigland 96006ea6d4 dlm: fix missing dir remove
I don't know exactly how, but in some cases, a dir
record is not removed, or a new one is created when
it shouldn't be.  The result is that the dir node
lookup returns a master node where the rsb does not
exist.  In this case, The master node will repeatedly
return -EBADR for requests, and the lock requests will
be stuck.

Until all possible ways for this to happen can be
eliminated, a simple and effective way to recover from
this situation is for the supposed master node to send
a standard remove message to the dir node when it
receives a request for a resource it has no rsb for.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16 14:24:43 -05:00
David Teigland c503a62103 dlm: fix conversion deadlock from recovery
The process of rebuilding locks on a new master during
recovery could re-order the locks on the convert queue,
creating an "in place" conversion deadlock that would
not be resolved.  Fix this by not considering queue
order when granting conversions after recovery.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16 14:18:22 -05:00
David Teigland 6d768177c2 dlm: use wait_event_timeout
Use wait_event_timeout to avoid using a timer
directly.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16 14:18:12 -05:00
David Teigland 05c32f47bf dlm: fix race between remove and lookup
It was possible for a remove message on an old
rsb to be sent after a lookup message on a new
rsb, where the rsbs were for the same resource
name.  This could lead to a missing directory
entry for the new rsb.

It is fixed by keeping a copy of the resource
name being removed until after the remove has
been sent.  A lookup checks if this in-progress
remove matches the name it is looking up.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16 14:18:01 -05:00
David Teigland 1d7c484eeb dlm: use idr instead of list for recovered rsbs
When a large number of resources are being recovered,
a linear search of the recover_list takes a long time.
Use an idr in place of a list.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16 14:17:52 -05:00
David Teigland c04fecb4d9 dlm: use rsbtbl as resource directory
Remove the dir hash table (dirtbl), and use
the rsb hash table (rsbtbl) as the resource
directory.  It has always been an unnecessary
duplication of information.

This improves efficiency by using a single rsbtbl
lookup in many cases where both rsbtbl and dirtbl
lookups were needed previously.

This eliminates the need to handle cases of rsbtbl
and dirtbl being out of sync.

In many cases there will be memory savings because
the dir hash table no longer exists.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16 14:16:19 -05:00
Linus Torvalds fce667c574 xfs: regression fixes for 3.5-rc7
- Really fix a cursor leak in xfs_alloc_ag_vextent_near
  - Fix a performance regression related to doing allocation in workqueues
  - Prevent recursion in xfs_buf_iorequest which is causing stack overflows
  - Don't call xfs_bdstrat_cb in xfs_buf_iodone callbacks
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQAGW0AAoJENaLyazVq6ZOOyEP/0xLuQKFF71eB/VL8BFylEHY
 H22/aDEWTo9pWDxxwBirVioBeCU07ByEv7zeQM1nqEm9pXESTSzsBUYKX2tWSRzY
 YclXYA0rpdCK2cdXpuz+0kWBFr9Y1Q1BIYNll6C3ZqhADgubAMHa13rKVUQlQqpD
 EZhvGrh42ujRhckwmi1E3+g3Ll79fty47WzyzEOa18ij3LI3q5Dm7WZpZlhv/MVW
 Fj975Q+LdJVchoQZ7gTiddMkZ936TwjLxM4EtWZd46CMG2/YRcPHx2YaItI0k6Xa
 Q34pbUHidZjqzng28iO6Y6BaB/rPLX/f7KcoZib+rc85zt8sShoDFXS9eq+DdTeH
 f5AgkPzpFfr3QpK3Fv5ZjICj5SkC6KhI14qnxLZhVGIWgJLGD8lb0fYDwN2y5BHq
 HmA7G4hALBaZ+TOpXQ2+XfxFyrffkQcM/Ja57yfDj37aOKYxMubAco4JlADBzBkg
 m1gF2QebQrbZ49k9x85vpIvoNFXcAg6FeuesYXciMcORCpMXp4f0oLjGlfwf13Fo
 upZ2AGfcpIcMl3fZzliw64VuJpX4QswTzUZFqXH8+Vn9eB6cwFqA8PJpLxBSshCk
 Y6E+LE6oYbsN+T6R+6Xj0DsPl4OIE0eWlGPaZZu9eWku02e0vZMj57RxUWpNcOmH
 6lq8TJaS/N5Qri51NPVs
 =TP1A
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.5-rc7' of git://oss.sgi.com/xfs/xfs

Pull xfs regression fixes from Ben Myers:
 - Really fix a cursor leak in xfs_alloc_ag_vextent_near
 - Fix a performance regression related to doing allocation in
   workqueues
 - Prevent recursion in xfs_buf_iorequest which is causing stack
   overflows
 - Don't call xfs_bdstrat_cb in xfs_buf_iodone callbacks

* tag 'for-linus-v3.5-rc7' of git://oss.sgi.com/xfs/xfs:
  xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
  xfs: prevent recursion in xfs_buf_iorequest
  xfs: don't defer metadata allocation to the workqueue
  xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near
2012-07-16 10:02:36 -07:00
Anders Kaseorg 05d290d66b fifo: Do not restart open() if it already found a partner
If a parent and child process open the two ends of a fifo, and the
child immediately exits, the parent may receive a SIGCHLD before its
open() returns.  In that case, we need to make sure that open() will
return successfully after the SIGCHLD handler returns, instead of
throwing EINTR or being restarted.  Otherwise, the restarted open()
would incorrectly wait for a second partner on the other end.

The following test demonstrates the EINTR that was wrongly thrown from
the parent’s open().  Change .sa_flags = 0 to .sa_flags = SA_RESTART
to see a deadlock instead, in which the restarted open() waits for a
second reader that will never come.  (On my systems, this happens
pretty reliably within about 5 to 500 iterations.  Others report that
it manages to loop ~forever sometimes; YMMV.)

  #include <sys/stat.h>
  #include <sys/types.h>
  #include <sys/wait.h>
  #include <fcntl.h>
  #include <signal.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <unistd.h>

  #define CHECK(x) do if ((x) == -1) {perror(#x); abort();} while(0)

  void handler(int signum) {}

  int main()
  {
      struct sigaction act = {.sa_handler = handler, .sa_flags = 0};
      CHECK(sigaction(SIGCHLD, &act, NULL));
      CHECK(mknod("fifo", S_IFIFO | S_IRWXU, 0));
      for (;;) {
          int fd;
          pid_t pid;
          putc('.', stderr);
          CHECK(pid = fork());
          if (pid == 0) {
              CHECK(fd = open("fifo", O_RDONLY));
              _exit(0);
          }
          CHECK(fd = open("fifo", O_WRONLY));
          CHECK(close(fd));
          CHECK(waitpid(pid, NULL, 0));
      }
  }

This is what I suspect was causing the Git test suite to fail in
t9010-svn-fe.sh:

	http://bugs.debian.org/678852

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-16 08:33:14 -07:00
David Howells 0bdaea9017 VFS: Split inode_permission()
Split inode_permission() into inode- and superblock-dependent parts.

This is aimed at unionmounts where the superblock from the upper layer has to
be checked rather than the superblock from the lower layer as the upper layer
may be writable, thus allowing an unwritable file from the lower layer to be
copied up and modified.

Original-author: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com> (Further development)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:38:36 +04:00
David Howells 9249e17fe0 VFS: Pass mount flags to sget()
Pass mount flags to sget() so that it can use them in initialising a new
superblock before the set function is called.  They could also be passed to the
compare function.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:38:34 +04:00
David Howells f015f1267b VFS: Comment mount following code
Add comments describing what the directions "up" and "down" mean and ref count
handling to the VFS mount following family of functions.

Signed-off-by: Valerie Aurora <vaurora@redhat.com> (Original author)
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:38:32 +04:00
David Howells be34d1a3bc VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors
copy_tree() can theoretically fail in a case other than ENOMEM, but always
returns NULL which is interpreted by callers as -ENOMEM.  Change it to return
an explicit error.

Also change clone_mnt() for consistency and because union mounts will add new
error cases.

Thanks to Andreas Gruenbacher <agruen@suse.de> for a bug fix.
[AV: folded braino fix by Dan Carpenter]

Original-author: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Valerie Aurora <valerie.aurora@gmail.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:37:27 +04:00
David Howells 55e4def0a6 VFS: Make chown() and lchown() call fchownat()
Make the chown() and lchown() syscalls jump to the fchownat() syscall with the
appropriate extra arguments.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:54 +04:00
Al Viro c3c4f69424 do_dentry_open(): close the race with mark_files_ro() in failure exit
we want to take it out of mark_files_ro() reach *before* we start
checking if we ought to drop write access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:50 +04:00
Al Viro 85d7d618c1 mark_files_ro(): don't bother with mntget/mntput
mnt_drop_write_file() is safe under any lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:46 +04:00
Andrew Morton c4107b3097 notify_change(): check that i_mutex is held
Cc: Djalal Harouni <tixxdz@opendz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:42 +04:00
Christoph Hellwig b5fb63c183 fs: add nd_jump_link
Add a helper that abstracts out the jump to an already parsed struct path
from ->follow_link operation from procfs.  Not only does this clean up
the code by moving the two sides of this game into a single helper, but
it also prepares for making struct nameidata private to namei.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:40 +04:00
Christoph Hellwig 408ef013cc fs: move path_put on failure out of ->follow_link
Currently the non-nd_set_link based versions of ->follow_link are expected
to do a path_put(&nd->path) on failure.  This calling convention is unexpected,
undocumented and doesn't match what the nd_set_link-based instances do.

Move the path_put out of the only non-nd_set_link based ->follow_link
instance into the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:35 +04:00
Al Viro ac481d6ca4 debugfs: get rid of useless arguments to debugfs_{mkdir,symlink}
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:30 +04:00
Al Viro cfa57c11b0 debugfs: fold debugfs_create_by_name() into the only caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:25 +04:00
Al Viro c3b1a35084 debugfs: make sure that debugfs_create_file() gets used only for regulars
It, debugfs_create_dir() and debugfs_create_link() use the common helper
now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:19 +04:00
Al Viro ee3efa91e2 __d_unalias() should refuse to move mountpoints
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:15 +04:00
Al Viro e77fb7cef8 sysfs: just use d_materialise_unique()
same as for nfs et.al.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:12 +04:00
Al Viro 469796d105 sysfs: switch to ->s_d_op and ->d_release()
a) ->d_iput() is wrong here - what we do to inode is completely usual, it's
dentry->d_fsdata that we want to drop.  Just use ->d_release().

b) switch to ->s_d_op - no need to play with d_set_d_op()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:06 +04:00
Al Viro 79714f72d3 get rid of kern_path_parent()
all callers want the same thing, actually - a kinda-sorta analog of
kern_path_create().  I.e. they want parent vfsmount/dentry (with
->i_mutex held, to make sure the child dentry is still their child)
+ the child dentry.

Signed-off-by Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:35:02 +04:00
David Howells 1acf0af9b9 VFS: Fix the banner comment on lookup_open()
Since commit 197e37d9, the banner comment on lookup_open() no longer matches
what the function returns.  It used to return a struct file pointer or NULL and
now it returns an integer and is passed the struct file pointer it is to use
amongst its arguments.  Update the comment to reflect this.

Also add a banner comment to atomic_open().

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:57 +04:00
Al Viro 312b63fba9 don't pass nameidata * to vfs_create()
all we want is a boolean flag, same as the method gets now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:50 +04:00
Al Viro ebfc3b49a7 don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:47 +04:00
Al Viro 72bd866a01 fs/namei.c: don't pass nameidata to __lookup_hash() and lookup_real()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:40 +04:00
Al Viro 00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Al Viro 201f956e43 fs/namei.c: don't pass namedata to lookup_dcache()
just the flags...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:25 +04:00
Al Viro 4ce16ef3fe fs/namei.c: don't pass nameidata to d_revalidate()
since the method wrapped by it doesn't need that anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:21 +04:00
Al Viro 0b728e1911 stop passing nameidata * to ->d_revalidate()
Just the lookup flags.  Die, bastard, die...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:14 +04:00
Al Viro fa3c56bbda fs/nfs/dir.c: switch to passing nd->flags instead of nd wherever possible
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:07 +04:00
Al Viro facc3530fb nfs_lookup_verify_inode() - nd is *always* non-NULL here
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:02 +04:00
Al Viro 93420b40bb switch nfs_lookup_check_intent() away from nameidata
just pass the flags

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:57 +04:00
Al Viro 02e5180d99 do_dentry_open(): take initialization of file->f_path to caller
... and get rid of a couple of arguments and a pointless reassignment
in finish_open() case.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:54 +04:00
Al Viro 2a027e7a18 fold __dentry_open() into its sole caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:52 +04:00
Al Viro 96b7e579ad switch do_dentry_open() to returning int
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:49 +04:00
Al Viro e45198a6ac make finish_no_open() return int
namely, 1 ;-)  That's what we want to return from ->atomic_open()
instances after finish_no_open().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:45 +04:00
Al Viro 2675a4eb6a fs/namei.c: get do_last() and friends return int
Same conventions as for ->atomic_open().  Trimmed the
forest of labels a bit, while we are at it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:43 +04:00
Al Viro 30d9049474 kill struct opendata
Just pass struct file *.  Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int.  Next: saner prototypes for parts in
namei.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:39 +04:00
Al Viro a4a3bdd778 kill opendata->{mnt,dentry}
->filp->f_path is there for purpose...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:37 +04:00
Al Viro d95852777b make ->atomic_open() return int
Change of calling conventions:
old		new
NULL		1
file		0
ERR_PTR(-ve)	-ve

Caller *knows* that struct file *; no need to return it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:35 +04:00
Al Viro 3d8a00d209 don't modify od->filp at all
make put_filp() conditional on flag set by finish_open()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:33 +04:00
Al Viro 47237687d7 ->atomic_open() prototype change - pass int * instead of bool *
... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.

[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:31 +04:00
Miklos Szeredi a8277b9baa vfs: move O_DIRECT check to common code
Perform open_check_o_direct() in a common place in do_last after opening the
file.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:28 +04:00
Miklos Szeredi f60dc3db6e vfs: do_last(): clean up retry
Move the lookup retry logic to the bottom of the function to make the normal
case simpler to read.

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:26 +04:00
Miklos Szeredi 77d660a8a8 vfs: do_last(): clean up bool
Consistently use bool for boolean values in do_last().

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:24 +04:00
Miklos Szeredi e83db16722 vfs: do_last(): clean up labels
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:21 +04:00
Miklos Szeredi aa4caadb70 vfs: do_last(): clean up error handling
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:20 +04:00
Miklos Szeredi 015c3bbcd8 vfs: remove open intents from nameidata
All users of open intents have been converted to use ->atomic_{open,create}.

This patch gets rid of nd->intent.open and related infrastructure.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:18 +04:00
Miklos Szeredi e43ae79c54 9p: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic open+create
operation implemented via ->create.  No functionality is changed.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:17 +04:00
Miklos Szeredi 2d83bde9a1 ceph: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:16 +04:00
Miklos Szeredi 3819219b59 ceph: remove unused arg from ceph_lookup_open()
What was the purpose of this?

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:16 +04:00
Miklos Szeredi d2c127197d cifs: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Steve French <sfrench@samba.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:15 +04:00
Miklos Szeredi c8ccbe032f fuse: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic open+create
operation implemented via ->create.  No functionality is changed.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:14 +04:00
Miklos Szeredi eda72afb9e nfs: don't use intents for checking atomic open
is_atomic_open() is now only used by nfs4_lookup_revalidate() to check whether
it's okay to skip normal revalidation.

It does a racy check for mount read-onlyness and falls back to normal
revalidation if the open would fail.  This makes little sense now that this
function isn't used for determining whether to actually open the file or not.

The d_mountpoint() check still makes sense since it is an indication that we
might be following a mount and so open may not revalidate the dentry.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:12 +04:00
Miklos Szeredi 50de348c36 nfs: don't use nd->intent.open.flags
Instead check LOOKUP_EXCL in nd->flags, which is basically what the open intent
flags were used for.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:10 +04:00
Miklos Szeredi 8867fe5899 nfs: clean up ->create in nfs_rpc_ops
Don't pass nfs_open_context() to ->create().  Only the NFS4 implementation
needed that and only because it wanted to return an open file using open
intents.  That task has been replaced by ->atomic_open so it is not necessary
anymore to pass the context to the create rpc operation.

Despite nfs4_proc_create apparently being okay with a NULL context it Oopses
somewhere down the call chain.  So allocate a context here.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:08 +04:00
Miklos Szeredi 0dd2b474d0 nfs: implement i_op->atomic_open()
Replace NFS4 specific ->lookup implementation with ->atomic_open impelementation
and use the generic nfs_lookup for other lookups.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:06 +04:00
Miklos Szeredi d18e9008c3 vfs: add i_op->atomic_open()
Add a new inode operation which is called on the last component of an open.
Using this the filesystem can look up, possibly create and open the file in one
atomic operation.  If it cannot perform this (e.g. the file type turned out to
be wrong) it may signal this by returning NULL instead of an open struct file
pointer.

i_op->atomic_open() is only called if the last component is negative or needs
lookup.  Handling cached positive dentries here doesn't add much value: these
can be opened using f_op->open().  If the cached file turns out to be invalid,
the open can be retried, this time using ->atomic_open() with a fresh dentry.

For now leave the old way of using open intents in lookup and revalidate in
place.  This will be removed once all the users are converted.

David Howells noticed that if ->atomic_open() opens the file but does not create
it, handle_truncate() will be called on it even if it is not a regular file.
Fix this by checking the file type in this case too.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:04 +04:00
Miklos Szeredi 54ef487241 vfs: lookup_open(): expand lookup_hash()
Copy __lookup_hash() into lookup_open().  The next patch will insert the atomic
open call just before the real lookup.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:02 +04:00
Miklos Szeredi d58ffd35c1 vfs: add lookup_open()
Split out lookup + maybe create from do_last().  This is the part under i_mutex
protection.

The function is called lookup_open() and returns a filp even though the open
part is not used yet.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:01 +04:00
Miklos Szeredi 7157486541 vfs: do_last(): common slow lookup
Make the slow lookup part of O_CREAT and non-O_CREAT opens common.

This allows atomic_open to be hooked into the slow lookup part.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:00 +04:00
Miklos Szeredi b6183df7b2 vfs: do_last(): separate O_CREAT specific code
Check O_CREAT on the slow lookup paths where necessary.  This allows the rest to
be shared with plain open.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:59 +04:00
Miklos Szeredi 37d7fffc9c vfs: do_last(): inline lookup_slow()
Copy lookup_slow() into do_last().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:58 +04:00
Al Viro 6d7b5aaed7 namei.c: let follow_link() do put_link() on failure
no need for kludgy "set cookie to ERR_PTR(...) because we failed
before we did actual ->follow_link() and want to suppress put_link()",
no pointless check in put_link() itself.

Callers checked if follow_link() has failed anyway; might as well
break out of their loops if that happened, without bothering
to call put_link() first.

[AV: folded fixes from hch]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:56 +04:00
Al Viro 1d674107ea coda: use list_for_each_entry
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:56 +04:00
Al Viro b3d9b7a3c7 vfs: switch i_dentry/d_alias to hlist
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:55 +04:00
Al Viro 9f713878f2 ext4: get rid of open-coded d_find_any_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:54 +04:00
Al Viro a614a092bf ocfs2: use list_for_each_entry in ocfs2_find_local_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:53 +04:00
Al Viro 12447c4039 affs: unobfuscate affs_fix_dcache()
and add a comment on what it's doing

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:52 +04:00
Al Viro 3084ee95f0 affs: get rid of open-coded list_for_each_entry()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:52 +04:00
Al Viro 7968ce12e9 adfs: don't bother with ->i_dentry in ->destroy_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:50 +04:00
Al Viro e6f9f8d029 cifs: don't bother with ->i_dentry in ->destroy_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:49 +04:00
Al Viro 63a44583f3 qnx6: don't bother with ->i_dentry in inode-freeing callback
we'll initialize it in inode_init_always() when we allocate that
object again.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:48 +04:00
Al Viro 6ce6e24e72 get rid of magic in proc_namespace.c
don't rely on proc_mounts->m being the first field; container_of()
is there for purpose.  No need to bother with ->private, while
we are at it - the same container_of will do nicely.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:48 +04:00
Al Viro f7a99c5b7c get rid of ->mnt_longterm
it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:47 +04:00
Julia Lawall d187663ef2 fs/direct-io.c: adjust suspicious bit operation
READ is 0, so the result of the bit-and operation is 0.  Rewrite with == as
done elsewhere in the same file.

This problem was found using Coccinelle (http://coccinelle.lip6.fr/).

Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:46 +04:00
Artem Bityutskiy 3dd847820d affs: get rid of affs_sync_super
This patch makes affs stop using the VFS '->write_super()' method along with
the 's_dirt' superblock flag, because they are on their way out.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblocks using the '->write_super()' call-back.  But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:45 +04:00
Artem Bityutskiy a215fef7ed affs: introduce VFS superblock object back-reference
Add an 'sb' VFS superblock back-reference to the 'struct affs_sb_info' data
structure - we will need to find the VFS superblock from a 'struct
affs_sb_info' object in the next patch, so this change is jut a preparation.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:45 +04:00
Artem Bityutskiy a837107439 affs: stop using lock_super
The VFS's 'lock_super()' and 'unlock_super()' calls are deprecated and unwanted
and just wait for a brave knight who'd kill them. This patch makes AFFS stop
using them and use the buffer-head's own lock instead.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:44 +04:00
Artem Bityutskiy e0471c8d8a affs: re-structure superblock locking a bit
AFFS wants to serialize the superblock (the root block in AFFS terms) updates
and uses 'lock_super()/unlock_super()' for these purposes. This patch pushes the
locking down to the 'affs_commit_super()' from the callers.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:43 +04:00
Artem Bityutskiy 0164b1a32e affs: remove useless superblock writeout on remount
We do not need to write out the superblock from '->remount_fs()' because
VFS has already called '->sync_fs()' by this time and the superblock has
already been written out. Thus, remove the 'affs_write_super()'
infocation from 'affs_remount()'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:42 +04:00
Artem Bityutskiy c9753b1d20 affs: remove useless superblock writeout on unmount
We do not need to write out the superblock from '->put_super()' because VFS has
already called '->sync_fs()' by this time and the superblock has already been
written out. Thus, remove the 'affs_commit_super()' infocation from
'affs_put_super()'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:42 +04:00
Artem Bityutskiy bc86256d2e affs: stop setting bm_flags
AFFS stores values '1' and '2' in 'bm_flags', and I fail to see any logic when
it prefers one or another. AFFS writes '1' only from '->put_super()', while
'->sync_fs()' and '->write_super()' store value '2'.  So on the first glance,
it looks like we want to have '1' if we unmount.  However, this does not really
happen in these cases:
  1. superblock is written via 'write_super()' then we unmount;
  2. we re-mount R/O, then unmount.
which are quite typical.

I could not find good documentation describing this field, except of one random
piece of documentation in the internet which says that -1 means that the root
block is valid, which is not consistent with what we have in the Linux AFFS
driver.

Jan Kara commented on this: "I have some vague recollection that on Amiga
boolean was usually encoded as: 0 == false, ~0 == -1 == true. But it has been
ages..."

Thus, my conclusion is that value of '1' is as good as value of '2' and we can
just always use '2'. An Jan Kara suggested to go further: "generally bm_flags
handling looks strange. If they are 0, we mount fs read only and thus cannot
change them.  If they are != 0, we write 2 there. So IMHO if you just removed
bm_flags setting, nothing will really happen."

So this patch removes the bm_flags setting completely. This makes the "clean"
argument of the 'affs_commit_super()' function unneeded, so it is also removed.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:41 +04:00
Christoph Hellwig 1632dcc93f xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
xfs_bdstrat_cb only adds a check for a shutdown filesystem over
xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down
filesystem a little earlier.  In addition the shutdown handling in
xfs_bdstrat_cb is not very suitable for this caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:49 -05:00
Christoph Hellwig 40a9b7963d xfs: prevent recursion in xfs_buf_iorequest
If the b_iodone handler is run in calling context in xfs_buf_iorequest we
can run into a recursion where xfs_buf_iodone_callbacks keeps calling back
into xfs_buf_iorequest because an I/O error happened, which keeps calling
back into xfs_buf_iorequest.  This chain will usually not take long
because the filesystem gets shut down because of log I/O errors, but even
over a short time it can cause stack overflows if run on the same context.

As a short term workaround make sure we always call the iodone handler in
workqueue context.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:39 -05:00
Dave Chinner aa292847b9 xfs: don't defer metadata allocation to the workqueue
Almost all metadata allocations come from shallow stack usage
situations. Avoid the overhead of switching the allocation to a
workqueue as we are not in danger of running out of stack when
making these allocations. Metadata allocations are already marked
through the args that are passed down, so this is trivial to do.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Mel Gorman <mgorman@suse.de>
Tested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:27 -05:00
Dave Chinner e3a746f5aa xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near
The current cursor is reallocated when retrying the allocation, so
the existing cursor needs to be destroyed in both the restart and
the failure cases.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:18 -05:00
Linus Torvalds 4264e6a263 NFS client bugfixes for Linux 3.5
- Fix an NFSv4 mount regression
 - Fix O_DIRECT list manipulation snafus
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQADo8AAoJEGcL54qWCgDyu8wP/2entLv4jYgM7zp+/kOBhps6
 RpAv57YYv86pLk5gxoZg/JJs3z4QzS3CjMRXK11CEzRMt1NUqJfI6587TIl11DYW
 0UJcrAQo4X3EgNR7CCKAJaHUmYLayjUgquL8OjdBTuXCrpJzGdHcnBa71oF3xaEb
 jJhsA0W+qqBpa0295HH8sdHkURX1+46OdSd47+Dg+PZurkr0CRhjZo+DX+AkaSsU
 blM+pYUgu20NFaNUEfBphib3XMnnCGXc84g3gqjeOldRMijJGv1VcmhfsmJwmyLA
 1mImwZ2HDzySVLrbA/D7yeddZfXuTaf8krFmyP07U6I7hIEXCAEr16DmqCQCF1UB
 ppzZM9lu8f6df9tIlmjdA/hGOfs5OSQzD1L9Irn8xpRIWYmg+mrxXSzIG6wZE8zu
 n1EURW5CrLrlz2d7rdhqA+Y/GIkAvIO0/iBbJnvkMUdMPsd9/ikFPkHGzMhsK1KN
 AxEi2r+n0e8Hwaueu0EP685QjrEjOyLDdKIsAzNy/UEGN4v297UoW8ZyYnrzEIJG
 mQg6l4ke34zuw3AxoR3X4SuW329rGpG7x0pXNwqJG292T9jki334dPzJCy1LbaGl
 o1g+Z0BcyMe9VC7tRwOFEiGsbcd/OYPRbVVhu/3RlrWuVvj46QDvVpQ7+1pOr2s/
 4Xu70AlKw3B02ge621+p
 =bX2K
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix an NFSv4 mount regression
 - Fix O_DIRECT list manipulation snafus

* tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix an NFSv4 mount regression
  NFS: Fix list manipulation snafus in fs/nfs/direct.c
2012-07-13 10:58:45 -07:00
Dave Jones 8d657eb3b4 Remove easily user-triggerable BUG from generic_setlease
This can be trivially triggered from userspace by passing in something unexpected.

    kernel BUG at fs/locks.c:1468!
    invalid opcode: 0000 [#1] SMP
    RIP: 0010:generic_setlease+0xc2/0x100
    Call Trace:
      __vfs_setlease+0x35/0x40
      fcntl_setlease+0x76/0x150
      sys_fcntl+0x1c6/0x810
      system_call_fastpath+0x1a/0x1f

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org # 3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-13 10:50:23 -07:00
Jeff Moyer 91f68c89d8 block: fix infinite loop in __getblk_slow
Commit 080399aaaf ("block: don't mark buffers beyond end of disk as
mapped") exposed a bug in __getblk_slow that causes mount to hang as it
loops infinitely waiting for a buffer that lies beyond the end of the
disk to become uptodate.

The problem was initially reported by Torsten Hilbrich here:

    https://lkml.org/lkml/2012/6/18/54

and also reported independently here:

    http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511

and then Richard W.M.  Jones and Marcos Mello noted a few separate
bugzillas also associated with the same issue.  This patch has been
confirmed to fix:

    https://bugzilla.redhat.com/show_bug.cgi?id=835019

The main problem is here, in __getblk_slow:

        for (;;) {
                struct buffer_head * bh;
                int ret;

                bh = __find_get_block(bdev, block, size);
                if (bh)
                        return bh;

                ret = grow_buffers(bdev, block, size);
                if (ret < 0)
                        return NULL;
                if (ret == 0)
                        free_more_memory();
        }

__find_get_block does not find the block, since it will not be marked as
mapped, and so grow_buffers is called to fill in the buffers for the
associated page.  I believe the for (;;) loop is there primarily to
retry in the case of memory pressure keeping grow_buffers from
succeeding.  However, we also continue to loop for other cases, like the
block lying beond the end of the disk.  So, the fix I came up with is to
only loop when grow_buffers fails due to memory allocation issues
(return value of 0).

The attached patch was tested by myself, Torsten, and Rich, and was
found to resolve the problem in call cases.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Josh Boyer <jwboyer@redhat.com>
Cc: Stable <stable@vger.kernel.org>  # 3.0+
[ Jens is on vacation, taking this directly  - Linus ]
--
Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-13 08:36:35 -07:00
Mathias Krause 0143fc5e9f udf: avoid info leak on export
For type 0x51 the udf.parent_partref member in struct fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-13 11:21:21 +02:00
Mathias Krause fe685aabf7 isofs: avoid info leak on export
For type 1 the parent_offset member in struct isofs_fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-13 11:21:21 +02:00
Liu Bo 10983f2e8d Btrfs: fix typo in convert_extent_bit
It should be convert_extent_bit.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-07-12 11:27:34 +02:00
Arne Jansen 6f72c7e20d Btrfs: add qgroup inheritance
When creating a subvolume or snapshot, it is necessary
to initialize the qgroup account with a copy of some
other (tracking) qgroup. This patch adds parameters
to the ioctls to pass the information from which qgroup
to inherit.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-12 10:54:40 +02:00
Arne Jansen 5d13a37bd5 Btrfs: add qgroup ioctls
Ioctls to control the qgroup feature like adding and
removing qgroups and assigning qgroups.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-12 10:54:39 +02:00
Arne Jansen c556723794 Btrfs: hooks to reserve qgroup space
Like block reserves, reserve a small piece of space on each
transaction start and for delalloc. These are the hooks that
can actually return EDQUOT to the user.
The amount of space reserved is tracked in the transaction
handle.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-12 10:54:39 +02:00
Jan Schmidt 546adb0d81 Btrfs: hooks for qgroup to record delayed refs
Hooks into qgroup code to record refs and into transaction commit.
This is the main entry point for qgroup. Basically every change in
extent backrefs got accounted to the appropriate qgroups.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-07-12 10:54:38 +02:00
Arne Jansen bcef60f249 Btrfs: quota tree support and startup
Init the quota tree along with the others on open_ctree
and close_ctree. Add the quota tree to the list of well
known trees in btrfs_read_fs_root_no_name.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-12 10:54:38 +02:00
Jan Schmidt edf39272db Btrfs: call the qgroup accounting functions
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-07-12 10:54:37 +02:00
Arne Jansen bed92eae26 Btrfs: qgroup implementation and prototypes
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-07-12 10:54:21 +02:00
Steven J. Magnani 5d8ecbbc28 fat: fix non-atomic NFS i_pos read
fat_encode_fh() can fetch an invalid i_pos value on systems where 64-bit
accesses are not atomic.  Make it use the same accessor as the rest of the
FAT code.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11 16:04:47 -07:00
Bob Liu fea9f718b3 fs: ramfs: file-nommu: add SetPageUptodate()
There is a bug in the below scenario for !CONFIG_MMU:

 1. create a new file
 2. mmap the file and write to it
 3. read the file can't get the correct value

Because

  sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page()

which causes the page to be zeroed.

Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that
generic_file_aio_read() do not call simple_readpage().

Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11 16:04:47 -07:00
Luis Henriques a4e08d001f ocfs2: fix NULL pointer dereference in __ocfs2_change_file_space()
As ocfs2_fallocate() will invoke __ocfs2_change_file_space() with a NULL
as the first parameter (file), it may trigger a NULL pointer dereferrence
due to a missing check.

Addresses http://bugs.launchpad.net/bugs/1006012

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Reported-by: Bret Towe <magnade@gmail.com>
Tested-by: Bret Towe <magnade@gmail.com>
Cc: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11 16:04:43 -07:00
Dong Aisheng 475d009429 of: Improve prom_update_property() function
prom_update_property() currently fails if the property doesn't
actually exist yet which isn't what we want. Change to add-or-update
instead of update-only, then we can remove a lot duplicated lines.

Suggested-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Dong Aisheng <dong.aisheng@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-11 15:26:51 +10:00
Trond Myklebust f1daf666dd NFSv4: Fix an NFSv4 mount regression
The helper nfs_fs_mount() will always call nfs4_try_mount with the
mount_info->fill_super argument pointing to nfs_fill_super, which is
NFSv2/v3 only.
Fix is to have nfs4_try_mount replace it with nfs4_fill_super.

The regression was introduced by commit c40f8d1d (NFS: Create a common
fs_mount() function)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-10 13:25:39 -04:00
Jan Kara 57b9655d01 udf: Improve table length check to avoid possible overflow
When a partition table length is corrupted to be close to 1 << 32, the
check for its length may overflow on 32-bit systems and we will think
the length is valid. Later on the kernel can crash trying to read beyond
end of buffer. Fix the check to avoid possible overflow.

CC: stable@vger.kernel.org
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-10 18:02:17 +02:00
Arne Jansen 709c0486b9 Btrfs: Test code to change the order of delayed-ref processing
Normally delayed refs get processed in ascending bytenr order. This
correlates in most cases to the order added. To expose dependencies
on this order, we start to process the tree in the middle instead of
the beginning.
This code is only effective when SCRAMBLE_DELAYED_REFS is defined.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-10 15:14:44 +02:00
Arne Jansen 416ac51da9 Btrfs: qgroup state and initialization
Add state to fs_info.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-10 15:14:44 +02:00
Arne Jansen 20897f5c86 Btrfs: added helper to create new trees
This creates a brand new tree. Will be used to create
the quota tree.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-10 15:14:43 +02:00
Arne Jansen d13603ef6e Btrfs: check the root passed to btrfs_end_transaction
This patch only add a consistancy check to validate that the
same root is passed to start_transaction and end_transaction.
Subvolume quota depends on this.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-10 15:14:43 +02:00
Arne Jansen 2f38b3e190 Btrfs: add helper for tree enumeration
Often no exact match is wanted but just the next lower or
higher item. There's a lot of duplicated code throughout
btrfs to deal with the corner cases. This patch adds a
helper function that can facilitate searching.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-10 15:14:42 +02:00
Arne Jansen 630dc772ea Btrfs: qgroup on-disk format
Not all features are in use by the current version
and thus may change in the future.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-10 15:14:42 +02:00
Jan Schmidt 097b8a7c9e Btrfs: join tree mod log code with the code holding back delayed refs
We've got two mechanisms both required for reliable backref resolving (tree
mod log and holding back delayed refs). You cannot make use of one without
the other. So instead of requiring the user of this mechanism to setup both
correctly, we join them into a single interface.

Additionally, we stop inserting non-blockers into fs_info->tree_mod_seq_list
as we did before, which was of no value.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-07-10 15:14:41 +02:00
Jan Schmidt cf5388307a Btrfs: fix buffer leak in btrfs_next_old_leaf
When calling btrfs_next_old_leaf, we were leaking an extent buffer in the
rare case of using the deadlock avoidance code needed for the tree mod log.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-07-10 15:14:41 +02:00
Jan Kara 44f4f729e7 ext3: Check return value of blkdev_issue_flush()
blkdev_issue_flush() can fail. Make sure the error gets properly propagated.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 23:40:46 +02:00
Jan Kara 349ecd6a3c jbd: Check return value of blkdev_issue_flush()
blkdev_issue_flush() can fail. Make sure the error gets properly propagated.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 23:38:36 +02:00
Zheng Liu 729f52c6be ext4: add a new nolock flag in ext4_map_blocks
EXT4_GET_BLOCKS_NO_LOCK flag is added to indicate that we don't need
to acquire i_data_sem lock in ext4_map_blocks.  Meanwhile, it changes
ext4_get_block() to not start a new journal because when we do a
overwrite dio, there is no any metadata that needs to be modified.

We define a new function called ext4_get_block_write_nolock, which is
used in dio overwrite nolock.  In this function, it doesn't try to
acquire i_data_sem lock and doesn't start a new journal as it does a
lookup.

CC: Tao Ma <tm@tao.ma>
CC: Eric Sandeen <sandeen@redhat.com>
CC: Robin Dong <hao.bigrat@gmail.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-07-09 16:29:29 -04:00
Zheng Liu fbe104942d ext4: split ext4_file_write into buffered IO and direct IO
ext4_file_dio_write is defined in order to split buffered IO and
direct IO in ext4.  This patch just refactor some stuff in write path.

CC: Tao Ma <tm@tao.ma>
CC: Eric Sandeen <sandeen@redhat.com>
CC: Robin Dong <hao.bigrat@gmail.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-07-09 16:29:29 -04:00
Haibo Liu 62a1391ddd ext4: remove an unused statement in ext4_mb_get_buddy_page_lock()
In this patch, the statement "poff = block % blocks_per_page"
in ext4_mb_get_buddy_page_lock has no effect.

It will be optimized out by the compiler, but it's better to remove it.

Signed-off-by: Haibo Liu <HaiboLiu6@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-07-09 16:29:28 -04:00
HaiboLiu e7bcf82304 ext4: fix out-of-date comments in extents.c
In this patch, ext4_ext_try_to_merge has been change to merge 
an extent both left and right.  So we need to update the comment
in here.

Signed-off-by: HaiboLiu <HaiboLiu6@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-07-09 16:29:28 -04:00
Tao Ma 41eb70dde4 ext4: use s_csum_seed instead of i_csum_seed for xattr block
In xattr block operation, we use h_refcount to indicate whether the
xattr block is shared among many inodes. And xattr block csum uses
s_csum_seed if it is shared and i_csum_seed if it belongs to
one inode. But this has a problem. So consider the block is shared
first bewteen inode A and B, and B has some xattr update and CoW
the xattr block. When it updates the *old* xattr block(because
of the h_refcount change) and calls ext4_xattr_release_block, we
has no idea that inode A is the real owner of the *old* xattr
block and we can't use the i_csum_seed of inode A either in xattr
block csum calculation. And I don't think we have an easy way to
find inode A.

So this patch just removes the tricky i_csum_seed and we now uses
s_csum_seed every time for the xattr block csum. The corresponding
patch for the e2fsprogs will be sent in another patch.

This is spotted by xfstests 117.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
2012-07-09 16:29:27 -04:00
Tao Ma ef58f69c3c ext4: use proper csum calculation in ext4_rename
In ext4_rename, when the old name is a dir, we need to
change ".." to its new parent and journal the change, so
with metadata_csum enabled, we have to re-calc the csum.

As the first block of the dir can be either a htree root
or a normal directory block and we have different csum
calculation for these 2 types, we have to choose the right
one in ext4_rename.

btw, it is found by xfstests 013.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
2012-07-09 16:29:05 -04:00
Theodore Ts'o 952fc18ef9 ext4: fix overhead calculation used by ext4_statfs()
Commit f975d6bcc7 introduced bug which caused ext4_statfs() to
miscalculate the number of file system overhead blocks.  This causes
the f_blocks field in the statfs structure to be larger than it should
be.  This would in turn cause the "df" output to show the number of
data blocks in the file system and the number of data blocks used to
be larger than they should be.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-07-09 16:27:05 -04:00
Jan Kara 17dc59ba41 udf: Do not decrement i_blocks when freeing indirect extent block
Indirect extent block is not accounted in i_blocks during allocation
thus we should not decrement i_blocks when we are freeing such block
during truncation.

Reported-by: Steve Nickel <snickel58@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 13:24:21 +02:00
Jan Kara bff943af6f udf: Fix memory leak when mounting
When we are mounting filesystem, we can load one partition table before
finding out that we cannot complete processing of logical volume descriptor
and trying the reserve descriptor. Free the table properly before trying
the reserve descriptor.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:12 +02:00
Wanlong Gao e124a32043 ext2: cleanup the confused goto label
Cleanup the confused goto label, since the big lock has been removed.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:12 +02:00
Ashish Sangwan a0e589b485 UDF: Remove unnecessary variable "offset" from udf_fill_inode
The variable "offset" is not needed. Remove it.

Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:12 +02:00
Artem Bityutskiy db8109ef98 udf: stop using s_dirt
The UDF file-system does not need the 's_dirt' superblock flag because it does
not define the 'write_super()' method. This flag was set to 1 in few places and
set to 0 in '->sync_fs()' and was basically useless. Stop using it because it
is on its way out.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:11 +02:00
Eric Sandeen f007dbf8e5 ext3: force ro mount if ext3_setup_super() fails
If ext3_setup_super() fails i.e. due to a too-high revision,
the error is logged in dmesg but the fs is not mounted RO as
indicated.

Tested by:

[164152.114551] EXT3-fs (sdb6): error: revision level too high, forcing read-only mode
/dev/sdb6 /mnt/test2 ext3 rw,seclabel,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered 0 0
                          ^^

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:11 +02:00
Jeff Liu f3da93105b quota: fix checkpatch.pl warning by replacing <asm/uaccess.h> with <linux/uaccess.h>
checkpatch.pl warns:

"WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>"

Below patch fixes it.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:11 +02:00
Trond Myklebust 4035c2487f NFS: Fix list manipulation snafus in fs/nfs/direct.c
Fix 2 bugs in nfs_direct_write_reschedule:

 - The request needs to be removed from the 'reqs' list before it can
   be added to 'failed'.
 - Fix an infinite loop if the 'failed' list is non-empty.

Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-08 10:32:08 -04:00
Linus Torvalds 332a2e1244 vfs: make O_PATH file descriptors usable for 'fchdir()'
We already use them for openat() and friends, but fchdir() also wants to
be able to use O_PATH file descriptors.  This should make it comparable
to the O_SEARCH of Solaris.  In particular, O_PATH allows you to access
(not-quite-open) a directory you don't have read persmission to, only
execute permission.

Noticed during development of multithread support for ksh93.

Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org    # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-07 17:19:02 -07:00
Linus Torvalds 26c439d400 Fixes an incorrect access mode check when preparing to open a file in the lower
filesystem. This isn't an urgent fix, but it is simple and the check was
 obviously incorrect.
 
 Also fixes a couple important bugs in the eCryptfs miscdev interface. These
 changes are low risk due to the small number of users that use the miscdev
 interface. I was able to keep the changes minimal and I have some cleaner, more
 complete changes queued up for the next merge window that will build on these
 patches.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCgAGBQJP92LsAAoJENaSAD2qAscKCWkP/3BWpv0AS0fnrZPniXv/+vjf
 gdV4NcQhE/86VsQ7CtZS7jqfSVTzm+YTta9BTKj6jWZuGUZGcjXsZdyMpleBZukh
 TvRSW3HKCRtC8XNHzle3YUukD1o465nMEiCUQOYcWjAa3in7cZTiFU+3S2Unn5UF
 yh2Slfzjxkl2EUHEbcBiBayzaMH2gqwAvRR4sjM0P175m/jjDF6pDGT5vc0skvcP
 kLzFr/3Ia9BW1nU0yblTtSNcHzYV8GTJVEpj1NR7q59x2gVJubF6hBDtbZdaaGK0
 rYlKV+w9mRwzUCuVdb4zPCa9EGrbqH4gYvIWsCW+R0zoK57rfIRolQVYEglGE2TU
 K3HHL6UOsPASZCQqhi+K+tCmYtZaCfeMhDRgxyDOaxS4rQ6dy+XO6f9zM30qw1UB
 QHeVEQl7bM0IpByCcjVbuNJT4zTlW7xmsLm/pbGv60UBdZpqaUZptEBEpgUFjq30
 shgNLlHHWvelhf52gbff+ytCHf+IDVPT/Q2aGjhC2fgqWiQno44vR88gtMQz6b7g
 4yEL7t0TqBB9jCBu/ikTITGpRH5S149e3oYGm2P/+YYZUGlw0Gf9N6TBkctJFSg/
 /vk6aobMnjfxmeM80xOKey5Y1zDis660sgt1hX8NVAuo4hp7VQfWGhEZ8lYqzCzP
 aJci4ZXaDzwXx6UCC5w2
 =TEei
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.5-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull eCryptfs fixes from Tyler Hicks:
 "Fixes an incorrect access mode check when preparing to open a file in
  the lower filesystem.  This isn't an urgent fix, but it is simple and
  the check was obviously incorrect.

  Also fixes a couple important bugs in the eCryptfs miscdev interface.
  These changes are low risk due to the small number of users that use
  the miscdev interface.  I was able to keep the changes minimal and I
  have some cleaner, more complete changes queued up for the next merge
  window that will build on these patches."

* tag 'ecryptfs-3.5-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files
  eCryptfs: Fix lockdep warning in miscdev operations
  eCryptfs: Properly check for O_RDONLY flag before doing privileged open
2012-07-06 15:32:18 -07:00
Tyler Hicks 8dc6780587 eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files
File operations on /dev/ecryptfs would BUG() when the operations were
performed by processes other than the process that originally opened the
file. This could happen with open files inherited after fork() or file
descriptors passed through IPC mechanisms. Rather than calling BUG(), an
error code can be safely returned in most situations.

In ecryptfs_miscdev_release(), eCryptfs still needs to handle the
release even if the last file reference is being held by a process that
didn't originally open the file. ecryptfs_find_daemon_by_euid() will not
be successful, so a pointer to the daemon is stored in the file's
private_data. The private_data pointer is initialized when the miscdev
file is opened and only used when the file is released.

https://launchpad.net/bugs/994247

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
2012-07-06 15:51:12 -05:00
Linus Torvalds 1b7fa4c271 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
Pull ocfs2 fixes from Joel Becker.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  aio: make kiocb->private NUll in init_sync_kiocb()
  ocfs2: Fix bogus error message from ocfs2_global_read_info
  ocfs2: for SEEK_DATA/SEEK_HOLE, return internal error unchanged if ocfs2_get_clusters_nocache() or ocfs2_inode_lock() call failed.
  ocfs2: use spinlock irqsave for downconvert lock.patch
  ocfs2: Misplaced parens in unlikley
  ocfs2: clear unaligned io flag when dio fails
2012-07-06 10:04:39 -07:00