Commit Graph

40467 Commits

Author SHA1 Message Date
Miklos Szeredi 7a3b2c7547 fuse: simplify req states
FUSE_REQ_INIT is actually the same state as FUSE_REQ_PENDING and
FUSE_REQ_READING and FUSE_REQ_WRITING can be merged into a common
FUSE_REQ_IO state.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:00 +02:00
Miklos Szeredi c47752673a fuse: don't hold lock over request_wait_answer()
Only hold fc->lock over sections of request_wait_answer() that actually
need it.  If wait_event_interruptible() returns zero, it means that the
request finished.  Need to add memory barriers, though, to make sure that
all relevant data in the request is synchronized.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-07-01 16:26:00 +02:00
Miklos Szeredi 7d2e0a099c fuse: simplify unique ctr
Since it's a 64bit counter, it's never gonna wrap around.  Remove code
dealing with that possibility.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:00 +02:00
Miklos Szeredi 41f982747e fuse: rework abort
Splice fc->pending and fc->processing lists into a common kill list while
holding fc->lock.

By the time we release fc->lock, pending and processing lists are empty and
the io list contains only locked requests.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:59 +02:00
Miklos Szeredi b716d42538 fuse: fold helpers into abort
Fold end_io_requests() and end_queued_requests() into fuse_abort_conn().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:59 +02:00
Miklos Szeredi dc00809a53 fuse: use per req lock for lock/unlock_request()
Reuse req->waitq.lock for protecting FR_ABORTED and FR_LOCKED flags.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:58 +02:00
Miklos Szeredi 825d6d3395 fuse: req use bitops
Finer grained locking will mean there's no single lock to protect
modification of bitfileds in fuse_req.

So move to using bitops.  Can use the non-atomic variants for those which
happen while the request definitely has only one reference.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:58 +02:00
Miklos Szeredi 0d8e84b043 fuse: simplify request abort
- don't end the request while req->locked is true

 - make unlock_request() return an error if the connection was aborted

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:58 +02:00
Miklos Szeredi ccd0a0bd16 fuse: call fuse_abort_conn() in dev release
fuse_abort_conn() does all the work done by fuse_dev_release() and more.
"More" consists of:

	end_io_requests(fc);
	wake_up_all(&fc->waitq);
	kill_fasync(&fc->fasync, SIGIO, POLL_IN);

All of which should be no-op (WARN_ON's added).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:57 +02:00
Miklos Szeredi f0139aa819 fuse: fold fuse_request_send_nowait() into single caller
And the same with fuse_request_send_nowait_locked().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:57 +02:00
Miklos Szeredi de15522646 fuse: check conn_error earlier
fc->conn_error is set once in FUSE_INIT reply and never cleared.  Check it
in request allocation, there's no sense in doing all the preparation if
sending will surely fail.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:57 +02:00
Miklos Szeredi 5437f24172 fuse: account as waiting before queuing for background
Move accounting of fc->num_waiting to the point where the request actually
starts waiting.  This is earlier than the current queue_request() for
background requests, since they might be waiting on the fc->bg_queue before
being queued on fc->pending.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:56 +02:00
Miklos Szeredi 73e0e73844 fuse: reset waiting
Reset req->waiting in fuse_put_request().  This is needed for correct
accounting in fc->num_waiting for reserved requests.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-07-01 16:25:56 +02:00
Miklos Szeredi 42dc6211c5 fuse: fix background request if not connected
request_end() expects fc->num_background and fc->active_background to have
been incremented, which is not the case in fuse_request_send_nowait()
failure path.  So instead just call the ->end() callback (which is actually
set by all callers).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:56 +02:00
Miklos Szeredi 0ad0b3255a fuse: initialize fc->release before calling it
fc->release is called from fuse_conn_put() which was used in the error
cleanup before fc->release was initialized.

[Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling
fuse_conn_init(fc) instead of before.]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fixes: a325f9b922 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()")
Cc: <stable@vger.kernel.org> #v2.6.31+
2015-07-01 16:25:55 +02:00
Sasha Levin 161f873b89 vfs: read file_handle only once in handle_to_path
We used to read file_handle twice.  Once to get the amount of extra
bytes, and once to fetch the entire structure.

This may be problematic since we do size verifications only after the
first read, so if the number of extra bytes changes in userspace between
the first and second calls, we'll have an incoherent view of
file_handle.

Instead, read the constant size once, and copy that over to the final
structure without having to re-read it again.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-02 10:29:07 -07:00
Linus Torvalds 8ba64dc338 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fix from Al Viro:
 "Off-by-one in d_walk()/__dentry_kill() race fix.

  It's very hard to hit; possible in the same conditions as the original
  bug, except that you need the skipped branch to contain all the
  remaining evictables, so that the d_walk()-calling loop in
  d_invalidate() decides there's nothing more to do and doesn't go for
  another pass - otherwise that next pass will sweep the sucker.

  So it's not too urgent, but seeing that the fix is obvious and the
  original commit has spread into all -stable branches..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  d_walk() might skip too much
2015-05-31 16:00:34 -07:00
Linus Torvalds 1be44e234b xfs: update for 4.1-rc6
Changes in this update:
 o regression fix for new rename whiteout code
 o regression fixes for new superblock generic per-cpu counter code
 o fix for incorrect error return sign introduced in 3.17
 o metadata corruption fixes that need to go back to -stable kernels
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJVaO0JAAoJEK3oKUf0dfodd4UQANRdfXnUrpyQGhVS7HFoFoVt
 FIQ52pPGbMu72+DqHc+Q41uvgAPe65LFB2VUL6CUGCMExstF72F5+QonzppMgkMo
 unPER3eB8ya03SY+Kp+803ZGgzI2Nl2M6w8Kof730/RUk56PTGYIx4eLXd6iZSli
 RsYjw8JDbeue5OQo5FPmLCSQ/Kr5ZJXbgWVPyWkKg9aCcXLN5YSJIV3xcMTK9Q2I
 LqqODkyatnGc6YxGAKddS7Xzt1ntlZgbe5mndQw04a2g0Lf6emPH5r8b0UJXIu96
 advOBX0pEbad4FeFS6Mo5D+nNCaaNP4WzN7wgdb+BYNVw3ss4Ebam7+yY6Gexg6y
 bzZOEkk9saL4YeBDgyYICNu7kG4BRVKRQiiX220G6SFXM3nqbl7qBPb3kVFyDpcI
 RRuFJ0ZV0kFJ+3IQ4xVnIh6nootceRk/mvZaK5HhLhQLzklpZ8fj4HF3oBDUAnvN
 wNd+7GoZy7zldjCkbF4BP3GjUeW+b9ngrCNc+bFXi5cUbdECXAa2krjxyY+MlQF2
 veNVVcsoRdfeM0VjJh2/piGJxMWIlRqXdKzPKsfMWnlIaJ6YyslfbSq+2K7LxgGR
 Ho3Sjt0oUuPMZ9F/Mjj+XDqwmzgooUHXNyDBxhGXBNBPjApcRLcb2vQ2SrWEmeGJ
 vZmC2R1ZoGdBJg8a55BT
 =w5SP
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs fixes from Dave Chinner:
 "This is a little larger than I'd like late in the release cycle, but
  all the fixes are for regressions introduced in the 4.1-rc1 merge, or
  are needed back in -stable kernels fairly quickly as they are
  filesystem corruption or userspace visible correctness issues.

  Changes in this update:

   - regression fix for new rename whiteout code

   - regression fixes for new superblock generic per-cpu counter code

   - fix for incorrect error return sign introduced in 3.17

   - metadata corruption fixes that need to go back to -stable kernels"

* tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: fix broken i_nlink accounting for whiteout tmpfile inode
  xfs: xfs_iozero can return positive errno
  xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
  xfs: extent size hints can round up extents past MAXEXTLEN
  xfs: inode and free block counters need to use __percpu_counter_compare
  percpu_counter: batch size aware __percpu_counter_compare()
  xfs: use percpu_counter_read_positive for mp->m_icount
2015-05-29 16:45:45 -07:00
Al Viro 2159184ea0 d_walk() might skip too much
when we find that a child has died while we'd been trying to ascend,
we should go into the first live sibling itself, rather than its sibling.

Off-by-one in question had been introduced in "deal with deadlock in
d_walk()" and the fix needs to be backported to all branches this one
has been backported to.

Cc: stable@vger.kernel.org # 3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-28 23:45:30 -04:00
Bob Copeland 5a6b2b36a8 omfs: fix potential integer overflow in allocator
Both 'i' and 'bits_per_entry' are signed integers but the result is a
u64 block number.  Cast i to u64 to avoid truncation on 32-bit targets.

Found by Coverity (CID 200679).

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28 18:25:19 -07:00
Bob Copeland c0345ee57d omfs: fix sign confusion for bitmap loop counter
The count variable is used to iterate down to (below) zero from the size
of the bitmap and handle the one-filling the remainder of the last
partial bitmap block.  The loop conditional expects count to be signed
in order to detect when the final block is processed, after which count
goes negative.

Unfortunately, a recent change made this unsigned along with some other
related fields.  The result of is this is that during mount,
omfs_get_imap will overrun the bitmap array and corrupt memory unless
number of blocks happens to be a multiple of 8 * blocksize.

Fix by changing count back to signed: it is guaranteed to fit in an s32
without overflow due to an enforced limit on the number of blocks in the
filesystem.

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28 18:25:19 -07:00
Bob Copeland 3a281f9466 omfs: set error return when d_make_root() fails
A static checker found the following issue in the error path for
omfs_fill_super:

    fs/omfs/inode.c:552 omfs_fill_super()
    warn: missing error code here? 'd_make_root()' failed. 'ret' = '0'

Fix by returning -ENOMEM in this case.

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28 18:25:18 -07:00
Sasha Levin dcbff39da3 fs, omfs: add NULL terminator in the end up the token list
match_token() expects a NULL terminator at the end of the token list so
that it would know where to stop.  Not having one causes it to overrun
to invalid memory.

In practice, passing a mount option that omfs didn't recognize would
sometimes panic the system.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28 18:25:18 -07:00
Andrew Morton 2b1d3ae940 fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings
load_elf_binary() returns `retval', not `error'.

Fixes: a87938b2e2 ("fs/binfmt_elf.c: fix bug in loading of PIE binaries")
Reported-by: James Hogan <james.hogan@imgtec.com>
Cc: Michael Davidson <md@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28 18:25:18 -07:00
Brian Foster 22419ac9fe xfs: fix broken i_nlink accounting for whiteout tmpfile inode
XFS uses the internal tmpfile() infrastructure for the whiteout inode
used for RENAME_WHITEOUT operations. For tmpfile inodes, XFS allocates
the inode, drops di_nlink, adds the inode to the agi unlinked list,
calls d_tmpfile() which correspondingly drops i_nlink of the vfs inode,
and then finishes the common inode setup (e.g., clear I_NEW and unlock).

The d_tmpfile() call was originally made inxfs_create_tmpfile(), but was
pulled up out of that function as part of the following commit to
resolve a deadlock issue:

	330033d6 xfs: fix tmpfile/selinux deadlock and initialize security

As a result, callers of xfs_create_tmpfile() are responsible for either
calling d_tmpfile() or fixing up i_nlink appropriately. The whiteout
tmpfile allocation helper does neither. As a result, the vfs ->i_nlink
becomes inconsistent with the on-disk ->di_nlink once xfs_rename() links
it back into the source dentry and calls xfs_bumplink().

Update the assert in xfs_rename() to help detect this problem in the
future and update xfs_rename_alloc_whiteout() to decrement the link
count as part of the manual tmpfile inode setup.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 08:14:55 +10:00
Dave Chinner cddc116228 xfs: xfs_iozero can return positive errno
It was missed when we converted everything in XFs to use negative error
numbers, so fix it now. Bug introduced in 3.17 by commit 2451337 ("xfs: global
error sign conversion"), and should go back to stable kernels.

Thanks to Brian Foster for noticing it.

cc: <stable@vger.kernel.org> # 3.17, 3.18, 3.19, 4.0
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 07:40:32 +10:00
Dave Chinner 6dfe5a049f xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
xfs_attr_inactive() is supposed to clean up the attribute fork when
the inode is being freed. While it removes attribute fork extents,
it completely ignores attributes in local format, which means that
there can still be active attributes on the inode after
xfs_attr_inactive() has run.

This leads to problems with concurrent inode writeback - the in-core
inode attribute fork is removed without locking on the assumption
that nothing will be attempting to access the attribute fork after a
call to xfs_attr_inactive() because it isn't supposed to exist on
disk any more.

To fix this, make xfs_attr_inactive() completely remove all traces
of the attribute fork from the inode, regardless of it's state.
Further, also remove the in-core attribute fork structure safely so
that there is nothing further that needs to be done by callers to
clean up the attribute fork. This means we can remove the in-core
and on-disk attribute forks atomically.

Also, on error simply remove the in-memory attribute fork. There's
nothing that can be done with it once we have failed to remove the
on-disk attribute fork, so we may as well just blow it away here
anyway.

cc: <stable@vger.kernel.org> # 3.12 to 4.0
Reported-by: Waiman Long <waiman.long@hp.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 07:40:08 +10:00
Dave Chinner 6dea405eee xfs: extent size hints can round up extents past MAXEXTLEN
This results in BMBT corruption, as seen by this test:

# mkfs.xfs -f -d size=40051712b,agcount=4 /dev/vdc
....
# mount /dev/vdc /mnt/scratch
# xfs_io -ft -c "extsize 16m" -c "falloc 0 30g" -c "bmap -vp" /mnt/scratch/foo

which results in this failure on a debug kernel:

XFS: Assertion failed: (blockcount & xfs_mask64hi(64-BMBT_BLOCKCOUNT_BITLEN)) == 0, file: fs/xfs/libxfs/xfs_bmap_btree.c, line: 211
....
Call Trace:
 [<ffffffff814cf0ff>] xfs_bmbt_set_allf+0x8f/0x100
 [<ffffffff814cf18d>] xfs_bmbt_set_all+0x1d/0x20
 [<ffffffff814f2efe>] xfs_iext_insert+0x9e/0x120
 [<ffffffff814c7956>] ? xfs_bmap_add_extent_hole_real+0x1c6/0xc70
 [<ffffffff814c7956>] xfs_bmap_add_extent_hole_real+0x1c6/0xc70
 [<ffffffff814caaab>] xfs_bmapi_write+0x72b/0xed0
 [<ffffffff811c72ac>] ? kmem_cache_alloc+0x15c/0x170
 [<ffffffff814fe070>] xfs_alloc_file_space+0x160/0x400
 [<ffffffff81ddcc29>] ? down_write+0x29/0x60
 [<ffffffff815063eb>] xfs_file_fallocate+0x29b/0x310
 [<ffffffff811d2bc8>] ? __sb_start_write+0x58/0x120
 [<ffffffff811e3e18>] ? do_vfs_ioctl+0x318/0x570
 [<ffffffff811cd680>] vfs_fallocate+0x140/0x260
 [<ffffffff811ce6f8>] SyS_fallocate+0x48/0x80
 [<ffffffff81ddec09>] system_call_fastpath+0x12/0x17

The tracepoint that indicates the extent that triggered the assert
failure is:

xfs_iext_insert:   idx 0 offset 0 block 16777224 count 2097152 flag 1

Clearly indicating that the extent length is greater than MAXEXTLEN,
which is 2097151. A prior trace point shows the allocation was an
exact size match and that a length greater than MAXEXTLEN was asked
for:

xfs_alloc_size_done:  agno 1 agbno 8 minlen 2097152 maxlen 2097152
					    ^^^^^^^        ^^^^^^^

We don't see this problem with extent size hints through the IO path
because we can't do single IOs large enough to trigger MAXEXTLEN
allocation. fallocate(), OTOH, is not limited in it's allocation
sizes and so needs help here.

The issue is that the extent size hint alignment is rounding up the
extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking
into account extent size hints when calculating the maximum extent
length to allocate. xfs_bmapi_reserve_delalloc() is already doing
this, but direct extent allocation is not.

Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is
wrong, and it works only because delayed allocation extents are not
limited in size to MAXEXTLEN in the in-core extent tree. hence this
calculation does not work for direct allocation, and the delalloc
code needs fixing. This may, in fact be the underlying bug that
occassionally causes transaction overruns in delayed allocation
extent conversion, so now we know it's wrong we should fix it, too.
Many thanks to Brian Foster for finding this problem during review
of this patch.

Hence the fix, after much code reading, is to allow
xfs_bmap_extsize_align() to align partial extents when full
alignment would extend the alignment past MAXEXTLEN. We can safely
do this because all callers have higher layer allocation loops that
already handle short allocations, and so will simply run another
allocation to cover the remainder of the requested allocation range
that we ignored during alignment. The advantage of this approach is
that it also removes the need for callers to do anything other than
limit their requests to MAXEXTLEN - they don't really need to be
aware of extent size hints at all.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 07:40:06 +10:00
Dave Chinner 8c1903d308 xfs: inode and free block counters need to use __percpu_counter_compare
Because the counters use a custom batch size, the comparison
functions need to be aware of that batch size otherwise the
comparison does not work correctly. This leads to ASSERT failures
on generic/027 like this:

 XFS: Assertion failed: 0, file: fs/xfs/xfs_mount.c, line: 1099
 ------------[ cut here ]------------
....
 Call Trace:
  [<ffffffff81522a39>] xfs_mod_icount+0x99/0xc0
  [<ffffffff815285cb>] xfs_trans_unreserve_and_mod_sb+0x28b/0x5b0
  [<ffffffff8152f941>] xfs_log_commit_cil+0x321/0x580
  [<ffffffff81528e17>] xfs_trans_commit+0xb7/0x260
  [<ffffffff81503d4d>] xfs_bmap_finish+0xcd/0x1b0
  [<ffffffff8151da41>] xfs_inactive_ifree+0x1e1/0x250
  [<ffffffff8151dbe0>] xfs_inactive+0x130/0x200
  [<ffffffff81523a21>] xfs_fs_evict_inode+0x91/0xf0
  [<ffffffff811f3958>] evict+0xb8/0x190
  [<ffffffff811f433b>] iput+0x18b/0x1f0
  [<ffffffff811e8853>] do_unlinkat+0x1f3/0x320
  [<ffffffff811d548a>] ? filp_close+0x5a/0x80
  [<ffffffff811e999b>] SyS_unlinkat+0x1b/0x40
  [<ffffffff81e0892e>] system_call_fastpath+0x12/0x71

This is a regression introduced by commit 501ab32 ("xfs: use generic
percpu counters for inode counter").

This patch fixes the same problem for both the inode counter and the
free block counter in the superblocks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 07:39:34 +10:00
George Wang 74f9ce1cf2 xfs: use percpu_counter_read_positive for mp->m_icount
Function percpu_counter_read just return the current counter, which can be
negative. This will cause the checking of "allocated inode
counts <= m_maxicount" false positive. Use percpu_counter_read_positive can
solve this problem, and be consistent with the purpose to introduce percpu
mechanism to xfs.

Signed-off-by: George Wang <xuw2015@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 07:39:34 +10:00
Linus Torvalds de182468d1 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Back from SambaXP - now have 8 small CIFS bug fixes to merge"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix race condition on RFC1002_NEGATIVE_SESSION_RESPONSE
  Fix to convert SURROGATE PAIR
  cifs: potential missing check for posix_lock_file_wait
  Fix to check Unique id and FileType when client refer file directly.
  CIFS: remove an unneeded NULL check
  [cifs] fix null pointer check
  Fix that several functions handle incorrect value of mapchars
  cifs: Don't replace dentries for dfs mounts
2015-05-27 14:09:16 -07:00
Linus Torvalds 3cfd4ba7d3 Merge branch 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull two overlayfs fixes from Miklos Szeredi:
 "Overlayfs rmdir() failed to check for emptiness in one case; this was
  introduced in 4.0.  The other bug was there since day one: failure to
  mount if upper fs is full, which bit some OpenWRT folks"

* 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: mount read-only if workdir can't be created
  ovl: don't remove non-empty opaque directory
2015-05-27 09:47:57 -07:00
Linus Torvalds 7ce14f6ff2 Merge branch 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "I fixed up a regression from 4.0 where conversion between different
  raid levels would sometimes bail out without converting.

  Filipe tracked down a race where it was possible to double allocate
  chunks on the drive.

  Mark has a fix for fiemap.  All three will get bundled off for stable
  as well"

* 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix regression in raid level conversion
  Btrfs: fix racy system chunk allocation when setting block group ro
  btrfs: clear 'ret' in btrfs_check_shared() loop
2015-05-23 11:14:10 -07:00
Federico Sauter 4afe260bab CIFS: Fix race condition on RFC1002_NEGATIVE_SESSION_RESPONSE
This patch fixes a race condition that occurs when connecting
to a NT 3.51 host without specifying a NetBIOS name.
In that case a RFC1002_NEGATIVE_SESSION_RESPONSE is received
and the SMB negotiation is reattempted, but under some conditions
it leads SendReceive() to hang forever while waiting for srv_mutex.
This, in turn, sets the calling process to an uninterruptible sleep
state and makes it unkillable.

The solution is to unlock the srv_mutex acquired in the demux
thread *before* going to sleep (after the reconnect error) and
before reattempting the connection.
2015-05-20 13:25:55 -05:00
Nakajima Akira b29103076b Fix to convert SURROGATE PAIR
Garbled characters happen by using surrogate pair for filename.
  (replace each 1 character to ??)

[Steps to Reproduce for bug]
client# touch $(echo -e '\xf0\x9d\x9f\xa3')
client# touch $(echo -e '\xf0\x9d\x9f\xa4')
client# ls -li
  You see same inode number, same filename(=?? and ??) .

Fix the bug about these functions do not consider about surrogate pair (and IVS).
cifs_utf16_bytes()
cifs_mapchar()
cifs_from_utf16()
cifsConvertToUTF16()

Reported-by: Nakajima Akira <nakajima.akira@nttcom.co.jp>
Signed-off-by: Nakajima Akira <nakajima.akira@nttcom.co.jp>
Signed-off-by: Steve French <smfrench@gmail.com>
2015-05-20 13:12:51 -05:00
Chengyu Song 00b8c95b68 cifs: potential missing check for posix_lock_file_wait
posix_lock_file_wait may fail under certain circumstances, and its result is
usually checked/returned. But given the complexity of cifs, I'm not sure if
the result is intentially left unchecked and always expected to succeed.

Signed-off-by: Chengyu Song <csong84@gatech.edu>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2015-05-20 13:08:33 -05:00
Nakajima Akira 7196ac113a Fix to check Unique id and FileType when client refer file directly.
When you refer file directly on cifs client,
 (e.g. ls -li <filename>, cd <dir>, stat <filename>)
 the function return old inode number and filetype from old inode cache,
 though server has different inode number or filetype.

When server is Windows, cifs client has same problem.
When Server is Windows
, This patch fixes bug in different filetype,
  but does not fix bug in different inode number.
Because QUERY_PATH_INFO response by Windows does not include inode number(Index Number) .

BUG INFO
https://bugzilla.kernel.org/show_bug.cgi?id=90021
https://bugzilla.kernel.org/show_bug.cgi?id=90031

Reported-by: Nakajima Akira <nakajima.akira@nttcom.co.jp>
Signed-off-by: Nakajima Akira <nakajima.akira@nttcom.co.jp>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2015-05-20 13:05:25 -05:00
Chris Mason 153c35b6cc Btrfs: fix regression in raid level conversion
Commit 2f0810880f changed
btrfs_set_block_group_ro to avoid trying to allocate new chunks with the
new raid profile during conversion.  This fixed failures when there was
no space on the drive to allocate a new chunk, but the metadata
reserves were sufficient to continue the conversion.

But this ended up causing a regression when the drive had plenty of
space to allocate new chunks, mostly because reduce_alloc_profile isn't
using the new raid profile.

Fixing btrfs_reduce_alloc_profile is a bigger patch.  For now, do a
partial revert of 2f0810880, and don't error out if we hit ENOSPC.

Signed-off-by: Chris Mason <clm@fb.com>
Tested-by: Dave Sterba <dsterba@suse.cz>
Reported-by: Holger Hoffstaette <holger.hoffstaette@googlemail.com>
2015-05-20 11:03:38 -07:00
Dan Carpenter 65c3b205eb CIFS: remove an unneeded NULL check
Smatch complains because we dereference "ses->server" without checking
some lines earlier inside the call to get_next_mid(ses->server).

	fs/cifs/cifssmb.c:4921 CIFSGetDFSRefer()
	warn: variable dereferenced before check 'ses->server' (see line 4899)

There is only one caller for this function get_dfs_path() and it always
passes a non-null "ses->server" pointer so this NULL check can be
removed.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2015-05-20 11:36:16 -05:00
Steve French 1dc92c450a [cifs] fix null pointer check
Dan Carpenter pointed out an inconsistent null pointer check
in smb2_hdr_assemble that was pointed out by static checker.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Sachin Prabhu <sprabhu@redhat.com>
CC: Dan Carpenter <dan.carpenter@oracle.com>w
2015-05-20 09:32:21 -05:00
Filipe Manana a96295965b Btrfs: fix racy system chunk allocation when setting block group ro
If while setting a block group read-only we end up allocating a system
chunk, through check_system_chunk(), we were not doing it while holding
the chunk mutex which is a problem if a concurrent chunk allocation is
happening, through do_chunk_alloc(), as it means both block groups can
end up using the same logical addresses and physical regions in the
device(s). So make sure we hold the chunk mutex.

Cc: stable@vger.kernel.org  # 4.0+
Fixes: 2f0810880f ("btrfs: delete chunk allocation attemp when
                      setting block group ro")

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-05-19 18:04:17 -07:00
Mark Fasheh 2c2ed5aa01 btrfs: clear 'ret' in btrfs_check_shared() loop
btrfs_check_shared() is leaking a return value of '1' from
find_parent_nodes(). As a result, callers (in this case, extent_fiemap())
are told extents are shared when they are not. This in turn broke fiemap on
btrfs for kernels v3.18 and up.

The fix is simple - we just have to clear 'ret' after we are done processing
the results of find_parent_nodes().

It wasn't clear to me at first what was happening with return values in
btrfs_check_shared() and find_parent_nodes() - thanks to Josef for the help
on irc. I added documentation to both functions to make things more clear
for the next hacker who might come across them.

If we could queue this up for -stable too that would be great.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-05-19 18:04:17 -07:00
Linus Torvalds 1113cdfe7d NFS client bugfixes for Linux 4.1
Highlights include:
 
 - Fix a Linux-4.1 regression affecting stat()
 - Take an extra reference to fl->fl_file when running a setlk
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVW1vvAAoJEGcL54qWCgDyI+YP/iqjM0YV+ZNMoLTXvO2+N1tw
 Bwc5zYDTG1De2OQTup4Ed4+3duBwchMW3NohTZP+/f1Sbhl8ZF2nYI6bBEMUvZbP
 fECmYGRZebqwloJCdu95QIaEZ67bn3+fUSwfm+krJzi7Thzwcqj+DiMaDDJzgcr1
 j+Acd5WHrdTBEpx3yXqCPkwX3L71CYj3SO2eO7cimAX9JQrHz8IkQtkf1UsUqSGw
 Wsb2l6wOIGGn+2PLyvvLttO83lTp1WjP7F6wG+zYcJCTl/f/j5VPAFIfXdi/ZoOw
 9KUE8+bUvmnn2wBlHj8hlVodfRBxRq+X/e6yfy2roMvpzQKXc30pN/xKJOQqmT2i
 hn48hAFNTfo+dO0oPmbrgq28ooO/Xl7krQeJPpMRsOL51LNkjLovfBImYZcXqmxs
 THC/SnSVQyL6YbBfHPGCzu7iam8kxY2ivwfsrrTcg9Mja4EMwJ7+FW8ezn1TSDB8
 T4047eCiQQAAuxICSQr2v+967gjKtOqFESEq6he9EN8bKN2x6KJ7f8u9CUagA7SK
 /iaQVqXT7Iq9JjSOIXN1uWkzQJg/x35YyXBb5HRQstaxhDO1QBMPMAN091xnZiwz
 ZSOxMseRjjBHRjuxkMoZ7CIa0refRNHRlhEh/IBivbhYP6K0ra43hWMi8lHQMBIp
 dN6EpW/CmQzNOom+n82w
 =ChTV
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull two NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - fix a Linux-4.1 regression affecting stat()

   - take an extra reference to fl->fl_file when running a setlk"

* tag 'nfs-for-4.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: take extra reference to fl->fl_file when running a setlk
  nfs: stat(2) fails during cthon04 basic test5 on NFSv4.0
2015-05-19 11:20:48 -07:00
Miklos Szeredi cc6f67bcaf ovl: mount read-only if workdir can't be created
OpenWRT folks reported that overlayfs fails to mount if upper fs is full,
because workdir can't be created.  Wordir creation can fail for various
other reasons too.

There's no reason that the mount itself should fail, overlayfs can work
fine without a workdir, as long as the overlay isn't modified.

So mount it read-only and don't allow remounting read-write.

Add a couple of WARN_ON()s for the impossible case of workdir being used
despite being read-only.

Reported-by: Bastian Bittorf <bittorf@bluebottle.com> 
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org> # v3.18+
2015-05-19 14:30:12 +02:00
Linus Torvalds 92752b5cdd Merge branch 'for-linus-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML hostfs fix from Richard Weinberger:
 "This contains a single fix for a regression introduced in 4.1-rc1"

* 'for-linus-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  hostfs: Use correct mask for file mode
2015-05-16 16:33:59 -07:00
Linus Torvalds 6a8098a447 Fix a number of ext4 bugs; the most serious of which is a bug in the
lazytime mount optimization code where we could end up updating the
 timestamps to the wrong inode.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJVV1l1AAoJEPL5WVaVDYGjdPwH/RzNut4bfgq7yK2yUVNqPpPN
 QzjR848fT1lj7C1eN7eEh+NRG+KNM2QnmMJBU8jVnwq2l3r8AGFV/bDRC+Zx4U8L
 cz9mZJMU7ZDP5TH/WVyimySGAXpaFKruXA+3L8CyC3LQEI6TUOxKt5CqNi0/9nND
 B8HoF+Ei7jIILrcW7KKj55/fSfh4iiy+iUb0kjrSnZj0y5sROfFG2QhQwIhJRk7I
 /8aeg2HYbhWXCKQHnQ5F4lLNCf44kdJ/EoCpz6aOHtVwrnBcQ44yeqm5MtHSh6Qw
 lj8iPCIlcHYGZE4im+pWAavDMeHBm/VnOnH9545t6nNFq6W7WNdkD99ZJ/AQyWQ=
 =JJxO
 -----END PGP SIGNATURE-----

Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix a number of ext4 bugs; the most serious of which is a bug in the
  lazytime mount optimization code where we could end up updating the
  timestamps to the wrong inode"

* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix an ext3 collapse range regression in xfstests
  jbd2: fix r_count overflows leading to buffer overflow in journal recovery
  ext4: check for zero length extent explicitly
  ext4: fix NULL pointer dereference when journal restart fails
  ext4: remove unused function prototype from ext4.h
  ext4: don't save the error information if the block device is read-only
  ext4: fix lazytime optimization
2015-05-16 15:55:31 -07:00
Linus Torvalds c7309e88a6 Merge branch 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "The first commit is a fix from Filipe for a very old extent buffer
  reuse race that triggered a BUG_ON.  It hasn't come up often, I looked
  through old logs at FB and we hit it a handful of times over the last
  year.

  The rest are other corners he hit during testing"

* 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix race when reusing stale extent buffers that leads to BUG_ON
  Btrfs: fix race between block group creation and their cache writeout
  Btrfs: fix panic when starting bg cache writeout after IO error
  Btrfs: fix crash after inode cache writeback failure
2015-05-16 15:50:58 -07:00
Linus Torvalds 4b470f1208 Merge branch 'parisc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
 "One important patch which fixes crashes due to stack randomization on
  architectures where the stack grows upwards (currently parisc and
  metag only).

  This bug went unnoticed on parisc since kernel 3.14 where the flexible
  mmap memory layout support was added by commit 9dabf60dc4.  The
  changes in fs/exec.c are inside an #ifdef CONFIG_STACK_GROWSUP section
  and will not affect other platforms.

  The other two patches rename args of the kthread_arg() function and
  fixes a printk output"

* 'parisc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures
  parisc: copy_thread(): rename 'arg' argument to 'kthread_arg'
  parisc: %pf is only for function pointers
2015-05-15 13:06:06 -07:00
Theodore Ts'o b9576fc362 ext4: fix an ext3 collapse range regression in xfstests
The xfstests test suite assumes that an attempt to collapse range on
the range (0, 1) will return EOPNOTSUPP if the file system does not
support collapse range.  Commit 280227a75b56: "ext4: move check under
lock scope to close a race" broke this, and this caused xfstests to
fail when run when testing file systems that did not have the extents
feature enabled.

Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-05-15 00:24:10 -04:00
Vladimir Davydov 499611ed45 kernfs: do not account ino_ida allocations to memcg
root->ino_ida is used for kernfs inode number allocations. Since IDA has
a layered structure, different IDs can reside on the same layer, which
is currently accounted to some memory cgroup. The problem is that each
kmem cache of a memory cgroup has its own directory on sysfs (under
/sys/fs/kernel/<cache-name>/cgroup). If the inode number of such a
directory or any file in it gets allocated from a layer accounted to the
cgroup which the cache is created for, the cgroup will get pinned for
good, because one has to free all kmem allocations accounted to a cgroup
in order to release it and destroy all its kmem caches. That said we
must not account layers of ino_ida to any memory cgroup.

Since per net init operations may create new sysfs entries directly
(e.g. lo device) or indirectly (nf_conntrack creates a new kmem cache
per each namespace, which, in turn, creates new sysfs entries), an easy
way to reproduce this issue is by creating network namespace(s) from
inside a kmem-active memory cgroup.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>	[4.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-14 17:55:51 -07:00