Stop reusing dquots from the freelist when allocating new ones directly, and
implement a shrinker that actually follows the specifications for the
interface. The shrinker implementation is still highly suboptimal at this
point, but we can gradually work on it.
This also fixes an bug in the previous lock ordering, where we would take
the hash and dqlist locks inside of the freelist lock against the normal
lock ordering. This is only solvable by introducing the dispose list,
and thus not when using direct reclaim of unused dquots for new allocations.
As a side-effect the quota upper bound and used to free ratio values in
/proc/fs/xfs/xqm are set to 0 as these values don't make any sense in the
new world order.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
To ensure that we don't just reuse the bad delegation when we attempt to
recover the nfs4_state that received the bad stateid error.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
nsegs is read from userspace. Limit its value and avoid overflowing nsegs
* sizeof(__u64) in the subsequent call to memdup_user().
This patch complements 481fe17e97 ("nilfs2: potential integer overflow
in nilfs_ioctl_clean_segments()").
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Cc: Haogang Chen <haogangchen@gmail.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There were two places bio_get_nr_vecs() could overflow:
First, it did a left shift to convert from sectors to bytes immediately
before dividing by PAGE_SIZE. If PAGE_SIZE ever was less than 512 a great
many things would break, so dividing by PAGE_SIZE >> 9 is safe and will
generate smaller code too.
The nastier overflow was in the DIV_ROUND_UP() (that's what the code was
effectively doing, anyways). If n + d overflowed, the whole thing would
return 0 which breaks things rather effectively.
bio_get_nr_vecs() doesn't claim to give an exact value anyways, so the
DIV_ROUND_UP() is silly; we could do a straight divide except if a
device's queue_max_sectors was less than PAGE_SIZE we'd return 0. So we
just add 1; this should always be safe - things will break badly if
bio_get_nr_vecs() returns > BIO_MAX_PAGES (bio_alloc() will suddenly start
failing) but it's queue_max_segments that must guard against this, if
queue_max_sectors is preventing this from happen things are going to
explode on architectures with different PAGE_SIZE.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
standard_receive3 will check the validity of the response from the
server (via checkSMB). It'll pass the result of that check to handle_mid
which will dequeue it and mark it with a status of
MID_RESPONSE_MALFORMED if checkSMB returned an error. At that point,
standard_receive3 will also return an error, which will make the
demultiplex thread skip doing the callback for the mid.
This is wrong -- if we were able to identify the request and the
response is marked malformed, then we want the demultiplex thread to do
the callback. Fix this by making standard_receive3 return 0 in this
situation.
Cc: stable@vger.kernel.org
Reported-and-Tested-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Currently, it's always set to 0 (no oplock requested).
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
* git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix oops in session setup code for null user mounts
[CIFS] Update cifs Kconfig title to match removal of experimental dependency
cifs: fix printk format warnings
cifs: check offset in decode_ntlmssp_challenge()
cifs: NULL dereference on allocation failure
put_io_context() performed a complex trylock dancing to avoid
deferring ioc release to workqueue. It was also broken on UP because
trylock was always assumed to succeed which resulted in unbalanced
preemption count.
While there are ways to fix the UP breakage, even the most
pathological microbench (forced ioc allocation and tight fork/exit
loop) fails to show any appreciable performance benefit of the
optimization. Strip it out. If there turns out to be workloads which
are affected by this change, simpler optimization from the discussion
thread can be applied later.
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <1328514611.21268.66.camel@sli10-conroe>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Setting the task name is done within setup_new_exec() by accessing
bprm->filename. However this happens after flush_old_exec().
This may result in a use after free bug, flush_old_exec() may
"complete" vfork_done, which will wake up the parent which in turn
may free the passed in filename.
To fix this add a new tcomm field in struct linux_binprm which
contains the now early generated task name until it is used.
Fixes this bug on s390:
Unable to handle kernel pointer dereference at virtual kernel address 0000000039768000
Process kworker/u:3 (pid: 245, task: 000000003a3dc840, ksp: 0000000039453818)
Krnl PSW : 0704000180000000 0000000000282e94 (setup_new_exec+0xa0/0x374)
Call Trace:
([<0000000000282e2c>] setup_new_exec+0x38/0x374)
[<00000000002dd12e>] load_elf_binary+0x402/0x1bf4
[<0000000000280a42>] search_binary_handler+0x38e/0x5bc
[<0000000000282b6c>] do_execve_common+0x410/0x514
[<0000000000282cb6>] do_execve+0x46/0x58
[<00000000005bce58>] kernel_execve+0x28/0x70
[<000000000014ba2e>] ____call_usermodehelper+0x102/0x140
[<00000000005bc8da>] kernel_thread_starter+0x6/0xc
[<00000000005bc8d4>] kernel_thread_starter+0x0/0xc
Last Breaking-Event-Address:
[<00000000002830f0>] setup_new_exec+0x2fc/0x374
Kernel panic - not syncing: Fatal exception: panic_on_oops
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix breakage with MTD suspend caused by the API rework
- Fix a problem with resetting the MX28 BCH module
- A couple of other trivial fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iEYEABECAAYFAk8s6HsACgkQdwG7hYl686MIiACgxpNoUWFvq8z+2UGXxsLnNrio
hhcAn31H7TY3KUuIQBo4CqG2dEjNwpCw
=DRWp
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.3' of git://git.infradead.org/~dwmw2/mtd-3.3
- Fix a regression in 16-bit Atmel NAND flash which was introduced in 3.1
- Fix breakage with MTD suspend caused by the API rework
- Fix a problem with resetting the MX28 BCH module
- A couple of other trivial fixes
* tag 'for-linus-3.3-20120204' of git://git.infradead.org/~dwmw2/mtd-3.3:
Revert "mtd: atmel_nand: optimize read/write buffer functions"
mtd: fix MTD suspend
jffs2: do not initialize variable unnecessarily
mtd: gpmi-nand bugfix: reset the BCH module when it is not MX23
mtd: nand: fix typo in comment
Commit bf118a342f (NFSv4: include bitmap
in nfsv4 get acl data) introduces the 'acl_scratch' page for the case
where we may need to decode multi-page data. However it fails to take
into account the fact that the variable may be NULL (for the case where
we're not doing multi-page decode), and it also attaches it to the
encoding xdr_stream rather than the decoding one.
The immediate result is an Oops in nfs4_xdr_enc_getacl due to the
call to page_address() with a NULL page pointer.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: stable@vger.kernel.org
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: fix safety of rbd_put_client()
rbd: fix a memory leak in rbd_get_client()
ceph: create a new session lock to avoid lock inversion
ceph: fix length validation in parse_reply_info()
ceph: initialize client debugfs outside of monc->mutex
ceph: change "ceph.layout" xattr to be "ceph.file.layout"
For null user mounts, do not invoke string length function
during session setup.
Cc: <stable@kernel.org
Reported-and-Tested-by: Chris Clayton <chris2553@googlemail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
This fixes the race in process_vm_core found by Oleg (see
http://article.gmane.org/gmane.linux.kernel/1235667/
for details).
This has been updated since I last sent it as the creation of the new
mm_access() function did almost exactly the same thing as parts of the
previous version of this patch did.
In order to use mm_access() even when /proc isn't enabled, we move it to
kernel/fork.c where other related process mm access functions already
are.
Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lockdep was reporting a possible circular lock dependency in
dentry_lease_is_valid(). That function needs to sample the
session's s_cap_gen and and s_cap_ttl fields coherently, but needs
to do so while holding a dentry lock. The s_cap_lock field was
being used to protect the two fields, but that can't be taken while
holding a lock on a dentry within the session.
In most cases, the s_cap_gen and s_cap_ttl fields only get operated
on separately. But in three cases they need to be updated together.
Implement a new lock to protect the spots updating both fields
atomically is required.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
"len" is read from network and thus needs validation. Otherwise, given
a bogus "len" value, p+len could be an out-of-bounds pointer, which is
used in further parsing.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
The virtual extended attribute named "ceph.layout" is meaningful
only for regular files. Change its name to be "ceph.file.layout" to
more directly reflect that in the ceph xattr namespace. Preserve
the old "ceph.layout" name for the time being (until we decide it's
safe to get rid of it entirely).
Add a missing initializer for "readonly" in the terminating entry.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
Once /proc/pid/mem is opened, the memory can't be released until
mem_release() even if its owner exits.
Change mem_open() to do atomic_inc(mm_count) + mmput(), this only
pins mm_struct. Change mem_rw() to do atomic_inc_not_zero(mm_count)
before access_remote_vm(), this verifies that this mm is still alive.
I am not sure what should mem_rw() return if atomic_inc_not_zero()
fails. With this patch it returns zero to match the "mm == NULL" case,
may be it should return -EINVAL like it did before e268337d.
Perhaps it makes sense to add the additional fatal_signal_pending()
check into the main loop, to ensure we do not hold this memory if
the target task was oom-killed.
Cc: stable@kernel.org
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No functional changes, cleanup and preparation.
mem_read() and mem_write() are very similar. Move this code into the
new common helper, mem_rw(), which takes the additional "int write"
argument.
Cc: stable@kernel.org
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mem_release() can hit mm == NULL, add the necessary check.
Cc: stable@kernel.org
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes merge conflict resolution breakage introduced by merge
d3712b9dfc ("Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream").
The commit changed 'mtd_can_have_bb()' function and made it always
return zero, which is incorrect. Instead, we need it to return whether
the underlying flash device can have bad eraseblocks or not. UBI needs
this information because it affects how it handles the underlying flash.
E.g., if the underlying flash is NOR, it cannot have bad blocks and any
write or erase error is fatal, and all we can do is to switch to R/O
mode. We do not need to reserve a pool of good eraseblocks for bad
eraseblocks handling, and so on.
This patch also removes 'mtd_can_have_bb()' invocations from Logfs to
ensure correct Logfs behavior.
I've tested that with this patch UBI works on top of NOR and NAND
flashes emulated by mtdram and nandsim correspondingly.
This patch is based on patch from Linus Torvalds.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Jörn Engel <joern@logfs.org>
Acked-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bdi_prune_sb() resets sb->s_bdi to default_backing_dev_info when the
tearing down the original bdi. Fix trace_writeback_single_inode to
use sb->s_bdi=default_backing_dev_info rather than bdi->dev=NULL for a
teared down bdi.
Cc: <stable@kernel.org>
Reported-by: Rabin Vincent <rabin@rab.in>
Tested-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
btrfs_fallocate tries to allocate space only if ranges in the file don't
already exist. But the enospc checks it does are not allowed with
extents locked.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Removed the dependency on CONFIG_EXPERIMENTAL but forgot to update
the text description to be consistent.
Signed-off-by: Steve French <smfrench@gmail.com>
The kmem_realloc() in xfs is given KM_* memory allocation flags. And it
allocates memory using kmalloc() after they are converted to gfp_mask
flags. In xlog_recover_add_to_cont_trans(), 0u is passed to kmem_realloc(),
instead of them. I guess it is preferred to use them, and here memory must
be allocated but don't have to be done with GFP_ATOMIC. So, this patch
changes it to KM_SLEEP.
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
There are few important bug fixes for LogFS
Shortlog:
Joern Engel (5):
logfs: Prevent memory corruption
logfs: remove useless BUG_ON
logfs: Free areas before calling generic_shutdown_super()
logfs: Grow inode in delete path
Logfs: Allow NULL block_isbad() methods
Prasad Joshi (5):
logfs: update page reference count for pined pages
logfs: take write mutex lock during fsync and sync
logfs: set superblock shutdown flag after generic sb shutdown
logfs: Propagate page parameter to __logfs_write_inode
MAINTAINERS: Add Prasad Joshi in LogFS maintiners
Diffstat:
MAINTAINERS | 1 +
fs/logfs/dev_mtd.c | 26 +++++++++++-------------
fs/logfs/dir.c | 2 +-
fs/logfs/file.c | 2 +
fs/logfs/gc.c | 2 +-
fs/logfs/inode.c | 4 ++-
fs/logfs/journal.c | 1 -
fs/logfs/logfs.h | 5 +++-
fs/logfs/readwrite.c | 51 +++++++++++++++++++++++++++++++++----------------
fs/logfs/segment.c | 51 ++++++++++++++++++++++++++++++++++++++-----------
fs/logfs/super.c | 3 +-
11 files changed, 99 insertions(+), 49 deletions(-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPKByhAAoJEDFA/f+3K+ZNQSUP/3gACcIwcsl+FnXPWBtz9XIG
g0DjXoRDd/sR0u25nLgjCVdBJgx5FVEyA+PvLgvvUU2KCAsqI5F/EQ+fLJs21YEN
TzepBO5aHtFZbNEjo6WiXOlDbBePTtk44WrN6jqoCHM/aDeT4Wof3NZBmHWNN1PX
B2RtEZ0ypJ7/b1OY2LUNcQfTaJXNgVoP8Hkx4KGY5LUVxVrBXxvDTU7YbkS8a+ys
1Yje/EQ4XD4RyZB42TmFEuTenvGPRgMGVFdnkJKuON8EmJQ8Hc61jEf5d7Q8sWef
dH5F/ptoAaR9a9LbbO8LoYuBZ8MR8848NPsrNPpr/gWntj46Z79yII8Jr7YoSDyw
zq5G2dZbwlbVrtVWKGae47THkNB8bljR/g4cijvPAkvuIAku6mg+dgjVHAhZ/t+J
xu8+Gy2sWHUH2gmoSXuoNyppOvYpPIRd5RB16PizMH3bw+sMad2K8/rfOKnmF1/r
HTM2jZ5bDcHVDjSuVI6u2m/mQX+PmPXUTffreaFXuSI75YpT0dqN3nponTX4EgFI
Ad9ZBQvdg8w1LGDsNxIAaqrGx4Q87RxqfUV4W/wo6N8gKsp+I2y4GtYMeD/CEKyi
wncKg10YwoMXZj7cBAkWgPlgrOBYCPwYZc/1DVRHvqrHo/m13SJrWDKkNKVvoXzH
2y4Tfi5w1WDRUT7yeoyK
=TA1A
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream
There are few important bug fixes for LogFS
* tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream:
Logfs: Allow NULL block_isbad() methods
logfs: Grow inode in delete path
logfs: Free areas before calling generic_shutdown_super()
logfs: remove useless BUG_ON
MAINTAINERS: Add Prasad Joshi in LogFS maintiners
logfs: Propagate page parameter to __logfs_write_inode
logfs: set superblock shutdown flag after generic sb shutdown
logfs: take write mutex lock during fsync and sync
logfs: Prevent memory corruption
logfs: update page reference count for pined pages
Fix up conflict in fs/logfs/dev_mtd.c due to semantic change in what
"mtd->block_isbad" means in commit f2933e86ad93: "Logfs: Allow NULL
block_isbad() methods" clashing with the abstraction changes in the
commits 7086c19d0742: "mtd: introduce mtd_block_isbad interface" and
d58b27ed58a3: "logfs: do not use 'mtd->block_isbad' directly".
This resolution takes the semantics from commit f2933e86ad, and just
makes mtd_block_isbad() return zero (false) if the 'block_isbad'
function is NULL. But that also means that now "mtd_can_have_bb()"
always returns 0.
Now, "mtd_block_markbad()" will obviously return an error if the
low-level driver doesn't support bad blocks, so this is somewhat
non-symmetric, but it actually makes sense if a NULL "block_isbad"
function is considered to mean "I assume that all my blocks are always
good".
Fix printk format warnings for ssize_t variables:
fs/cifs/connect.c:2145:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'
fs/cifs/connect.c:2152:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'
fs/cifs/connect.c:2160:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'
fs/cifs/connect.c:2170:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: linux-cifs@vger.kernel.org
We should check that we're not copying memory from beyond the end of the
blob.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
It contains the removal of the sysdev code, now that all users of it are
gone, as well as some sysfs bugfixes that have been reported by users.
There are also some documentation updates here as well.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk8jKW4ACgkQMUfUDdst+ynAUwCfVWwHJxpb4DSSMVZhGOnHMQrL
ZjIAn00gPeSs5u8y1nPvFrFikbon4FDs
=bzVy
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Here are some patches for the 3.3-rc1 tree.
It contains the removal of the sysdev code, now that all users of it are
gone, as well as some sysfs bugfixes that have been reported by users.
There are also some documentation updates here as well.
* tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: Complain bitterly about attempts to remove files from nonexistent directories.
stable: update documentation to ask for kernel version
base/core.c:fix typo in comment in function device_add
Documentation: devres: add allocation functions to list of supported calls
Documentation update for the driver model core
kernel-doc: fix new warnings in driver-core
kernel-doc: fix new warnings in debugfs
kernel-doc: fix new warnings in device.h
driver core: remove drivers/base/sys.c and include/linux/sysdev.h
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix reservations in btrfs_page_mkwrite
Btrfs: advance window_start if we're using a bitmap
btrfs: mask out gfp flags in releasepage
Btrfs: fix enospc error caused by wrong checks of the chunk
Btrfs: do not defrag a file partially
Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c
Btrfs: use cluster->window_start when allocating from a cluster bitmap
Btrfs: Check for NULL page in extent_range_uptodate
btrfs: Fix busyloops in transaction waiting code
Btrfs: make sure a bitmap has enough bytes
Btrfs: fix uninit warning in backref.c
Can be necessary if an inode gets deleted (through -ENOSPC) before being
written. Might be better to move this into logfs_write_rec(), but for
now go with the stupid&safe patch.
Signed-off-by: Joern Engel <joern@logfs.org>
This is a bad one. I wonder whether we were so far protected by
no_free_segments(sb) usually being smaller than LOGFS_NO_AREAS.
Found by Dan Carpenter <dan.carpenter@oracle.com> using smatch.
Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
LogFS sets PG_private flag to indicate a pined page. We assumed that
marking a page as private is enough to ensure its existence. But
instead it is necessary to hold a reference count to the page.
The change resolves the following BUG
BUG: Bad page state in process flush-253:16 pfn:6a6d0
page flags: 0x100000000000808(uptodate|private)
Suggested-and-Acked-by: Joern Engel <joern@logfs.org>
Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
Josef fixed btrfs_page_mkwrite to properly release reserved
extents if there was an error. But if we fail to get a reservation
and we fail to dirty the inode (for ENOSPC reasons), we'll end up
trying to release a reservation we never had.
This makes sure we only release if we were able to reserve.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
If we span a long area in a bitmap we could end up taking a lot of time
searching to the next free area if we're searching from the original
window_start, so advance window_start in order to make sure we don't do any
superficial searching. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btree_releasepage is a callback and can be passed unknown gfp flags and then
they may end up in kmem_cache_alloc called from alloc_extent_state, slab
allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.
This may happen when btrfs is mounted from a loop device, which masks out
__GFP_IO flag. The check in try_release_extent_state
3399 if ((mask & GFP_NOFS) == GFP_NOFS)
3400 mask = GFP_NOFS;
will not work and passes unfiltered flags further resulting in crash at
mm/slab.c:2963
[<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8
[<000000000024c810>] kmem_cache_alloc+0x204/0x294
[<00000000001fd3c2>] mempool_alloc+0x52/0x170
[<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs]
[<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs]
[<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs]
[<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs]
[<0000000000210d84>] shrink_page_list+0x6a0/0x724
[<0000000000211394>] shrink_inactive_list+0x230/0x578
[<0000000000211bb8>] shrink_list+0x6c/0x120
[<0000000000211e4e>] shrink_zone+0x1e2/0x228
[<0000000000211f24>] shrink_zones+0x90/0x254
[<0000000000213410>] do_try_to_free_pages+0xac/0x420
[<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0
[<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8
[<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When we did sysbench test for inline files, enospc error happened easily though
there was lots of free disk space which could be allocated for new chunks.
Reproduce steps:
# mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition>
# mount <test partition> /mnt
# ulimit -n 102400
# cd /mnt
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr prepare
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr run
<soon later, BUG_ON() was triggered by enospc error>
The reason of this bug is:
Now, we can reserve space which is larger than the free space in the chunks if
we have enough free disk space which can be used for new chunks. By this way,
the space allocator should allocate a new chunk by force if there is no free
space in the free space cache. But there are two wrong checks which break this
operation.
One is
if (ret == -ENOSPC && num_bytes > min_alloc_size)
in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
even we fail to allocate free space by minimum allocable size.
The other is
if (space_info->force_alloc)
force = space_info->force_alloc;
in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.
Fix these two wrong checks. Especially the second one, we fix it by changing
the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
higher priority. And if the value which is passed in by the caller is greater
than ->force_alloc, use the passed value.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
xfstests 218 complains that btrfs defrags a file partially:
After: 1
Write backwards sync, but contiguous - should defrag to 1 extent
Before: 10
-After: 1
+After: 2
To fix this, we need to set max_to_defrag count properly.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
There have been 4 warnings on 32-bit build, they are herewith fixed.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We specifically set window_start in the cluster struct to indicate where the
cluster starts in a bitmap, but we've been using min_start to indicate where
we're searching from. This is usually the start of the blockgroup, so
essentially means we're constantly searching from the start of any bitmap we
find, which completely negates all the trouble we go to in order to setup a
cluster. So start using window_start to make sure we actually use the area we
found. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
A user has encountered a NULL pointer kernel oops in btrfs when
encountering media errors. The problem has been identified
as an unhandled NULL pointer returned from find_get_page().
This modification simply checks for a NULL page, and returns
with an error if found (the extent_range_uptodate() function
returns 1 on errors).
After testing this patch, the user reported that the error with
the NULL pointer oops was solved. However, there is still a
remaining problem with a thread becoming stuck in
wait_on_page_locked(page) in the read_extent_buffer_pages(...)
function in extent_io.c
for (i = start_i; i < num_pages; i++) {
page = extent_buffer_page(eb, i);
wait_on_page_locked(page);
if (!PageUptodate(page))
ret = -EIO;
}
This patch leaves the issue with the locked page yet to be resolved.
Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
wait_log_commit() and wait_for_writer() were using slightly different
conditions for deciding whether they should call schedule() and whether they
should continue in the wait loop. Thus it could happen that we busylooped when
the first condition was not true while the second one was. That is burning CPU
cycles needlessly and is deadly on UP machines...
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We have only been checking for min_bytes available in bitmap entries, but we
won't successfully setup a bitmap cluster unless it has at least bytes in the
bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so
if there are a bunch of bitmap entries with less than 2mb's in them, we'll
search all them anyway, which is suboptimal. Fix this check. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Added initialization with the declaration of ret. It isn't set later on the
switch-default branch (which should never be taken).
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Quoth Ben Myers:
"Please pull in the following bugfix for xfs. We forgot to drop a lock on
error in xfs_readlink. It hasn't been through -next yet, but there is no
-next tree tomorrow. The fix is clear so I'm sending this request today."
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()
The data encryption was moved from ecryptfs_write_end into
ecryptfs_writepage, this patch moves the corresponding function
comments to be consistent with the modification.
Signed-off-by: Li Wang <liwang@nudt.edu.cn>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Says Tyler:
"Tim's logging message update will be really helpful to users when
they're trying to locate a problematic file in the lower filesystem
with filename encryption enabled.
You'll recognize the fix from Li, as you commented on that.
You should also be familiar with my setattr/truncate improvements,
since you were the one that pointed them out to us (thanks again!).
Andrew noted the /dev/ecryptfs write count sanitization needed to be
improved, so I've got a fix in there for that along with some other
less important cleanups of the /dev/ecryptfs read/write code."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Fix oops when printing debug info in extent crypto functions
eCryptfs: Remove unused ecryptfs_read()
eCryptfs: Check inode changes in setattr
eCryptfs: Make truncate path killable
eCryptfs: Infinite loop due to overflow in ecryptfs_write()
eCryptfs: Replace miscdev read/write magic numbers
eCryptfs: Report errors in writes to /dev/ecryptfs
eCryptfs: Sanitize write counts of /dev/ecryptfs
ecryptfs: Remove unnecessary variable initialization
ecryptfs: Improve metadata read failure logging
MAINTAINERS: Update eCryptfs maintainer address
If pages passed to the eCryptfs extent-based crypto functions are not
mapped and the module parameter ecryptfs_verbosity=1 was specified at
loading time, a NULL pointer dereference will occur.
Note that this wouldn't happen on a production system, as you wouldn't
pass ecryptfs_verbosity=1 on a production system. It leaks private
information to the system logs and is for debugging only.
The debugging info printed in these messages is no longer very useful
and rather than doing a kmap() in these debugging paths, it will be
better to simply remove the debugging paths completely.
https://launchpad.net/bugs/913651
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Daniel DeFreez
Cc: <stable@vger.kernel.org>
ecryptfs_read() has been ifdef'ed out for years now and it was
apparently unused before then. It is time to get rid of it for good.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Most filesystems call inode_change_ok() very early in ->setattr(), but
eCryptfs didn't call it at all. It allowed the lower filesystem to make
the call in its ->setattr() function. Then, eCryptfs would copy the
appropriate inode attributes from the lower inode to the eCryptfs inode.
This patch changes that and actually calls inode_change_ok() on the
eCryptfs inode, fairly early in ecryptfs_setattr(). Ideally, the call
would happen earlier in ecryptfs_setattr(), but there are some possible
inode initialization steps that must happen first.
Since the call was already being made on the lower inode, the change in
functionality should be minimal, except for the case of a file extending
truncate call. In that case, inode_newsize_ok() was never being
called on the eCryptfs inode. Rather than inode_newsize_ok() catching
maximum file size errors early on, eCryptfs would encrypt zeroed pages
and write them to the lower filesystem until the lower filesystem's
write path caught the error in generic_write_checks(). This patch
introduces a new function, called ecryptfs_inode_newsize_ok(), which
checks if the new lower file size is within the appropriate limits when
the truncate operation will be growing the lower file.
In summary this change prevents eCryptfs truncate operations (and the
resulting page encryptions), which would exceed the lower filesystem
limits or FSIZE rlimits, from ever starting.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Li Wang <liwang@nudt.edu.cn>
Cc: <stable@vger.kernel.org>
ecryptfs_write() handles the truncation of eCryptfs inodes. It grabs a
page, zeroes out the appropriate portions, and then encrypts the page
before writing it to the lower filesystem. It was unkillable and due to
the lack of sparse file support could result in tying up a large portion
of system resources, while encrypting pages of zeros, with no way for
the truncate operation to be stopped from userspace.
This patch adds the ability for ecryptfs_write() to detect a pending
fatal signal and return as gracefully as possible. The intent is to
leave the lower file in a useable state, while still allowing a user to
break out of the encryption loop. If a pending fatal signal is detected,
the eCryptfs inode size is updated to reflect the modified inode size
and then -EINTR is returned.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: <stable@vger.kernel.org>
ecryptfs_write() can enter an infinite loop when truncating a file to a
size larger than 4G. This only happens on architectures where size_t is
represented by 32 bits.
This was caused by a size_t overflow due to it incorrectly being used to
store the result of a calculation which uses potentially large values of
type loff_t.
[tyhicks@canonical.com: rewrite subject and commit message]
Signed-off-by: Li Wang <liwang@nudt.edu.cn>
Signed-off-by: Yunchuan Wen <wenyunchuan@kylinos.com.cn>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
ecryptfs_miscdev_read() and ecryptfs_miscdev_write() contained many
magic numbers for specifying packet header field sizes and offsets. This
patch defines those values and replaces the magic values.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Errors in writes to /dev/ecryptfs were being incorrectly reported by
returning 0 or the value of the original write count.
This patch clears up the return code assignment in error paths.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
A malicious count value specified when writing to /dev/ecryptfs may
result in a a very large kernel memory allocation.
This patch peeks at the specified packet payload size, adds that to the
size of the packet headers and compares the result with the write count
value. The resulting maximum memory allocation size is approximately 532
bytes.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Cc: <stable@vger.kernel.org>
Removes unneeded variable initialization in ecryptfs_read_metadata(). Also adds
a small comment to help explain metadata reading logic.
[tyhicks@canonical.com: Pulled out of for-stable patch and wrote commit msg]
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Print inode on metadata read failure. The only real
way of dealing with metadata read failures is to delete
the underlying file system file. Having the inode
allows one to 'find . -inum INODE`.
[tyhicks@canonical.com: Removed some minor not-for-stable parts]
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted
symplink and bailed out. Fix it by jumping to 'out' instead of doing return.
CC: stable@kernel.org
CC: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Alex Elder <elder@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Pass information that quota is stored in system file to userspace
ext2: protect inode changes in the SETVERSION and SETFLAGS ioctls
jbd: Issue cache flush after checkpointing
Recently an OOPS was observed from the usb serial io_ti driver when it tried to remove
sysfs directories. Upon investigation it turns out this driver was always buggy
and that a recent sysfs change had stopped guarding itself against removing attributes
from sysfs directories that had already been removed. :(
Historically we have been silent about attempting to files from nonexistent sysfs
directories and have politely returned error codes. That has resulted in people writing
broken code that ignores the error codes.
Issue a kernel WARNING and a stack backtrace to make it clear in no uncertain
terms that abusing sysfs is not ok, and the callers need to fix their code.
This change transforms the io_ti OOPS into a more comprehensible error message
and stack backtrace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Wolfgang Frisch <wfpub@roembden.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix new kernel-doc warnings:
Warning(fs/debugfs/file.c:556): No description found for parameter 'nregs'
Warning(fs/debugfs/file.c:556): Excess function parameter 'mregs' description in 'debugfs_print_regs32'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We should just return directly here, the goto causes a NULL dereference.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
The usual kernel-doc fixups from Randy. Some of them David acked as
merged in his tree, this is the random left-overs.
* kernel-doc:
docbook: fix sched source file names in device-drivers book
docbook: change iomap source filename in deviceiobook
docbook: don't use serial_core.h in device-drivers book
kernel-doc: fix kernel-doc warnings in sched
kernel-doc: fix new warnings in cfg80211.h
kernel-doc: fix new warning in usb.h
kernel-doc: fix new warnings in device.h
kernel-doc: fix new warnings in debugfs
kernel-doc: fix new warning in regulator core
kernel-doc: fix new warnings in pci
kernel-doc: fix new warnings in driver-core
kernel-doc: fix new warnings in auditsc.c
scripts/kernel-doc: fix fatal error caused by cfg80211.h
Quoth Andrew:
"Random fixes. And a simple new LED driver which I'm trying to sneak
in while you're not looking."
Sneaking successful.
* akpm:
score: fix off-by-one index into syscall table
mm: fix rss count leakage during migration
SHM_UNLOCK: fix Unevictable pages stranded after swap
SHM_UNLOCK: fix long unpreemptible section
kdump: define KEXEC_NOTE_BYTES arch specific for s390x
mm/hugetlb.c: undo change to page mapcount in fault handler
mm: memcg: update the correct soft limit tree during migration
proc: clear_refs: do not clear reserved pages
drivers/video/backlight/l4f00242t03.c: return proper error in l4f00242t03_probe if regulator_get() fails
drivers/video/backlight/adp88x0_bl.c: fix bit testing logic
kprobes: initialize before using a hlist
ipc/mqueue: simplify reading msgqueue limit
leds: add led driver for Bachmann's ot200
mm: __count_immobile_pages(): make sure the node is online
mm: fix NULL ptr dereference in __count_immobile_pages
mm: fix warnings regarding enum migrate_mode
* git://git.samba.org/sfrench/cifs-2.6:
CIFS: Rename *UCS* functions to *UTF16*
[CIFS] ACL and FSCACHE support no longer EXPERIMENTAL
[CIFS] Fix build break with multiuser patch when LANMAN disabled
cifs: warn about impending deprecation of legacy MultiuserMount code
cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts
cifs: sanitize username handling
keys: add a "logon" key type
cifs: lower default wsize when unix extensions are not used
cifs: better instrumentation for coalesce_t2
cifs: integer overflow in parse_dacl()
cifs: Fix sparse warning when calling cifs_strtoUCS
CIFS: Add descriptions to the brlock cache functions
Fix new kernel-doc warnings:
Warning(fs/debugfs/file.c:556): No description found for parameter 'nregs'
Warning(fs/debugfs/file.c:556): Excess function parameter 'mregs' description in 'debugfs_print_regs32'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
/proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for
pages and corresponding page table entries of the task with PID pid, which
includes any special mappings inserted into the page tables in order to
provide things like vDSOs and user helper functions.
On ARM this causes a problem because the vectors page is mapped as a
global mapping and since ec706dab ("ARM: add a vma entry for the user
accessible vector page"), a VMA is also inserted into each task for this
page to aid unwinding through signals and syscall restarts. Since the
vectors page is required for handling faults, clearing the YOUNG bit (and
subsequently writing a faulting pte) means that we lose the vectors page
*globally* and cannot fault it back in. This results in a system deadlock
on the next exception.
To see this problem in action, just run:
$ echo 1 > /proc/self/clear_refs
on an ARM platform (as any user) and watch your system hang. I think this
has been the case since 2.6.37
This patch avoids clearing the aforementioned bits for reserved pages,
therefore leaving the vectors page intact on ARM. Since reserved pages
are not candidates for swap, this change should not have any impact on the
usefulness of clear_refs.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Reported-by: Moussa Ba <moussaba@micron.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: <stable@vger.kernel.org> [2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/accounting, proc: Fix /proc/stat interrupts sum
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT
x86/kprobes: Add arch/x86/tools/insn_sanity to .gitignore
x86/kprobes: Fix typo transferred from Intel manual
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, syscall: Need __ARCH_WANT_SYS_IPC for 32 bits
x86, tsc: Fix SMI induced variation in quick_pit_calibrate()
x86, opcode: ANDN and Group 17 in x86-opcode-map.txt
x86/kconfig: Move the ZONE_DMA entry under a menu
x86/UV2: Add accounting for BAU strong nacks
x86/UV2: Ack BAU interrupt earlier
x86/UV2: Remove stale no-resources test for UV2 BAU
x86/UV2: Work around BAU bug
x86/UV2: Fix BAU destination timeout initialization
x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode
x86: Get rid of dubious one-bit signed bitfield
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
qnx4: don't leak ->BitMap on late failure exits
qnx4: reduce the insane nesting in qnx4_checkroot()
qnx4: di_fname is an array, for crying out loud...
vfs: remove printk from set_nlink()
wake up s_wait_unfrozen when ->freeze_fs fails
to reflect the unicode encoding used by CIFS protocol.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@samba.org>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
CIFS ACL support and FSCACHE support have been in long enough
to be no longer considered experimental. Remove obsolete Kconfig
dependency.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
We'll allow a grace period of 2 releases (3.3 and 3.4) and then remove
the legacy code in 3.5.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Fix up multiuser mounts to set the secType and set the username and
password from the key payload in the vol info for non-krb5 auth types.
Look for a key of type "secret" with a description of
"cifs🅰️<server address>" or "cifs:d:<domainname>". If that's found,
then scrape the username and password out of the key payload and use
that to create a new user session.
Finally, don't have the code enforce krb5 auth on multiuser mounts,
but do require a kernel with keys support.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Currently, it's not very clear whether you're allowed to have a NULL
vol->username or ses->user_name. Some places check for it and some don't.
Make it clear that a NULL pointer is OK in these fields, and ensure that
all the callers check for that.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
We've had some reports of servers (namely, the Solaris in-kernel CIFS
server) that don't deal properly with writes that are "too large" even
though they set CAP_LARGE_WRITE_ANDX. Change the default to better
mirror what windows clients do.
Cc: stable@vger.kernel.org
Cc: Pavel Shilovsky <piastry@etersoft.ru>
Reported-by: Nick Davis <phireph0x@yahoo.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
When coalesce_t2 returns an error, have it throw a cFYI message that
explains the reason. Also rename some variables to clarify what they
represent.
Reported-and-Tested-by: Konstantinos Skarlatos <k.skarlatos@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits)
audit: no leading space in audit_log_d_path prefix
audit: treat s_id as an untrusted string
audit: fix signedness bug in audit_log_execve_info()
audit: comparison on interprocess fields
audit: implement all object interfield comparisons
audit: allow interfield comparison between gid and ogid
audit: complex interfield comparison helper
audit: allow interfield comparison in audit rules
Kernel: Audit Support For The ARM Platform
audit: do not call audit_getname on error
audit: only allow tasks to set their loginuid if it is -1
audit: remove task argument to audit_set_loginuid
audit: allow audit matching on inode gid
audit: allow matching on obj_uid
audit: remove audit_finish_fork as it can't be called
audit: reject entry,always rules
audit: inline audit_free to simplify the look of generic code
audit: drop audit_set_macxattr as it doesn't do anything
audit: inline checks for not needing to collect aux records
audit: drop some potentially inadvisable likely notations
...
Use evil merge to fix up grammar mistakes in Kconfig file.
Bad speling and horrible grammar (and copious swearing) is to be
expected, but let's keep it to commit messages and comments, rather than
expose it to users in config help texts or printouts.
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: cleanup xfs_file_aio_write
xfs: always return with the iolock held from xfs_file_aio_write_checks
xfs: remove the i_new_size field in struct xfs_inode
xfs: remove the i_size field in struct xfs_inode
xfs: replace i_pin_wait with a bit waitqueue
xfs: replace i_flock with a sleeping bitlock
xfs: make i_flags an unsigned long
xfs: remove the if_ext_max field in struct xfs_ifork
xfs: remove the unused dm_attrs structure
xfs: cleanup xfs_iomap_eof_align_last_fsb
xfs: remove xfs_itruncate_data
* 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
btrfs: take allocation of ->tree_root into open_ctree()
btrfs: let ->s_fs_info point to fs_info, not root...
btrfs: consolidate failure exits in btrfs_mount() a bit
btrfs: make free_fs_info() call ->kill_sb() unconditional
btrfs: merge free_fs_info() calls on fill_super failures
btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super()
btrfs: make open_ctree() return int
btrfs: sanitizing ->fs_info, part 5
btrfs: sanitizing ->fs_info, part 4
btrfs: sanitizing ->fs_info, part 3
btrfs: sanitizing ->fs_info, part 2
btrfs: sanitizing ->fs_info, part 1
btrfs: fix a deadlock in btrfs_scan_one_device()
btrfs: fix mount/umount race
btrfs: get ->kill_sb() of its own
btrfs: preparation to fixing mount/umount race
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
Btrfs: use larger system chunks
Btrfs: add a delalloc mutex to inodes for delalloc reservations
Btrfs: space leak tracepoints
Btrfs: protect orphan block rsv with spin_lock
Btrfs: add allocator tracepoints
Btrfs: don't call btrfs_throttle in file write
Btrfs: release space on error in page_mkwrite
Btrfs: fix btrfsck error 400 when truncating a compressed
Btrfs: do not use btrfs_end_transaction_throttle everywhere
Btrfs: add balance progress reporting
Btrfs: allow for resuming restriper after it was paused
Btrfs: allow for canceling restriper
Btrfs: allow for pausing restriper
Btrfs: add skip_balance mount option
Btrfs: recover balance on mount
Btrfs: save balance parameters to disk
Btrfs: soft profile changing mode (aka soft convert)
Btrfs: implement online profile changing
Btrfs: do not reduce profile in do_chunk_alloc()
Btrfs: virtual address space subset filter
...
Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new
mnt_drop_write_file() helper.
Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.
This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open. That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.
That is different from our previous behavior, but much simpler. If
somebody actually finds a load where this matters, we'll need to revert
this commit.
I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve. So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.
Reported-by: Jüri Aedla <asd@ut.ee>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dd slept infinitely when fsfeeze failed because of EIO.
To fix this problem, if ->freeze_fs fails, freeze_super() wakes up
the tasks waiting for the filesystem to become unfrozen.
When s_frozen isn't SB_UNFROZEN in __generic_file_aio_write(),
the function sleeps until FITHAW ioctl wakes up s_wait_unfrozen.
However, if ->freeze_fs fails, s_frozen is set to SB_UNFROZEN and then
freeze_super() returns an error number. In this case, FITHAW ioctl returns
EINVAL because s_frozen is already SB_UNFROZEN. There is no way to wake up
s_wait_unfrozen, so __generic_file_aio_write() sleeps infinitely.
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Just a code cleanup really. We don't need to make a function call just for
it to return on error. This also makes the VFS function even easier to follow
and removes a conditional on a hot path.
Signed-off-by: Eric Paris <eparis@redhat.com>
At the moment we allow tasks to set their loginuid if they have
CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they
log in and it be impossible to ever reset. We had to make it mutable even
after it was once set (with the CAP) because on update and admin might have
to restart sshd. Now sshd would get his loginuid and the next user which
logged in using ssh would not be able to set his loginuid.
Systemd has changed how userspace works and allowed us to make the kernel
work the way it should. With systemd users (even admins) are not supposed
to restart services directly. The system will restart the service for
them. Thus since systemd is going to loginuid==-1, sshd would get -1, and
sshd would be allowed to set a new loginuid without special permissions.
If an admin in this system were to manually start an sshd he is inserting
himself into the system chain of trust and thus, logically, it's his
loginuid that should be used! Since we have old systems I make this a
Kconfig option.
Signed-off-by: Eric Paris <eparis@redhat.com>
The function always deals with current. Don't expose an option
pretending one can use it for something. You can't.
Signed-off-by: Eric Paris <eparis@redhat.com>
With all the size field updates out of the way xfs_file_aio_write can
be further simplified by pushing all iolock handling into
xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
the generic generic_write_sync helper for synchronous writes.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
While xfs_iunlock is fine with 0 lockflags the calling conventions are much
cleaner if xfs_file_aio_write_checks never returns without the iolock held.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>