OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Chandan Rajendra	30a9d7afe7	ext4: fix stack memory corruption with 64k block size The number of 'counters' elements needed in 'struct sg' is super_block->s_blocksize_bits + 2. Presently we have 16 'counters' elements in the array. This is insufficient for block sizes >= 32k. In such cases the memcpy operation performed in ext4_mb_seq_groups_show() would cause stack memory corruption. Fixes: `c9de560ded` Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org	2016-11-14 21:26:26 -05:00
Chandan Rajendra	69e43e8cc9	ext4: fix mballoc breakage with 64k block size 'border' variable is set to a value of 2 times the block size of the underlying filesystem. With 64k block size, the resulting value won't fit into a 16-bit variable. Hence this commit changes the data type of 'border' to 'unsigned int'. Fixes: `c9de560ded` Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org	2016-11-14 21:04:37 -05:00
Theodore Ts'o	1566a48aaa	ext4: don't lock buffer in ext4_commit_super if holding spinlock If there is an error reported in mballoc via ext4_grp_locked_error(), the code is holding a spinlock, so ext4_commit_super() must not try to lock the buffer head, or else it will trigger a BUG: BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:358 in_atomic(): 1, irqs_disabled(): 0, pid: 993, name: mount CPU: 0 PID: 993 Comm: mount Not tainted 4.9.0-rc1-clouder1 #62 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 ffff880006423548 ffffffff81318c89 ffffffff819ecdd0 0000000000000166 ffff880006423558 ffffffff810810b0 ffff880006423580 ffffffff81081153 ffff880006e5a1a0 ffff88000690e400 0000000000000000 ffff8800064235c0 Call Trace: [<ffffffff81318c89>] dump_stack+0x67/0x9e [<ffffffff810810b0>] ___might_sleep+0xf0/0x140 [<ffffffff81081153>] __might_sleep+0x53/0xb0 [<ffffffff8126c1dc>] ext4_commit_super+0x19c/0x290 [<ffffffff8126e61a>] __ext4_grp_locked_error+0x14a/0x230 [<ffffffff81081153>] ? __might_sleep+0x53/0xb0 [<ffffffff812822be>] ext4_mb_generate_buddy+0x1de/0x320 Since ext4_grp_locked_error() calls ext4_commit_super with sync == 0 (and it is the only caller which does so), avoid locking and unlocking the buffer in this case. This can result in races with ext4_commit_super() if there are other problems (which is what commit `4743f83990` was trying to address), but a Warning is better than BUG. Fixes: `4743f83990` Cc: stable@vger.kernel.org # 4.9 Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2016-11-13 22:02:29 -05:00
Theodore Ts'o	d0abb36db4	ext4: allow ext4_ext_truncate() to return an error Return errors to the caller instead of declaring the file system corrupted. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2016-11-13 22:02:28 -05:00
Theodore Ts'o	2c98eb5ea2	ext4: allow ext4_truncate() to return an error This allows us to properly propagate errors back up to ext4_truncate()'s callers. This also means we no longer have to silently ignore some errors (e.g., when trying to add the inode to the orphan inode list). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2016-11-13 22:02:26 -05:00
Theodore Ts'o	6da22013bb	Merge branch 'fscrypt' into origin	2016-11-13 22:02:22 -05:00
Theodore Ts'o	a2f6d9c4c0	Merge branch 'dax-4.10-iomap-pmd' into origin	2016-11-13 22:02:15 -05:00
David Gstir	9c4bb8a3a9	fscrypt: Let fs select encryption index/tweak Avoid re-use of page index as tweak for AES-XTS when multiple parts of same page are encrypted. This will happen on multiple (partial) calls of fscrypt_encrypt_page on same page. page->index is only valid for writeback pages. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-13 20:18:16 -05:00
David Gstir	7821d4dd45	fscrypt: Enable partial page encryption Not all filesystems work on full pages, thus we should allow them to hand partial pages to fscrypt for en/decryption. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-13 18:55:21 -05:00
David Gstir	b50f7b268b	fscrypt: Allow fscrypt_decrypt_page() to function with non-writeback pages Some filesystem might pass pages which do not have page->mapping->host set to the encrypted inode. We want the caller to explicitly pass the corresponding inode. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-13 18:53:10 -05:00
Ross Zwisler	547edce3ba	ext4: tell DAX the size of allocation holes When DAX calls _ext4_get_block() and the file offset points to a hole we currently don't set bh->b_size. This is current worked around via buffer_size_valid() in fs/dax.c. _ext4_get_block() has the hole size information from ext4_map_blocks(), so populate bh->b_size so we can remove buffer_size_valid() in a later patch. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-11-08 11:30:58 +11:00
Jan Kara	e64855c6cf	fs: Add helper to clean bdev aliases under a bh and use it Add a helper function that clears buffer heads from a block device aliasing passed bh. Use this helper function from filesystems instead of the original unmap_underlying_metadata() to save some boiler plate code and also have a better name for the functionalily since it is not unmapping anything for a long time. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-04 14:34:47 -06:00
Jan Kara	64e1c57fa4	ext4: Use clean_bdev_aliases() instead of iteration Use clean_bdev_aliases() instead of iterating through blocks one by one. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-04 14:34:47 -06:00
Christoph Hellwig	70fd76140a	block,fs: use REQ_* flags directly Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-01 09:43:26 -06:00
Joe Perches	d74f3d2528	ext4: add missing KERN_CONT to a few more debugging uses Recent commits require line continuing printks to always use pr_cont or KERN_CONT. Add these markings to a few more printks. Miscellaneaous: o Integrate the ea_idebug and ea_bdebug macros to use a single call to printk(KERN_DEBUG instead of 3 separate printks o Use the more common varargs macro style Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2016-10-15 09:57:31 -04:00
Eric Biggers	199625098a	ext4: correct endianness conversion in __xattr_check_inode() It should be cpu_to_le32(), not le32_to_cpu(). No change in behavior. Found with sparse, and this was the only endianness warning in fs/ext4/. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2016-10-15 09:39:31 -04:00
Eric Biggers	c4704a4fbe	ext4: do not advertise encryption support when disabled The sysfs file /sys/fs/ext4/features/encryption was present on kernels compiled with CONFIG_EXT4_FS_ENCRYPTION=n. This was misleading because such kernels do not actually support ext4 encryption. Therefore, only provide this file on kernels compiled with CONFIG_EXT4_FS_ENCRYPTION=y. Note: since the ext4 feature files are all hardcoded to have a contents of "supported", it really is the presence or absence of the file that is significant, not the contents (and this change reflects that). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2016-10-12 23:24:51 -04:00
Joe Perches	651e1c3b15	ext4: super.c: Update logging style using KERN_CONT Recent commit require line continuing printks to use PR_CONT. Update super.c to use KERN_CONT and use vsprintf extension %pV to avoid a printk/vprintk/printk("\n") sequence as well. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2016-10-12 23:12:53 -04:00
Michal Hocko	5114a97a8b	fs: use mapping_set_error instead of opencoded set_bit The mapping_set_error() helper sets the correct AS_ flag for the mapping so there is no reason to open code it. Use the helper directly. [akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO] Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:33 -07:00
Linus Torvalds	101105b171	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: ">rename2() work from Miklos + current_time() from Deepa" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Replace current_fs_time() with current_time() fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps fs: Replace CURRENT_TIME with current_time() for inode timestamps fs: proc: Delete inode time initializations in proc_alloc_inode() vfs: Add current_time() api vfs: add note about i_op->rename changes to porting fs: rename "rename2" i_op to "rename" vfs: remove unused i_op->rename fs: make remaining filesystems use .rename2 libfs: support RENAME_NOREPLACE in simple_rename() fs: support RENAME_NOREPLACE for local filesystems ncpfs: fix unused variable warning	2016-10-10 20:16:43 -07:00
Linus Torvalds	97d2116708	Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr updates from Al Viro: "xattr stuff from Andreas This completes the switch to xattr_handler ->get()/->set() from ->getxattr/->setxattr/->removexattr" * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Remove {get,set,remove}xattr inode operations xattr: Stop calling {get,set,remove}xattr inode operations vfs: Check for the IOP_XATTR flag in listxattr xattr: Add __vfs_{get,set,remove}xattr helpers libfs: Use IOP_XATTR flag for empty directory handling vfs: Use IOP_XATTR flag for bad-inode handling vfs: Add IOP_XATTR inode operations flag vfs: Move xattr_resolve_name to the front of fs/xattr.c ecryptfs: Switch to generic xattr handlers sockfs: Get rid of getxattr iop sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names kernfs: Switch to generic xattr handlers hfs: Switch to generic xattr handlers jffs2: Remove jffs2_{get,set,remove}xattr macros xattr: Remove unnecessary NULL attribute name check	2016-10-10 17:11:50 -07:00
Linus Torvalds	abb5a14fa2	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted misc bits and pieces. There are several single-topic branches left after this (rename2 series from Miklos, current_time series from Deepa Dinamani, xattr series from Andreas, uaccess stuff from from me) and I'd prefer to send those separately" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits) proc: switch auxv to use of __mem_open() hpfs: support FIEMAP cifs: get rid of unused arguments of CIFSSMBWrite() posix_acl: uapi header split posix_acl: xattr representation cleanups fs/aio.c: eliminate redundant loads in put_aio_ring_file fs/internal.h: add const to ns_dentry_operations declaration compat: remove compat_printk() fs/buffer.c: make __getblk_slow() static proc: unsigned file descriptors fs/file: more unsigned file descriptors fs: compat: remove redundant check of nr_segs cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] cifs: don't use memcpy() to copy struct iov_iter get rid of separate multipage fault-in primitives fs: Avoid premature clearing of capabilities fs: Give dentry to inode_change_ok() instead of inode fuse: Propagate dentry down to inode_change_ok() ceph: Propagate dentry down to inode_change_ok() xfs: Propagate dentry down to inode_change_ok() ...	2016-10-10 13:04:49 -07:00
Al Viro	e55f1d1d13	Merge remote-tracking branch 'jk/vfs' into work.misc	2016-10-08 11:06:08 -04:00
Linus Torvalds	b66484cd74	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: - fsnotify updates - ocfs2 updates - all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits) console: don't prefer first registered if DT specifies stdout-path cred: simpler, 1D supplementary groups CREDITS: update Pavel's information, add GPG key, remove snail mail address mailmap: add Johan Hovold .gitattributes: set git diff driver for C source code files uprobes: remove function declarations from arch/{mips,s390} spelling.txt: "modeled" is spelt correctly nmi_backtrace: generate one-line reports for idle cpus arch/tile: adopt the new nmi_backtrace framework nmi_backtrace: do a local dump_stack() instead of a self-NMI nmi_backtrace: add more trigger__cpu_backtrace() methods min/max: remove sparse warnings when they're nested Documentation/filesystems/proc.txt: add more description for maps/smaps mm, proc: fix region lost in /proc/self/smaps proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self proc: add LSM hook checks to /proc/<tid>/timerslack_ns proc: relax /proc/<tid>/timerslack_ns capability requirements meminfo: break apart a very long seq_printf with #ifdefs seq/proc: modify seq_put_decimal_[u]ll to take a const char , not char proc: faster /proc/*/status ...	2016-10-07 21:38:00 -07:00
Andreas Gruenbacher	fd50ecaddf	vfs: Remove {get,set,remove}xattr inode operations These inode operations are no longer used; remove them. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-10-07 21:48:36 -04:00
Toshi Kani	dbe6ec8156	ext2/4, xfs: call thp_get_unmapped_area() for pmd mappings To support DAX pmd mappings with unmodified applications, filesystems need to align an mmap address by the pmd size. Call thp_get_unmapped_area() from f_op->get_unmapped_area. Note, there is no change in behavior for a non-DAX file. Link: http://lkml.kernel.org/r/1472497881-9323-3-git-send-email-toshi.kani@hpe.com Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-07 18:46:28 -07:00
Linus Torvalds	2eee010d09	Lots of bug fixes and cleanups. -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJX9pA6AAoJEPL5WVaVDYGj7fwH/0YcdQWBg0O5d7iXFnTcimh9 fiYkqKniBWQhgBAOFPMoNPRIW4tyeQmTtu8Rywx2Hr+v4lzJvuOaT18NDANdq/pp u5eDrnJ4R+uqPJlgxVOzopLVJ6I2glgSSRdvAKYxwTYcv8F88ObzVfsJ4M415gPq cbEKF+JT3l5hTGENR5sqmYvHYaNfOFkOqt4gulPtgk1eshy+BH/05M+qBSeA5a6k srdon0pFRoUV68m+T4G8FqOZxdybeT5Yx6X0GJf0eQJoX7IaiQTPcDrXzlrbDBbN rrzbpwsDeDKtgSOckbarCBroZKdToHFekfnOJ7IPWYq8IwYTSnZKFCWIRKO6z38= =IvhS -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Lots of bug fixes and cleanups" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: remove unused variable ext4: use journal inode to determine journal overhead ext4: create function to read journal inode ext4: unmap metadata when zeroing blocks ext4: remove plugging from ext4_file_write_iter() ext4: allow unlocked direct IO when pages are cached ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY fscrypto: use standard macros to compute length of fname ciphertext ext4: do not unnecessarily null-terminate encrypted symlink data ext4: release bh in make_indexed_dir ext4: Allow parallel DIO reads ext4: allow DAX writeback for hole punch jbd2: fix lockdep annotation in add_transaction_credits() blockgroup_lock.h: simplify definition of NR_BG_LOCKS blockgroup_lock.h: remove debris from bgl_lock_ptr() conversion fscrypto: make filename crypto functions return 0 on success fscrypto: rename completion callbacks to reflect usage fscrypto: remove unnecessary includes fscrypto: improved validation when loading inode encryption metadata ext4: fix memory leak when symlink decryption fails ...	2016-10-07 15:15:33 -07:00
Eric Engestrom	18017479ca	ext4: remove unused variable Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>	2016-09-30 02:14:56 -04:00
Eric Whitney	3c816ded78	ext4: use journal inode to determine journal overhead When a file system contains an internal journal that has not been loaded, use the journal inode's i_size field to determine its contribution to the file system's overhead. (The journal's j_maxlen field is normally used to determine its size, but it's unavailable when the journal has not been loaded.) Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-30 02:08:49 -04:00
Eric Whitney	c6cb7e776a	ext4: create function to read journal inode Factor out the code used in ext4_get_journal() to read a valid journal inode from storage, enabling its reuse in other functions. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-30 02:05:09 -04:00
Jan Kara	9b623df614	ext4: unmap metadata when zeroing blocks When zeroing blocks for DAX allocations, we also have to unmap aliases in the block device mappings. Otherwise writeback can overwrite zeros with stale data from block device page cache. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2016-09-30 02:02:29 -04:00
Jan Kara	51e8137b82	ext4: remove plugging from ext4_file_write_iter() do_blockdev_direct_IO() takes care of properly plugging direct IO so there's no need to plug again inside ext4_file_write_iter(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-30 01:57:41 -04:00
Jan Kara	4b0524aae0	ext4: allow unlocked direct IO when pages are cached Currently we do not allow unlocked (meaning without inode_lock) direct IO when the file has any pages cached. This check is not needed anymore as we keep inode lock until ext4_direct_IO_write() and thus can happily writeback and evict any pages conflicting with current direct IO write. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-30 01:55:32 -04:00
Richard Weinberger	9a200d075e	ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY ...otherwise an user can enable encryption for certain files even when the filesystem is unable to support it. Such a case would be a filesystem created by mkfs.ext4's default settings, 1KiB block size. Ext4 supports encyption only when block size is equal to PAGE_SIZE. But this constraint is only checked when the encryption feature flag is set. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-30 01:49:55 -04:00
Eric Biggers	cc91542ac8	ext4: do not unnecessarily null-terminate encrypted symlink data Null-terminating the fscrypt_symlink_data on read is unnecessary because it is not string data --- it contains binary ciphertext. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-30 01:44:17 -04:00
gmail	e81d44778d	ext4: release bh in make_indexed_dir The commit 6050d47adcad: "ext4: bail out from make_indexed_dir() on first error" could end up leaking bh2 in the error path. [ Also avoid renaming bh2 to bh, which just confuses things --tytso ] Cc: stable@vger.kernel.org Signed-off-by: yangsheng <yngsion@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-30 01:33:37 -04:00
Jan Kara	16c5468859	ext4: Allow parallel DIO reads We can easily support parallel direct IO reads. We only have to make sure we cannot expose uninitialized data by reading allocated block to which data was not written yet, or which was already truncated. That is easily achieved by holding inode_lock in shared mode - that excludes all writes, truncates, hole punches. We also have to guard against page writeback allocating blocks for delay-allocated pages - that race is handled by the fact that we writeback all the pages in the affected range and the lock protects us from new pages being created there. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-30 01:03:17 -04:00
Miklos Szeredi	2773bf00ae	fs: rename "rename2" i_op to "rename" Generated patch: sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2` sed -i "s/\brename2\b/rename/g" `git grep -wl rename2` Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-09-27 11:03:58 +02:00
Ross Zwisler	cca32b7eeb	ext4: allow DAX writeback for hole punch Currently when doing a DAX hole punch with ext4 we fail to do a writeback. This is because the logic around filemap_write_and_wait_range() in ext4_punch_hole() only looks for dirty page cache pages in the radix tree, not for dirty DAX exceptional entries. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-22 11:49:38 -04:00
Jan Kara	31051c85b5	fs: Give dentry to inode_change_ok() instead of inode inode_change_ok() will be resposible for clearing capabilities and IMA extended attributes and as such will need dentry. Give it as an argument to inode_change_ok() instead of an inode. Also rename inode_change_ok() to setattr_prepare() to better relect that it does also some modifications in addition to checks. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2016-09-22 10:56:19 +02:00
Jan Kara	073931017b	posix_acl: Clear SGID bit when setting file permissions When file permissions are modified via chmod(2) and the user is not in the owning group or capable of CAP_FSETID, the setgid bit is cleared in inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file permissions as well as the new ACL, but doesn't clear the setgid bit in a similar way; this allows to bypass the check in chmod(2). Fix that. References: CVE-2016-7097 Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2016-09-22 10:55:32 +02:00
Eric Biggers	ef1eb3aa50	fscrypto: make filename crypto functions return 0 on success Several filename crypto functions: fname_decrypt(), fscrypt_fname_disk_to_usr(), and fscrypt_fname_usr_to_disk(), returned the output length on success or -errno on failure. However, the output length was redundant with the value written to 'oname->len'. It is also potentially error-prone to make callers have to check for '< 0' instead of '!= 0'. Therefore, make these functions return 0 instead of a length, and make the callers who cared about the return value being a length use 'oname->len' instead. For consistency also make other callers check for a nonzero result rather than a negative result. This change also fixes the inconsistency of fname_encrypt() actually already returning 0 on success, not a length like the other filename crypto functions and as documented in its function comment. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-15 17:25:55 -04:00
Eric Biggers	dcce7a46c6	ext4: fix memory leak when symlink decryption fails This bug was introduced in v4.8-rc1. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2016-09-15 13:13:13 -04:00
Fabian Frederick	be32197cd6	ext4: remove unused definition for MAX_32_NUM MAX_32_NUM isn't used in ext4 Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-15 11:58:47 -04:00
Fabian Frederick	518eaa6387	ext4: create EXT4_MAX_BLOCKS() macro Create a macro to calculate length + offset -> maximum blocks This adds more readability. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-15 11:55:01 -04:00
Fabian Frederick	c3fe493ccd	ext4: remove unneeded test in ext4_alloc_file_blocks() ext4_alloc_file_blocks() is called from ext4_zero_range() and ext4_fallocate() both already testing EXT4_INODE_EXTENTS We can call ext_depth(inode) unconditionnally. [ Added BUG_ON check to make sure ext4_alloc_file_blocks() won't get called for a indirect-mapped inode in the future. -- tytso ] Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-15 11:52:07 -04:00
Fabian Frederick	edf15aa180	ext4: fix memory leak in ext4_insert_range() Running xfstests generic/013 with kmemleak gives the following: unreferenced object 0xffff8801d3d27de0 (size 96): comm "fsstress", pid 4941, jiffies 4294860168 (age 53.485s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff818eaaf3>] kmemleak_alloc+0x23/0x40 [<ffffffff81179805>] __kmalloc+0xf5/0x1d0 [<ffffffff8122ef5c>] ext4_find_extent+0x1ec/0x2f0 [<ffffffff8123530c>] ext4_insert_range+0x34c/0x4a0 [<ffffffff81235942>] ext4_fallocate+0x4e2/0x8b0 [<ffffffff81181334>] vfs_fallocate+0x134/0x210 [<ffffffff8118203f>] SyS_fallocate+0x3f/0x60 [<ffffffff818efa9b>] entry_SYSCALL_64_fastpath+0x13/0x8f [<ffffffffffffffff>] 0xffffffffffffffff Problem seems mitigated by dropping refs and freeing path when there's no path[depth].p_ext Cc: stable@vger.kernel.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-15 11:39:52 -04:00
wangguang	4e800c0359	ext4: bugfix for mmaped pages in mpage_release_unused_pages() Pages clear buffers after ext4 delayed block allocation failed, However, it does not clean its pte_dirty flag. if the pages unmap ,in cording to the pte_dirty , unmap_page_range may try to call __set_page_dirty, which may lead to the bugon at mpage_prepare_extent_to_map:head = page_buffers(page);. This patch just call clear_page_dirty_for_io to clean pte_dirty at mpage_release_unused_pages for pages mmaped. Steps to reproduce the bug: （1） mmap a file in ext4 addr = (char *)mmap(NULL, 4096, PROT_READ\|PROT_WRITE, MAP_SHARED, fd, 0); memset(addr, 'i', 4096); （2） return EIO at ext4_writepages->mpage_map_and_submit_extent->mpage_map_one_extent which causes this log message to be print: ext4_msg(sb, KERN_CRIT, "Delayed block allocation failed for " "inode %lu at logical offset %llu with" " max blocks %u with error %d", inode->i_ino, (unsigned long long)map->m_lblk, (unsigned)map->m_len, -err); (3）Unmap the addr cause warning at __set_page_dirty:WARN_ON_ONCE(warn && !PageUptodate(page)); (4) wait for a minute,then bugon happen. Cc: stable@vger.kernel.org Signed-off-by: wangguang <wangguang03@zte.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-15 11:32:46 -04:00
Eric Biggers	ba63f23d69	fscrypto: require write access to mount to set encryption policy Since setting an encryption policy requires writing metadata to the filesystem, it should be guarded by mnt_want_write/mnt_drop_write. Otherwise, a user could cause a write to a frozen or readonly filesystem. This was handled correctly by f2fs but not by ext4. Make fscrypt_process_policy() handle it rather than relying on the filesystem to get it right. Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs} Signed-off-by: Theodore Ts'o <tytso@mit.edu> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-09-10 01:18:57 -04:00
Dmitry Monakhov	e22834f024	ext4: improve ext4lazyinit scalability ext4lazyinit is a global thread. This thread performs itable initalization under li_list_mtx mutex. It basically does the following: ext4_lazyinit_thread ->mutex_lock(&eli->li_list_mtx); ->ext4_run_li_request(elr) ->ext4_init_inode_table-> Do a lot of IO if the list is large And when new mount/umount arrive they have to block on ->li_list_mtx because lazy_thread holds it during full walk procedure. ext4_fill_super ->ext4_register_li_request ->mutex_lock(&ext4_li_info->li_list_mtx); ->list_add(&elr->lr_request, &ext4_li_info >li_request_list); In my case mount takes 40minutes on server with 36 * 4Tb HDD. Common user may face this in case of very slow dev ( /dev/mmcblkXXX) Even more. If one of filesystems was frozen lazyinit_thread will simply block on sb_start_write() so other mount/umount will be stuck forever. This patch changes logic like follows: - grab ->s_umount read sem before processing new li_request. After that it is safe to drop li_list_mtx because all callers of li_remove_request are holding ->s_umount for write. - li_thread skips frozen SB's Locking order: Mh KOrder is asserted by umount path like follows: s_umount ->li_list_mtx so the only way to to grab ->s_mount inside li_thread is via down_read_trylock xfstests:ext4/023 #PSBM-49658 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-05 23:38:36 -04:00
Jan Kara	6ae4c5a698	ext4: cleanup ext4_sync_parent() A condition !hlist_empty(&inode->i_dentry) is always true for open file. Just remove it. Also ext4_sync_parent() could use some explanation why races with rmdir() are not an issue - add a comment explaining that. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-05 23:21:43 -04:00
Kaho Ng	0b7b77791c	ext4: remove old feature helpers Use the ext4_{has,set,clear}_feature_* helpers to replace the old feature helpers. Signed-off-by: Kaho Ng <ngkaho1234@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>	2016-09-05 23:11:58 -04:00
Jan Kara	49da939272	ext4: enable quota enforcement based on mount options When quota information is stored in quota files, we enable only quota accounting on mount and enforcement is enabled only in response to Q_QUOTAON quotactl. To make ext4 behavior consistent with XFS, we add a possibility to enable quota enforcement on mount by specifying corresponding quota mount option (usrquota, grpquota, prjquota). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-05 23:08:16 -04:00
Daeho Jeong	93e3b4e663	ext4: reinforce check of i_dtime when clearing high fields of uid and gid Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid of deleted and evicted inode to fix up interoperability with old kernels. However, it checks only i_dtime of an inode to determine whether the inode was deleted and evicted, and this is very risky, because i_dtime can be used for the pointer maintaining orphan inode list, too. We need to further check whether the i_dtime is being used for the orphan inode list even if the i_dtime is not NULL. We found that high 16-bit fields of uid/gid of inode are unintentionally and permanently cleared when the inode truncation is just triggered, but not finished, and the inode metadata, whose high uid/gid bits are cleared, is written on disk, and the sudden power-off follows that in order. Cc: stable@vger.kernel.org Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Hobin Woo <hobin.woo@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-09-05 22:56:10 -04:00
Eric Whitney	14fbd4aa61	ext4: enforce online defrag restriction for encrypted files Online defragging of encrypted files is not currently implemented. However, the move extent ioctl can still return successfully when called. For example, this occurs when xfstest ext4/020 is run on an encrypted file system, resulting in a corrupted test file and a corresponding test failure. Until the proper functionality is implemented, fail the move extent ioctl if either the original or donor file is encrypted. Cc: stable@vger.kernel.org Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-29 15:45:11 -04:00
Jan Kara	dfa2064b22	ext4: factor out loop for freeing inode xattr space Move loop to make enough space in the inode from ext4_expand_extra_isize_ea() into a separate function to make that function smaller and better readable and also to avoid delaration of variables inside a loop block. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-29 15:44:11 -04:00
Jan Kara	6e0cd088c0	ext4: remove (almost) unused variables from ext4_expand_extra_isize_ea() 'start' variable is completely unused in ext4_expand_extra_isize_ea(). Variable 'first' is used only once in one place. So just remove them. Variables 'entry' and 'last' are only really used later in the function inside a loop. Move their declarations there. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-29 15:43:11 -04:00
Jan Kara	3f2571c1f9	ext4: factor out xattr moving Factor out function for moving xattrs from inode into external xattr block from ext4_expand_extra_isize_ea(). That function is already quite long and factoring out this rather standalone functionality helps readability. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-29 15:42:11 -04:00
Jan Kara	9440571388	ext4: replace bogus assertion in ext4_xattr_shift_entries() We were checking whether computed offsets do not exceed end of block in ext4_xattr_shift_entries(). However this does not make sense since we always only decrease offsets. So replace that assertion with a check whether we really decrease xattrs value offsets. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-29 15:41:11 -04:00
Jan Kara	1cba423707	ext4: remove checks for e_value_block Currently we don't support xattrs with e_value_block set. We don't allow them to pass initial xattr check so there's no point for checking for this later. Since these tests were untested, bugs were creeping in and not all places which should have checked were checking e_value_block anyway. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-29 15:40:11 -04:00
Jan Kara	2de58f1102	ext4: Check that external xattr value block is zero Currently we don't support xattrs with values stored out of line. Check for that in ext4_xattr_check_names() to make sure we never work with such xattrs since not all the code counts with that resulting is possible weird corruption issues. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-29 15:39:11 -04:00
Jan Kara	e3014d14a8	ext4: fixup free space calculations when expanding inodes Conditions checking whether there is enough free space in an xattr block and when xattr is large enough to make enough space in the inode forgot to account for the fact that inode need not be completely filled up with xattrs. Thus we could move unnecessarily many xattrs out of inode or even falsely claim there is not enough space to expand the inode. We also forgot to update the amount of free space in xattr block when moving more xattrs and thus could decide to move too big xattr resulting in unexpected failure. Fix these problems by properly updating free space in the inode and xattr block as we move xattrs. To simplify the math, avoid shifting xattrs after removing each one xattr and instead just shift xattrs only once there is enough free space in the inode. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-29 15:38:11 -04:00
Linus Torvalds	b8927721ae	Fix bugs that could cause kernel deadlocks or file system corruption while moving xattrs to expand the extended inode. Also add some sanity checks to the block group descriptors to make sure we don't end up overwriting the superblock. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXw7i2AAoJEPL5WVaVDYGj96gH/A8rNgx7BoqPx3kanVEamblT tM0X9JcEGmKHN4enRts2b78EWbR0/U0SOP92+fg9SSq2MDJ0/kdaKLWmbUwx8jUi B7HMEqCprlCdigK7wwt3xF+6edyZRhtzlWy3bhxJ40f0KT5CuriSQbxogr931uKl hUKW2h5JtUqHtINzTt4oWjVm8xwrScxuYHYAcpw0G42ZzfO6xQOzQdowcx4m3cE9 PrtTbU5MwW8/wgsdLiClScQq30MK/GCbHh5heyRt1BcNo9+MDsZDOgdavh9StfnW Bl1N6zwRtRBJNcpKWfTfwU4NTIvStCTyA8BJgKgE95YIHDsstJVl4MO7ot25qbM= =pXe+ -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix bugs that could cause kernel deadlocks or file system corruption while moving xattrs to expand the extended inode. Also add some sanity checks to the block group descriptors to make sure we don't end up overwriting the superblock" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid deadlock when expanding inode size ext4: properly align shifted xattrs when expanding inodes ext4: fix xattr shifting when expanding inodes part 2 ext4: fix xattr shifting when expanding inodes ext4: validate that metadata blocks do not overlap superblock ext4: reserve xattr index for the Hurd	2016-08-29 12:37:11 -07:00
Jan Kara	2e81a4eeed	ext4: avoid deadlock when expanding inode size When we need to move xattrs into external xattr block, we call ext4_xattr_block_set() from ext4_expand_extra_isize_ea(). That may end up calling ext4_mark_inode_dirty() again which will recurse back into the inode expansion code leading to deadlocks. Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move its management into ext4_expand_extra_isize_ea() since its manipulation is safe there (due to xattr_sem) from possible races with ext4_xattr_set_handle() which plays with it as well. CC: stable@vger.kernel.org # 4.4.x Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-11 12:38:55 -04:00
Jan Kara	443a8c41cd	ext4: properly align shifted xattrs when expanding inodes We did not count with the padding of xattr value when computing desired shift of xattrs in the inode when expanding i_extra_isize. As a result we could create unaligned start of inline xattrs. Account for alignment properly. CC: stable@vger.kernel.org # 4.4.x- Signed-off-by: Jan Kara <jack@suse.cz>	2016-08-11 12:00:01 -04:00
Jan Kara	418c12d08d	ext4: fix xattr shifting when expanding inodes part 2 When multiple xattrs need to be moved out of inode, we did not properly recompute total size of xattr headers in the inode and the new header position. Thus when moving the second and further xattr we asked ext4_xattr_shift_entries() to move too much and from the wrong place, resulting in possible xattr value corruption or general memory corruption. CC: stable@vger.kernel.org # 4.4.x Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-11 11:58:32 -04:00
Jan Kara	d0141191a2	ext4: fix xattr shifting when expanding inodes The code in ext4_expand_extra_isize_ea() treated new_extra_isize argument sometimes as the desired target i_extra_isize and sometimes as the amount by which we need to grow current i_extra_isize. These happen to coincide when i_extra_isize is 0 which used to be the common case and so nobody noticed this until recently when we added i_projid to the inode and so i_extra_isize now needs to grow from 28 to 32 bytes. The result of these bugs was that we sometimes unnecessarily decided to move xattrs out of inode even if there was enough space and we often ended up corrupting in-inode xattrs because arguments to ext4_xattr_shift_entries() were just wrong. This could demonstrate itself as BUG_ON in ext4_xattr_shift_entries() triggering. Fix the problem by introducing new isize_diff variable and use it where appropriate. CC: stable@vger.kernel.org # 4.4.x Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-11 11:50:30 -04:00
Theodore Ts'o	829fa70ddd	ext4: validate that metadata blocks do not overlap superblock A number of fuzzing failures seem to be caused by allocation bitmaps or other metadata blocks being pointed at the superblock. This can cause kernel BUG or WARNings once the superblock is overwritten, so validate the group descriptor blocks to make sure this doesn't happen. Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-08-01 00:51:02 -04:00
Theodore Ts'o	3980bd3b40	ext4: reserve xattr index for the Hurd The Hurd is using inode fields which restricts it from using more advanced ext4 file system features, due to design choices made over a decade ago. By giving the Hurd an extended attribute index field we allow it to move the translator and author fields out of the core inode fields, and hopefully we can get rid of ugly hacks such as EXT4_OS_HURD and EXT4_MOUNT2_HURD_COMPAT somday. For more information please see: https://summerofcode.withgoogle.com/projects/#5869799859027968 Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-07-31 23:38:36 -04:00
Linus Torvalds	6784725ab0	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted cleanups and fixes. Probably the most interesting part long-term is ->d_init() - that will have a bunch of followups in (at least) ceph and lustre, but we'll need to sort the barrier-related rules before it can get used for really non-trivial stuff. Another fun thing is the merge of ->d_iput() callers (dentry_iput() and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all except the one in __d_lookup_lru())" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) fs/dcache.c: avoid soft-lockup in dput() vfs: new d_init method vfs: Update lookup_dcache() comment bdev: get rid of ->bd_inodes Remove last traces of ->sync_page new helper: d_same_name() dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends() vfs: clean up documentation vfs: document ->d_real() vfs: merge .d_select_inode() into .d_real() unify dentry_iput() and dentry_unlink_inode() binfmt_misc: ->s_root is not going anywhere drop redundant ->owner initializations ufs: get rid of redundant checks orangefs: constify inode_operations missed comment updates from ->direct_IO() prototype change file_inode(f)->i_mapping is f->f_mapping trim fsnotify hooks a bit 9p: new helper - v9fs_parent_fid() debugfs: ->d_parent is never NULL or negative ...	2016-07-28 12:59:05 -07:00
Linus Torvalds	0e06f5c0de	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: - a few misc bits - ocfs2 - most(?) of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits) thp: fix comments of __pmd_trans_huge_lock() cgroup: remove unnecessary 0 check from css_from_id() cgroup: fix idr leak for the first cgroup root mm: memcontrol: fix documentation for compound parameter mm: memcontrol: remove BUG_ON in uncharge_list mm: fix build warnings in <linux/compaction.h> mm, thp: convert from optimistic swapin collapsing to conservative mm, thp: fix comment inconsistency for swapin readahead functions thp: update Documentation/{vm/transhuge,filesystems/proc}.txt shmem: split huge pages beyond i_size under memory pressure thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE khugepaged: add support of collapse for tmpfs/shmem pages shmem: make shmem_inode_info::lock irq-safe khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() thp: extract khugepaged from mm/huge_memory.c shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings shmem: add huge pages support shmem: get_unmapped_area align huge page shmem: prepare huge= mount option and sysfs knob mm, rmap: account shmem thp pages ...	2016-07-26 19:55:54 -07:00
Linus Torvalds	396d10993f	The major change this cycle is deleting ext4's copy of the file system encryption code and switching things over to using the copies in fs/crypto. I've updated the MAINTAINERS file to add an entry for fs/crypto listing Jaeguk Kim and myself as the maintainers. There are also a number of bug fixes, most notably for some problems found by American Fuzzy Lop (AFL) courtesy of Vegard Nossum. Also fixed is a writeback deadlock detected by generic/130, and some potential races in the metadata checksum code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXlbP9AAoJEPL5WVaVDYGjGxgIAJ9YIqme//yix63oHYLhDNea lY/TLqZrb9/TdDRvGyZa3jYaKaIejL53eEQS9nhEB/JI0sEiDpHmOrDOxdj8Hlsw fm7nJyh1u4vFKPyklCbIvLAje1vl8X/6OvqQiwh45gIxbbsFftaBWtccW+UtEkIP Fx65Vk7RehJ/sNrM0cRrwB79YAmDS8P6BPyzdMRk+vO/uFqyq7Auc+pkd+bTlw/m TDAEIunlk0Ovjx75ru1zaemL1JJx5ffehrJmGCcSUPHVbMObOEKIrlV50gAAKVhO qbZAri3mhDvyspSLuS/73L9skeCiWFLhvojCBGu4t2aa3JJolmItO7IpKi4HdRU= =bxGK -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "The major change this cycle is deleting ext4's copy of the file system encryption code and switching things over to using the copies in fs/crypto. I've updated the MAINTAINERS file to add an entry for fs/crypto listing Jaeguk Kim and myself as the maintainers. There are also a number of bug fixes, most notably for some problems found by American Fuzzy Lop (AFL) courtesy of Vegard Nossum. Also fixed is a writeback deadlock detected by generic/130, and some potential races in the metadata checksum code" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits) ext4: verify extent header depth ext4: short-cut orphan cleanup on error ext4: fix reference counting bug on block allocation error MAINTAINRES: fs-crypto maintainers update ext4 crypto: migrate into vfs's crypto engine ext2: fix filesystem deadlock while reading corrupted xattr block ext4: fix project quota accounting without quota limits enabled ext4: validate s_reserved_gdt_blocks on mount ext4: remove unused page_idx ext4: don't call ext4_should_journal_data() on the journal inode ext4: Fix WARN_ON_ONCE in ext4_commit_super() ext4: fix deadlock during page writeback ext4: correct error value of function verifying dx checksum ext4: avoid modifying checksum fields directly during checksum verification ext4: check for extents that wrap around jbd2: make journal y2038 safe jbd2: track more dependencies on transaction commit jbd2: move lockdep tracking to journal_s jbd2: move lockdep instrumentation for jbd2 handles ext4: respect the nobarrier mount option in nojournal mode ...	2016-07-26 18:35:55 -07:00
Michal Hocko	8a5c743e30	mm, memcg: use consistent gfp flags during readahead Vladimir has noticed that we might declare memcg oom even during readahead because read_pages only uses GFP_KERNEL (with mapping_gfp restriction) while __do_page_cache_readahead uses page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from OOMs. This gfp mask discrepancy is really unfortunate and easily fixable. Drop page_cache_alloc_readahead() which only has one user and outsource the gfp_mask logic into readahead_gfp_mask and propagate this mask from __do_page_cache_readahead down to read_pages. This alone would have only very limited impact as most filesystems are implementing ->readpages and the common implementation mpage_readpages does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to use readahead_gfp_mask instead as this function is called only during readahead as well. The same applies to read_cache_pages. ext4 has its own ext4_mpage_readpages but the path which has pages != NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are doing a very similar pattern to mpage_readpages so the same can be applied to them as well. [akpm@linux-foundation.org: coding-style fixes] [mhocko@suse.com: restrict gfp mask in mpage_alloc] Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Chris Mason <clm@fb.com> Cc: Steve French <sfrench@samba.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Changman Lee <cm224.lee@samsung.com> Cc: Chao Yu <yuchao0@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-26 16:19:19 -07:00
Ross Zwisler	6b524995a7	dax: remote unused fault wrappers Remove the unused wrappers dax_fault() and dax_pmd_fault(). After this removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and dax_pmd_fault() respectively, and update all callers. The dax_fault() and dax_pmd_fault() wrappers were initially intended to capture some filesystem independent functionality around page faults (calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime and ctime). However, the following commits: `5726b27b09` ("ext2: Add locking for DAX faults") `ea3d7209ca` ("ext4: fix races between page faults and hole punching") added locking to the ext2 and ext4 filesystems after these common operations but before __dax_fault() and __dax_pmd_fault() were called. This means that these wrappers are no longer used, and are unlikely to be used in the future. XFS has had locking analogous to what was recently added to ext2 and ext4 since DAX support was initially introduced by: `6b698edeee` ("xfs: add DAX file operations support") Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-26 16:19:19 -07:00
Vegard Nossum	7bc9491645	ext4: verify extent header depth Although the extent tree depth of 5 should enough be for the worst case of 2*32 extents of length 1, the extent tree code does not currently to merge nodes which are less than half-full with a sibling node, or to shrink the tree depth if possible. So it's possible, at least in theory, for the tree depth to be greater than 5. However, even in the worst case, a tree depth of 32 is highly unlikely, and if the file system is maliciously corrupted, an insanely large eh_depth can cause memory allocation failures that will trigger kernel warnings (here, eh_depth = 65280): JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580 CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508 Stack: 604a8947 625badd8 0002fd09 00000000 60078643 00000000 62623910 601bf9bc 62623970 6002fc84 626239b0 900000125 Call Trace: [<6001c2dc>] show_stack+0xdc/0x1a0 [<601bf9bc>] dump_stack+0x2a/0x2e [<6002fc84>] __warn+0x114/0x140 [<6002fdff>] warn_slowpath_null+0x1f/0x30 [<60165829>] start_this_handle+0x569/0x580 [<60165d4e>] jbd2__journal_start+0x11e/0x220 [<60146690>] __ext4_journal_start_sb+0x60/0xa0 [<60120a81>] ext4_truncate+0x131/0x3a0 [<60123677>] ext4_setattr+0x757/0x840 [<600d5d0f>] notify_change+0x16f/0x2a0 [<600b2b16>] do_truncate+0x76/0xc0 [<600c3e56>] path_openat+0x806/0x1300 [<600c55c9>] do_filp_open+0x89/0xf0 [<600b4074>] do_sys_open+0x134/0x1e0 [<600b4140>] SyS_open+0x20/0x30 [<6001ea68>] handle_syscall+0x88/0x90 [<600295fd>] userspace+0x3fd/0x500 [<6001ac55>] fork_handler+0x85/0x90 ---[ end trace 08b0b88b6387a244 ]--- [ Commit message modified and the extent tree depath check changed from 5 to 32 -- tytso ] Cc: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-07-15 00:22:07 -04:00
Vegard Nossum	c65d5c6c81	ext4: short-cut orphan cleanup on error If we encounter a filesystem error during orphan cleanup, we should stop. Otherwise, we may end up in an infinite loop where the same inode is processed again and again. EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters Aborting journal on device loop0-8. EXT4-fs (loop0): Remounting filesystem read-only EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed! [...] EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed! [...] EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed! [...] See-also: `c9eb13a910` ("ext4: fix hang when processing corrupted orphaned inode list") Cc: Jan Kara <jack@suse.cz> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2016-07-14 23:21:35 -04:00
Vegard Nossum	554a5ccc4e	ext4: fix reference counting bug on block allocation error If we hit this error when mounted with errors=continue or errors=remount-ro: EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to continue. However, ext4_mb_release_context() is the wrong thing to call here since we are still actually using the allocation context. Instead, just error out. We could retry the allocation, but there is a possibility of getting stuck in an infinite loop instead, so this seems safer. [ Fixed up so we don't return EAGAIN to userspace. --tytso ] Fixes: `8556e8f3b6` ("ext4: Don't allow new groups to be added during block allocation") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: stable@vger.kernel.org	2016-07-14 23:02:47 -04:00
Jaegeuk Kim	a7550b30ab	ext4 crypto: migrate into vfs's crypto engine This patch removes the most parts of internal crypto codes. And then, it modifies and adds some ext4-specific crypt codes to use the generic facility. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-07-10 14:01:03 -04:00
Wang Shilong	079788d01e	ext4: fix project quota accounting without quota limits enabled We should always transfer quota accounting, regardless of whether quota limits are enabled. Steps to reproduce: # mkfs.ext4 /dev/sda4 -O quota,project # mount /dev/sda4 /mnt/test # cp /bin/bash /mnt/test # chattr -p 123 /mnt/test/bash # quota -v -P 123 Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-07-05 21:33:52 -04:00
Theodore Ts'o	5b9554dc5b	ext4: validate s_reserved_gdt_blocks on mount If s_reserved_gdt_blocks is extremely large, it's possible for ext4_init_block_bitmap(), which is called when ext4 sets up an uninitialized block bitmap, to corrupt random kernel memory. Add the same checks which e2fsck has --- it must never be larger than blocksize / sizeof(__u32) --- and then add a backup check in ext4_init_block_bitmap() in case the superblock gets modified after the file system is mounted. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2016-07-05 20:01:52 -04:00
yalin wang	de9e9181bc	ext4: remove unused page_idx Signed-off-by: yalin wang <yalin.wang2010@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.com>	2016-07-05 16:32:32 -04:00
Vegard Nossum	6a7fd522a7	ext4: don't call ext4_should_journal_data() on the journal inode If ext4_fill_super() fails early, it's possible for ext4_evict_inode() to call ext4_should_journal_data() before superblock options and flags are fully set up. In that case, the iput() on the journal inode can end up causing a BUG(). Work around this problem by reordering the tests so we only call ext4_should_journal_data() after we know it's not the journal inode. Fixes: `2d859db3e4` ("ext4: fix data corruption in inodes with journalled data") Fixes: `2b405bfa84` ("ext4: fix data=journal fast mount/umount hang") Cc: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2016-07-04 11:03:00 -04:00
Pranay Kr. Srivastava	4743f83990	ext4: Fix WARN_ON_ONCE in ext4_commit_super() If there are racing calls to ext4_commit_super() it's possible for another writeback of the superblock to result in the buffer being marked with an error after we check if the buffer is marked as having a write error and the buffer up-to-date flag is set again. If that happens mark_buffer_dirty() can end up throwing a WARN_ON_ONCE. Fix this by moving this check to write before we call write_buffer_dirty(), and keeping the buffer locked during this whole sequence. Signed-off-by: Pranay Kr. Srivastava <pranjas@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-07-04 10:24:52 -04:00
Jan Kara	646caa9c8e	ext4: fix deadlock during page writeback Commit `06bd3c36a7` (ext4: fix data exposure after a crash) uncovered a deadlock in ext4_writepages() which was previously much harder to hit. After this commit xfstest generic/130 reproduces the deadlock on small filesystems. The problem happens when ext4_do_update_inode() sets LARGE_FILE feature and marks current inode handle as synchronous. That subsequently results in ext4_journal_stop() called from ext4_writepages() to block waiting for transaction commit while still holding page locks, reference to io_end, and some prepared bio in mpd structure each of which can possibly block transaction commit from completing and thus results in deadlock. Fix the problem by releasing page locks, io_end reference, and submitting prepared bio before calling ext4_journal_stop(). [ Changed to defer the call to ext4_journal_stop() only if the handle is synchronous. --tytso ] Reported-and-tested-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>	2016-07-04 10:14:01 -04:00
Daeho Jeong	fa96454069	ext4: correct error value of function verifying dx checksum ext4_dx_csum_verify() returns the success return value in two checksum verification failure cases. We need to set the return values to zero as failure like ext4_dirent_csum_verify() returning zero when failing to find a checksum dirent at the tail. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>	2016-07-03 21:11:08 -04:00
Daeho Jeong	b47820edd1	ext4: avoid modifying checksum fields directly during checksum verification We temporally change checksum fields in buffers of some types of metadata into '0' for verifying the checksum values. By doing this without locking the buffer, some metadata's checksums, which are being committed or written back to the storage, could be damaged. In our test, several metadata blocks were found with damaged metadata checksum value during recovery process. When we only verify the checksum value, we have to avoid modifying checksum fields directly. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>	2016-07-03 17:51:39 -04:00
Vegard Nossum	f70749ca42	ext4: check for extents that wrap around An extent with lblock = 4294967295 and len = 1 will pass the ext4_valid_extent() test: ext4_lblk_t last = lblock + len - 1; if (len == 0 \|\| lblock > last) return 0; since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end(). We can simplify it by removing the - 1 altogether and changing the test to use lblock + len <= lblock, since now if len = 0, then lblock + 0 == lblock and it fails, and if len > 0 then lblock + len > lblock in order to pass (i.e. it doesn't overflow). Fixes: `5946d0893` ("ext4: check for overlapping extents in ext4_valid_extent_entries()") Fixes: `2f974865f` ("ext4: check for zero length extent explicitly") Cc: Eryu Guan <guaneryu@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-06-30 11:53:46 -04:00
Theodore Ts'o	78d9625107	ext4: respect the nobarrier mount option in nojournal mode Also, if we are going to issue the barrier, we should do this after we write out the parent directories if necessary. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2016-06-26 18:25:01 -04:00
Theodore Ts'o	d08854f5bc	ext4: optimize ext4_should_retry_alloc() to improve ENOSPC performance If there are no pending blocks to be released after a commit, forcing a journal commit has no hope of helping. It's possible that a commit had just completed, so if there are now free blocks available for allocation, it's worth retrying the commit. Reported-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-06-26 18:24:01 -04:00
Mike Christie	60a40096a3	ext4: use bio op helprs in ext4 crypto code This was missed from my last patchset. This patch has ext4 crypto code use the bio op helper to set the operation. The operation (discard, write, writesame, etc) is now defined seperately from the other REQ bits. They still share the bi_rw field to save space, so we use these helpers so modules do not have to worry about setting/overwriting info. Jens, I am not sure how you handle patches on top of patches in the next branches. If you merge patches that fix issues in previous patches in next, then this patch could be part of commit `95fe6c1a20` Author: Mike Christie <mchristi@redhat.com> Date: Sun Jun 5 14:31:48 2016 -0500 block, fs, mm, drivers: use bio set/get op accessors Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-08 15:01:08 -06:00
Mike Christie	95fe6c1a20	block, fs, mm, drivers: use bio set/get op accessors This patch converts the simple bi_rw use cases in the block, drivers, mm and fs code to set/get the bio operation using bio_set_op_attrs/bio_op These should be simple one or two liner cases, so I just did them in one patch. The next patches handle the more complicated cases in a module per patch. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Mike Christie	dfec8a14fc	fs: have ll_rw_block users pass in op and flags separately This has ll_rw_block users pass in the operation and flags separately, so ll_rw_block can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Mike Christie	2a222ca992	fs: have submit_bh users pass in op and flags separately This has submit_bh users pass in the operation and flags separately, so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Mike Christie	4e49ea4a3d	block/fs/drivers: remove rw argument from submit_bio This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Al Viro	84c60b1388	drop redundant ->owner initializations it's not needed for file_operations of inodes located on fs defined in the hosting module and for file_operations that go into procfs. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-29 19:08:00 -04:00
Linus Torvalds	d102a56edb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Followups to the parallel lookup work: - update docs - restore killability of the places that used to take ->i_mutex killably now that we have down_write_killable() merged - Additionally, it turns out that I missed a prerequisite for security_d_instantiate() stuff - ->getxattr() wasn't the only thing that could be called before dentry is attached to inode; with smack we needed the same treatment applied to ->setxattr() as well" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->setxattr() to passing dentry and inode separately switch xattr_handler->set() to passing dentry and inode separately restore killability of old mutex_lock_killable(&inode->i_mutex) users add down_write_killable_nested() update D/f/directory-locking	2016-05-27 17:14:05 -07:00
Al Viro	5930122683	switch xattr_handler->set() to passing dentry and inode separately preparation for similar switch in ->setxattr() (see the next commit for rationale). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-27 15:39:43 -04:00
Linus Torvalds	315227f6da	DAX error handling for 4.7 - Until now, dax has been disabled if media errors were found on any device. This enables the use of DAX in the presence of these errors by making all sector-aligned zeroing go through the driver. - The driver (already) has the ability to clear errors on writes that are sent through the block layer using 'DSMs' defined in ACPI 6.1. Other misc changes: - When mounting DAX filesystems, check to make sure the partition is page aligned. This is a requirement for DAX, and previously, we allowed such unaligned mounts to succeed, but subsequent reads/writes would fail. - Misc/cleanup fixes from Jan that remove unused code from DAX related to zeroing, writeback, and some size checks. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXQ4GKAAoJEHr6Yb6juE3/zowP/iclIhgXXXMQJRUHJlePMXC8 15sGZ32JS1ak9g7vrsmNVEDNynfNtiMYdBxtUyRuj6xqgwdZvFk3F55KOCPtaeA1 +yADkgeRkTAcwzmHw9WQVEzBCqyzSisdrwtEfH817qdq9FJdH66x2Kos6i+HeAVr 5Q/e4gs7lKrjf384/QBl+wxNZOndJaQAPd2VRHQqx2A9F33v0ljdwRaUG1r4fjK2 dtmhcZCqdQyuAGXW3piTnZc5ZFc3DPqO4FkEfqkEK3lFOflK0fd8wMsAZRp/Jd0j GJsgnVSWSqG0Dz476djlG0w8t2p5Jv1g9cKChV+ZZEdFLKWHCOUFqXNj8uI8I4k5 cOEKCHyJ3IwfSHhNQqktEWrQN4T8ZXhWtuc9GuV4UZYuqJqHci6EdR/YsWsJjV+L lm/qvK4ipDS1pivxOy8KX/iN0z7Io8J9GXpStDx3g8iWjLlh4YYlbJLWeeRepo/z aPlV/QAKcHiGY6jzLExrZIyCWkzwo6O+0p1Kxerv9/7K/32HWbOodZ+tC8eD+N25 pV69nCGf+u50T2TtIx1+iann4NC1r7zg5yqnT9AgpyZpiwR5joCDzI5sXW+D0rcS vPtfM84Ccdeq/e6mvfIpZgR0/npQapKnrmUest0J7P2BFPHiFPji1KzZ7M+1aFOo 9R6JdrAj0Sc+FBa+cGzH =v6Of -----END PGP SIGNATURE----- Merge tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull misc DAX updates from Vishal Verma: "DAX error handling for 4.7 - Until now, dax has been disabled if media errors were found on any device. This enables the use of DAX in the presence of these errors by making all sector-aligned zeroing go through the driver. - The driver (already) has the ability to clear errors on writes that are sent through the block layer using 'DSMs' defined in ACPI 6.1. Other misc changes: - When mounting DAX filesystems, check to make sure the partition is page aligned. This is a requirement for DAX, and previously, we allowed such unaligned mounts to succeed, but subsequent reads/writes would fail. - Misc/cleanup fixes from Jan that remove unused code from DAX related to zeroing, writeback, and some size checks" * tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: fix a comment in dax_zero_page_range and dax_truncate_page dax: for truncate/hole-punch, do zeroing through the driver if possible dax: export a low-level __dax_zero_page_range helper dax: use sb_issue_zerout instead of calling dax_clear_sectors dax: enable dax in the presence of known media errors (badblocks) dax: fallback from pmd to pte on error block: Update blkdev_dax_capable() for consistency xfs: Add alignment check for DAX mount ext2: Add alignment check for DAX mount ext4: Add alignment check for DAX mount block: Add bdev_dax_supported() for dax mount checks block: Add vfs_msg() interface dax: Remove redundant inode size checks dax: Remove pointless writeback from dax_do_io() dax: Remove zeroing from dax_io() dax: Remove dead zeroing code from fault handlers ext2: Avoid DAX zeroing to corrupt data ext2: Fix block zeroing in ext2_get_blocks() for DAX dax: Remove complete_unwritten argument DAX: move RADIX_DAX_ definitions to dax.c	2016-05-26 19:34:26 -07:00
Linus Torvalds	0e01df100b	Fix a number of bugs, most notably a potential stale data exposure after a crash and a potential BUG_ON crash if a file has the data journalling flag enabled while it has dirty delayed allocation blocks that haven't been written yet. Also fix a potential crash in the new project quota code and a maliciously corrupted file system. In addition, fix some DAX-specific bugs, including when there is a transient ENOSPC situation and races between writes via direct I/O and an mmap'ed segment that could lead to lost I/O. Finally the usual set of miscellaneous cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXQ40fAAoJEPL5WVaVDYGjnwMH+wXHASgPfzZgtRInsTG8W/2L jsmAcMlyMAYIATWMppNtPIq0td49z1dYO0YkKhtPVMwfzu230IFWhGWp93WqP9ve XYHMmaBorFlMAzWgMKn1K0ExWZlV+ammmcTKgU0kU4qyZp0G/NnMtlXIkSNv2amI 9Mn6R+v97c20gn8e9HWP/IVWkgPr+WBtEXaSGjC7dL6yI8hL+rJMqN82D76oU5ea vtwzrna/ISijy+etYmQzqHNYNaBKf40+B5HxQZw/Ta3FSHofBwXAyLaeEAr260Mf V3Eg2NDcKQxiZ3adBzIUvrRnrJV381OmHoguo8Frs8YHTTRiZ0T/s7FGr2Q0NYE= =7yIM -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Fix a number of bugs, most notably a potential stale data exposure after a crash and a potential BUG_ON crash if a file has the data journalling flag enabled while it has dirty delayed allocation blocks that haven't been written yet. Also fix a potential crash in the new project quota code and a maliciously corrupted file system. In addition, fix some DAX-specific bugs, including when there is a transient ENOSPC situation and races between writes via direct I/O and an mmap'ed segment that could lead to lost I/O. Finally the usual set of miscellaneous cleanups" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits) ext4: pre-zero allocated blocks for DAX IO ext4: refactor direct IO code ext4: fix race in transient ENOSPC detection ext4: handle transient ENOSPC properly for DAX dax: call get_blocks() with create == 1 for write faults to unwritten extents ext4: remove unmeetable inconsisteny check from ext4_find_extent() jbd2: remove excess descriptions for handle_s ext4: remove unnecessary bio get/put ext4: silence UBSAN in ext4_mb_init() ext4: address UBSAN warning in mb_find_order_for_block() ext4: fix oops on corrupted filesystem ext4: fix check of dqget() return value in ext4_ioctl_setproject() ext4: clean up error handling when orphan list is corrupted ext4: fix hang when processing corrupted orphaned inode list ext4: remove trailing \n from ext4_warning/ext4_error calls ext4: fix races between changing inode journal mode and ext4_writepages ext4: handle unwritten or delalloc buffers before enabling data journaling ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart() ext4: do not ask jbd2 to write data for delalloc buffers jbd2: add support for avoiding data writes during transaction commits ...	2016-05-24 12:55:26 -07:00
Andy Shevchenko	8da4b8c48e	lib/uuid.c: move generate_random_uuid() to uuid.c Let's gather the UUID related functions under one hood. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-20 17:58:30 -07:00
Linus Torvalds	c2e7b20705	Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs cleanups from Al Viro: "More cleanups from Christoph" * 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsd: use RWF_SYNC fs: add RWF_DSYNC aand RWF_SYNC ceph: use generic_write_sync fs: simplify the generic_write_sync prototype fs: add IOCB_SYNC and IOCB_DSYNC direct-io: remove the offset argument to dio_complete direct-io: eliminate the offset argument to ->direct_IO xfs: eliminate the pos variable in xfs_file_dio_aio_write filemap: remove the pos argument to generic_file_direct_write filemap: remove pos variables in generic_file_read_iter	2016-05-17 15:05:23 -07:00
Toshi Kani	87eefeb4e8	ext4: Add alignment check for DAX mount When a partition is not aligned by 4KB, mount -o dax succeeds, but any read/write access to the filesystem fails, except for metadata update. Call bdev_dax_supported() to perform proper precondition checks which includes this partition alignment check. Reported-by: Micah Parrish <micah.parrish@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>	2016-05-17 00:44:11 -06:00
Al Viro	0e0162bb8c	Merge branch 'ovl-fixes' into for-linus Backmerge to resolve a conflict in ovl_lookup_real(); "ovl_lookup_real(): use lookup_one_len_unlocked()" instead, but it was too late in the cycle to rebase.	2016-05-17 02:17:59 -04:00
Jan Kara	02fbd13975	dax: Remove complete_unwritten argument Fault handlers currently take complete_unwritten argument to convert unwritten extents after PTEs are updated. However no filesystem uses this anymore as the code is racy. Remove the unused argument. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>	2016-05-16 18:11:51 -06:00
Jan Kara	12735f8819	ext4: pre-zero allocated blocks for DAX IO Currently ext4 treats DAX IO the same way as direct IO. I.e., it allocates unwritten extents before IO is done and converts unwritten extents afterwards. However this way DAX IO can race with page fault to the same area: ext4_ext_direct_IO() dax_fault() dax_io() get_block() - allocates unwritten extent copy_from_iter_pmem() get_block() - converts unwritten block to written and zeroes it out ext4_convert_unwritten_extents() So data written with DAX IO gets lost. Similarly dax_new_buf() called from dax_io() can overwrite data that has been already written to the block via mmap. Fix the problem by using pre-zeroed blocks for DAX IO the same way as we use them for DAX mmap. The downside of this solution is that every allocating write writes each block twice (once zeros, once data). Fixing the race with locking is possible as well however we would need to lock-out faults for the whole range written to by DAX IO. And that is not easy to do without locking-out faults for the whole file which seems too aggressive. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-05-13 00:51:15 -04:00
Jan Kara	914f82a32d	ext4: refactor direct IO code Currently ext4 direct IO handling is split between ext4_ext_direct_IO() and ext4_ind_direct_IO(). However the extent based function calls into the indirect based one for some cases and for example it is not able to handle file extending. Previously it was not also properly handling retries in case of ENOSPC errors. With DAX things would get even more contrieved so just refactor the direct IO code and instead of indirect / extent split do the split to read vs writes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-05-13 00:44:16 -04:00
Jan Kara	dbc427ce40	ext4: fix race in transient ENOSPC detection When there are blocks to free in the running transaction, block allocator can return ENOSPC although the filesystem has some blocks to free. We use ext4_should_retry_alloc() to force commit of the current transaction and return whether anything was committed so that it makes sense to retry the allocation. However the transaction may get committed after block allocation fails but before we call ext4_should_retry_alloc(). So ext4_should_retry_alloc() returns false because there is nothing to commit and we wrongly return ENOSPC. Fix the race by unconditionally returning 1 from ext4_should_retry_alloc() when we tried to commit a transaction. This should not add any unnecessary retries since we had a transaction running a while ago when trying to allocate blocks and we want to retry the allocation once that transaction has committed anyway. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-05-13 00:42:40 -04:00
Jan Kara	7cb476f834	ext4: handle transient ENOSPC properly for DAX ext4_dax_get_blocks() was accidentally omitted fixing get blocks handlers to properly handle transient ENOSPC errors. Fix it now to use ext4_get_blocks_trans() helper which takes care of these errors. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-05-13 00:38:16 -04:00
Al Viro	ae05327a00	ext4: switch to ->iterate_shared() Note that we need relax_dir() equivalent for directories locked shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-12 20:36:01 -04:00
Nicolai Stange	816cd71b0c	ext4: remove unmeetable inconsisteny check from ext4_find_extent() ext4_find_extent(), stripped down to the parts relevant to this patch, reads as ppos = 0; i = depth; while (i) { --i; ++ppos; if (unlikely(ppos > depth)) { ... ret = -EFSCORRUPTED; goto err; } } Due to the loop's bounds, the condition ppos > depth can never be met. Remove this dead code. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-05-05 22:43:04 -04:00
Jens Axboe	32157de29a	ext4: remove unnecessary bio get/put ext4_io_submit() used to check for EOPNOTSUPP after bio submission, which is why it had to get an extra reference to the bio before submitting it. But since we no longer touch the bio after submission, get rid of the redundant get/put of the bio. If we do get the extra reference, we enter the slower path of having to flag this bio as now having external references. Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-05-05 22:09:49 -04:00
Nicolai Stange	935244cd54	ext4: silence UBSAN in ext4_mb_init() Currently, in ext4_mb_init(), there's a loop like the following: do { ... offset += 1 << (sb->s_blocksize_bits - i); i++; } while (i <= sb->s_blocksize_bits + 1); Note that the updated offset is used in the loop's next iteration only. However, at the last iteration, that is at i == sb->s_blocksize_bits + 1, the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3)) and UBSAN reports UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15 shift exponent 4294967295 is too large for 32-bit type 'int' [...] Call Trace: [<ffffffff818c4d25>] dump_stack+0xbc/0x117 [<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169 [<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e [<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254 [<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158 [<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390 [<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0 [<ffffffff814293c7>] ? create_cache+0x57/0x1f0 [<ffffffff8142948a>] ? create_cache+0x11a/0x1f0 [<ffffffff821c2168>] ? mutex_lock+0x38/0x60 [<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50 [<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0 [<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0 [<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0 [...] Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1. Unless compilers start to do some fancy transformations (which at least GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the such calculated value of offset is never used again. Silence UBSAN by introducing another variable, offset_incr, holding the next increment to apply to offset and adjust that one by right shifting it by one position per loop iteration. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161 Cc: stable@vger.kernel.org Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-05-05 19:46:19 -04:00
Nicolai Stange	b5cb316cdf	ext4: address UBSAN warning in mb_find_order_for_block() Currently, in mb_find_order_for_block(), there's a loop like the following: while (order <= e4b->bd_blkbits + 1) { ... bb += 1 << (e4b->bd_blkbits - order); } Note that the updated bb is used in the loop's next iteration only. However, at the last iteration, that is at order == e4b->bd_blkbits + 1, the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11 shift exponent -1 is negative [...] Call Trace: [<ffffffff818c4d35>] dump_stack+0xbc/0x117 [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169 [<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e [<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254 [<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158 [<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590 [<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80 [<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240 [...] Unless compilers start to do some fancy transformations (which at least GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the such calculated value of bb is never used again. Silence UBSAN by introducing another variable, bb_incr, holding the next increment to apply to bb and adjust that one by right shifting it by one position per loop iteration. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161 Cc: stable@vger.kernel.org Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-05-05 17:38:03 -04:00
Jan Kara	74177f55b7	ext4: fix oops on corrupted filesystem When filesystem is corrupted in the right way, it can happen ext4_mark_iloc_dirty() in ext4_orphan_add() returns error and we subsequently remove inode from the in-memory orphan list. However this deletion is done with list_del(&EXT4_I(inode)->i_orphan) and thus we leave i_orphan list_head with a stale content. Later we can look at this content causing list corruption, oops, or other issues. The reported trace looked like: WARNING: CPU: 0 PID: 46 at lib/list_debug.c:53 __list_del_entry+0x6b/0x100() list_del corruption, 0000000061c1d6e0->next is LIST_POISON1 0000000000100100) CPU: 0 PID: 46 Comm: ext4.exe Not tainted 4.1.0-rc4+ #250 Stack: 60462947 62219960 602ede24 62219960 602ede24 603ca293 622198f0 602f02eb 62219950 6002c12c 62219900 601b4d6b Call Trace: [<6005769c>] ? vprintk_emit+0x2dc/0x5c0 [<602ede24>] ? printk+0x0/0x94 [<600190bc>] show_stack+0xdc/0x1a0 [<602ede24>] ? printk+0x0/0x94 [<602ede24>] ? printk+0x0/0x94 [<602f02eb>] dump_stack+0x2a/0x2c [<6002c12c>] warn_slowpath_common+0x9c/0xf0 [<601b4d6b>] ? __list_del_entry+0x6b/0x100 [<6002c254>] warn_slowpath_fmt+0x94/0xa0 [<602f4d09>] ? __mutex_lock_slowpath+0x239/0x3a0 [<6002c1c0>] ? warn_slowpath_fmt+0x0/0xa0 [<60023ebf>] ? set_signals+0x3f/0x50 [<600a205a>] ? kmem_cache_free+0x10a/0x180 [<602f4e88>] ? mutex_lock+0x18/0x30 [<601b4d6b>] __list_del_entry+0x6b/0x100 [<601177ec>] ext4_orphan_del+0x22c/0x2f0 [<6012f27c>] ? __ext4_journal_start_sb+0x2c/0xa0 [<6010b973>] ? ext4_truncate+0x383/0x390 [<6010bc8b>] ext4_write_begin+0x30b/0x4b0 [<6001bb50>] ? copy_from_user+0x0/0xb0 [<601aa840>] ? iov_iter_fault_in_readable+0xa0/0xc0 [<60072c4f>] generic_perform_write+0xaf/0x1e0 [<600c4166>] ? file_update_time+0x46/0x110 [<60072f0f>] __generic_file_write_iter+0x18f/0x1b0 [<6010030f>] ext4_file_write_iter+0x15f/0x470 [<60094e10>] ? unlink_file_vma+0x0/0x70 [<6009b180>] ? unlink_anon_vmas+0x0/0x260 [<6008f169>] ? free_pgtables+0xb9/0x100 [<600a6030>] __vfs_write+0xb0/0x130 [<600a61d5>] vfs_write+0xa5/0x170 [<600a63d6>] SyS_write+0x56/0xe0 [<6029fcb0>] ? __libc_waitpid+0x0/0xa0 [<6001b698>] handle_syscall+0x68/0x90 [<6002633d>] userspace+0x4fd/0x600 [<6002274f>] ? save_registers+0x1f/0x40 [<60028bd7>] ? arch_prctl+0x177/0x1b0 [<60017bd5>] fork_handler+0x85/0x90 Fix the problem by using list_del_init() as we always should with i_orphan list. CC: stable@vger.kernel.org Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-05-05 11:10:15 -04:00
Seth Forshee	ff0bc08454	ext4: fix check of dqget() return value in ext4_ioctl_setproject() A failed call to dqget() returns an ERR_PTR() and not null. Fix the check in ext4_ioctl_setproject() to handle this correctly. Fixes: `9b7365fc1c` ("ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support") Cc: stable@vger.kernel.org # v4.5 Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2016-05-05 10:52:38 -04:00
Al Viro	84695ffee7	Merge getxattr prototype change into work.lookups The rest of work.xattr stuff isn't needed for this branch	2016-05-02 19:45:47 -04:00
Christoph Hellwig	e259221763	fs: simplify the generic_write_sync prototype The kiocb already has the new position, so use that. The only interesting case is AIO, where we currently don't bother updating ki_pos. We're about to free the kiocb after we're done, so we might as well update it to make everyone's life simpler. While we're at it also return the bytes written argument passed in if we were successful so that the boilerplate error switch code in the callers can go away. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-01 19:58:39 -04:00
Christoph Hellwig	dde0c2e798	fs: add IOCB_SYNC and IOCB_DSYNC This will allow us to do per-I/O sync file writes, as required by a lot of fileservers or storage targets. XXX: Will need a few additional audits for O_DSYNC Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-01 19:58:39 -04:00
Christoph Hellwig	c8b8e32d70	direct-io: eliminate the offset argument to ->direct_IO Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually work, so eliminate the superflous argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-01 19:58:39 -04:00
Theodore Ts'o	7827a7f6eb	ext4: clean up error handling when orphan list is corrupted Instead of just printing warning messages, if the orphan list is corrupted, declare the file system is corrupted. If there are any reserved inodes in the orphaned inode list, declare the file system corrupted and stop right away to avoid doing more potential damage to the file system. Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-04-30 00:49:54 -04:00
Theodore Ts'o	c9eb13a910	ext4: fix hang when processing corrupted orphaned inode list If the orphaned inode list contains inode #5, ext4_iget() returns a bad inode (since the bootloader inode should never be referenced directly). Because of the bad inode, we end up processing the inode repeatedly and this hangs the machine. This can be reproduced via: mke2fs -t ext4 /tmp/foo.img 100 debugfs -w -R "ssv last_orphan 5" /tmp/foo.img mount -o loop /tmp/foo.img /mnt (But don't do this if you are using an unpatched kernel if you care about the system staying functional. :-) This bug was found by the port of American Fuzzy Lop into the kernel to find file system problems[1]. (Since it only happens if inode #5 shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not surprising that AFL needed two hours before it found it.) [1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf Cc: stable@vger.kernel.org Reported by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-04-30 00:48:54 -04:00
Jakub Wilk	8d2ae1cbe8	ext4: remove trailing \n from ext4_warning/ext4_error calls Messages passed to ext4_warning() or ext4_error() don't need trailing newlines, because these function add the newlines themselves. Signed-off-by: Jakub Wilk <jwilk@jwilk.net>	2016-04-27 01:11:21 -04:00
Daeho Jeong	c8585c6fca	ext4: fix races between changing inode journal mode and ext4_writepages In ext4, there is a race condition between changing inode journal mode and ext4_writepages(). While ext4_writepages() is executed on a non-journalled mode inode, the inode's journal mode could be enabled by ioctl() and then, some pages dirtied after switching the journal mode will be still exposed to ext4_writepages() in non-journaled mode. To resolve this problem, we use fs-wide per-cpu rw semaphore by Jan Kara's suggestion because we don't want to waste ext4_inode_info's space for this extra rare case. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2016-04-25 23:22:35 -04:00
Daeho Jeong	4c54659269	ext4: handle unwritten or delalloc buffers before enabling data journaling We already allocate delalloc blocks before changing the inode mode into "per-file data journal" mode to prevent delalloc blocks from remaining not allocated, but another issue concerned with "BH_Unwritten" status still exists. For example, by fallocate(), several buffers' status change into "BH_Unwritten", but these buffers cannot be processed by ext4_alloc_da_blocks(). So, they still remain in unwritten status after per-file data journaling is enabled and they cannot be changed into written status any more and, if they are journaled and eventually checkpointed, these unwritten buffer will cause a kernel panic by the below BUG_ON() function of submit_bh_wbc() when they are submitted during checkpointing. static int submit_bh_wbc(int rw, struct buffer_head *bh,... { ... BUG_ON(buffer_unwritten(bh)); Moreover, when "dioread_nolock" option is enabled, the status of a buffer is changed into "BH_Unwritten" after write_begin() completes and the "BH_Unwritten" status will be cleared after I/O is done. Therefore, if a buffer's status is changed into unwrutten but the buffer's I/O is not submitted and completed, it can cause the same problem after enabling per-file data journaling. You can easily generate this bug by executing the following command. ./kvm-xfstests -C 10000 -m nodelalloc,dioread_nolock generic/269 To resolve these problems and define a boundary between the previous mode and per-file data journaling mode, we need to flush and wait all the I/O of buffers of a file before enabling per-file data journaling of the file. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2016-04-25 23:21:00 -04:00
Theodore Ts'o	7b8081912d	ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart() The function jbd2_journal_extend() takes as its argument the number of new credits to be added to the handle. We weren't taking into account the currently unused handle credits; worse, we would try to extend the handle by N credits when it had N credits available. In the case where jbd2_journal_extend() fails because the transaction is too large, when jbd2_journal_restart() gets called, the N credits owned by the handle gets returned to the transaction, and the transaction commit is asynchronously requested, and then start_this_handle() will be able to successfully attach the handle to the current transaction since the required credits are now available. This is mostly harmless, but since ext4_ext_truncate_extend_restart() returns EAGAIN, the truncate machinery will once again try to call ext4_ext_truncate_extend_restart(), which will do the above sequence over and over again until the transaction has committed. This was found while I was debugging a lockup in caused by running xfstests generic/074 in the data=journal case. I'm still not sure why we ended up looping forever, which suggests there may still be another bug hiding in the transaction accounting machinery, but this commit prevents us from looping in the first place. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-04-25 23:13:17 -04:00
Jan Kara	ee0876bc69	ext4: do not ask jbd2 to write data for delalloc buffers Currently we ask jbd2 to write all dirty allocated buffers before committing a transaction when doing writeback of delay allocated blocks. However this is unnecessary since we move all pages to writeback state before dropping a transaction handle and then submit all the necessary IO. We still need the transaction commit to wait for all the outstanding writeback before flushing disk caches during transaction commit to avoid data exposure issues though. Use the new jbd2 capability and ask it to only wait for outstanding writeback during transaction commit when writing back data in ext4_writepages(). Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-04-24 00:56:08 -04:00
Jan Kara	41617e1a8d	jbd2: add support for avoiding data writes during transaction commits Currently when filesystem needs to make sure data is on permanent storage before committing a transaction it adds inode to transaction's inode list. During transaction commit, jbd2 writes back all dirty buffers that have allocated underlying blocks and waits for the IO to finish. However when doing writeback for delayed allocated data, we allocate blocks and immediately submit the data. Thus asking jbd2 to write dirty pages just unnecessarily adds more work to jbd2 possibly writing back other redirtied blocks. Add support to jbd2 to allow filesystem to ask jbd2 to only wait for outstanding data writes before committing a transaction and thus avoid unnecessary writes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-04-24 00:56:07 -04:00
Jan Kara	3957ef53a5	ext4: remove EXT4_STATE_ORDERED_MODE This flag is just duplicating what ext4_should_order_data() tells you and is used in a single place. Furthermore it doesn't reflect changes to inode data journalling flag so it may be possibly misleading. Just remove it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-04-24 00:56:05 -04:00
Jan Kara	06bd3c36a7	ext4: fix data exposure after a crash Huang has reported that in his powerfail testing he is seeing stale block contents in some of recently allocated blocks although he mounts ext4 in data=ordered mode. After some investigation I have found out that indeed when delayed allocation is used, we don't add inode to transaction's list of inodes needing flushing before commit. Originally we were doing that but commit `f3b59291a6` removed the logic with a flawed argument that it is not needed. The problem is that although for delayed allocated blocks we write their contents immediately after allocating them, there is no guarantee that the IO scheduler or device doesn't reorder things and thus transaction allocating blocks and attaching them to inode can reach stable storage before actual block contents. Actually whenever we attach freshly allocated blocks to inode using a written extent, we should add inode to transaction's ordered inode list to make sure we properly wait for block contents to be written before committing the transaction. So that is what we do in this patch. This also handles other cases where stale data exposure was possible - like filling hole via mmap in data=ordered,nodelalloc mode. The only exception to the above rule are extending direct IO writes where blkdev_direct_IO() waits for IO to complete before increasing i_size and thus stale data exposure is not possible. For now we don't complicate the code with optimizing this special case since the overhead is pretty low. In case this is observed to be a performance problem we can always handle it using a special flag to ext4_map_blocks(). CC: stable@vger.kernel.org Fixes: `f3b59291a6` Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-04-24 00:56:03 -04:00
Theodore Ts'o	1f60fbe727	ext4: allow readdir()'s of large empty directories to be interrupted If a directory has a large number of empty blocks, iterating over all of them can take a long time, leading to scheduler warnings and users getting irritated when they can't kill a process in the middle of one of these long-running readdir operations. Fix this by adding checks to ext4_readdir() and ext4_htree_fill_tree(). This was reverted earlier due to a typo in the original commit where I experimented with using signal_pending() instead of fatal_signal_pending(). The test was in the wrong place if we were going to return signal_pending() since we would end up returning duplicant entries. See `9f2394c9be` for a more detailed explanation. Added fix as suggested by Linus to check for signal_pending() in in the filldir() functions. Reported-by: Benjamin LaHaise <bcrl@kvack.org> Google-Bug-Id: 27880676 Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-04-23 22:50:07 -04:00
Jaegeuk Kim	03a8bb0e53	ext4/fscrypto: avoid RCU lookup in d_revalidate As Al pointed, d_revalidate should return RCU lookup before using d_inode. This was originally introduced by: commit `34286d6662` ("fs: rcu-walk aware d_revalidate method"). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: stable <stable@vger.kernel.org>	2016-04-12 20:01:35 -07:00
Al Viro	b296821a7c	xattr_handler: pass dentry and inode as separate arguments of ->get() ... and do not assume they are already attached to each other Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-04-10 20:48:24 -04:00
Linus Torvalds	9f2394c9be	Revert "ext4: allow readdir()'s of large empty directories to be interrupted" This reverts commit `1028b55baf`. It's broken: it makes ext4 return an error at an invalid point, causing the readdir wrappers to write the the position of the last successful directory entry into the position field, which means that the next readdir will now return that last successful entry _again_. You can only return fatal errors (that terminate the readdir directory walk) from within the filesystem readdir functions, the "normal" errors (that happen when the readdir buffer fills up, for example) happen in the iterorator where we know the position of the actual failing entry. I do have a very different patch that does the "signal_pending()" handling inside the iterator function where it is allowable, but while that one passes all the sanity checks, I screwed up something like four times while emailing it out, so I'm not going to commit it today. So my track record is not good enough, and the stars will have to align better before that one gets committed. And it would be good to get some review too, of course, since celestial alignments are always an iffy debugging model. IOW, let's just revert the commit that caused the problem for now. Reported-by: Greg Thelen <gthelen@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-10 16:52:24 -07:00
Al Viro	fc64005c93	don't bother with ->d_inode->i_sb - it's always equal to ->d_sb ... and neither can ever be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-04-10 17:11:51 -04:00
Linus Torvalds	93061f390f	These changes contains a fix for overlayfs interacting with some (badly behaved) dentry code in various file systems. These have been reviewed by Al and the respective file system mtinainers and are going through the ext4 tree for convenience. This also has a few ext4 encryption bug fixes that were discovered in Android testing (yes, we will need to get these sync'ed up with the fs/crypto code; I'll take care of that). It also has some bug fixes and a change to ignore the legacy quota options to allow for xfstests regression testing of ext4's internal quota feature and to be more consistent with how xfs handles this case. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXBn4aAAoJEPL5WVaVDYGjHWgH/2wXnlQnC2ndJhblBWtPzprz OQW4dawdnhxqbTEGUqWe942tZivSb/liu/lF+urCGbWsbgz9jNOCmEAg7JPwlccY mjzwDvtVq5U4d2rP+JDWXLy/Gi8XgUclhbQDWFVIIIea6fS7IuFWqoVBR+HPMhra 9tEygpiy5lNtJA/hqq3/z9x0AywAjwrYR491CuWreo2Uu1aeKg0YZsiDsuAcGioN Waa2TgbC/ZZyJuJcPBP8If+VOFAa0ea3F+C/o7Tb9bOqwuz0qSTcaMRgt6eQ2KUt P4b9Ecp1XLjJTC7IYOknUOScY3lCyREx/Xya9oGZfFNTSHzbOlLBoplCr3aUpYQ= =/HHR -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "These changes contains a fix for overlayfs interacting with some (badly behaved) dentry code in various file systems. These have been reviewed by Al and the respective file system mtinainers and are going through the ext4 tree for convenience. This also has a few ext4 encryption bug fixes that were discovered in Android testing (yes, we will need to get these sync'ed up with the fs/crypto code; I'll take care of that). It also has some bug fixes and a change to ignore the legacy quota options to allow for xfstests regression testing of ext4's internal quota feature and to be more consistent with how xfs handles this case" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: ignore quota mount options if the quota feature is enabled ext4 crypto: fix some error handling ext4: avoid calling dquot_get_next_id() if quota is not enabled ext4: retry block allocation for failed DIO and DAX writes ext4: add lockdep annotations for i_data_sem ext4: allow readdir()'s of large empty directories to be interrupted btrfs: fix crash/invalid memory access on fsync when using overlayfs ext4 crypto: use dget_parent() in ext4_d_revalidate() ext4: use file_dentry() ext4: use dget_parent() in ext4_file_open() nfs: use file_dentry() fs: add file_dentry() ext4 crypto: don't let data integrity writebacks fail with ENOMEM ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()	2016-04-07 17:22:20 -07:00
Kirill A. Shutemov	ea1754a084	mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Kirill A. Shutemov	09cbfeaf1a	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Theodore Ts'o	c325a67c72	ext4: ignore quota mount options if the quota feature is enabled Previously, ext4 would fail the mount if the file system had the quota feature enabled and quota mount options (used for the older quota setups) were present. This broke xfstests, since xfs silently ignores the usrquote and grpquota mount options if they are specified. This commit changes things so that we are consistent with xfs; having the mount options specified is harmless, so no sense break users by forbidding them. Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-04-03 17:03:37 -04:00
Dan Carpenter	4762cc3fbb	ext4 crypto: fix some error handling We should be testing for -ENOMEM but the minus sign is missing. Fixes: `c9af28fdd4` ('ext4 crypto: don't let data integrity writebacks fail with ENOMEM') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-04-02 18:13:38 -04:00
Theodore Ts'o	8f0e8746b4	ext4: avoid calling dquot_get_next_id() if quota is not enabled This should be fixed in the quota layer so we can test with the quota mutex held, but for now, we need this to avoid tests from crashing the kernel aborting the regression test suite. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-04-01 12:00:03 -04:00
Jan Kara	e84dfbe2bf	ext4: retry block allocation for failed DIO and DAX writes Currently if block allocation for DIO or DAX write fails due to ENOSPC, we just returned it to userspace. However these ENOSPC errors can be transient because the transaction freeing blocks has not yet committed. This demonstrates as failures of generic/102 test when the filesystem is mounted with 'dax' mount option. Fix the problem by properly retrying the allocation in case of ENOSPC error in get blocks functions used for direct IO. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>	2016-04-01 02:07:22 -04:00
Theodore Ts'o	daf647d2dd	ext4: add lockdep annotations for i_data_sem With the internal Quota feature, mke2fs creates empty quota inodes and quota usage tracking is enabled as soon as the file system is mounted. Since quotacheck is no longer preallocating all of the blocks in the quota inode that are likely needed to be written to, we are now seeing a lockdep false positive caused by needing to allocate a quota block from inside ext4_map_blocks(), while holding i_data_sem for a data inode. This results in this complaint: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ei->i_data_sem); lock(&s->s_dquot.dqio_mutex); lock(&ei->i_data_sem); lock(&s->s_dquot.dqio_mutex); Google-Bug-Id: 27907753 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2016-04-01 01:31:28 -04:00
Andreas Gruenbacher	b8a7a3a667	posix_acl: Inode acl caching fixes When get_acl() is called for an inode whose ACL is not cached yet, the get_acl inode operation is called to fetch the ACL from the filesystem. The inode operation is responsible for updating the cached acl with set_cached_acl(). This is done without locking at the VFS level, so another task can call set_cached_acl() or forget_cached_acl() before the get_acl inode operation gets to calling set_cached_acl(), and then get_acl's call to set_cached_acl() results in caching an outdate ACL. Prevent this from happening by setting the cached ACL pointer to a task-specific sentinel value before calling the get_acl inode operation. Move the responsibility for updating the cached ACL from the get_acl inode operations to get_acl(). There, only set the cached ACL if the sentinel value hasn't changed. The sentinel values are chosen to have odd values. Likewise, the value of ACL_NOT_CACHED is odd. In contrast, ACL object pointers always have an even value (ACLs are aligned in memory). This allows to distinguish uncached ACLs values from ACL objects. In addition, switch from guarding inode->i_acl and inode->i_default_acl upates by the inode->i_lock spinlock to using xchg() and cmpxchg(). Filesystems that do not want ACLs returned from their get_acl inode operations to be cached must call forget_cached_acl() to prevent the VFS from doing so. (Patch written by Al Viro and Andreas Gruenbacher.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-31 00:30:15 -04:00
Theodore Ts'o	1028b55baf	ext4: allow readdir()'s of large empty directories to be interrupted If a directory has a large number of empty blocks, iterating over all of them can take a long time, leading to scheduler warnings and users getting irritated when they can't kill a process in the middle of one of these long-running readdir operations. Fix this by adding checks to ext4_readdir() and ext4_htree_fill_tree(). Reported-by: Benjamin LaHaise <bcrl@kvack.org> Google-Bug-Id: 27880676 Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-30 22:36:24 -04:00
Theodore Ts'o	3d43bcfef5	ext4 crypto: use dget_parent() in ext4_d_revalidate() This avoids potential problems caused by a race where the inode gets renamed out from its parent directory and the parent directory is deleted while ext4_d_revalidate() is running. Fixes: `28b4c26396` Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2016-03-26 16:15:42 -04:00
Miklos Szeredi	c0a37d4878	ext4: use file_dentry() EXT4 may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Reported-by: Daniel Axtens <dja@axtens.net> Fixes: `4bacc9c923` ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Fixes: `ff978b09f9` ("ext4 crypto: move context consistency check to ext4_file_open()") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> # v4.5	2016-03-26 16:14:42 -04:00
Miklos Szeredi	9dd78d8c9a	ext4: use dget_parent() in ext4_file_open() In f_op->open() lock on parent is not held, so there's no guarantee that parent dentry won't go away at any time. Even after this patch there's no guarantee that 'dir' will stay the parent of 'inode', but at least it won't be freed while being used. Fixes: `ff978b09f9` ("ext4 crypto: move context consistency check to ext4_file_open()") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: <stable@vger.kernel.org> # v4.5	2016-03-26 16:14:41 -04:00
Theodore Ts'o	c9af28fdd4	ext4 crypto: don't let data integrity writebacks fail with ENOMEM We don't want the writeback triggered from the journal commit (in data=writeback mode) to cause the journal to abort due to generic_writepages() returning an ENOMEM error. In addition, if fsync() fails with ENOMEM, most applications will probably not do the right thing. So if we are doing a data integrity sync, and ext4_encrypt() returns ENOMEM, we will submit any queued I/O to date, and then retry the allocation using GFP_NOFAIL. Google-Bug-Id: 27641567 Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-26 16:14:34 -04:00
Andy Lutomirski	121cef8f17	ext4: in ext4_dir_llseek, check syscall bitness directly ext4 treats directory offsets differently for 32-bit and 64-bit callers. Check the caller type using in_compat_syscall, not is_compat_task. This changes behavior on SPARC slightly. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-22 15:36:02 -07:00
Theodore Ts'o	9e92f48c34	ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea() We aren't checking to see if the in-inode extended attribute is corrupted before we try to expand the inode's extra isize fields. This can lead to potential crashes caused by the BUG_ON() check in ext4_xattr_shift_entries(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-22 16:13:15 -04:00

1 2 3 4 5 ...

2946 Commits