Having automatic updates seems pointless for production system, and
even dangerous and thus counter-productive:
1. If we can mount pstore, or read files, we can as well read
/proc/kmsg. So, there's little point in duplicating the
functionality and present the same information but via another
userland ABI;
2. Expecting the kernel to behave sanely after oops/panic is naive.
It might work, but you'd rather not try it. Screwed up kernel
can do rather bad things, like recursive faults[1]; and pstore
rather provoking bad things to happen. It uses:
1. Timers (assumes sane interrupts state);
2. Workqueues and mutexes (assumes scheduler in a sane state);
3. kzalloc (a working slab allocator);
That's too much for a dead kernel, so the debugging facility
itself might just make debugging harder, which is not what
we want.
Maybe for non-oops message types it would make sense to re-enable
automatic updates, but so far I don't see any use case for this.
Even for tracing, it has its own run-time/normal ABI, so we're
only interested in pstore upon next boot, to retrieve what has
gone wrong with HW or SW.
So, let's disable the updates by default.
[1]
BUG: unable to handle kernel paging request at fffffffffffffff8
IP: [<ffffffff8104801b>] kthread_data+0xb/0x20
[...]
Process kworker/0:1 (pid: 14, threadinfo ffff8800072c0000, task ffff88000725b100)
[...
Call Trace:
[<ffffffff81043710>] wq_worker_sleeping+0x10/0xa0
[<ffffffff813687a8>] __schedule+0x568/0x7d0
[<ffffffff8106c24d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81087e22>] ? call_rcu_sched+0x12/0x20
[<ffffffff8102b596>] ? release_task+0x156/0x2d0
[<ffffffff8102b45e>] ? release_task+0x1e/0x2d0
[<ffffffff8106c24d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81368ac4>] schedule+0x24/0x70
[<ffffffff8102cba8>] do_exit+0x1f8/0x370
[<ffffffff810051e7>] oops_end+0x77/0xb0
[<ffffffff8135c301>] no_context+0x1a6/0x1b5
[<ffffffff8135c4de>] __bad_area_nosemaphore+0x1ce/0x1ed
[<ffffffff81053156>] ? ttwu_queue+0xc6/0xe0
[<ffffffff8135c50b>] bad_area_nosemaphore+0xe/0x10
[<ffffffff8101fa47>] do_page_fault+0x2c7/0x450
[<ffffffff8106e34b>] ? __lock_release+0x6b/0xe0
[<ffffffff8106bf21>] ? mark_held_locks+0x61/0x140
[<ffffffff810502fe>] ? __wake_up+0x4e/0x70
[<ffffffff81185f7d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff81158970>] ? pstore_register+0x120/0x120
[<ffffffff8136a37f>] page_fault+0x1f/0x30
[<ffffffff81158970>] ? pstore_register+0x120/0x120
[<ffffffff81185ab8>] ? memcpy+0x68/0x110
[<ffffffff8115875a>] ? pstore_get_records+0x3a/0x130
[<ffffffff811590f4>] ? persistent_ram_copy_old+0x64/0x90
[<ffffffff81158bf4>] ramoops_pstore_read+0x84/0x130
[<ffffffff81158799>] pstore_get_records+0x79/0x130
[<ffffffff81042536>] ? process_one_work+0x116/0x450
[<ffffffff81158970>] ? pstore_register+0x120/0x120
[<ffffffff8115897e>] pstore_dowork+0xe/0x10
[<ffffffff81042594>] process_one_work+0x174/0x450
[<ffffffff81042536>] ? process_one_work+0x116/0x450
[<ffffffff81042e13>] worker_thread+0x123/0x2d0
[<ffffffff81042cf0>] ? manage_workers.isra.28+0x120/0x120
[<ffffffff81047d8e>] kthread+0x8e/0xa0
[<ffffffff8136ba74>] kernel_thread_helper+0x4/0x10
[<ffffffff8136a199>] ? retint_restore_args+0xe/0xe
[<ffffffff81047d00>] ? __init_kthread_worker+0x70/0x70
[<ffffffff8136ba70>] ? gs_change+0xb/0xb
Code: be e2 00 00 00 48 c7 c7 d1 2a 4e 81 e8 bf fb fd ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 44 00 00 48 8b 87 08 02 00 00 55 48 89 e5 <48> 8b 40 f8 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
RIP [<ffffffff8104801b>] kthread_data+0xb/0x20
RSP <ffff8800072c1888>
CR2: fffffffffffffff8
---[ end trace 996a332dc399111d ]---
Fixing recursive fault but reboot is needed!
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no behavioural change, the default value is still 60 seconds.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code tried to maintain the global list of persistent ram zones,
which isn't a great idea overall, plus since Android's ram_console
is no longer there, we can remove some unused functions.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Colin Cross <ccross@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since we use multiple regions, the messages are somewhat annoying.
We do print total mapped memory already, so no need to print the
information for each region in the library routines.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Colin Cross <ccross@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The console log size is configurable via ramoops.console_size
module option, and the log itself is available via
<pstore-mount>/console-ramoops file.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This will help make code clearer when we'll add support for other
message types.
The patch also changes return value from -EINVAL to 0 in case of
end-of-records. The exact value doesn't matter for pstore (it should
be just <= 0), but 0 feels more correct.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This will help make code clearer when we'll add support for other
message types.
This also makes probe() much shorter and understandable, plus
makes mem/record size checking a bit easier.
Implementation detail: we now use a paddr pointer, this will
be used for allocating persistent ram zones for other message
types.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're about to add support for other message types, so let's rename
some variables to not be confused later.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pstore doesn't support logging kernel messages in run-time, it only
dumps dmesg when kernel oopses/panics. This makes pstore useless for
debugging hangs caused by HW issues or improper use of HW (e.g.
weird device inserted -> driver tried to write a reserved bits ->
SoC hanged. In that case we don't get any messages in the pstore.
Therefore, let's add a runtime logging support: PSTORE_TYPE_CONSOLE.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Colin Cross <ccross@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's no reason to extern it. The patch fixes the annoying sparse
warning:
CHECK fs/pstore/inode.c
fs/pstore/inode.c:264:5: warning: symbol 'pstore_fill_super' was not
declared. Should it be static?
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Otherwise, unlinked file will reappear on the next boot.
Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A handy function that we will use outside of ram_core soon. But
so far just factor it out and start using it in post_init().
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Otherwise, the files will survive just one reboot, and on a subsequent
boot they will disappear.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Without the update, we'll only see the new dmesg buffer after the
reboot, but previously we could see it right away. Making an oops
visible in pstore filesystem before reboot is a somewhat dubious
feature, but removing it wasn't an intentional change, so let's
restore it.
For this we have to make persistent_ram_save_old() safe for calling
multiple times, and also extern it.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Merge fixups for the mac NLS tables from Andrew.
* emailed from Andrew Morton, and one cleanup by me:
nls: fix (and rename) mac NLS table files and config options
fs/nls/Makefile: remove bogus CONFIG_ assignments
The config options in the Kconfig file (with _CODEPAGE_ in the name)
didn't match the config option name in the Makefile (no _CODEPAGE_).
And both of them were of the hard-to-read MACXYZZY variety, which made
them hard to parse for normal humans: MACROMAN easily reads as "macro
man", not as "Mac Roman".
So rename the options to be consistent, and be NLS_MAC_xyzzy. Rename
the files to be mac-xyzzy.c too, and drop the "nls" part entirely (it's
already in the directory name).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These were debug things which snuck through.
Reported-by: Yinghai Lu <yinghai@kernel.org>
Cc: Vladimir Serbinenko <phcoder@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Updates to mxc_nand and gpmi drivers to support new boards and device tree
- Improve consistency of information about ECC strength in NAND devices
- Clean up partition handling of plat_nand
- Support NAND drivers without dedicated access to OOB area
- BCH hardware ECC support for OMAP
- Other fixes and cleanups, and a few new device IDs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iEYEABECAAYFAk/JG1wACgkQdwG7hYl686M80wCglN4kutx20j+KJWuZofkr9Hog
weEAoI4jrqEWEdW9EcT2CIWQw7eG+1v+
=7tdo
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd
Pull mtd update from David Woodhouse:
- More robust parsing especially of xattr data in JFFS2
- Updates to mxc_nand and gpmi drivers to support new boards and device tree
- Improve consistency of information about ECC strength in NAND devices
- Clean up partition handling of plat_nand
- Support NAND drivers without dedicated access to OOB area
- BCH hardware ECC support for OMAP
- Other fixes and cleanups, and a few new device IDs
Fixed trivial conflict in drivers/mtd/nand/gpmi-nand/gpmi-nand.c due to
added include files next to each other.
* tag 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd: (75 commits)
mtd: mxc_nand: move ecc strengh setup before nand_scan_tail
mtd: block2mtd: fix recursive call of mtd_writev
mtd: gpmi-nand: define ecc.strength
mtd: of_parts: fix breakage in Kconfig
mtd: nand: fix scan_read_raw_oob
mtd: docg3 fix in-middle of blocks reads
mtd: cfi_cmdset_0002: Slight cleanup of fixup messages
mtd: add fixup for S29NS512P NOR flash.
jffs2: allow to complete xattr integrity check on first GC scan
jffs2: allow to discriminate between recoverable and non-recoverable errors
mtd: nand: omap: add support for hardware BCH ecc
ARM: OMAP3: gpmc: add BCH ecc api and modes
mtd: nand: check the return code of 'read_oob/read_oob_raw'
mtd: nand: remove 'sndcmd' parameter of 'read_oob/read_oob_raw'
mtd: m25p80: Add support for Winbond W25Q80BW
jffs2: get rid of jffs2_sync_super
jffs2: remove unnecessary GC pass on sync
jffs2: remove unnecessary GC pass on umount
jffs2: remove lock_super
mtd: gpmi: add gpmi support for mx6q
...
Pull third pile of signal handling patches from Al Viro:
"This time it's mostly helpers and conversions to them; there's a lot
of stuff remaining in the tree, but that'll either go in -rc2
(isolated bug fixes, ideally via arch maintainers' trees) or will sit
there until the next cycle."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
x86: get rid of calling do_notify_resume() when returning to kernel mode
blackfin: check __get_user() return value
whack-a-mole with TIF_FREEZE
FRV: Optimise the system call exit path in entry.S [ver #2]
FRV: Shrink TIF_WORK_MASK [ver #2]
FRV: Prevent syscall exit tracing and notify_resume at end of kernel exceptions
new helper: signal_delivered()
powerpc: get rid of restore_sigmask()
most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set
set_restore_sigmask() is never called without SIGPENDING (and never should be)
TIF_RESTORE_SIGMASK can be set only when TIF_SIGPENDING is set
don't call try_to_freeze() from do_signal()
pull clearing RESTORE_SIGMASK into block_sigmask()
sh64: failure to build sigframe != signal without handler
openrisc: tracehook_signal_handler() is supposed to be called on success
new helper: sigmask_to_save()
new helper: restore_saved_sigmask()
new helpers: {clear,test,test_and_clear}_restore_sigmask()
HAVE_RESTORE_SIGMASK is defined on all architectures now
Pull vfs changes from Al Viro.
"A lot of misc stuff. The obvious groups:
* Miklos' atomic_open series; kills the damn abuse of
->d_revalidate() by NFS, which was the major stumbling block for
all work in that area.
* ripping security_file_mmap() and dealing with deadlocks in the
area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
general.
* ->encode_fh() switched to saner API; insane fake dentry in
mm/cleancache.c gone.
* assorted annotations in fs (endianness, __user)
* parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
* ->update_time() work from Josef.
* other bits and pieces all over the place.
Normally it would've been in two or three pull requests, but
signal.git stuff had eaten a lot of time during this cycle ;-/"
Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
nfs: don't open in ->d_revalidate
vfs: retry last component if opening stale dentry
vfs: nameidata_to_filp(): don't throw away file on error
vfs: nameidata_to_filp(): inline __dentry_open()
vfs: do_dentry_open(): don't put filp
vfs: split __dentry_open()
vfs: do_last() common post lookup
vfs: do_last(): add audit_inode before open
vfs: do_last(): only return EISDIR for O_CREAT
vfs: do_last(): check LOOKUP_DIRECTORY
vfs: do_last(): make ENOENT exit RCU safe
vfs: make follow_link check RCU safe
vfs: do_last(): use inode variable
vfs: do_last(): inline walk_component()
vfs: do_last(): make exit RCU safe
vfs: split do_lookup()
Btrfs: move over to use ->update_time
fs: introduce inode operation ->update_time
reiserfs: get rid of resierfs_sync_super
reiserfs: mark the superblock as dirty a bit later
...
The major new feature added in this update is Darrick J. Wong's
metadata checksum feature, which adds crc32 checksums to ext4's
metadata fields. There is also the usual set of cleanups and bug
fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABCAAGBQJPyNleAAoJENNvdpvBGATwtLMP/i3WsPyTvxmYP6HttHXQb8Jk
GYCoTQ5bZMuTbOwOGg3w137cXWBv5uuPpxIk79YVLHSWx6HuanlGIa7/VnPKIaLu
2ihuvVfnrDqpwQ4MJaSq4R1Eka9JCwZ7HbYYo+fYOVobxgw588JVV9VVI9EdKRGz
z11UkW8iHE0f6Xa5gOhdAMkR0uaPnxwJX/qHZYiHuognRivuwMglqWJSiMr8nQmo
A2GmeoLehhW+k65IqgTCmSW6ZgFTvZdk6bskQIij3fOYHW3hHn/gcLFtmLTIZ/B5
LZdg/lngPYve+R/UyypliGKi+pv1qNEiTiBm0rrBgsdZFkBdGj0soSvGZzeK+Mp4
Q1vAmOBPYPFzs6nVzPst2n/osryyykFCK6TgSGZ50dosJ0NO8cBeDdX/gh9JKD2R
yQUMUltOCCSj/eWU4iwqZ0T3FXRiH/+S3XMHznoKJiwUyGDBNQy4+Yg2k2WzUXrz
Cu5t5BwNG2WNP7y5Et/wmUIzpC7VPId4qYmGyHe7OwTxSJgW+6f7GVkHfjWcDMuv
pGgEUiInbMmLajP3v2/LKfVU4hXLZy4uJbhoBgDdeIpZrnPifJG/MwDOS4W+dLVT
tDzgO1SAh3/E4jATreZ5bjzD/HGsfe1OX09UH3Pbc1EcgkrLnyrQXFwdHshdVu4A
cxMoKNPVCQJySb1UrLkO
=SdJJ
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull Ext4 updates from Theodore Ts'o:
"The major new feature added in this update is Darrick J Wong's
metadata checksum feature, which adds crc32 checksums to ext4's
metadata fields.
There is also the usual set of cleanups and bug fixes."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
ext4: hole-punch use truncate_pagecache_range
jbd2: use kmem_cache_zalloc wrapper instead of flag
ext4: remove mb_groups before tearing down the buddy_cache
ext4: add ext4_mb_unload_buddy in the error path
ext4: don't trash state flags in EXT4_IOC_SETFLAGS
ext4: let getattr report the right blocks in delalloc+bigalloc
ext4: add missing save_error_info() to ext4_error()
ext4: add debugging trigger for ext4_error()
ext4: protect group inode free counting with group lock
ext4: use consistent ssize_t type in ext4_file_write()
ext4: fix format flag in ext4_ext_binsearch_idx()
ext4: cleanup in ext4_discard_allocated_blocks()
ext4: return ENOMEM when mounts fail due to lack of memory
ext4: remove redundundant "(char *) bh->b_data" casts
ext4: disallow hard-linked directory in ext4_lookup
ext4: fix potential integer overflow in alloc_flex_gd()
ext4: remove needs_recovery in ext4_mb_init()
ext4: force ro mount if ext4_setup_super() fails
ext4: fix potential NULL dereference in ext4_free_inodes_counts()
ext4/jbd2: add metadata checksumming to the list of supported features
...
Everyone either defines it in arch thread_info.h or has TIF_RESTORE_SIGMASK
and picks default set_restore_sigmask() in linux/thread_info.h. Kill the
ifdefs, slap #error in linux/thread_info.h to catch breakage when new ones
get merged.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
NFSv4 can't do reliable opens in d_revalidate, since it cannot know whether a
mount needs to be followed or not. It does check d_mountpoint() on the dentry,
which can result in a weird error if the VFS found that the mount does not in
fact need to be followed, e.g.:
# mount --bind /mnt/nfs /mnt/nfs-clone
# echo something > /mnt/nfs/tmp/bar
# echo x > /tmp/file
# mount --bind /tmp/file /mnt/nfs-clone/tmp/bar
# cat /mnt/nfs/tmp/bar
cat: /mnt/nfs/tmp/bar: Not a directory
Which should, by any sane filesystem, result in "something" being printed.
So instead do the open in f_op->open() and in the unlikely case that the cached
dentry turned out to be invalid, drop the dentry and return EOPENSTALE to let
the VFS retry.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
NFS optimizes away d_revalidates for last component of open. This means that
open itself can find the dentry stale.
This patch allows the filesystem to return EOPENSTALE and the VFS will retry the
lookup on just the last component if possible.
If the lookup was done using RCU mode, including the last component, then this
is not possible since the parent dentry is lost. In this case fall back to
non-RCU lookup. Currently this is not used since NFS will always leave RCU
mode.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If open fails, don't put the file. This allows it to be reused if open needs to
be retried.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move put_filp() out to __dentry_open(), the only caller now.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Split __dentry_open() into two functions:
do_dentry_open() - does most of the actual work, doesn't put file on failure
open_check_o_direct() - after a successful open, checks direct_IO method
This will allow i_op->atomic_open to do just the file initialization and leave
the direct_IO checking to the VFS.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now the post lookup code can be shared between O_CREAT and plain opens since
they are essentially the same.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This allows this code to be shared between O_CREAT and plain opens.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This allows this code to be shared between O_CREAT and plain opens.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Check for ENOTDIR before finishing open. This allows this code to be shared
between O_CREAT and plain opens.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use helper variable instead of path->dentry->d_inode before complete_walk().
This will allow this code to be used in RCU mode.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Allow returning from do_last() with LOOKUP_RCU still set on the "out:" and
"exit:" labels.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Split do_lookup() into two functions:
lookup_fast() - does cached lookup without i_mutex
lookup_slow() - does lookup with i_mutex
Both follow managed dentries.
The new functions are needed by atomic_open.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Btrfs had been doing it's own file_update_time so we could catch ENOSPC
properly, so just update our btrfs_update_time to work with the new stuff and
then we'll be fancy later. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail. We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates. So introduce
->update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made. The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.
I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Pull btrfs updates from Chris Mason:
"This includes a fairly large change from Josef around data writeback
completion. Before, the writeback wasn't completed until the metadata
insertions for the extent were done, and this made for fairly large
latency spikes on the last page of each ordered extent.
We already had a separate mechanism for tracking pending metadata
insertions, so Josef just needed to tweak things a little to end
writeback earlier on the page. Overall it makes us much friendly to
memory reclaim and lowers latencies quite a lot for synchronous IO.
Jan Schmidt has finished some background work required to track btree
blocks as they go through changes in ownership. It's the missing
piece he needed for both btrfs send/receive and subvolume quotas.
Neither of those are ready yet, but the new tracking code is included
here. Most of the time, the new code is off. It is only used by
scrub and other backref walkers.
Stefan Behrens has added io failure tracking. This includes counters
for which drives are causing the most trouble so the admin (or an
automated tool) can choose to kick them out. We're tracking IO
errors, crc errors, and generation checks we do on each metadata
block.
RAID5/6 did miss the cut this time because I'm having trouble with
corruptions. I'll nail it down next week and post as a beta testing
before 3.6"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (58 commits)
Btrfs: fix tree mod log rewinded level and rewinding of moved keys
Btrfs: fix tree mod log del_ptr
Btrfs: add tree_mod_dont_log helper
Btrfs: add missing spin_lock for insertion into tree mod log
Btrfs: add inodes before dropping the extent lock in find_all_leafs
Btrfs: use delayed ref sequence numbers for all fs-tree updates
Btrfs: fix false positive in check-integrity on unmount
Btrfs: fix runtime warning in check-integrity check data mode
Btrfs: set ioprio of scrub readahead to idle
Btrfs: fix return code in drop_objectid_items
Btrfs: check to see if the inode is in the log before fsyncing
Btrfs: return value of btrfs_read_buffer is checked correctly
Btrfs: read device stats on mount, write modified ones during commit
Btrfs: add ioctl to get and reset the device stats
Btrfs: add device counters for detected IO and checksum errors
btrfs: Drop unused function btrfs_abort_devices()
Btrfs: fix the same inode id problem when doing auto defragment
Btrfs: fall back to non-inline if we don't have enough space
Btrfs: fix how we deal with the orphan block rsv
Btrfs: convert the inode bit field to use the actual bit operations
...
Pull the rest of the nfsd commits from Bruce Fields:
"... and then I cherry-picked the remainder of the patches from the
head of my previous branch"
This is the rest of the original nfsd branch, rebased without the
delegation stuff that I thought really needed to be redone.
I don't like rebasing things like this in general, but in this situation
this was the lesser of two evils.
* 'for-3.5' of git://linux-nfs.org/~bfields/linux: (50 commits)
nfsd4: fix, consolidate client_has_state
nfsd4: don't remove rebooted client record until confirmation
nfsd4: remove some dprintk's and a comment
nfsd4: return "real" sequence id in confirmed case
nfsd4: fix exchange_id to return confirm flag
nfsd4: clarify that renewing expired client is a bug
nfsd4: simpler ordering of setclientid_confirm checks
nfsd4: setclientid: remove pointless assignment
nfsd4: fix error return in non-matching-creds case
nfsd4: fix setclientid_confirm same_cred check
nfsd4: merge 3 setclientid cases to 2
nfsd4: pull out common code from setclientid cases
nfsd4: merge last two setclientid cases
nfsd4: setclientid/confirm comment cleanup
nfsd4: setclientid remove unnecessary terms from a logical expression
nfsd4: move rq_flavor into svc_cred
nfsd4: stricter cred comparison for setclientid/exchange_id
nfsd4: move principal name into svc_cred
nfsd4: allow removing clients not holding state
nfsd4: rearrange exchange_id logic to simplify
...
This patch stops reiserfs using the VFS 'write_super()' method along with the
s_dirt flag, because they are on their way out.
The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblock using the '->write_super()' call-back. But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The 'journal_mark_dirty()' function currently first marks the superblock as
dirty by setting 's_dirt' to 1, then does various sanity checks and returns,
then actuall does all the magic with the journal.
This is not an ideal order, though. It makes more sense to first do all the
checks, then do all the internal stuff, and at the end notify the VFS that the
superblock is now dirty.
This patch moves the 's_dirt = 1' assignment from the very beginning of this
function to the very end.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The 'reiserfs_resize()' function marks the superblock as dirty by assigning 1
to 's_dirt' and then calls 'journal_mark_dirty()' which does the same. Thus,
we can remove the assignment from 'reiserfs_resize()'.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Turn 'reiserfs_flush_old_commits()' into a void function because the callers
do not cares about what it returns anyway.
We are going to remove the 'sb->s_dirt' field completely and this patch is a
small step towards this direction.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We have the reiserfs superblock pointer in the 'sbi' variable in this
function, no need to use the 'REISERFS_SB(s)' macro which is the same.
This is jut a small clean-up.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When truncating a file, we unmap pages from userspace first, as that's
usually more efficient than relying, page by page, on the fallback in
truncate_inode_page() - particularly if the file is mapped many times.
Do the same when punching a hole: 3.4 added truncate_pagecache_range()
to do the unmap and trunc, so use it in ext4_ext_punch_hole(), instead
of calling truncate_inode_pages_range() directly.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Use kmem_cache_zalloc wrapper instead of flag __GFP_ZERO.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>