* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: new helper - d_make_root()
dcache: use a dispose list in select_parent
ceph: d_alloc_root() may fail
ext4: fix failure exits
isofs: inode leak on mount failure
select_parent currently abuses the dentry cache LRU to provide
cleanup features for child dentries that need to be freed. It moves
them to the tail of the LRU, then tells shrink_dcache_parent() to
calls __shrink_dcache_sb to unconditionally move them to a dispose
list (as DCACHE_REFERENCED is ignored). __shrink_dcache_sb() has to
relock the dentries to move them off the LRU onto the dispose list,
but otherwise does not touch the dentries that select_parent() moved
to the tail of the LRU. It then passses the dispose list to
shrink_dentry_list() which tries to free the dentries.
IOWs, the use of __shrink_dcache_sb() is superfluous - we can build
exactly the same list of dentries for disposal directly in
select_parent() and call shrink_dentry_list() instead of calling
__shrink_dcache_sb() to do that. This means that we avoid long holds
on the lru lock walking the LRU moving dentries to the dispose list
We also avoid the need to relock each dentry just to move it off the
LRU, reducing the numebr of times we lock each dentry to dispose of
them in shrink_dcache_parent() from 3 to 2 times.
Further, we remove one of the two callers of __shrink_dcache_sb().
This also means that __shrink_dcache_sb can be moved into back into
prune_dcache_sb() and we no longer have to handle referenced
dentries conditionally, simplifying the code.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a) leaking root dentry is bad
b) in case of failed ext4_mb_init() we don't want to do ext4_mb_release()
c) OTOH, in the same case we *do* want ext4_ext_release()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2/3/4: delete unneeded includes of module.h
ext{3,4}: Fix potential race when setversion ioctl updates inode
udf: Mark LVID buffer as uptodate before marking it dirty
ext3: Don't warn from writepage when readonly inode is spotted after error
jbd: Remove j_barrier mutex
reiserfs: Force inode evictions before umount to avoid crash
reiserfs: Fix quota mount option parsing
udf: Treat symlink component of type 2 as /
udf: Fix deadlock when converting file from in-ICB one to normal one
udf: Cleanup calling convention of inode_getblk()
ext2: Fix error handling on inode bitmap corruption
ext3: Fix error handling on inode bitmap corruption
ext3: replace ll_rw_block with other functions
ext3: NULL dereference in ext3_evict_inode()
jbd: clear revoked flag on buffers before a new transaction started
ext3: call ext3_mark_recovery_complete() when recovery is really needed
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
ore: Must support none-PAGE-aligned IO
ore: fix BUG_ON, too few sgs when reading
ore: Fix crash in case of an IO error.
ore: FIX breakage when MISC_FILESYSTEMS is not set
Delete any instances of include module.h that were not strictly
required. In the case of ext2, the declaration of MODULE_LICENSE
etc. were in inode.c but the module_init/exit were in super.c, so
relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
consistent with ext3 and ext4 at the same time.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jan Kara <jack@suse.cz>
The EXT{3,4}_IOC_SETVERSION ioctl() updates i_ctime and i_generation
without i_mutex. This can lead to a race with the other operations that
update i_ctime. This is not a big issue but let's make the ioctl consistent
with how we handle e.g. other timestamp updates and use i_mutex to protect
inode changes.
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
When we hit EIO while writing LVID, the buffer uptodate bit is cleared.
This then results in an anoying warning from mark_buffer_dirty() when we
write the buffer again. So just set uptodate flag unconditionally.
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
WARN_ON_ONCE(IS_RDONLY(inode)) tends to trip when filesystem hits error and is
remounted read-only. This unnecessarily scares users (well, they should be
scared because of filesystem error, but the stack trace distracts them from the
right source of their fear ;-). We could as well just remove the WARN_ON but
it's not hard to fix it to not trip on filesystem with errors and not use more
cycles in the common case so that's what we do.
CC: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
j_barrier mutex is used for serializing different journal lock operations. The
problem with it is that e.g. FIFREEZE ioctl results in process leaving kernel
with j_barrier mutex held which makes lockdep freak out. Also hibernation code
wants to freeze filesystem but it cannot do so because it then cannot hibernate
the system because of mutex being locked.
So we remove j_barrier mutex and use direct wait on j_barrier_count instead.
Since locking journal is a rare operation we don't have to care about fairness
or such things.
CC: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Jan Kara <jack@suse.cz>
This patch fixes a crash in reiserfs_delete_xattrs during umount.
When shrink_dcache_for_umount clears the dcache from
generic_shutdown_super, delayed evictions are forced to disk. If an
evicted inode has extended attributes associated with it, it will
need to walk the xattr tree to locate and remove them.
But since shrink_dcache_for_umount will BUG if it encounters active
dentries, the xattr tree must be released before it's called or it will
crash during every umount.
This patch forces the evictions to occur before generic_shutdown_super
by calling shrink_dcache_sb first. The additional evictions caused
by the removal of each associated xattr file and dir will be automatically
handled as they're added to the LRU list.
CC: reiserfs-devel@vger.kernel.org
CC: stable@kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When jqfmt mount option is not specified on remount, we mistakenly clear
s_jquota_fmt value stored in superblock. Fix the problem.
CC: stable@kernel.org
CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Currently, we ignore symlink component of type 2. But mkisofs and other OS'
seem to treat it as / so do the same for compatibility.
Reported-by: "Gábor S." <otnaccess@hotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
During BKL removal in 2.6.38, conversion of files from in-ICB format to normal
format got broken. We call ->writepage with i_data_sem held but udf_get_block()
also acquires i_data_sem thus creating A-A deadlock.
We fix the problem by dropping i_data_sem before calling ->writepage() which is
safe since i_mutex still protects us against any changes in the file. Also fix
pagelock - i_data_sem lock inversion in udf_expand_file_adinicb() by dropping
i_data_sem before calling find_or_create_page().
CC: stable@kernel.org
Reported-by: Matthias Matiak <netzpython@mail-on.us>
Tested-by: Matthias Matiak <netzpython@mail-on.us>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
inode_getblk() always returned NULL and passed results in its parameters.
Make the function return something useful - found block number.
Signed-off-by: Jan Kara <jack@suse.cz>
When insert_inode_locked() fails in ext2_new_inode() it most likely means inode
bitmap got corrupted and we allocated again inode which is already in use. Also
doing unlock_new_inode() during error recovery is wrong since the inode does
not have I_NEW set. Fix the problem by informing about filesystem error and
jumping to fail: (instead of fail_drop:) which doesn't call unlock_new_inode().
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When insert_inode_locked() fails in ext3_new_inode() it most likely
means inode bitmap got corrupted and we allocated again inode which
is already in use. Also doing unlock_new_inode() during error recovery
is wrong since inode does not have I_NEW set. Fix the problem by jumping
to fail: (instead of fail_drop:) which declares filesystem error and
does not call unlock_new_inode().
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
ll_rw_block() is deprecated. Thus we replace it with other functions.
CC: Jan Kara <jack@suse.cz>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Both ext3 and ext4 put the half-created symlink inode into the orphan list
for a while (see the comment in ext[34]_symlink() for gory details). Then,
if everything went fine, they pull it out of the orphan list and bump the
link count back to 1. The thing is, inc_nlink() is going to complain about
seeing somebody changing i_nlink from 0 to 1. With a good reason, since
normally something like that is a bug. Explicit set_nlink(inode, 1) does
the same thing as inc_nlink() here, but it does *not* complain - exactly
because it should be usable in strange situations like this one.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We have already set ->s_root, so ->put_super() is going to be called.
Freeing ->s_fs_info is a bloody bad idea when it's going to be
dereferenced very shortly...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
devpts_kill_sb() is called even if devpts_fill_super() fails;
we should not do that kfree() in the latter, especially not
with ->s_fs_info left pointing to freed object. Double kfree()
is a Bad Thing(tm)...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
Kconfig: acpi: Fix typo in comment.
misc latin1 to utf8 conversions
devres: Fix a typo in devm_kfree comment
btrfs: free-space-cache.c: remove extra semicolon.
fat: Spelling s/obsolate/obsolete/g
SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
tools/power turbostat: update fields in manpage
mac80211: drop spelling fix
types.h: fix comment spelling for 'architectures'
typo fixes: aera -> area, exntension -> extension
devices.txt: Fix typo of 'VMware'.
sis900: Fix enum typo 'sis900_rx_bufer_status'
decompress_bunzip2: remove invalid vi modeline
treewide: Fix comment and string typo 'bufer'
hyper-v: Update MAINTAINERS
treewide: Fix typos in various parts of the kernel, and fix some comments.
clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
leds: Kconfig: Fix typo 'D2NET_V2'
sound: Kconfig: drop unknown symbol ARCH_CLPS7500
...
Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
PM / Hibernate: Implement compat_ioctl for /dev/snapshot
PM / Freezer: fix return value of freezable_schedule_timeout_killable()
PM / shmobile: Allow the A4R domain to be turned off at run time
PM / input / touchscreen: Make st1232 use device PM QoS constraints
PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
PM / shmobile: Remove the stay_on flag from SH7372's PM domains
PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
PM: Drop generic_subsys_pm_ops
PM / Sleep: Remove forward-only callbacks from AMBA bus type
PM / Sleep: Remove forward-only callbacks from platform bus type
PM: Run the driver callback directly if the subsystem one is not there
PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
PM / Sleep: Merge internal functions in generic_ops.c
PM / Sleep: Simplify generic system suspend callbacks
PM / Hibernate: Remove deprecated hibernation snapshot ioctls
PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
ARM: S3C64XX: Implement basic power domain support
PM / shmobile: Use common always on power domain governor
...
Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
XBT_FORCE_SLEEP bit
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
GFS2: local functions should be static
GFS2: We only need one ACL getting function
GFS2: Fix multi-block allocation
GFS2: decouple quota allocations from block allocations
GFS2: split function rgblk_search
GFS2: Fix up "off by one" in the previous patch
GFS2: move toward a generic multi-block allocator
GFS2: O_(D)SYNC support for fallocate
GFS2: remove vestigial al_alloced
GFS2: combine gfs2_alloc_block and gfs2_alloc_di
GFS2: Add non-try locks back to get_local_rgrp
GFS2: f_ra is always valid in dir readahead function
GFS2: Fix very unlikley memory leak in ACL xattr code
GFS2: More automated code analysis fixes
GFS2: Add readahead to sequential directory traversal
GFS2: Fix up REQ flags
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (22 commits)
xfs: mark the xfssyncd workqueue as non-reentrant
xfs: simplify xfs_qm_detach_gdquots
xfs: fix acl count validation in xfs_acl_from_disk()
xfs: remove unused XBT_FORCE_SLEEP bit
xfs: remove XFS_QMOPT_DQSUSER
xfs: kill xfs_qm_idtodq
xfs: merge xfs_qm_dqinit_core into the only caller
xfs: add a xfs_dqhold helper
xfs: simplify xfs_qm_dqattach_grouphint
xfs: nest qm_dqfrlist_lock inside the dquot qlock
xfs: flatten the dquot lock ordering
xfs: implement lazy removal for the dquot freelist
xfs: remove XFS_DQ_INACTIVE
xfs: cleanup xfs_qm_dqlookup
xfs: cleanup dquot locking helpers
xfs: remove the sync_mode argument to xfs_qm_dqflush_all
xfs: remove xfs_qm_sync
xfs: make sure to really flush all dquots in xfs_qm_quotacheck
xfs: untangle SYNC_WAIT and SYNC_TRYLOCK meanings for xfs_qm_dqflush
xfs: remove the lid_size field in struct log_item_desc
...
Fix up trivial conflict in fs/xfs/xfs_sync.c
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
reiserfs: Properly display mount options in /proc/mounts
vfs: prevent remount read-only if pending removes
vfs: count unlinked inodes
vfs: protect remounting superblock read-only
vfs: keep list of mounts for each superblock
vfs: switch ->show_options() to struct dentry *
vfs: switch ->show_path() to struct dentry *
vfs: switch ->show_devname() to struct dentry *
vfs: switch ->show_stats to struct dentry *
switch security_path_chmod() to struct path *
vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
vfs: trim includes a bit
switch mnt_namespace ->root to struct mount
vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
vfs: opencode mntget() mnt_set_mountpoint()
vfs: spread struct mount - remaining argument of next_mnt()
vfs: move fsnotify junk to struct mount
vfs: move mnt_devname
vfs: move mnt_list to struct mount
vfs: switch pnode.h macros to struct mount *
...
NFS might send us offsets that are not PAGE aligned. So
we must read in the reminder of the first/last pages, in cases
we need it for Parity calculations.
We only add an sg segments to read the partial page. But
we don't mark it as read=true because it is a lock-for-write
page.
TODO: In some cases (IO spans a single unit) we can just
adjust the raid_unit offset/length, but this is left for
later Kernels.
[Bug in 3.2.0 Kernel]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
arm: fix up some samsung merge sysdev conversion problems
firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
Drivers:hv: Fix a bug in vmbus_driver_unregister()
driver core: remove __must_check from device_create_file
debugfs: add missing #ifdef HAS_IOMEM
arm: time.h: remove device.h #include
driver-core: remove sysdev.h usage.
clockevents: remove sysdev.h
arm: convert sysdev_class to a regular subsystem
arm: leds: convert sysdev_class to a regular subsystem
kobject: remove kset_find_obj_hinted()
m86k: gpio - convert sysdev_class to a regular subsystem
mips: txx9_sram - convert sysdev_class to a regular subsystem
mips: 7segled - convert sysdev_class to a regular subsystem
sh: dma - convert sysdev_class to a regular subsystem
sh: intc - convert sysdev_class to a regular subsystem
power: suspend - convert sysdev_class to a regular subsystem
power: qe_ic - convert sysdev_class to a regular subsystem
power: cmm - convert sysdev_class to a regular subsystem
s390: time - convert sysdev_class to a regular subsystem
...
Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
- arch/arm/mach-exynos/cpu.c
- arch/arm/mach-exynos/irq-eint.c
- arch/arm/mach-s3c64xx/common.c
- arch/arm/mach-s3c64xx/cpu.c
- arch/arm/mach-s5p64x0/cpu.c
- arch/arm/mach-s5pv210/common.c
- arch/arm/plat-samsung/include/plat/cpu.h
- arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
Make reiserfs properly display mount options in /proc/mounts.
CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If there are any inodes on the super block that have been unlinked
(i_nlink == 0) but have not yet been deleted then prevent the
remounting the super block read-only.
Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a new counter to the superblock that keeps track of unlinked but
not yet deleted inodes.
Do not WARN_ON if set_nlink is called with zero count, just do a
ratelimited printk. This happens on xfs and probably other
filesystems after an unclean shutdown when the filesystem reads inodes
which already have zero i_nlink. Reported by Christoph Hellwig.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently remouting superblock read-only is racy in a major way.
With the per mount read-only infrastructure it is now possible to
prevent most races, which this patch attempts.
Before starting the remount read-only, iterate through all mounts
belonging to the superblock and if none of them have any pending
writes, set sb->s_readonly_remount. This indicates that remount is in
progress and no further write requests are allowed. If the remount
succeeds set MS_RDONLY and reset s_readonly_remount.
If the remounting is unsuccessful just reset s_readonly_remount.
This can result in transient EROFS errors, despite the fact the
remount failed. Unfortunately hodling off writes is difficult as
remount itself may touch the filesystem (e.g. through load_nls())
which would deadlock.
A later patch deals with delayed writes due to nlink going to zero.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Keep track of vfsmounts belonging to a superblock. List is protected
by vfsmount_lock.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1958 commits)
net: pack skb_shared_info more efficiently
net_sched: red: split red_parms into parms and vars
net_sched: sfq: extend limits
cnic: Improve error recovery on bnx2x devices
cnic: Re-init dev->stats_addr after chip reset
net_sched: Bug in netem reordering
bna: fix sparse warnings/errors
bna: make ethtool_ops and strings const
xgmac: cleanups
net: make ethtool_ops const
vmxnet3" make ethtool ops const
xen-netback: make ops structs const
virtio_net: Pass gfp flags when allocating rx buffers.
ixgbe: FCoE: Add support for ndo_get_fcoe_hbainfo() call
netdev: FCoE: Add new ndo_get_fcoe_hbainfo() call
igb: reset PHY after recovering from PHY power down
igb: add basic runtime PM support
igb: Add support for byte queue limits.
e1000: cleanup CE4100 MDIO registers access
e1000: unmap ce4100_gbe_mdio_base_virt in e1000_remove
...
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.
The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
sched/tracing: Add a new tracepoint for sleeptime
sched: Disable scheduler warnings during oopses
sched: Fix cgroup movement of waking process
sched: Fix cgroup movement of newly created process
sched: Fix cgroup movement of forking process
sched: Remove cfs bandwidth period check in tg_set_cfs_period()
sched: Fix load-balance lock-breaking
sched: Replace all_pinned with a generic flags field
sched: Only queue remote wakeups when crossing cache boundaries
sched: Add missing rcu_dereference() around ->real_parent usage
[S390] fix cputime overflow in uptime_proc_show
[S390] cputime: add sparse checking and cleanup
sched: Mark parent and real_parent as __rcu
sched, nohz: Fix missing RCU read lock
sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
sched, nohz: Fix the idle cpu check in nohz_idle_balance
sched: Use jump_labels for sched_feat
sched/accounting: Fix parameter passing in task_group_account_field
sched/accounting: Fix user/system tick double accounting
sched/accounting: Re-use scheduler statistics for the root cgroup
...
Fix up conflicts in
- arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
usecs_to_cputime64() vs the sparse cleanups
- kernel/sched/fair.c, kernel/time/tick-sched.c
scheduler changes in multiple branches