fs: Remove ext3 filesystem driver
The functionality of ext3 is fully supported by ext4 driver. Major distributions (SUSE, RedHat) already use ext4 driver to handle ext3 filesystems for quite some time. There is some ugliness in mm resulting from jbd cleaning buffers in a dirty page without cleaning page dirty bit and also support for buffer bouncing in the block layer when stable pages are required is there only because of jbd. So let's remove the ext3 driver. This saves us some 28k lines of duplicated code. Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>
This commit is contained in:
parent
82ff50b222
commit
c290ea01ab
|
@ -360,8 +360,8 @@ and are copied into the filesystem. If a transaction is incomplete at
|
|||
the time of the crash, then there is no guarantee of consistency for
|
||||
the blocks in that transaction so they are discarded (which means any
|
||||
filesystem changes they represent are also lost).
|
||||
Check Documentation/filesystems/ext3.txt if you want to read more about
|
||||
ext3 and journaling.
|
||||
Check Documentation/filesystems/ext4.txt if you want to read more about
|
||||
ext4 and journaling.
|
||||
|
||||
References
|
||||
==========
|
||||
|
|
|
@ -6,210 +6,7 @@ Ext3 was originally released in September 1999. Written by Stephen Tweedie
|
|||
for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger,
|
||||
Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie.
|
||||
|
||||
Ext3 is the ext2 filesystem enhanced with journalling capabilities.
|
||||
Ext3 is the ext2 filesystem enhanced with journalling capabilities. The
|
||||
filesystem is a subset of ext4 filesystem so use ext4 driver for accessing
|
||||
ext3 filesystems.
|
||||
|
||||
Options
|
||||
=======
|
||||
|
||||
When mounting an ext3 filesystem, the following option are accepted:
|
||||
(*) == default
|
||||
|
||||
ro Mount filesystem read only. Note that ext3 will replay
|
||||
the journal (and thus write to the partition) even when
|
||||
mounted "read only". Mount options "ro,noload" can be
|
||||
used to prevent writes to the filesystem.
|
||||
|
||||
journal=update Update the ext3 file system's journal to the current
|
||||
format.
|
||||
|
||||
journal=inum When a journal already exists, this option is ignored.
|
||||
Otherwise, it specifies the number of the inode which
|
||||
will represent the ext3 file system's journal file.
|
||||
|
||||
journal_path=path
|
||||
journal_dev=devnum When the external journal device's major/minor numbers
|
||||
have changed, these options allow the user to specify
|
||||
the new journal location. The journal device is
|
||||
identified through either its new major/minor numbers
|
||||
encoded in devnum, or via a path to the device.
|
||||
|
||||
norecovery Don't load the journal on mounting. Note that this forces
|
||||
noload mount of inconsistent filesystem, which can lead to
|
||||
various problems.
|
||||
|
||||
data=journal All data are committed into the journal prior to being
|
||||
written into the main file system.
|
||||
|
||||
data=ordered (*) All data are forced directly out to the main file
|
||||
system prior to its metadata being committed to the
|
||||
journal.
|
||||
|
||||
data=writeback Data ordering is not preserved, data may be written
|
||||
into the main file system after its metadata has been
|
||||
committed to the journal.
|
||||
|
||||
commit=nrsec (*) Ext3 can be told to sync all its data and metadata
|
||||
every 'nrsec' seconds. The default value is 5 seconds.
|
||||
This means that if you lose your power, you will lose
|
||||
as much as the latest 5 seconds of work (your
|
||||
filesystem will not be damaged though, thanks to the
|
||||
journaling). This default value (or any low value)
|
||||
will hurt performance, but it's good for data-safety.
|
||||
Setting it to 0 will have the same effect as leaving
|
||||
it at the default (5 seconds).
|
||||
Setting it to very large values will improve
|
||||
performance.
|
||||
|
||||
barrier=<0|1(*)> This enables/disables the use of write barriers in
|
||||
barrier (*) the jbd code. barrier=0 disables, barrier=1 enables.
|
||||
nobarrier This also requires an IO stack which can support
|
||||
barriers, and if jbd gets an error on a barrier
|
||||
write, it will disable again with a warning.
|
||||
Write barriers enforce proper on-disk ordering
|
||||
of journal commits, making volatile disk write caches
|
||||
safe to use, at some performance penalty. If
|
||||
your disks are battery-backed in one way or another,
|
||||
disabling barriers may safely improve performance.
|
||||
The mount options "barrier" and "nobarrier" can
|
||||
also be used to enable or disable barriers, for
|
||||
consistency with other ext3 mount options.
|
||||
|
||||
user_xattr Enables Extended User Attributes. Additionally, you
|
||||
need to have extended attribute support enabled in the
|
||||
kernel configuration (CONFIG_EXT3_FS_XATTR). See the
|
||||
attr(5) manual page and http://acl.bestbits.at/ to
|
||||
learn more about extended attributes.
|
||||
|
||||
nouser_xattr Disables Extended User Attributes.
|
||||
|
||||
acl Enables POSIX Access Control Lists support.
|
||||
Additionally, you need to have ACL support enabled in
|
||||
the kernel configuration (CONFIG_EXT3_FS_POSIX_ACL).
|
||||
See the acl(5) manual page and http://acl.bestbits.at/
|
||||
for more information.
|
||||
|
||||
noacl This option disables POSIX Access Control List
|
||||
support.
|
||||
|
||||
reservation
|
||||
|
||||
noreservation
|
||||
|
||||
bsddf (*) Make 'df' act like BSD.
|
||||
minixdf Make 'df' act like Minix.
|
||||
|
||||
check=none Don't do extra checking of bitmaps on mount.
|
||||
nocheck
|
||||
|
||||
debug Extra debugging information is sent to syslog.
|
||||
|
||||
errors=remount-ro Remount the filesystem read-only on an error.
|
||||
errors=continue Keep going on a filesystem error.
|
||||
errors=panic Panic and halt the machine if an error occurs.
|
||||
(These mount options override the errors behavior
|
||||
specified in the superblock, which can be
|
||||
configured using tune2fs.)
|
||||
|
||||
data_err=ignore(*) Just print an error message if an error occurs
|
||||
in a file data buffer in ordered mode.
|
||||
data_err=abort Abort the journal if an error occurs in a file
|
||||
data buffer in ordered mode.
|
||||
|
||||
grpid Give objects the same group ID as their creator.
|
||||
bsdgroups
|
||||
|
||||
nogrpid (*) New objects have the group ID of their creator.
|
||||
sysvgroups
|
||||
|
||||
resgid=n The group ID which may use the reserved blocks.
|
||||
|
||||
resuid=n The user ID which may use the reserved blocks.
|
||||
|
||||
sb=n Use alternate superblock at this location.
|
||||
|
||||
quota These options are ignored by the filesystem. They
|
||||
noquota are used only by quota tools to recognize volumes
|
||||
grpquota where quota should be turned on. See documentation
|
||||
usrquota in the quota-tools package for more details
|
||||
(http://sourceforge.net/projects/linuxquota).
|
||||
|
||||
jqfmt=<quota type> These options tell filesystem details about quota
|
||||
usrjquota=<file> so that quota information can be properly updated
|
||||
grpjquota=<file> during journal replay. They replace the above
|
||||
quota options. See documentation in the quota-tools
|
||||
package for more details
|
||||
(http://sourceforge.net/projects/linuxquota).
|
||||
|
||||
Specification
|
||||
=============
|
||||
Ext3 shares all disk implementation with the ext2 filesystem, and adds
|
||||
transactions capabilities to ext2. Journaling is done by the Journaling Block
|
||||
Device layer.
|
||||
|
||||
Journaling Block Device layer
|
||||
-----------------------------
|
||||
The Journaling Block Device layer (JBD) isn't ext3 specific. It was designed
|
||||
to add journaling capabilities to a block device. The ext3 filesystem code
|
||||
will inform the JBD of modifications it is performing (called a transaction).
|
||||
The journal supports the transactions start and stop, and in case of a crash,
|
||||
the journal can replay the transactions to quickly put the partition back into
|
||||
a consistent state.
|
||||
|
||||
Handles represent a single atomic update to a filesystem. JBD can handle an
|
||||
external journal on a block device.
|
||||
|
||||
Data Mode
|
||||
---------
|
||||
There are 3 different data modes:
|
||||
|
||||
* writeback mode
|
||||
In data=writeback mode, ext3 does not journal data at all. This mode provides
|
||||
a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
|
||||
mode - metadata journaling. A crash+recovery can cause incorrect data to
|
||||
appear in files which were written shortly before the crash. This mode will
|
||||
typically provide the best ext3 performance.
|
||||
|
||||
* ordered mode
|
||||
In data=ordered mode, ext3 only officially journals metadata, but it logically
|
||||
groups metadata and data blocks into a single unit called a transaction. When
|
||||
it's time to write the new metadata out to disk, the associated data blocks
|
||||
are written first. In general, this mode performs slightly slower than
|
||||
writeback but significantly faster than journal mode.
|
||||
|
||||
* journal mode
|
||||
data=journal mode provides full data and metadata journaling. All new data is
|
||||
written to the journal first, and then to its final location.
|
||||
In the event of a crash, the journal can be replayed, bringing both data and
|
||||
metadata into a consistent state. This mode is the slowest except when data
|
||||
needs to be read from and written to disk at the same time where it
|
||||
outperforms all other modes.
|
||||
|
||||
Compatibility
|
||||
-------------
|
||||
|
||||
Ext2 partitions can be easily convert to ext3, with `tune2fs -j <dev>`.
|
||||
Ext3 is fully compatible with Ext2. Ext3 partitions can easily be mounted as
|
||||
Ext2.
|
||||
|
||||
|
||||
External Tools
|
||||
==============
|
||||
See manual pages to learn more.
|
||||
|
||||
tune2fs: create a ext3 journal on a ext2 partition with the -j flag.
|
||||
mke2fs: create a ext3 partition with the -j flag.
|
||||
debugfs: ext2 and ext3 file system debugger.
|
||||
ext2online: online (mounted) ext2 and ext3 filesystem resizer
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
kernel source: <file:fs/ext3/>
|
||||
<file:fs/jbd/>
|
||||
|
||||
programs: http://e2fsprogs.sourceforge.net/
|
||||
http://ext2resize.sourceforge.net
|
||||
|
||||
useful links: http://www.ibm.com/developerworks/library/l-fs7/index.html
|
||||
http://www.ibm.com/developerworks/library/l-fs8/index.html
|
||||
|
|
|
@ -769,7 +769,7 @@ struct address_space_operations {
|
|||
to stall to allow flushers a chance to complete some IO. Ordinarily
|
||||
it can use PageDirty and PageWriteback but some filesystems have
|
||||
more complex state (unstable pages in NFS prevent reclaim) or
|
||||
do not set those flags due to locking problems (jbd). This callback
|
||||
do not set those flags due to locking problems. This callback
|
||||
allows a filesystem to indicate to the VM if a page should be
|
||||
treated as dirty or writeback for the purposes of stalling.
|
||||
|
||||
|
|
18
MAINTAINERS
18
MAINTAINERS
|
@ -4059,15 +4059,6 @@ F: Documentation/filesystems/ext2.txt
|
|||
F: fs/ext2/
|
||||
F: include/linux/ext2*
|
||||
|
||||
EXT3 FILE SYSTEM
|
||||
M: Jan Kara <jack@suse.com>
|
||||
M: Andrew Morton <akpm@linux-foundation.org>
|
||||
M: Andreas Dilger <adilger.kernel@dilger.ca>
|
||||
L: linux-ext4@vger.kernel.org
|
||||
S: Maintained
|
||||
F: Documentation/filesystems/ext3.txt
|
||||
F: fs/ext3/
|
||||
|
||||
EXT4 FILE SYSTEM
|
||||
M: "Theodore Ts'o" <tytso@mit.edu>
|
||||
M: Andreas Dilger <adilger.kernel@dilger.ca>
|
||||
|
@ -5751,16 +5742,9 @@ S: Maintained
|
|||
F: fs/jffs2/
|
||||
F: include/uapi/linux/jffs2.h
|
||||
|
||||
JOURNALLING LAYER FOR BLOCK DEVICES (JBD)
|
||||
M: Andrew Morton <akpm@linux-foundation.org>
|
||||
M: Jan Kara <jack@suse.com>
|
||||
L: linux-ext4@vger.kernel.org
|
||||
S: Maintained
|
||||
F: fs/jbd/
|
||||
F: include/linux/jbd.h
|
||||
|
||||
JOURNALLING LAYER FOR BLOCK DEVICES (JBD2)
|
||||
M: "Theodore Ts'o" <tytso@mit.edu>
|
||||
M: Jan Kara <jack@suse.com>
|
||||
L: linux-ext4@vger.kernel.org
|
||||
S: Maintained
|
||||
F: fs/jbd2/
|
||||
|
|
|
@ -11,18 +11,15 @@ config DCACHE_WORD_ACCESS
|
|||
if BLOCK
|
||||
|
||||
source "fs/ext2/Kconfig"
|
||||
source "fs/ext3/Kconfig"
|
||||
source "fs/ext4/Kconfig"
|
||||
source "fs/jbd/Kconfig"
|
||||
source "fs/jbd2/Kconfig"
|
||||
|
||||
config FS_MBCACHE
|
||||
# Meta block cache for Extended Attributes (ext2/ext3/ext4)
|
||||
tristate
|
||||
default y if EXT2_FS=y && EXT2_FS_XATTR
|
||||
default y if EXT3_FS=y && EXT3_FS_XATTR
|
||||
default y if EXT4_FS=y
|
||||
default m if EXT2_FS_XATTR || EXT3_FS_XATTR || EXT4_FS
|
||||
default m if EXT2_FS_XATTR || EXT4_FS
|
||||
|
||||
source "fs/reiserfs/Kconfig"
|
||||
source "fs/jfs/Kconfig"
|
||||
|
|
|
@ -62,12 +62,10 @@ obj-$(CONFIG_DLM) += dlm/
|
|||
# Do not add any filesystems before this line
|
||||
obj-$(CONFIG_FSCACHE) += fscache/
|
||||
obj-$(CONFIG_REISERFS_FS) += reiserfs/
|
||||
obj-$(CONFIG_EXT3_FS) += ext3/ # Before ext2 so root fs can be ext3
|
||||
obj-$(CONFIG_EXT2_FS) += ext2/
|
||||
# We place ext4 after ext2 so plain ext2 root fs's are mounted using ext2
|
||||
# unless explicitly requested by rootfstype
|
||||
obj-$(CONFIG_EXT4_FS) += ext4/
|
||||
obj-$(CONFIG_JBD) += jbd/
|
||||
obj-$(CONFIG_JBD2) += jbd2/
|
||||
obj-$(CONFIG_CRAMFS) += cramfs/
|
||||
obj-$(CONFIG_SQUASHFS) += squashfs/
|
||||
|
|
|
@ -1,89 +0,0 @@
|
|||
config EXT3_FS
|
||||
tristate "Ext3 journalling file system support"
|
||||
select JBD
|
||||
help
|
||||
This is the journalling version of the Second extended file system
|
||||
(often called ext3), the de facto standard Linux file system
|
||||
(method to organize files on a storage device) for hard disks.
|
||||
|
||||
The journalling code included in this driver means you do not have
|
||||
to run e2fsck (file system checker) on your file systems after a
|
||||
crash. The journal keeps track of any changes that were being made
|
||||
at the time the system crashed, and can ensure that your file system
|
||||
is consistent without the need for a lengthy check.
|
||||
|
||||
Other than adding the journal to the file system, the on-disk format
|
||||
of ext3 is identical to ext2. It is possible to freely switch
|
||||
between using the ext3 driver and the ext2 driver, as long as the
|
||||
file system has been cleanly unmounted, or e2fsck is run on the file
|
||||
system.
|
||||
|
||||
To add a journal on an existing ext2 file system or change the
|
||||
behavior of ext3 file systems, you can use the tune2fs utility ("man
|
||||
tune2fs"). To modify attributes of files and directories on ext3
|
||||
file systems, use chattr ("man chattr"). You need to be using
|
||||
e2fsprogs version 1.20 or later in order to create ext3 journals
|
||||
(available at <http://sourceforge.net/projects/e2fsprogs/>).
|
||||
|
||||
To compile this file system support as a module, choose M here: the
|
||||
module will be called ext3.
|
||||
|
||||
config EXT3_DEFAULTS_TO_ORDERED
|
||||
bool "Default to 'data=ordered' in ext3"
|
||||
depends on EXT3_FS
|
||||
default y
|
||||
help
|
||||
The journal mode options for ext3 have different tradeoffs
|
||||
between when data is guaranteed to be on disk and
|
||||
performance. The use of "data=writeback" can cause
|
||||
unwritten data to appear in files after an system crash or
|
||||
power failure, which can be a security issue. However,
|
||||
"data=ordered" mode can also result in major performance
|
||||
problems, including seconds-long delays before an fsync()
|
||||
call returns. For details, see:
|
||||
|
||||
http://ext4.wiki.kernel.org/index.php/Ext3_data_mode_tradeoffs
|
||||
|
||||
If you have been historically happy with ext3's performance,
|
||||
data=ordered mode will be a safe choice and you should
|
||||
answer 'y' here. If you understand the reliability and data
|
||||
privacy issues of data=writeback and are willing to make
|
||||
that trade off, answer 'n'.
|
||||
|
||||
config EXT3_FS_XATTR
|
||||
bool "Ext3 extended attributes"
|
||||
depends on EXT3_FS
|
||||
default y
|
||||
help
|
||||
Extended attributes are name:value pairs associated with inodes by
|
||||
the kernel or by users (see the attr(5) manual page, or visit
|
||||
<http://acl.bestbits.at/> for details).
|
||||
|
||||
If unsure, say N.
|
||||
|
||||
You need this for POSIX ACL support on ext3.
|
||||
|
||||
config EXT3_FS_POSIX_ACL
|
||||
bool "Ext3 POSIX Access Control Lists"
|
||||
depends on EXT3_FS_XATTR
|
||||
select FS_POSIX_ACL
|
||||
help
|
||||
Posix Access Control Lists (ACLs) support permissions for users and
|
||||
groups beyond the owner/group/world scheme.
|
||||
|
||||
To learn more about Access Control Lists, visit the Posix ACLs for
|
||||
Linux website <http://acl.bestbits.at/>.
|
||||
|
||||
If you don't know what Access Control Lists are, say N
|
||||
|
||||
config EXT3_FS_SECURITY
|
||||
bool "Ext3 Security Labels"
|
||||
depends on EXT3_FS_XATTR
|
||||
help
|
||||
Security labels support alternative access control models
|
||||
implemented by security modules like SELinux. This option
|
||||
enables an extended attribute handler for file security
|
||||
labels in the ext3 filesystem.
|
||||
|
||||
If you are not using a security module that requires using
|
||||
extended attributes for file security labels, say N.
|
|
@ -1,12 +0,0 @@
|
|||
#
|
||||
# Makefile for the linux ext3-filesystem routines.
|
||||
#
|
||||
|
||||
obj-$(CONFIG_EXT3_FS) += ext3.o
|
||||
|
||||
ext3-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \
|
||||
ioctl.o namei.o super.o symlink.o hash.o resize.o ext3_jbd.o
|
||||
|
||||
ext3-$(CONFIG_EXT3_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o
|
||||
ext3-$(CONFIG_EXT3_FS_POSIX_ACL) += acl.o
|
||||
ext3-$(CONFIG_EXT3_FS_SECURITY) += xattr_security.o
|
281
fs/ext3/acl.c
281
fs/ext3/acl.c
|
@ -1,281 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/acl.c
|
||||
*
|
||||
* Copyright (C) 2001-2003 Andreas Gruenbacher, <agruen@suse.de>
|
||||
*/
|
||||
|
||||
#include "ext3.h"
|
||||
#include "xattr.h"
|
||||
#include "acl.h"
|
||||
|
||||
/*
|
||||
* Convert from filesystem to in-memory representation.
|
||||
*/
|
||||
static struct posix_acl *
|
||||
ext3_acl_from_disk(const void *value, size_t size)
|
||||
{
|
||||
const char *end = (char *)value + size;
|
||||
int n, count;
|
||||
struct posix_acl *acl;
|
||||
|
||||
if (!value)
|
||||
return NULL;
|
||||
if (size < sizeof(ext3_acl_header))
|
||||
return ERR_PTR(-EINVAL);
|
||||
if (((ext3_acl_header *)value)->a_version !=
|
||||
cpu_to_le32(EXT3_ACL_VERSION))
|
||||
return ERR_PTR(-EINVAL);
|
||||
value = (char *)value + sizeof(ext3_acl_header);
|
||||
count = ext3_acl_count(size);
|
||||
if (count < 0)
|
||||
return ERR_PTR(-EINVAL);
|
||||
if (count == 0)
|
||||
return NULL;
|
||||
acl = posix_acl_alloc(count, GFP_NOFS);
|
||||
if (!acl)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
for (n=0; n < count; n++) {
|
||||
ext3_acl_entry *entry =
|
||||
(ext3_acl_entry *)value;
|
||||
if ((char *)value + sizeof(ext3_acl_entry_short) > end)
|
||||
goto fail;
|
||||
acl->a_entries[n].e_tag = le16_to_cpu(entry->e_tag);
|
||||
acl->a_entries[n].e_perm = le16_to_cpu(entry->e_perm);
|
||||
switch(acl->a_entries[n].e_tag) {
|
||||
case ACL_USER_OBJ:
|
||||
case ACL_GROUP_OBJ:
|
||||
case ACL_MASK:
|
||||
case ACL_OTHER:
|
||||
value = (char *)value +
|
||||
sizeof(ext3_acl_entry_short);
|
||||
break;
|
||||
|
||||
case ACL_USER:
|
||||
value = (char *)value + sizeof(ext3_acl_entry);
|
||||
if ((char *)value > end)
|
||||
goto fail;
|
||||
acl->a_entries[n].e_uid =
|
||||
make_kuid(&init_user_ns,
|
||||
le32_to_cpu(entry->e_id));
|
||||
break;
|
||||
case ACL_GROUP:
|
||||
value = (char *)value + sizeof(ext3_acl_entry);
|
||||
if ((char *)value > end)
|
||||
goto fail;
|
||||
acl->a_entries[n].e_gid =
|
||||
make_kgid(&init_user_ns,
|
||||
le32_to_cpu(entry->e_id));
|
||||
break;
|
||||
|
||||
default:
|
||||
goto fail;
|
||||
}
|
||||
}
|
||||
if (value != end)
|
||||
goto fail;
|
||||
return acl;
|
||||
|
||||
fail:
|
||||
posix_acl_release(acl);
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
/*
|
||||
* Convert from in-memory to filesystem representation.
|
||||
*/
|
||||
static void *
|
||||
ext3_acl_to_disk(const struct posix_acl *acl, size_t *size)
|
||||
{
|
||||
ext3_acl_header *ext_acl;
|
||||
char *e;
|
||||
size_t n;
|
||||
|
||||
*size = ext3_acl_size(acl->a_count);
|
||||
ext_acl = kmalloc(sizeof(ext3_acl_header) + acl->a_count *
|
||||
sizeof(ext3_acl_entry), GFP_NOFS);
|
||||
if (!ext_acl)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
ext_acl->a_version = cpu_to_le32(EXT3_ACL_VERSION);
|
||||
e = (char *)ext_acl + sizeof(ext3_acl_header);
|
||||
for (n=0; n < acl->a_count; n++) {
|
||||
const struct posix_acl_entry *acl_e = &acl->a_entries[n];
|
||||
ext3_acl_entry *entry = (ext3_acl_entry *)e;
|
||||
entry->e_tag = cpu_to_le16(acl_e->e_tag);
|
||||
entry->e_perm = cpu_to_le16(acl_e->e_perm);
|
||||
switch(acl_e->e_tag) {
|
||||
case ACL_USER:
|
||||
entry->e_id = cpu_to_le32(
|
||||
from_kuid(&init_user_ns, acl_e->e_uid));
|
||||
e += sizeof(ext3_acl_entry);
|
||||
break;
|
||||
case ACL_GROUP:
|
||||
entry->e_id = cpu_to_le32(
|
||||
from_kgid(&init_user_ns, acl_e->e_gid));
|
||||
e += sizeof(ext3_acl_entry);
|
||||
break;
|
||||
|
||||
case ACL_USER_OBJ:
|
||||
case ACL_GROUP_OBJ:
|
||||
case ACL_MASK:
|
||||
case ACL_OTHER:
|
||||
e += sizeof(ext3_acl_entry_short);
|
||||
break;
|
||||
|
||||
default:
|
||||
goto fail;
|
||||
}
|
||||
}
|
||||
return (char *)ext_acl;
|
||||
|
||||
fail:
|
||||
kfree(ext_acl);
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
/*
|
||||
* Inode operation get_posix_acl().
|
||||
*
|
||||
* inode->i_mutex: don't care
|
||||
*/
|
||||
struct posix_acl *
|
||||
ext3_get_acl(struct inode *inode, int type)
|
||||
{
|
||||
int name_index;
|
||||
char *value = NULL;
|
||||
struct posix_acl *acl;
|
||||
int retval;
|
||||
|
||||
switch (type) {
|
||||
case ACL_TYPE_ACCESS:
|
||||
name_index = EXT3_XATTR_INDEX_POSIX_ACL_ACCESS;
|
||||
break;
|
||||
case ACL_TYPE_DEFAULT:
|
||||
name_index = EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT;
|
||||
break;
|
||||
default:
|
||||
BUG();
|
||||
}
|
||||
|
||||
retval = ext3_xattr_get(inode, name_index, "", NULL, 0);
|
||||
if (retval > 0) {
|
||||
value = kmalloc(retval, GFP_NOFS);
|
||||
if (!value)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
retval = ext3_xattr_get(inode, name_index, "", value, retval);
|
||||
}
|
||||
if (retval > 0)
|
||||
acl = ext3_acl_from_disk(value, retval);
|
||||
else if (retval == -ENODATA || retval == -ENOSYS)
|
||||
acl = NULL;
|
||||
else
|
||||
acl = ERR_PTR(retval);
|
||||
kfree(value);
|
||||
|
||||
if (!IS_ERR(acl))
|
||||
set_cached_acl(inode, type, acl);
|
||||
|
||||
return acl;
|
||||
}
|
||||
|
||||
/*
|
||||
* Set the access or default ACL of an inode.
|
||||
*
|
||||
* inode->i_mutex: down unless called from ext3_new_inode
|
||||
*/
|
||||
static int
|
||||
__ext3_set_acl(handle_t *handle, struct inode *inode, int type,
|
||||
struct posix_acl *acl)
|
||||
{
|
||||
int name_index;
|
||||
void *value = NULL;
|
||||
size_t size = 0;
|
||||
int error;
|
||||
|
||||
switch(type) {
|
||||
case ACL_TYPE_ACCESS:
|
||||
name_index = EXT3_XATTR_INDEX_POSIX_ACL_ACCESS;
|
||||
if (acl) {
|
||||
error = posix_acl_equiv_mode(acl, &inode->i_mode);
|
||||
if (error < 0)
|
||||
return error;
|
||||
else {
|
||||
inode->i_ctime = CURRENT_TIME_SEC;
|
||||
ext3_mark_inode_dirty(handle, inode);
|
||||
if (error == 0)
|
||||
acl = NULL;
|
||||
}
|
||||
}
|
||||
break;
|
||||
|
||||
case ACL_TYPE_DEFAULT:
|
||||
name_index = EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT;
|
||||
if (!S_ISDIR(inode->i_mode))
|
||||
return acl ? -EACCES : 0;
|
||||
break;
|
||||
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
if (acl) {
|
||||
value = ext3_acl_to_disk(acl, &size);
|
||||
if (IS_ERR(value))
|
||||
return (int)PTR_ERR(value);
|
||||
}
|
||||
|
||||
error = ext3_xattr_set_handle(handle, inode, name_index, "",
|
||||
value, size, 0);
|
||||
|
||||
kfree(value);
|
||||
|
||||
if (!error)
|
||||
set_cached_acl(inode, type, acl);
|
||||
|
||||
return error;
|
||||
}
|
||||
|
||||
int
|
||||
ext3_set_acl(struct inode *inode, struct posix_acl *acl, int type)
|
||||
{
|
||||
handle_t *handle;
|
||||
int error, retries = 0;
|
||||
|
||||
retry:
|
||||
handle = ext3_journal_start(inode, EXT3_DATA_TRANS_BLOCKS(inode->i_sb));
|
||||
if (IS_ERR(handle))
|
||||
return PTR_ERR(handle);
|
||||
error = __ext3_set_acl(handle, inode, type, acl);
|
||||
ext3_journal_stop(handle);
|
||||
if (error == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
|
||||
goto retry;
|
||||
return error;
|
||||
}
|
||||
|
||||
/*
|
||||
* Initialize the ACLs of a new inode. Called from ext3_new_inode.
|
||||
*
|
||||
* dir->i_mutex: down
|
||||
* inode->i_mutex: up (access to inode is still exclusive)
|
||||
*/
|
||||
int
|
||||
ext3_init_acl(handle_t *handle, struct inode *inode, struct inode *dir)
|
||||
{
|
||||
struct posix_acl *default_acl, *acl;
|
||||
int error;
|
||||
|
||||
error = posix_acl_create(dir, &inode->i_mode, &default_acl, &acl);
|
||||
if (error)
|
||||
return error;
|
||||
|
||||
if (default_acl) {
|
||||
error = __ext3_set_acl(handle, inode, ACL_TYPE_DEFAULT,
|
||||
default_acl);
|
||||
posix_acl_release(default_acl);
|
||||
}
|
||||
if (acl) {
|
||||
if (!error)
|
||||
error = __ext3_set_acl(handle, inode, ACL_TYPE_ACCESS,
|
||||
acl);
|
||||
posix_acl_release(acl);
|
||||
}
|
||||
return error;
|
||||
}
|
|
@ -1,72 +0,0 @@
|
|||
/*
|
||||
File: fs/ext3/acl.h
|
||||
|
||||
(C) 2001 Andreas Gruenbacher, <a.gruenbacher@computer.org>
|
||||
*/
|
||||
|
||||
#include <linux/posix_acl_xattr.h>
|
||||
|
||||
#define EXT3_ACL_VERSION 0x0001
|
||||
|
||||
typedef struct {
|
||||
__le16 e_tag;
|
||||
__le16 e_perm;
|
||||
__le32 e_id;
|
||||
} ext3_acl_entry;
|
||||
|
||||
typedef struct {
|
||||
__le16 e_tag;
|
||||
__le16 e_perm;
|
||||
} ext3_acl_entry_short;
|
||||
|
||||
typedef struct {
|
||||
__le32 a_version;
|
||||
} ext3_acl_header;
|
||||
|
||||
static inline size_t ext3_acl_size(int count)
|
||||
{
|
||||
if (count <= 4) {
|
||||
return sizeof(ext3_acl_header) +
|
||||
count * sizeof(ext3_acl_entry_short);
|
||||
} else {
|
||||
return sizeof(ext3_acl_header) +
|
||||
4 * sizeof(ext3_acl_entry_short) +
|
||||
(count - 4) * sizeof(ext3_acl_entry);
|
||||
}
|
||||
}
|
||||
|
||||
static inline int ext3_acl_count(size_t size)
|
||||
{
|
||||
ssize_t s;
|
||||
size -= sizeof(ext3_acl_header);
|
||||
s = size - 4 * sizeof(ext3_acl_entry_short);
|
||||
if (s < 0) {
|
||||
if (size % sizeof(ext3_acl_entry_short))
|
||||
return -1;
|
||||
return size / sizeof(ext3_acl_entry_short);
|
||||
} else {
|
||||
if (s % sizeof(ext3_acl_entry))
|
||||
return -1;
|
||||
return s / sizeof(ext3_acl_entry) + 4;
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef CONFIG_EXT3_FS_POSIX_ACL
|
||||
|
||||
/* acl.c */
|
||||
extern struct posix_acl *ext3_get_acl(struct inode *inode, int type);
|
||||
extern int ext3_set_acl(struct inode *inode, struct posix_acl *acl, int type);
|
||||
extern int ext3_init_acl (handle_t *, struct inode *, struct inode *);
|
||||
|
||||
#else /* CONFIG_EXT3_FS_POSIX_ACL */
|
||||
#include <linux/sched.h>
|
||||
#define ext3_get_acl NULL
|
||||
#define ext3_set_acl NULL
|
||||
|
||||
static inline int
|
||||
ext3_init_acl(handle_t *handle, struct inode *inode, struct inode *dir)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
#endif /* CONFIG_EXT3_FS_POSIX_ACL */
|
||||
|
2158
fs/ext3/balloc.c
2158
fs/ext3/balloc.c
File diff suppressed because it is too large
Load Diff
|
@ -1,20 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/bitmap.c
|
||||
*
|
||||
* Copyright (C) 1992, 1993, 1994, 1995
|
||||
* Remy Card (card@masi.ibp.fr)
|
||||
* Laboratoire MASI - Institut Blaise Pascal
|
||||
* Universite Pierre et Marie Curie (Paris VI)
|
||||
*/
|
||||
|
||||
#include "ext3.h"
|
||||
|
||||
#ifdef EXT3FS_DEBUG
|
||||
|
||||
unsigned long ext3_count_free (struct buffer_head * map, unsigned int numchars)
|
||||
{
|
||||
return numchars * BITS_PER_BYTE - memweight(map->b_data, numchars);
|
||||
}
|
||||
|
||||
#endif /* EXT3FS_DEBUG */
|
||||
|
537
fs/ext3/dir.c
537
fs/ext3/dir.c
|
@ -1,537 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/dir.c
|
||||
*
|
||||
* Copyright (C) 1992, 1993, 1994, 1995
|
||||
* Remy Card (card@masi.ibp.fr)
|
||||
* Laboratoire MASI - Institut Blaise Pascal
|
||||
* Universite Pierre et Marie Curie (Paris VI)
|
||||
*
|
||||
* from
|
||||
*
|
||||
* linux/fs/minix/dir.c
|
||||
*
|
||||
* Copyright (C) 1991, 1992 Linus Torvalds
|
||||
*
|
||||
* ext3 directory handling functions
|
||||
*
|
||||
* Big-endian to little-endian byte-swapping/bitmaps by
|
||||
* David S. Miller (davem@caip.rutgers.edu), 1995
|
||||
*
|
||||
* Hash Tree Directory indexing (c) 2001 Daniel Phillips
|
||||
*
|
||||
*/
|
||||
|
||||
#include <linux/compat.h>
|
||||
#include "ext3.h"
|
||||
|
||||
static unsigned char ext3_filetype_table[] = {
|
||||
DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
|
||||
};
|
||||
|
||||
static int ext3_dx_readdir(struct file *, struct dir_context *);
|
||||
|
||||
static unsigned char get_dtype(struct super_block *sb, int filetype)
|
||||
{
|
||||
if (!EXT3_HAS_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_FILETYPE) ||
|
||||
(filetype >= EXT3_FT_MAX))
|
||||
return DT_UNKNOWN;
|
||||
|
||||
return (ext3_filetype_table[filetype]);
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if the given dir-inode refers to an htree-indexed directory
|
||||
* (or a directory which could potentially get converted to use htree
|
||||
* indexing).
|
||||
*
|
||||
* Return 1 if it is a dx dir, 0 if not
|
||||
*/
|
||||
static int is_dx_dir(struct inode *inode)
|
||||
{
|
||||
struct super_block *sb = inode->i_sb;
|
||||
|
||||
if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
|
||||
EXT3_FEATURE_COMPAT_DIR_INDEX) &&
|
||||
((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
|
||||
((inode->i_size >> sb->s_blocksize_bits) == 1)))
|
||||
return 1;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int ext3_check_dir_entry (const char * function, struct inode * dir,
|
||||
struct ext3_dir_entry_2 * de,
|
||||
struct buffer_head * bh,
|
||||
unsigned long offset)
|
||||
{
|
||||
const char * error_msg = NULL;
|
||||
const int rlen = ext3_rec_len_from_disk(de->rec_len);
|
||||
|
||||
if (unlikely(rlen < EXT3_DIR_REC_LEN(1)))
|
||||
error_msg = "rec_len is smaller than minimal";
|
||||
else if (unlikely(rlen % 4 != 0))
|
||||
error_msg = "rec_len % 4 != 0";
|
||||
else if (unlikely(rlen < EXT3_DIR_REC_LEN(de->name_len)))
|
||||
error_msg = "rec_len is too small for name_len";
|
||||
else if (unlikely((((char *) de - bh->b_data) + rlen > dir->i_sb->s_blocksize)))
|
||||
error_msg = "directory entry across blocks";
|
||||
else if (unlikely(le32_to_cpu(de->inode) >
|
||||
le32_to_cpu(EXT3_SB(dir->i_sb)->s_es->s_inodes_count)))
|
||||
error_msg = "inode out of bounds";
|
||||
|
||||
if (unlikely(error_msg != NULL))
|
||||
ext3_error (dir->i_sb, function,
|
||||
"bad entry in directory #%lu: %s - "
|
||||
"offset=%lu, inode=%lu, rec_len=%d, name_len=%d",
|
||||
dir->i_ino, error_msg, offset,
|
||||
(unsigned long) le32_to_cpu(de->inode),
|
||||
rlen, de->name_len);
|
||||
|
||||
return error_msg == NULL ? 1 : 0;
|
||||
}
|
||||
|
||||
static int ext3_readdir(struct file *file, struct dir_context *ctx)
|
||||
{
|
||||
unsigned long offset;
|
||||
int i;
|
||||
struct ext3_dir_entry_2 *de;
|
||||
int err;
|
||||
struct inode *inode = file_inode(file);
|
||||
struct super_block *sb = inode->i_sb;
|
||||
int dir_has_error = 0;
|
||||
|
||||
if (is_dx_dir(inode)) {
|
||||
err = ext3_dx_readdir(file, ctx);
|
||||
if (err != ERR_BAD_DX_DIR)
|
||||
return err;
|
||||
/*
|
||||
* We don't set the inode dirty flag since it's not
|
||||
* critical that it get flushed back to the disk.
|
||||
*/
|
||||
EXT3_I(inode)->i_flags &= ~EXT3_INDEX_FL;
|
||||
}
|
||||
offset = ctx->pos & (sb->s_blocksize - 1);
|
||||
|
||||
while (ctx->pos < inode->i_size) {
|
||||
unsigned long blk = ctx->pos >> EXT3_BLOCK_SIZE_BITS(sb);
|
||||
struct buffer_head map_bh;
|
||||
struct buffer_head *bh = NULL;
|
||||
|
||||
map_bh.b_state = 0;
|
||||
err = ext3_get_blocks_handle(NULL, inode, blk, 1, &map_bh, 0);
|
||||
if (err > 0) {
|
||||
pgoff_t index = map_bh.b_blocknr >>
|
||||
(PAGE_CACHE_SHIFT - inode->i_blkbits);
|
||||
if (!ra_has_index(&file->f_ra, index))
|
||||
page_cache_sync_readahead(
|
||||
sb->s_bdev->bd_inode->i_mapping,
|
||||
&file->f_ra, file,
|
||||
index, 1);
|
||||
file->f_ra.prev_pos = (loff_t)index << PAGE_CACHE_SHIFT;
|
||||
bh = ext3_bread(NULL, inode, blk, 0, &err);
|
||||
}
|
||||
|
||||
/*
|
||||
* We ignore I/O errors on directories so users have a chance
|
||||
* of recovering data when there's a bad sector
|
||||
*/
|
||||
if (!bh) {
|
||||
if (!dir_has_error) {
|
||||
ext3_error(sb, __func__, "directory #%lu "
|
||||
"contains a hole at offset %lld",
|
||||
inode->i_ino, ctx->pos);
|
||||
dir_has_error = 1;
|
||||
}
|
||||
/* corrupt size? Maybe no more blocks to read */
|
||||
if (ctx->pos > inode->i_blocks << 9)
|
||||
break;
|
||||
ctx->pos += sb->s_blocksize - offset;
|
||||
continue;
|
||||
}
|
||||
|
||||
/* If the dir block has changed since the last call to
|
||||
* readdir(2), then we might be pointing to an invalid
|
||||
* dirent right now. Scan from the start of the block
|
||||
* to make sure. */
|
||||
if (offset && file->f_version != inode->i_version) {
|
||||
for (i = 0; i < sb->s_blocksize && i < offset; ) {
|
||||
de = (struct ext3_dir_entry_2 *)
|
||||
(bh->b_data + i);
|
||||
/* It's too expensive to do a full
|
||||
* dirent test each time round this
|
||||
* loop, but we do have to test at
|
||||
* least that it is non-zero. A
|
||||
* failure will be detected in the
|
||||
* dirent test below. */
|
||||
if (ext3_rec_len_from_disk(de->rec_len) <
|
||||
EXT3_DIR_REC_LEN(1))
|
||||
break;
|
||||
i += ext3_rec_len_from_disk(de->rec_len);
|
||||
}
|
||||
offset = i;
|
||||
ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
|
||||
| offset;
|
||||
file->f_version = inode->i_version;
|
||||
}
|
||||
|
||||
while (ctx->pos < inode->i_size
|
||||
&& offset < sb->s_blocksize) {
|
||||
de = (struct ext3_dir_entry_2 *) (bh->b_data + offset);
|
||||
if (!ext3_check_dir_entry ("ext3_readdir", inode, de,
|
||||
bh, offset)) {
|
||||
/* On error, skip the to the
|
||||
next block. */
|
||||
ctx->pos = (ctx->pos |
|
||||
(sb->s_blocksize - 1)) + 1;
|
||||
break;
|
||||
}
|
||||
offset += ext3_rec_len_from_disk(de->rec_len);
|
||||
if (le32_to_cpu(de->inode)) {
|
||||
if (!dir_emit(ctx, de->name, de->name_len,
|
||||
le32_to_cpu(de->inode),
|
||||
get_dtype(sb, de->file_type))) {
|
||||
brelse(bh);
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
ctx->pos += ext3_rec_len_from_disk(de->rec_len);
|
||||
}
|
||||
offset = 0;
|
||||
brelse (bh);
|
||||
if (ctx->pos < inode->i_size)
|
||||
if (!dir_relax(inode))
|
||||
return 0;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline int is_32bit_api(void)
|
||||
{
|
||||
#ifdef CONFIG_COMPAT
|
||||
return is_compat_task();
|
||||
#else
|
||||
return (BITS_PER_LONG == 32);
|
||||
#endif
|
||||
}
|
||||
|
||||
/*
|
||||
* These functions convert from the major/minor hash to an f_pos
|
||||
* value for dx directories
|
||||
*
|
||||
* Upper layer (for example NFS) should specify FMODE_32BITHASH or
|
||||
* FMODE_64BITHASH explicitly. On the other hand, we allow ext3 to be mounted
|
||||
* directly on both 32-bit and 64-bit nodes, under such case, neither
|
||||
* FMODE_32BITHASH nor FMODE_64BITHASH is specified.
|
||||
*/
|
||||
static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
|
||||
{
|
||||
if ((filp->f_mode & FMODE_32BITHASH) ||
|
||||
(!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
|
||||
return major >> 1;
|
||||
else
|
||||
return ((__u64)(major >> 1) << 32) | (__u64)minor;
|
||||
}
|
||||
|
||||
static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
|
||||
{
|
||||
if ((filp->f_mode & FMODE_32BITHASH) ||
|
||||
(!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
|
||||
return (pos << 1) & 0xffffffff;
|
||||
else
|
||||
return ((pos >> 32) << 1) & 0xffffffff;
|
||||
}
|
||||
|
||||
static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
|
||||
{
|
||||
if ((filp->f_mode & FMODE_32BITHASH) ||
|
||||
(!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
|
||||
return 0;
|
||||
else
|
||||
return pos & 0xffffffff;
|
||||
}
|
||||
|
||||
/*
|
||||
* Return 32- or 64-bit end-of-file for dx directories
|
||||
*/
|
||||
static inline loff_t ext3_get_htree_eof(struct file *filp)
|
||||
{
|
||||
if ((filp->f_mode & FMODE_32BITHASH) ||
|
||||
(!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
|
||||
return EXT3_HTREE_EOF_32BIT;
|
||||
else
|
||||
return EXT3_HTREE_EOF_64BIT;
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* ext3_dir_llseek() calls generic_file_llseek[_size]() to handle both
|
||||
* non-htree and htree directories, where the "offset" is in terms
|
||||
* of the filename hash value instead of the byte offset.
|
||||
*
|
||||
* Because we may return a 64-bit hash that is well beyond s_maxbytes,
|
||||
* we need to pass the max hash as the maximum allowable offset in
|
||||
* the htree directory case.
|
||||
*
|
||||
* NOTE: offsets obtained *before* ext3_set_inode_flag(dir, EXT3_INODE_INDEX)
|
||||
* will be invalid once the directory was converted into a dx directory
|
||||
*/
|
||||
static loff_t ext3_dir_llseek(struct file *file, loff_t offset, int whence)
|
||||
{
|
||||
struct inode *inode = file->f_mapping->host;
|
||||
int dx_dir = is_dx_dir(inode);
|
||||
loff_t htree_max = ext3_get_htree_eof(file);
|
||||
|
||||
if (likely(dx_dir))
|
||||
return generic_file_llseek_size(file, offset, whence,
|
||||
htree_max, htree_max);
|
||||
else
|
||||
return generic_file_llseek(file, offset, whence);
|
||||
}
|
||||
|
||||
/*
|
||||
* This structure holds the nodes of the red-black tree used to store
|
||||
* the directory entry in hash order.
|
||||
*/
|
||||
struct fname {
|
||||
__u32 hash;
|
||||
__u32 minor_hash;
|
||||
struct rb_node rb_hash;
|
||||
struct fname *next;
|
||||
__u32 inode;
|
||||
__u8 name_len;
|
||||
__u8 file_type;
|
||||
char name[0];
|
||||
};
|
||||
|
||||
/*
|
||||
* This functoin implements a non-recursive way of freeing all of the
|
||||
* nodes in the red-black tree.
|
||||
*/
|
||||
static void free_rb_tree_fname(struct rb_root *root)
|
||||
{
|
||||
struct fname *fname, *next;
|
||||
|
||||
rbtree_postorder_for_each_entry_safe(fname, next, root, rb_hash)
|
||||
do {
|
||||
struct fname *old = fname;
|
||||
fname = fname->next;
|
||||
kfree(old);
|
||||
} while (fname);
|
||||
|
||||
*root = RB_ROOT;
|
||||
}
|
||||
|
||||
static struct dir_private_info *ext3_htree_create_dir_info(struct file *filp,
|
||||
loff_t pos)
|
||||
{
|
||||
struct dir_private_info *p;
|
||||
|
||||
p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
|
||||
if (!p)
|
||||
return NULL;
|
||||
p->curr_hash = pos2maj_hash(filp, pos);
|
||||
p->curr_minor_hash = pos2min_hash(filp, pos);
|
||||
return p;
|
||||
}
|
||||
|
||||
void ext3_htree_free_dir_info(struct dir_private_info *p)
|
||||
{
|
||||
free_rb_tree_fname(&p->root);
|
||||
kfree(p);
|
||||
}
|
||||
|
||||
/*
|
||||
* Given a directory entry, enter it into the fname rb tree.
|
||||
*/
|
||||
int ext3_htree_store_dirent(struct file *dir_file, __u32 hash,
|
||||
__u32 minor_hash,
|
||||
struct ext3_dir_entry_2 *dirent)
|
||||
{
|
||||
struct rb_node **p, *parent = NULL;
|
||||
struct fname * fname, *new_fn;
|
||||
struct dir_private_info *info;
|
||||
int len;
|
||||
|
||||
info = (struct dir_private_info *) dir_file->private_data;
|
||||
p = &info->root.rb_node;
|
||||
|
||||
/* Create and allocate the fname structure */
|
||||
len = sizeof(struct fname) + dirent->name_len + 1;
|
||||
new_fn = kzalloc(len, GFP_KERNEL);
|
||||
if (!new_fn)
|
||||
return -ENOMEM;
|
||||
new_fn->hash = hash;
|
||||
new_fn->minor_hash = minor_hash;
|
||||
new_fn->inode = le32_to_cpu(dirent->inode);
|
||||
new_fn->name_len = dirent->name_len;
|
||||
new_fn->file_type = dirent->file_type;
|
||||
memcpy(new_fn->name, dirent->name, dirent->name_len);
|
||||
new_fn->name[dirent->name_len] = 0;
|
||||
|
||||
while (*p) {
|
||||
parent = *p;
|
||||
fname = rb_entry(parent, struct fname, rb_hash);
|
||||
|
||||
/*
|
||||
* If the hash and minor hash match up, then we put
|
||||
* them on a linked list. This rarely happens...
|
||||
*/
|
||||
if ((new_fn->hash == fname->hash) &&
|
||||
(new_fn->minor_hash == fname->minor_hash)) {
|
||||
new_fn->next = fname->next;
|
||||
fname->next = new_fn;
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (new_fn->hash < fname->hash)
|
||||
p = &(*p)->rb_left;
|
||||
else if (new_fn->hash > fname->hash)
|
||||
p = &(*p)->rb_right;
|
||||
else if (new_fn->minor_hash < fname->minor_hash)
|
||||
p = &(*p)->rb_left;
|
||||
else /* if (new_fn->minor_hash > fname->minor_hash) */
|
||||
p = &(*p)->rb_right;
|
||||
}
|
||||
|
||||
rb_link_node(&new_fn->rb_hash, parent, p);
|
||||
rb_insert_color(&new_fn->rb_hash, &info->root);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*
|
||||
* This is a helper function for ext3_dx_readdir. It calls filldir
|
||||
* for all entres on the fname linked list. (Normally there is only
|
||||
* one entry on the linked list, unless there are 62 bit hash collisions.)
|
||||
*/
|
||||
static bool call_filldir(struct file *file, struct dir_context *ctx,
|
||||
struct fname *fname)
|
||||
{
|
||||
struct dir_private_info *info = file->private_data;
|
||||
struct inode *inode = file_inode(file);
|
||||
struct super_block *sb = inode->i_sb;
|
||||
|
||||
if (!fname) {
|
||||
printk("call_filldir: called with null fname?!?\n");
|
||||
return true;
|
||||
}
|
||||
ctx->pos = hash2pos(file, fname->hash, fname->minor_hash);
|
||||
while (fname) {
|
||||
if (!dir_emit(ctx, fname->name, fname->name_len,
|
||||
fname->inode,
|
||||
get_dtype(sb, fname->file_type))) {
|
||||
info->extra_fname = fname;
|
||||
return false;
|
||||
}
|
||||
fname = fname->next;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
static int ext3_dx_readdir(struct file *file, struct dir_context *ctx)
|
||||
{
|
||||
struct dir_private_info *info = file->private_data;
|
||||
struct inode *inode = file_inode(file);
|
||||
struct fname *fname;
|
||||
int ret;
|
||||
|
||||
if (!info) {
|
||||
info = ext3_htree_create_dir_info(file, ctx->pos);
|
||||
if (!info)
|
||||
return -ENOMEM;
|
||||
file->private_data = info;
|
||||
}
|
||||
|
||||
if (ctx->pos == ext3_get_htree_eof(file))
|
||||
return 0; /* EOF */
|
||||
|
||||
/* Some one has messed with f_pos; reset the world */
|
||||
if (info->last_pos != ctx->pos) {
|
||||
free_rb_tree_fname(&info->root);
|
||||
info->curr_node = NULL;
|
||||
info->extra_fname = NULL;
|
||||
info->curr_hash = pos2maj_hash(file, ctx->pos);
|
||||
info->curr_minor_hash = pos2min_hash(file, ctx->pos);
|
||||
}
|
||||
|
||||
/*
|
||||
* If there are any leftover names on the hash collision
|
||||
* chain, return them first.
|
||||
*/
|
||||
if (info->extra_fname) {
|
||||
if (!call_filldir(file, ctx, info->extra_fname))
|
||||
goto finished;
|
||||
info->extra_fname = NULL;
|
||||
goto next_node;
|
||||
} else if (!info->curr_node)
|
||||
info->curr_node = rb_first(&info->root);
|
||||
|
||||
while (1) {
|
||||
/*
|
||||
* Fill the rbtree if we have no more entries,
|
||||
* or the inode has changed since we last read in the
|
||||
* cached entries.
|
||||
*/
|
||||
if ((!info->curr_node) ||
|
||||
(file->f_version != inode->i_version)) {
|
||||
info->curr_node = NULL;
|
||||
free_rb_tree_fname(&info->root);
|
||||
file->f_version = inode->i_version;
|
||||
ret = ext3_htree_fill_tree(file, info->curr_hash,
|
||||
info->curr_minor_hash,
|
||||
&info->next_hash);
|
||||
if (ret < 0)
|
||||
return ret;
|
||||
if (ret == 0) {
|
||||
ctx->pos = ext3_get_htree_eof(file);
|
||||
break;
|
||||
}
|
||||
info->curr_node = rb_first(&info->root);
|
||||
}
|
||||
|
||||
fname = rb_entry(info->curr_node, struct fname, rb_hash);
|
||||
info->curr_hash = fname->hash;
|
||||
info->curr_minor_hash = fname->minor_hash;
|
||||
if (!call_filldir(file, ctx, fname))
|
||||
break;
|
||||
next_node:
|
||||
info->curr_node = rb_next(info->curr_node);
|
||||
if (info->curr_node) {
|
||||
fname = rb_entry(info->curr_node, struct fname,
|
||||
rb_hash);
|
||||
info->curr_hash = fname->hash;
|
||||
info->curr_minor_hash = fname->minor_hash;
|
||||
} else {
|
||||
if (info->next_hash == ~0) {
|
||||
ctx->pos = ext3_get_htree_eof(file);
|
||||
break;
|
||||
}
|
||||
info->curr_hash = info->next_hash;
|
||||
info->curr_minor_hash = 0;
|
||||
}
|
||||
}
|
||||
finished:
|
||||
info->last_pos = ctx->pos;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int ext3_release_dir (struct inode * inode, struct file * filp)
|
||||
{
|
||||
if (filp->private_data)
|
||||
ext3_htree_free_dir_info(filp->private_data);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
const struct file_operations ext3_dir_operations = {
|
||||
.llseek = ext3_dir_llseek,
|
||||
.read = generic_read_dir,
|
||||
.iterate = ext3_readdir,
|
||||
.unlocked_ioctl = ext3_ioctl,
|
||||
#ifdef CONFIG_COMPAT
|
||||
.compat_ioctl = ext3_compat_ioctl,
|
||||
#endif
|
||||
.fsync = ext3_sync_file,
|
||||
.release = ext3_release_dir,
|
||||
};
|
1332
fs/ext3/ext3.h
1332
fs/ext3/ext3.h
File diff suppressed because it is too large
Load Diff
|
@ -1,59 +0,0 @@
|
|||
/*
|
||||
* Interface between ext3 and JBD
|
||||
*/
|
||||
|
||||
#include "ext3.h"
|
||||
|
||||
int __ext3_journal_get_undo_access(const char *where, handle_t *handle,
|
||||
struct buffer_head *bh)
|
||||
{
|
||||
int err = journal_get_undo_access(handle, bh);
|
||||
if (err)
|
||||
ext3_journal_abort_handle(where, __func__, bh, handle,err);
|
||||
return err;
|
||||
}
|
||||
|
||||
int __ext3_journal_get_write_access(const char *where, handle_t *handle,
|
||||
struct buffer_head *bh)
|
||||
{
|
||||
int err = journal_get_write_access(handle, bh);
|
||||
if (err)
|
||||
ext3_journal_abort_handle(where, __func__, bh, handle,err);
|
||||
return err;
|
||||
}
|
||||
|
||||
int __ext3_journal_forget(const char *where, handle_t *handle,
|
||||
struct buffer_head *bh)
|
||||
{
|
||||
int err = journal_forget(handle, bh);
|
||||
if (err)
|
||||
ext3_journal_abort_handle(where, __func__, bh, handle,err);
|
||||
return err;
|
||||
}
|
||||
|
||||
int __ext3_journal_revoke(const char *where, handle_t *handle,
|
||||
unsigned long blocknr, struct buffer_head *bh)
|
||||
{
|
||||
int err = journal_revoke(handle, blocknr, bh);
|
||||
if (err)
|
||||
ext3_journal_abort_handle(where, __func__, bh, handle,err);
|
||||
return err;
|
||||
}
|
||||
|
||||
int __ext3_journal_get_create_access(const char *where,
|
||||
handle_t *handle, struct buffer_head *bh)
|
||||
{
|
||||
int err = journal_get_create_access(handle, bh);
|
||||
if (err)
|
||||
ext3_journal_abort_handle(where, __func__, bh, handle,err);
|
||||
return err;
|
||||
}
|
||||
|
||||
int __ext3_journal_dirty_metadata(const char *where,
|
||||
handle_t *handle, struct buffer_head *bh)
|
||||
{
|
||||
int err = journal_dirty_metadata(handle, bh);
|
||||
if (err)
|
||||
ext3_journal_abort_handle(where, __func__, bh, handle,err);
|
||||
return err;
|
||||
}
|
|
@ -1,79 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/file.c
|
||||
*
|
||||
* Copyright (C) 1992, 1993, 1994, 1995
|
||||
* Remy Card (card@masi.ibp.fr)
|
||||
* Laboratoire MASI - Institut Blaise Pascal
|
||||
* Universite Pierre et Marie Curie (Paris VI)
|
||||
*
|
||||
* from
|
||||
*
|
||||
* linux/fs/minix/file.c
|
||||
*
|
||||
* Copyright (C) 1991, 1992 Linus Torvalds
|
||||
*
|
||||
* ext3 fs regular file handling primitives
|
||||
*
|
||||
* 64-bit file support on 64-bit platforms by Jakub Jelinek
|
||||
* (jj@sunsite.ms.mff.cuni.cz)
|
||||
*/
|
||||
|
||||
#include <linux/quotaops.h>
|
||||
#include "ext3.h"
|
||||
#include "xattr.h"
|
||||
#include "acl.h"
|
||||
|
||||
/*
|
||||
* Called when an inode is released. Note that this is different
|
||||
* from ext3_file_open: open gets called at every open, but release
|
||||
* gets called only when /all/ the files are closed.
|
||||
*/
|
||||
static int ext3_release_file (struct inode * inode, struct file * filp)
|
||||
{
|
||||
if (ext3_test_inode_state(inode, EXT3_STATE_FLUSH_ON_CLOSE)) {
|
||||
filemap_flush(inode->i_mapping);
|
||||
ext3_clear_inode_state(inode, EXT3_STATE_FLUSH_ON_CLOSE);
|
||||
}
|
||||
/* if we are the last writer on the inode, drop the block reservation */
|
||||
if ((filp->f_mode & FMODE_WRITE) &&
|
||||
(atomic_read(&inode->i_writecount) == 1))
|
||||
{
|
||||
mutex_lock(&EXT3_I(inode)->truncate_mutex);
|
||||
ext3_discard_reservation(inode);
|
||||
mutex_unlock(&EXT3_I(inode)->truncate_mutex);
|
||||
}
|
||||
if (is_dx(inode) && filp->private_data)
|
||||
ext3_htree_free_dir_info(filp->private_data);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
const struct file_operations ext3_file_operations = {
|
||||
.llseek = generic_file_llseek,
|
||||
.read_iter = generic_file_read_iter,
|
||||
.write_iter = generic_file_write_iter,
|
||||
.unlocked_ioctl = ext3_ioctl,
|
||||
#ifdef CONFIG_COMPAT
|
||||
.compat_ioctl = ext3_compat_ioctl,
|
||||
#endif
|
||||
.mmap = generic_file_mmap,
|
||||
.open = dquot_file_open,
|
||||
.release = ext3_release_file,
|
||||
.fsync = ext3_sync_file,
|
||||
.splice_read = generic_file_splice_read,
|
||||
.splice_write = iter_file_splice_write,
|
||||
};
|
||||
|
||||
const struct inode_operations ext3_file_inode_operations = {
|
||||
.setattr = ext3_setattr,
|
||||
#ifdef CONFIG_EXT3_FS_XATTR
|
||||
.setxattr = generic_setxattr,
|
||||
.getxattr = generic_getxattr,
|
||||
.listxattr = ext3_listxattr,
|
||||
.removexattr = generic_removexattr,
|
||||
#endif
|
||||
.get_acl = ext3_get_acl,
|
||||
.set_acl = ext3_set_acl,
|
||||
.fiemap = ext3_fiemap,
|
||||
};
|
||||
|
109
fs/ext3/fsync.c
109
fs/ext3/fsync.c
|
@ -1,109 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/fsync.c
|
||||
*
|
||||
* Copyright (C) 1993 Stephen Tweedie (sct@redhat.com)
|
||||
* from
|
||||
* Copyright (C) 1992 Remy Card (card@masi.ibp.fr)
|
||||
* Laboratoire MASI - Institut Blaise Pascal
|
||||
* Universite Pierre et Marie Curie (Paris VI)
|
||||
* from
|
||||
* linux/fs/minix/truncate.c Copyright (C) 1991, 1992 Linus Torvalds
|
||||
*
|
||||
* ext3fs fsync primitive
|
||||
*
|
||||
* Big-endian to little-endian byte-swapping/bitmaps by
|
||||
* David S. Miller (davem@caip.rutgers.edu), 1995
|
||||
*
|
||||
* Removed unnecessary code duplication for little endian machines
|
||||
* and excessive __inline__s.
|
||||
* Andi Kleen, 1997
|
||||
*
|
||||
* Major simplications and cleanup - we only need to do the metadata, because
|
||||
* we can depend on generic_block_fdatasync() to sync the data blocks.
|
||||
*/
|
||||
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/writeback.h>
|
||||
#include "ext3.h"
|
||||
|
||||
/*
|
||||
* akpm: A new design for ext3_sync_file().
|
||||
*
|
||||
* This is only called from sys_fsync(), sys_fdatasync() and sys_msync().
|
||||
* There cannot be a transaction open by this task.
|
||||
* Another task could have dirtied this inode. Its data can be in any
|
||||
* state in the journalling system.
|
||||
*
|
||||
* What we do is just kick off a commit and wait on it. This will snapshot the
|
||||
* inode to disk.
|
||||
*/
|
||||
|
||||
int ext3_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
|
||||
{
|
||||
struct inode *inode = file->f_mapping->host;
|
||||
struct ext3_inode_info *ei = EXT3_I(inode);
|
||||
journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
|
||||
int ret, needs_barrier = 0;
|
||||
tid_t commit_tid;
|
||||
|
||||
trace_ext3_sync_file_enter(file, datasync);
|
||||
|
||||
if (inode->i_sb->s_flags & MS_RDONLY) {
|
||||
/* Make sure that we read updated state */
|
||||
smp_rmb();
|
||||
if (EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ERROR_FS)
|
||||
return -EROFS;
|
||||
return 0;
|
||||
}
|
||||
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
|
||||
if (ret)
|
||||
goto out;
|
||||
|
||||
J_ASSERT(ext3_journal_current_handle() == NULL);
|
||||
|
||||
/*
|
||||
* data=writeback,ordered:
|
||||
* The caller's filemap_fdatawrite()/wait will sync the data.
|
||||
* Metadata is in the journal, we wait for a proper transaction
|
||||
* to commit here.
|
||||
*
|
||||
* data=journal:
|
||||
* filemap_fdatawrite won't do anything (the buffers are clean).
|
||||
* ext3_force_commit will write the file data into the journal and
|
||||
* will wait on that.
|
||||
* filemap_fdatawait() will encounter a ton of newly-dirtied pages
|
||||
* (they were dirtied by commit). But that's OK - the blocks are
|
||||
* safe in-journal, which is all fsync() needs to ensure.
|
||||
*/
|
||||
if (ext3_should_journal_data(inode)) {
|
||||
ret = ext3_force_commit(inode->i_sb);
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (datasync)
|
||||
commit_tid = atomic_read(&ei->i_datasync_tid);
|
||||
else
|
||||
commit_tid = atomic_read(&ei->i_sync_tid);
|
||||
|
||||
if (test_opt(inode->i_sb, BARRIER) &&
|
||||
!journal_trans_will_send_data_barrier(journal, commit_tid))
|
||||
needs_barrier = 1;
|
||||
log_start_commit(journal, commit_tid);
|
||||
ret = log_wait_commit(journal, commit_tid);
|
||||
|
||||
/*
|
||||
* In case we didn't commit a transaction, we have to flush
|
||||
* disk caches manually so that data really is on persistent
|
||||
* storage
|
||||
*/
|
||||
if (needs_barrier) {
|
||||
int err;
|
||||
|
||||
err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
|
||||
if (!ret)
|
||||
ret = err;
|
||||
}
|
||||
out:
|
||||
trace_ext3_sync_file_exit(inode, ret);
|
||||
return ret;
|
||||
}
|
206
fs/ext3/hash.c
206
fs/ext3/hash.c
|
@ -1,206 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/hash.c
|
||||
*
|
||||
* Copyright (C) 2002 by Theodore Ts'o
|
||||
*
|
||||
* This file is released under the GPL v2.
|
||||
*
|
||||
* This file may be redistributed under the terms of the GNU Public
|
||||
* License.
|
||||
*/
|
||||
|
||||
#include "ext3.h"
|
||||
#include <linux/cryptohash.h>
|
||||
|
||||
#define DELTA 0x9E3779B9
|
||||
|
||||
static void TEA_transform(__u32 buf[4], __u32 const in[])
|
||||
{
|
||||
__u32 sum = 0;
|
||||
__u32 b0 = buf[0], b1 = buf[1];
|
||||
__u32 a = in[0], b = in[1], c = in[2], d = in[3];
|
||||
int n = 16;
|
||||
|
||||
do {
|
||||
sum += DELTA;
|
||||
b0 += ((b1 << 4)+a) ^ (b1+sum) ^ ((b1 >> 5)+b);
|
||||
b1 += ((b0 << 4)+c) ^ (b0+sum) ^ ((b0 >> 5)+d);
|
||||
} while(--n);
|
||||
|
||||
buf[0] += b0;
|
||||
buf[1] += b1;
|
||||
}
|
||||
|
||||
|
||||
/* The old legacy hash */
|
||||
static __u32 dx_hack_hash_unsigned(const char *name, int len)
|
||||
{
|
||||
__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
|
||||
const unsigned char *ucp = (const unsigned char *) name;
|
||||
|
||||
while (len--) {
|
||||
hash = hash1 + (hash0 ^ (((int) *ucp++) * 7152373));
|
||||
|
||||
if (hash & 0x80000000)
|
||||
hash -= 0x7fffffff;
|
||||
hash1 = hash0;
|
||||
hash0 = hash;
|
||||
}
|
||||
return hash0 << 1;
|
||||
}
|
||||
|
||||
static __u32 dx_hack_hash_signed(const char *name, int len)
|
||||
{
|
||||
__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
|
||||
const signed char *scp = (const signed char *) name;
|
||||
|
||||
while (len--) {
|
||||
hash = hash1 + (hash0 ^ (((int) *scp++) * 7152373));
|
||||
|
||||
if (hash & 0x80000000)
|
||||
hash -= 0x7fffffff;
|
||||
hash1 = hash0;
|
||||
hash0 = hash;
|
||||
}
|
||||
return hash0 << 1;
|
||||
}
|
||||
|
||||
static void str2hashbuf_signed(const char *msg, int len, __u32 *buf, int num)
|
||||
{
|
||||
__u32 pad, val;
|
||||
int i;
|
||||
const signed char *scp = (const signed char *) msg;
|
||||
|
||||
pad = (__u32)len | ((__u32)len << 8);
|
||||
pad |= pad << 16;
|
||||
|
||||
val = pad;
|
||||
if (len > num*4)
|
||||
len = num * 4;
|
||||
for (i = 0; i < len; i++) {
|
||||
if ((i % 4) == 0)
|
||||
val = pad;
|
||||
val = ((int) scp[i]) + (val << 8);
|
||||
if ((i % 4) == 3) {
|
||||
*buf++ = val;
|
||||
val = pad;
|
||||
num--;
|
||||
}
|
||||
}
|
||||
if (--num >= 0)
|
||||
*buf++ = val;
|
||||
while (--num >= 0)
|
||||
*buf++ = pad;
|
||||
}
|
||||
|
||||
static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num)
|
||||
{
|
||||
__u32 pad, val;
|
||||
int i;
|
||||
const unsigned char *ucp = (const unsigned char *) msg;
|
||||
|
||||
pad = (__u32)len | ((__u32)len << 8);
|
||||
pad |= pad << 16;
|
||||
|
||||
val = pad;
|
||||
if (len > num*4)
|
||||
len = num * 4;
|
||||
for (i=0; i < len; i++) {
|
||||
if ((i % 4) == 0)
|
||||
val = pad;
|
||||
val = ((int) ucp[i]) + (val << 8);
|
||||
if ((i % 4) == 3) {
|
||||
*buf++ = val;
|
||||
val = pad;
|
||||
num--;
|
||||
}
|
||||
}
|
||||
if (--num >= 0)
|
||||
*buf++ = val;
|
||||
while (--num >= 0)
|
||||
*buf++ = pad;
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns the hash of a filename. If len is 0 and name is NULL, then
|
||||
* this function can be used to test whether or not a hash version is
|
||||
* supported.
|
||||
*
|
||||
* The seed is an 4 longword (32 bits) "secret" which can be used to
|
||||
* uniquify a hash. If the seed is all zero's, then some default seed
|
||||
* may be used.
|
||||
*
|
||||
* A particular hash version specifies whether or not the seed is
|
||||
* represented, and whether or not the returned hash is 32 bits or 64
|
||||
* bits. 32 bit hashes will return 0 for the minor hash.
|
||||
*/
|
||||
int ext3fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
|
||||
{
|
||||
__u32 hash;
|
||||
__u32 minor_hash = 0;
|
||||
const char *p;
|
||||
int i;
|
||||
__u32 in[8], buf[4];
|
||||
void (*str2hashbuf)(const char *, int, __u32 *, int) =
|
||||
str2hashbuf_signed;
|
||||
|
||||
/* Initialize the default seed for the hash checksum functions */
|
||||
buf[0] = 0x67452301;
|
||||
buf[1] = 0xefcdab89;
|
||||
buf[2] = 0x98badcfe;
|
||||
buf[3] = 0x10325476;
|
||||
|
||||
/* Check to see if the seed is all zero's */
|
||||
if (hinfo->seed) {
|
||||
for (i=0; i < 4; i++) {
|
||||
if (hinfo->seed[i])
|
||||
break;
|
||||
}
|
||||
if (i < 4)
|
||||
memcpy(buf, hinfo->seed, sizeof(buf));
|
||||
}
|
||||
|
||||
switch (hinfo->hash_version) {
|
||||
case DX_HASH_LEGACY_UNSIGNED:
|
||||
hash = dx_hack_hash_unsigned(name, len);
|
||||
break;
|
||||
case DX_HASH_LEGACY:
|
||||
hash = dx_hack_hash_signed(name, len);
|
||||
break;
|
||||
case DX_HASH_HALF_MD4_UNSIGNED:
|
||||
str2hashbuf = str2hashbuf_unsigned;
|
||||
case DX_HASH_HALF_MD4:
|
||||
p = name;
|
||||
while (len > 0) {
|
||||
(*str2hashbuf)(p, len, in, 8);
|
||||
half_md4_transform(buf, in);
|
||||
len -= 32;
|
||||
p += 32;
|
||||
}
|
||||
minor_hash = buf[2];
|
||||
hash = buf[1];
|
||||
break;
|
||||
case DX_HASH_TEA_UNSIGNED:
|
||||
str2hashbuf = str2hashbuf_unsigned;
|
||||
case DX_HASH_TEA:
|
||||
p = name;
|
||||
while (len > 0) {
|
||||
(*str2hashbuf)(p, len, in, 4);
|
||||
TEA_transform(buf, in);
|
||||
len -= 16;
|
||||
p += 16;
|
||||
}
|
||||
hash = buf[0];
|
||||
minor_hash = buf[1];
|
||||
break;
|
||||
default:
|
||||
hinfo->hash = 0;
|
||||
return -1;
|
||||
}
|
||||
hash = hash & ~1;
|
||||
if (hash == (EXT3_HTREE_EOF_32BIT << 1))
|
||||
hash = (EXT3_HTREE_EOF_32BIT - 1) << 1;
|
||||
hinfo->hash = hash;
|
||||
hinfo->minor_hash = minor_hash;
|
||||
return 0;
|
||||
}
|
706
fs/ext3/ialloc.c
706
fs/ext3/ialloc.c
|
@ -1,706 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/ialloc.c
|
||||
*
|
||||
* Copyright (C) 1992, 1993, 1994, 1995
|
||||
* Remy Card (card@masi.ibp.fr)
|
||||
* Laboratoire MASI - Institut Blaise Pascal
|
||||
* Universite Pierre et Marie Curie (Paris VI)
|
||||
*
|
||||
* BSD ufs-inspired inode and directory allocation by
|
||||
* Stephen Tweedie (sct@redhat.com), 1993
|
||||
* Big-endian to little-endian byte-swapping/bitmaps by
|
||||
* David S. Miller (davem@caip.rutgers.edu), 1995
|
||||
*/
|
||||
|
||||
#include <linux/quotaops.h>
|
||||
#include <linux/random.h>
|
||||
|
||||
#include "ext3.h"
|
||||
#include "xattr.h"
|
||||
#include "acl.h"
|
||||
|
||||
/*
|
||||
* ialloc.c contains the inodes allocation and deallocation routines
|
||||
*/
|
||||
|
||||
/*
|
||||
* The free inodes are managed by bitmaps. A file system contains several
|
||||
* blocks groups. Each group contains 1 bitmap block for blocks, 1 bitmap
|
||||
* block for inodes, N blocks for the inode table and data blocks.
|
||||
*
|
||||
* The file system contains group descriptors which are located after the
|
||||
* super block. Each descriptor contains the number of the bitmap block and
|
||||
* the free blocks count in the block.
|
||||
*/
|
||||
|
||||
|
||||
/*
|
||||
* Read the inode allocation bitmap for a given block_group, reading
|
||||
* into the specified slot in the superblock's bitmap cache.
|
||||
*
|
||||
* Return buffer_head of bitmap on success or NULL.
|
||||
*/
|
||||
static struct buffer_head *
|
||||
read_inode_bitmap(struct super_block * sb, unsigned long block_group)
|
||||
{
|
||||
struct ext3_group_desc *desc;
|
||||
struct buffer_head *bh = NULL;
|
||||
|
||||
desc = ext3_get_group_desc(sb, block_group, NULL);
|
||||
if (!desc)
|
||||
goto error_out;
|
||||
|
||||
bh = sb_bread(sb, le32_to_cpu(desc->bg_inode_bitmap));
|
||||
if (!bh)
|
||||
ext3_error(sb, "read_inode_bitmap",
|
||||
"Cannot read inode bitmap - "
|
||||
"block_group = %lu, inode_bitmap = %u",
|
||||
block_group, le32_to_cpu(desc->bg_inode_bitmap));
|
||||
error_out:
|
||||
return bh;
|
||||
}
|
||||
|
||||
/*
|
||||
* NOTE! When we get the inode, we're the only people
|
||||
* that have access to it, and as such there are no
|
||||
* race conditions we have to worry about. The inode
|
||||
* is not on the hash-lists, and it cannot be reached
|
||||
* through the filesystem because the directory entry
|
||||
* has been deleted earlier.
|
||||
*
|
||||
* HOWEVER: we must make sure that we get no aliases,
|
||||
* which means that we have to call "clear_inode()"
|
||||
* _before_ we mark the inode not in use in the inode
|
||||
* bitmaps. Otherwise a newly created file might use
|
||||
* the same inode number (not actually the same pointer
|
||||
* though), and then we'd have two inodes sharing the
|
||||
* same inode number and space on the harddisk.
|
||||
*/
|
||||
void ext3_free_inode (handle_t *handle, struct inode * inode)
|
||||
{
|
||||
struct super_block * sb = inode->i_sb;
|
||||
int is_directory;
|
||||
unsigned long ino;
|
||||
struct buffer_head *bitmap_bh = NULL;
|
||||
struct buffer_head *bh2;
|
||||
unsigned long block_group;
|
||||
unsigned long bit;
|
||||
struct ext3_group_desc * gdp;
|
||||
struct ext3_super_block * es;
|
||||
struct ext3_sb_info *sbi;
|
||||
int fatal = 0, err;
|
||||
|
||||
if (atomic_read(&inode->i_count) > 1) {
|
||||
printk ("ext3_free_inode: inode has count=%d\n",
|
||||
atomic_read(&inode->i_count));
|
||||
return;
|
||||
}
|
||||
if (inode->i_nlink) {
|
||||
printk ("ext3_free_inode: inode has nlink=%d\n",
|
||||
inode->i_nlink);
|
||||
return;
|
||||
}
|
||||
if (!sb) {
|
||||
printk("ext3_free_inode: inode on nonexistent device\n");
|
||||
return;
|
||||
}
|
||||
sbi = EXT3_SB(sb);
|
||||
|
||||
ino = inode->i_ino;
|
||||
ext3_debug ("freeing inode %lu\n", ino);
|
||||
trace_ext3_free_inode(inode);
|
||||
|
||||
is_directory = S_ISDIR(inode->i_mode);
|
||||
|
||||
es = EXT3_SB(sb)->s_es;
|
||||
if (ino < EXT3_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) {
|
||||
ext3_error (sb, "ext3_free_inode",
|
||||
"reserved or nonexistent inode %lu", ino);
|
||||
goto error_return;
|
||||
}
|
||||
block_group = (ino - 1) / EXT3_INODES_PER_GROUP(sb);
|
||||
bit = (ino - 1) % EXT3_INODES_PER_GROUP(sb);
|
||||
bitmap_bh = read_inode_bitmap(sb, block_group);
|
||||
if (!bitmap_bh)
|
||||
goto error_return;
|
||||
|
||||
BUFFER_TRACE(bitmap_bh, "get_write_access");
|
||||
fatal = ext3_journal_get_write_access(handle, bitmap_bh);
|
||||
if (fatal)
|
||||
goto error_return;
|
||||
|
||||
/* Ok, now we can actually update the inode bitmaps.. */
|
||||
if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, block_group),
|
||||
bit, bitmap_bh->b_data))
|
||||
ext3_error (sb, "ext3_free_inode",
|
||||
"bit already cleared for inode %lu", ino);
|
||||
else {
|
||||
gdp = ext3_get_group_desc (sb, block_group, &bh2);
|
||||
|
||||
BUFFER_TRACE(bh2, "get_write_access");
|
||||
fatal = ext3_journal_get_write_access(handle, bh2);
|
||||
if (fatal) goto error_return;
|
||||
|
||||
if (gdp) {
|
||||
spin_lock(sb_bgl_lock(sbi, block_group));
|
||||
le16_add_cpu(&gdp->bg_free_inodes_count, 1);
|
||||
if (is_directory)
|
||||
le16_add_cpu(&gdp->bg_used_dirs_count, -1);
|
||||
spin_unlock(sb_bgl_lock(sbi, block_group));
|
||||
percpu_counter_inc(&sbi->s_freeinodes_counter);
|
||||
if (is_directory)
|
||||
percpu_counter_dec(&sbi->s_dirs_counter);
|
||||
|
||||
}
|
||||
BUFFER_TRACE(bh2, "call ext3_journal_dirty_metadata");
|
||||
err = ext3_journal_dirty_metadata(handle, bh2);
|
||||
if (!fatal) fatal = err;
|
||||
}
|
||||
BUFFER_TRACE(bitmap_bh, "call ext3_journal_dirty_metadata");
|
||||
err = ext3_journal_dirty_metadata(handle, bitmap_bh);
|
||||
if (!fatal)
|
||||
fatal = err;
|
||||
|
||||
error_return:
|
||||
brelse(bitmap_bh);
|
||||
ext3_std_error(sb, fatal);
|
||||
}
|
||||
|
||||
/*
|
||||
* Orlov's allocator for directories.
|
||||
*
|
||||
* We always try to spread first-level directories.
|
||||
*
|
||||
* If there are blockgroups with both free inodes and free blocks counts
|
||||
* not worse than average we return one with smallest directory count.
|
||||
* Otherwise we simply return a random group.
|
||||
*
|
||||
* For the rest rules look so:
|
||||
*
|
||||
* It's OK to put directory into a group unless
|
||||
* it has too many directories already (max_dirs) or
|
||||
* it has too few free inodes left (min_inodes) or
|
||||
* it has too few free blocks left (min_blocks).
|
||||
* Parent's group is preferred, if it doesn't satisfy these
|
||||
* conditions we search cyclically through the rest. If none
|
||||
* of the groups look good we just look for a group with more
|
||||
* free inodes than average (starting at parent's group).
|
||||
*
|
||||
* Debt is incremented each time we allocate a directory and decremented
|
||||
* when we allocate an inode, within 0--255.
|
||||
*/
|
||||
|
||||
static int find_group_orlov(struct super_block *sb, struct inode *parent)
|
||||
{
|
||||
int parent_group = EXT3_I(parent)->i_block_group;
|
||||
struct ext3_sb_info *sbi = EXT3_SB(sb);
|
||||
int ngroups = sbi->s_groups_count;
|
||||
int inodes_per_group = EXT3_INODES_PER_GROUP(sb);
|
||||
unsigned int freei, avefreei;
|
||||
ext3_fsblk_t freeb, avefreeb;
|
||||
unsigned int ndirs;
|
||||
int max_dirs, min_inodes;
|
||||
ext3_grpblk_t min_blocks;
|
||||
int group = -1, i;
|
||||
struct ext3_group_desc *desc;
|
||||
|
||||
freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
|
||||
avefreei = freei / ngroups;
|
||||
freeb = percpu_counter_read_positive(&sbi->s_freeblocks_counter);
|
||||
avefreeb = freeb / ngroups;
|
||||
ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
|
||||
|
||||
if ((parent == d_inode(sb->s_root)) ||
|
||||
(EXT3_I(parent)->i_flags & EXT3_TOPDIR_FL)) {
|
||||
int best_ndir = inodes_per_group;
|
||||
int best_group = -1;
|
||||
|
||||
group = prandom_u32();
|
||||
parent_group = (unsigned)group % ngroups;
|
||||
for (i = 0; i < ngroups; i++) {
|
||||
group = (parent_group + i) % ngroups;
|
||||
desc = ext3_get_group_desc (sb, group, NULL);
|
||||
if (!desc || !desc->bg_free_inodes_count)
|
||||
continue;
|
||||
if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir)
|
||||
continue;
|
||||
if (le16_to_cpu(desc->bg_free_inodes_count) < avefreei)
|
||||
continue;
|
||||
if (le16_to_cpu(desc->bg_free_blocks_count) < avefreeb)
|
||||
continue;
|
||||
best_group = group;
|
||||
best_ndir = le16_to_cpu(desc->bg_used_dirs_count);
|
||||
}
|
||||
if (best_group >= 0)
|
||||
return best_group;
|
||||
goto fallback;
|
||||
}
|
||||
|
||||
max_dirs = ndirs / ngroups + inodes_per_group / 16;
|
||||
min_inodes = avefreei - inodes_per_group / 4;
|
||||
min_blocks = avefreeb - EXT3_BLOCKS_PER_GROUP(sb) / 4;
|
||||
|
||||
for (i = 0; i < ngroups; i++) {
|
||||
group = (parent_group + i) % ngroups;
|
||||
desc = ext3_get_group_desc (sb, group, NULL);
|
||||
if (!desc || !desc->bg_free_inodes_count)
|
||||
continue;
|
||||
if (le16_to_cpu(desc->bg_used_dirs_count) >= max_dirs)
|
||||
continue;
|
||||
if (le16_to_cpu(desc->bg_free_inodes_count) < min_inodes)
|
||||
continue;
|
||||
if (le16_to_cpu(desc->bg_free_blocks_count) < min_blocks)
|
||||
continue;
|
||||
return group;
|
||||
}
|
||||
|
||||
fallback:
|
||||
for (i = 0; i < ngroups; i++) {
|
||||
group = (parent_group + i) % ngroups;
|
||||
desc = ext3_get_group_desc (sb, group, NULL);
|
||||
if (!desc || !desc->bg_free_inodes_count)
|
||||
continue;
|
||||
if (le16_to_cpu(desc->bg_free_inodes_count) >= avefreei)
|
||||
return group;
|
||||
}
|
||||
|
||||
if (avefreei) {
|
||||
/*
|
||||
* The free-inodes counter is approximate, and for really small
|
||||
* filesystems the above test can fail to find any blockgroups
|
||||
*/
|
||||
avefreei = 0;
|
||||
goto fallback;
|
||||
}
|
||||
|
||||
return -1;
|
||||
}
|
||||
|
||||
static int find_group_other(struct super_block *sb, struct inode *parent)
|
||||
{
|
||||
int parent_group = EXT3_I(parent)->i_block_group;
|
||||
int ngroups = EXT3_SB(sb)->s_groups_count;
|
||||
struct ext3_group_desc *desc;
|
||||
int group, i;
|
||||
|
||||
/*
|
||||
* Try to place the inode in its parent directory
|
||||
*/
|
||||
group = parent_group;
|
||||
desc = ext3_get_group_desc (sb, group, NULL);
|
||||
if (desc && le16_to_cpu(desc->bg_free_inodes_count) &&
|
||||
le16_to_cpu(desc->bg_free_blocks_count))
|
||||
return group;
|
||||
|
||||
/*
|
||||
* We're going to place this inode in a different blockgroup from its
|
||||
* parent. We want to cause files in a common directory to all land in
|
||||
* the same blockgroup. But we want files which are in a different
|
||||
* directory which shares a blockgroup with our parent to land in a
|
||||
* different blockgroup.
|
||||
*
|
||||
* So add our directory's i_ino into the starting point for the hash.
|
||||
*/
|
||||
group = (group + parent->i_ino) % ngroups;
|
||||
|
||||
/*
|
||||
* Use a quadratic hash to find a group with a free inode and some free
|
||||
* blocks.
|
||||
*/
|
||||
for (i = 1; i < ngroups; i <<= 1) {
|
||||
group += i;
|
||||
if (group >= ngroups)
|
||||
group -= ngroups;
|
||||
desc = ext3_get_group_desc (sb, group, NULL);
|
||||
if (desc && le16_to_cpu(desc->bg_free_inodes_count) &&
|
||||
le16_to_cpu(desc->bg_free_blocks_count))
|
||||
return group;
|
||||
}
|
||||
|
||||
/*
|
||||
* That failed: try linear search for a free inode, even if that group
|
||||
* has no free blocks.
|
||||
*/
|
||||
group = parent_group;
|
||||
for (i = 0; i < ngroups; i++) {
|
||||
if (++group >= ngroups)
|
||||
group = 0;
|
||||
desc = ext3_get_group_desc (sb, group, NULL);
|
||||
if (desc && le16_to_cpu(desc->bg_free_inodes_count))
|
||||
return group;
|
||||
}
|
||||
|
||||
return -1;
|
||||
}
|
||||
|
||||
/*
|
||||
* There are two policies for allocating an inode. If the new inode is
|
||||
* a directory, then a forward search is made for a block group with both
|
||||
* free space and a low directory-to-inode ratio; if that fails, then of
|
||||
* the groups with above-average free space, that group with the fewest
|
||||
* directories already is chosen.
|
||||
*
|
||||
* For other inodes, search forward from the parent directory's block
|
||||
* group to find a free inode.
|
||||
*/
|
||||
struct inode *ext3_new_inode(handle_t *handle, struct inode * dir,
|
||||
const struct qstr *qstr, umode_t mode)
|
||||
{
|
||||
struct super_block *sb;
|
||||
struct buffer_head *bitmap_bh = NULL;
|
||||
struct buffer_head *bh2;
|
||||
int group;
|
||||
unsigned long ino = 0;
|
||||
struct inode * inode;
|
||||
struct ext3_group_desc * gdp = NULL;
|
||||
struct ext3_super_block * es;
|
||||
struct ext3_inode_info *ei;
|
||||
struct ext3_sb_info *sbi;
|
||||
int err = 0;
|
||||
struct inode *ret;
|
||||
int i;
|
||||
|
||||
/* Cannot create files in a deleted directory */
|
||||
if (!dir || !dir->i_nlink)
|
||||
return ERR_PTR(-EPERM);
|
||||
|
||||
sb = dir->i_sb;
|
||||
trace_ext3_request_inode(dir, mode);
|
||||
inode = new_inode(sb);
|
||||
if (!inode)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
ei = EXT3_I(inode);
|
||||
|
||||
sbi = EXT3_SB(sb);
|
||||
es = sbi->s_es;
|
||||
if (S_ISDIR(mode))
|
||||
group = find_group_orlov(sb, dir);
|
||||
else
|
||||
group = find_group_other(sb, dir);
|
||||
|
||||
err = -ENOSPC;
|
||||
if (group == -1)
|
||||
goto out;
|
||||
|
||||
for (i = 0; i < sbi->s_groups_count; i++) {
|
||||
err = -EIO;
|
||||
|
||||
gdp = ext3_get_group_desc(sb, group, &bh2);
|
||||
if (!gdp)
|
||||
goto fail;
|
||||
|
||||
brelse(bitmap_bh);
|
||||
bitmap_bh = read_inode_bitmap(sb, group);
|
||||
if (!bitmap_bh)
|
||||
goto fail;
|
||||
|
||||
ino = 0;
|
||||
|
||||
repeat_in_this_group:
|
||||
ino = ext3_find_next_zero_bit((unsigned long *)
|
||||
bitmap_bh->b_data, EXT3_INODES_PER_GROUP(sb), ino);
|
||||
if (ino < EXT3_INODES_PER_GROUP(sb)) {
|
||||
|
||||
BUFFER_TRACE(bitmap_bh, "get_write_access");
|
||||
err = ext3_journal_get_write_access(handle, bitmap_bh);
|
||||
if (err)
|
||||
goto fail;
|
||||
|
||||
if (!ext3_set_bit_atomic(sb_bgl_lock(sbi, group),
|
||||
ino, bitmap_bh->b_data)) {
|
||||
/* we won it */
|
||||
BUFFER_TRACE(bitmap_bh,
|
||||
"call ext3_journal_dirty_metadata");
|
||||
err = ext3_journal_dirty_metadata(handle,
|
||||
bitmap_bh);
|
||||
if (err)
|
||||
goto fail;
|
||||
goto got;
|
||||
}
|
||||
/* we lost it */
|
||||
journal_release_buffer(handle, bitmap_bh);
|
||||
|
||||
if (++ino < EXT3_INODES_PER_GROUP(sb))
|
||||
goto repeat_in_this_group;
|
||||
}
|
||||
|
||||
/*
|
||||
* This case is possible in concurrent environment. It is very
|
||||
* rare. We cannot repeat the find_group_xxx() call because
|
||||
* that will simply return the same blockgroup, because the
|
||||
* group descriptor metadata has not yet been updated.
|
||||
* So we just go onto the next blockgroup.
|
||||
*/
|
||||
if (++group == sbi->s_groups_count)
|
||||
group = 0;
|
||||
}
|
||||
err = -ENOSPC;
|
||||
goto out;
|
||||
|
||||
got:
|
||||
ino += group * EXT3_INODES_PER_GROUP(sb) + 1;
|
||||
if (ino < EXT3_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) {
|
||||
ext3_error (sb, "ext3_new_inode",
|
||||
"reserved inode or inode > inodes count - "
|
||||
"block_group = %d, inode=%lu", group, ino);
|
||||
err = -EIO;
|
||||
goto fail;
|
||||
}
|
||||
|
||||
BUFFER_TRACE(bh2, "get_write_access");
|
||||
err = ext3_journal_get_write_access(handle, bh2);
|
||||
if (err) goto fail;
|
||||
spin_lock(sb_bgl_lock(sbi, group));
|
||||
le16_add_cpu(&gdp->bg_free_inodes_count, -1);
|
||||
if (S_ISDIR(mode)) {
|
||||
le16_add_cpu(&gdp->bg_used_dirs_count, 1);
|
||||
}
|
||||
spin_unlock(sb_bgl_lock(sbi, group));
|
||||
BUFFER_TRACE(bh2, "call ext3_journal_dirty_metadata");
|
||||
err = ext3_journal_dirty_metadata(handle, bh2);
|
||||
if (err) goto fail;
|
||||
|
||||
percpu_counter_dec(&sbi->s_freeinodes_counter);
|
||||
if (S_ISDIR(mode))
|
||||
percpu_counter_inc(&sbi->s_dirs_counter);
|
||||
|
||||
|
||||
if (test_opt(sb, GRPID)) {
|
||||
inode->i_mode = mode;
|
||||
inode->i_uid = current_fsuid();
|
||||
inode->i_gid = dir->i_gid;
|
||||
} else
|
||||
inode_init_owner(inode, dir, mode);
|
||||
|
||||
inode->i_ino = ino;
|
||||
/* This is the optimal IO size (for stat), not the fs block size */
|
||||
inode->i_blocks = 0;
|
||||
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;
|
||||
|
||||
memset(ei->i_data, 0, sizeof(ei->i_data));
|
||||
ei->i_dir_start_lookup = 0;
|
||||
ei->i_disksize = 0;
|
||||
|
||||
ei->i_flags =
|
||||
ext3_mask_flags(mode, EXT3_I(dir)->i_flags & EXT3_FL_INHERITED);
|
||||
#ifdef EXT3_FRAGMENTS
|
||||
ei->i_faddr = 0;
|
||||
ei->i_frag_no = 0;
|
||||
ei->i_frag_size = 0;
|
||||
#endif
|
||||
ei->i_file_acl = 0;
|
||||
ei->i_dir_acl = 0;
|
||||
ei->i_dtime = 0;
|
||||
ei->i_block_alloc_info = NULL;
|
||||
ei->i_block_group = group;
|
||||
|
||||
ext3_set_inode_flags(inode);
|
||||
if (IS_DIRSYNC(inode))
|
||||
handle->h_sync = 1;
|
||||
if (insert_inode_locked(inode) < 0) {
|
||||
/*
|
||||
* Likely a bitmap corruption causing inode to be allocated
|
||||
* twice.
|
||||
*/
|
||||
err = -EIO;
|
||||
goto fail;
|
||||
}
|
||||
spin_lock(&sbi->s_next_gen_lock);
|
||||
inode->i_generation = sbi->s_next_generation++;
|
||||
spin_unlock(&sbi->s_next_gen_lock);
|
||||
|
||||
ei->i_state_flags = 0;
|
||||
ext3_set_inode_state(inode, EXT3_STATE_NEW);
|
||||
|
||||
/* See comment in ext3_iget for explanation */
|
||||
if (ino >= EXT3_FIRST_INO(sb) + 1 &&
|
||||
EXT3_INODE_SIZE(sb) > EXT3_GOOD_OLD_INODE_SIZE) {
|
||||
ei->i_extra_isize =
|
||||
sizeof(struct ext3_inode) - EXT3_GOOD_OLD_INODE_SIZE;
|
||||
} else {
|
||||
ei->i_extra_isize = 0;
|
||||
}
|
||||
|
||||
ret = inode;
|
||||
dquot_initialize(inode);
|
||||
err = dquot_alloc_inode(inode);
|
||||
if (err)
|
||||
goto fail_drop;
|
||||
|
||||
err = ext3_init_acl(handle, inode, dir);
|
||||
if (err)
|
||||
goto fail_free_drop;
|
||||
|
||||
err = ext3_init_security(handle, inode, dir, qstr);
|
||||
if (err)
|
||||
goto fail_free_drop;
|
||||
|
||||
err = ext3_mark_inode_dirty(handle, inode);
|
||||
if (err) {
|
||||
ext3_std_error(sb, err);
|
||||
goto fail_free_drop;
|
||||
}
|
||||
|
||||
ext3_debug("allocating inode %lu\n", inode->i_ino);
|
||||
trace_ext3_allocate_inode(inode, dir, mode);
|
||||
goto really_out;
|
||||
fail:
|
||||
ext3_std_error(sb, err);
|
||||
out:
|
||||
iput(inode);
|
||||
ret = ERR_PTR(err);
|
||||
really_out:
|
||||
brelse(bitmap_bh);
|
||||
return ret;
|
||||
|
||||
fail_free_drop:
|
||||
dquot_free_inode(inode);
|
||||
|
||||
fail_drop:
|
||||
dquot_drop(inode);
|
||||
inode->i_flags |= S_NOQUOTA;
|
||||
clear_nlink(inode);
|
||||
unlock_new_inode(inode);
|
||||
iput(inode);
|
||||
brelse(bitmap_bh);
|
||||
return ERR_PTR(err);
|
||||
}
|
||||
|
||||
/* Verify that we are loading a valid orphan from disk */
|
||||
struct inode *ext3_orphan_get(struct super_block *sb, unsigned long ino)
|
||||
{
|
||||
unsigned long max_ino = le32_to_cpu(EXT3_SB(sb)->s_es->s_inodes_count);
|
||||
unsigned long block_group;
|
||||
int bit;
|
||||
struct buffer_head *bitmap_bh;
|
||||
struct inode *inode = NULL;
|
||||
long err = -EIO;
|
||||
|
||||
/* Error cases - e2fsck has already cleaned up for us */
|
||||
if (ino > max_ino) {
|
||||
ext3_warning(sb, __func__,
|
||||
"bad orphan ino %lu! e2fsck was run?", ino);
|
||||
goto error;
|
||||
}
|
||||
|
||||
block_group = (ino - 1) / EXT3_INODES_PER_GROUP(sb);
|
||||
bit = (ino - 1) % EXT3_INODES_PER_GROUP(sb);
|
||||
bitmap_bh = read_inode_bitmap(sb, block_group);
|
||||
if (!bitmap_bh) {
|
||||
ext3_warning(sb, __func__,
|
||||
"inode bitmap error for orphan %lu", ino);
|
||||
goto error;
|
||||
}
|
||||
|
||||
/* Having the inode bit set should be a 100% indicator that this
|
||||
* is a valid orphan (no e2fsck run on fs). Orphans also include
|
||||
* inodes that were being truncated, so we can't check i_nlink==0.
|
||||
*/
|
||||
if (!ext3_test_bit(bit, bitmap_bh->b_data))
|
||||
goto bad_orphan;
|
||||
|
||||
inode = ext3_iget(sb, ino);
|
||||
if (IS_ERR(inode))
|
||||
goto iget_failed;
|
||||
|
||||
/*
|
||||
* If the orphans has i_nlinks > 0 then it should be able to be
|
||||
* truncated, otherwise it won't be removed from the orphan list
|
||||
* during processing and an infinite loop will result.
|
||||
*/
|
||||
if (inode->i_nlink && !ext3_can_truncate(inode))
|
||||
goto bad_orphan;
|
||||
|
||||
if (NEXT_ORPHAN(inode) > max_ino)
|
||||
goto bad_orphan;
|
||||
brelse(bitmap_bh);
|
||||
return inode;
|
||||
|
||||
iget_failed:
|
||||
err = PTR_ERR(inode);
|
||||
inode = NULL;
|
||||
bad_orphan:
|
||||
ext3_warning(sb, __func__,
|
||||
"bad orphan inode %lu! e2fsck was run?", ino);
|
||||
printk(KERN_NOTICE "ext3_test_bit(bit=%d, block=%llu) = %d\n",
|
||||
bit, (unsigned long long)bitmap_bh->b_blocknr,
|
||||
ext3_test_bit(bit, bitmap_bh->b_data));
|
||||
printk(KERN_NOTICE "inode=%p\n", inode);
|
||||
if (inode) {
|
||||
printk(KERN_NOTICE "is_bad_inode(inode)=%d\n",
|
||||
is_bad_inode(inode));
|
||||
printk(KERN_NOTICE "NEXT_ORPHAN(inode)=%u\n",
|
||||
NEXT_ORPHAN(inode));
|
||||
printk(KERN_NOTICE "max_ino=%lu\n", max_ino);
|
||||
printk(KERN_NOTICE "i_nlink=%u\n", inode->i_nlink);
|
||||
/* Avoid freeing blocks if we got a bad deleted inode */
|
||||
if (inode->i_nlink == 0)
|
||||
inode->i_blocks = 0;
|
||||
iput(inode);
|
||||
}
|
||||
brelse(bitmap_bh);
|
||||
error:
|
||||
return ERR_PTR(err);
|
||||
}
|
||||
|
||||
unsigned long ext3_count_free_inodes (struct super_block * sb)
|
||||
{
|
||||
unsigned long desc_count;
|
||||
struct ext3_group_desc *gdp;
|
||||
int i;
|
||||
#ifdef EXT3FS_DEBUG
|
||||
struct ext3_super_block *es;
|
||||
unsigned long bitmap_count, x;
|
||||
struct buffer_head *bitmap_bh = NULL;
|
||||
|
||||
es = EXT3_SB(sb)->s_es;
|
||||
desc_count = 0;
|
||||
bitmap_count = 0;
|
||||
gdp = NULL;
|
||||
for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
|
||||
gdp = ext3_get_group_desc (sb, i, NULL);
|
||||
if (!gdp)
|
||||
continue;
|
||||
desc_count += le16_to_cpu(gdp->bg_free_inodes_count);
|
||||
brelse(bitmap_bh);
|
||||
bitmap_bh = read_inode_bitmap(sb, i);
|
||||
if (!bitmap_bh)
|
||||
continue;
|
||||
|
||||
x = ext3_count_free(bitmap_bh, EXT3_INODES_PER_GROUP(sb) / 8);
|
||||
printk("group %d: stored = %d, counted = %lu\n",
|
||||
i, le16_to_cpu(gdp->bg_free_inodes_count), x);
|
||||
bitmap_count += x;
|
||||
}
|
||||
brelse(bitmap_bh);
|
||||
printk("ext3_count_free_inodes: stored = %u, computed = %lu, %lu\n",
|
||||
le32_to_cpu(es->s_free_inodes_count), desc_count, bitmap_count);
|
||||
return desc_count;
|
||||
#else
|
||||
desc_count = 0;
|
||||
for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
|
||||
gdp = ext3_get_group_desc (sb, i, NULL);
|
||||
if (!gdp)
|
||||
continue;
|
||||
desc_count += le16_to_cpu(gdp->bg_free_inodes_count);
|
||||
cond_resched();
|
||||
}
|
||||
return desc_count;
|
||||
#endif
|
||||
}
|
||||
|
||||
/* Called at mount-time, super-block is locked */
|
||||
unsigned long ext3_count_dirs (struct super_block * sb)
|
||||
{
|
||||
unsigned long count = 0;
|
||||
int i;
|
||||
|
||||
for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
|
||||
struct ext3_group_desc *gdp = ext3_get_group_desc (sb, i, NULL);
|
||||
if (!gdp)
|
||||
continue;
|
||||
count += le16_to_cpu(gdp->bg_used_dirs_count);
|
||||
}
|
||||
return count;
|
||||
}
|
||||
|
3574
fs/ext3/inode.c
3574
fs/ext3/inode.c
File diff suppressed because it is too large
Load Diff
327
fs/ext3/ioctl.c
327
fs/ext3/ioctl.c
|
@ -1,327 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/ioctl.c
|
||||
*
|
||||
* Copyright (C) 1993, 1994, 1995
|
||||
* Remy Card (card@masi.ibp.fr)
|
||||
* Laboratoire MASI - Institut Blaise Pascal
|
||||
* Universite Pierre et Marie Curie (Paris VI)
|
||||
*/
|
||||
|
||||
#include <linux/mount.h>
|
||||
#include <linux/compat.h>
|
||||
#include <asm/uaccess.h>
|
||||
#include "ext3.h"
|
||||
|
||||
long ext3_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
|
||||
{
|
||||
struct inode *inode = file_inode(filp);
|
||||
struct ext3_inode_info *ei = EXT3_I(inode);
|
||||
unsigned int flags;
|
||||
unsigned short rsv_window_size;
|
||||
|
||||
ext3_debug ("cmd = %u, arg = %lu\n", cmd, arg);
|
||||
|
||||
switch (cmd) {
|
||||
case EXT3_IOC_GETFLAGS:
|
||||
ext3_get_inode_flags(ei);
|
||||
flags = ei->i_flags & EXT3_FL_USER_VISIBLE;
|
||||
return put_user(flags, (int __user *) arg);
|
||||
case EXT3_IOC_SETFLAGS: {
|
||||
handle_t *handle = NULL;
|
||||
int err;
|
||||
struct ext3_iloc iloc;
|
||||
unsigned int oldflags;
|
||||
unsigned int jflag;
|
||||
|
||||
if (!inode_owner_or_capable(inode))
|
||||
return -EACCES;
|
||||
|
||||
if (get_user(flags, (int __user *) arg))
|
||||
return -EFAULT;
|
||||
|
||||
err = mnt_want_write_file(filp);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
flags = ext3_mask_flags(inode->i_mode, flags);
|
||||
|
||||
mutex_lock(&inode->i_mutex);
|
||||
|
||||
/* Is it quota file? Do not allow user to mess with it */
|
||||
err = -EPERM;
|
||||
if (IS_NOQUOTA(inode))
|
||||
goto flags_out;
|
||||
|
||||
oldflags = ei->i_flags;
|
||||
|
||||
/* The JOURNAL_DATA flag is modifiable only by root */
|
||||
jflag = flags & EXT3_JOURNAL_DATA_FL;
|
||||
|
||||
/*
|
||||
* The IMMUTABLE and APPEND_ONLY flags can only be changed by
|
||||
* the relevant capability.
|
||||
*
|
||||
* This test looks nicer. Thanks to Pauline Middelink
|
||||
*/
|
||||
if ((flags ^ oldflags) & (EXT3_APPEND_FL | EXT3_IMMUTABLE_FL)) {
|
||||
if (!capable(CAP_LINUX_IMMUTABLE))
|
||||
goto flags_out;
|
||||
}
|
||||
|
||||
/*
|
||||
* The JOURNAL_DATA flag can only be changed by
|
||||
* the relevant capability.
|
||||
*/
|
||||
if ((jflag ^ oldflags) & (EXT3_JOURNAL_DATA_FL)) {
|
||||
if (!capable(CAP_SYS_RESOURCE))
|
||||
goto flags_out;
|
||||
}
|
||||
|
||||
handle = ext3_journal_start(inode, 1);
|
||||
if (IS_ERR(handle)) {
|
||||
err = PTR_ERR(handle);
|
||||
goto flags_out;
|
||||
}
|
||||
if (IS_SYNC(inode))
|
||||
handle->h_sync = 1;
|
||||
err = ext3_reserve_inode_write(handle, inode, &iloc);
|
||||
if (err)
|
||||
goto flags_err;
|
||||
|
||||
flags = flags & EXT3_FL_USER_MODIFIABLE;
|
||||
flags |= oldflags & ~EXT3_FL_USER_MODIFIABLE;
|
||||
ei->i_flags = flags;
|
||||
|
||||
ext3_set_inode_flags(inode);
|
||||
inode->i_ctime = CURRENT_TIME_SEC;
|
||||
|
||||
err = ext3_mark_iloc_dirty(handle, inode, &iloc);
|
||||
flags_err:
|
||||
ext3_journal_stop(handle);
|
||||
if (err)
|
||||
goto flags_out;
|
||||
|
||||
if ((jflag ^ oldflags) & (EXT3_JOURNAL_DATA_FL))
|
||||
err = ext3_change_inode_journal_flag(inode, jflag);
|
||||
flags_out:
|
||||
mutex_unlock(&inode->i_mutex);
|
||||
mnt_drop_write_file(filp);
|
||||
return err;
|
||||
}
|
||||
case EXT3_IOC_GETVERSION:
|
||||
case EXT3_IOC_GETVERSION_OLD:
|
||||
return put_user(inode->i_generation, (int __user *) arg);
|
||||
case EXT3_IOC_SETVERSION:
|
||||
case EXT3_IOC_SETVERSION_OLD: {
|
||||
handle_t *handle;
|
||||
struct ext3_iloc iloc;
|
||||
__u32 generation;
|
||||
int err;
|
||||
|
||||
if (!inode_owner_or_capable(inode))
|
||||
return -EPERM;
|
||||
|
||||
err = mnt_want_write_file(filp);
|
||||
if (err)
|
||||
return err;
|
||||
if (get_user(generation, (int __user *) arg)) {
|
||||
err = -EFAULT;
|
||||
goto setversion_out;
|
||||
}
|
||||
|
||||
mutex_lock(&inode->i_mutex);
|
||||
handle = ext3_journal_start(inode, 1);
|
||||
if (IS_ERR(handle)) {
|
||||
err = PTR_ERR(handle);
|
||||
goto unlock_out;
|
||||
}
|
||||
err = ext3_reserve_inode_write(handle, inode, &iloc);
|
||||
if (err == 0) {
|
||||
inode->i_ctime = CURRENT_TIME_SEC;
|
||||
inode->i_generation = generation;
|
||||
err = ext3_mark_iloc_dirty(handle, inode, &iloc);
|
||||
}
|
||||
ext3_journal_stop(handle);
|
||||
|
||||
unlock_out:
|
||||
mutex_unlock(&inode->i_mutex);
|
||||
setversion_out:
|
||||
mnt_drop_write_file(filp);
|
||||
return err;
|
||||
}
|
||||
case EXT3_IOC_GETRSVSZ:
|
||||
if (test_opt(inode->i_sb, RESERVATION)
|
||||
&& S_ISREG(inode->i_mode)
|
||||
&& ei->i_block_alloc_info) {
|
||||
rsv_window_size = ei->i_block_alloc_info->rsv_window_node.rsv_goal_size;
|
||||
return put_user(rsv_window_size, (int __user *)arg);
|
||||
}
|
||||
return -ENOTTY;
|
||||
case EXT3_IOC_SETRSVSZ: {
|
||||
int err;
|
||||
|
||||
if (!test_opt(inode->i_sb, RESERVATION) ||!S_ISREG(inode->i_mode))
|
||||
return -ENOTTY;
|
||||
|
||||
err = mnt_want_write_file(filp);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
if (!inode_owner_or_capable(inode)) {
|
||||
err = -EACCES;
|
||||
goto setrsvsz_out;
|
||||
}
|
||||
|
||||
if (get_user(rsv_window_size, (int __user *)arg)) {
|
||||
err = -EFAULT;
|
||||
goto setrsvsz_out;
|
||||
}
|
||||
|
||||
if (rsv_window_size > EXT3_MAX_RESERVE_BLOCKS)
|
||||
rsv_window_size = EXT3_MAX_RESERVE_BLOCKS;
|
||||
|
||||
/*
|
||||
* need to allocate reservation structure for this inode
|
||||
* before set the window size
|
||||
*/
|
||||
mutex_lock(&ei->truncate_mutex);
|
||||
if (!ei->i_block_alloc_info)
|
||||
ext3_init_block_alloc_info(inode);
|
||||
|
||||
if (ei->i_block_alloc_info){
|
||||
struct ext3_reserve_window_node *rsv = &ei->i_block_alloc_info->rsv_window_node;
|
||||
rsv->rsv_goal_size = rsv_window_size;
|
||||
}
|
||||
mutex_unlock(&ei->truncate_mutex);
|
||||
setrsvsz_out:
|
||||
mnt_drop_write_file(filp);
|
||||
return err;
|
||||
}
|
||||
case EXT3_IOC_GROUP_EXTEND: {
|
||||
ext3_fsblk_t n_blocks_count;
|
||||
struct super_block *sb = inode->i_sb;
|
||||
int err, err2;
|
||||
|
||||
if (!capable(CAP_SYS_RESOURCE))
|
||||
return -EPERM;
|
||||
|
||||
err = mnt_want_write_file(filp);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
if (get_user(n_blocks_count, (__u32 __user *)arg)) {
|
||||
err = -EFAULT;
|
||||
goto group_extend_out;
|
||||
}
|
||||
err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count);
|
||||
journal_lock_updates(EXT3_SB(sb)->s_journal);
|
||||
err2 = journal_flush(EXT3_SB(sb)->s_journal);
|
||||
journal_unlock_updates(EXT3_SB(sb)->s_journal);
|
||||
if (err == 0)
|
||||
err = err2;
|
||||
group_extend_out:
|
||||
mnt_drop_write_file(filp);
|
||||
return err;
|
||||
}
|
||||
case EXT3_IOC_GROUP_ADD: {
|
||||
struct ext3_new_group_data input;
|
||||
struct super_block *sb = inode->i_sb;
|
||||
int err, err2;
|
||||
|
||||
if (!capable(CAP_SYS_RESOURCE))
|
||||
return -EPERM;
|
||||
|
||||
err = mnt_want_write_file(filp);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
if (copy_from_user(&input, (struct ext3_new_group_input __user *)arg,
|
||||
sizeof(input))) {
|
||||
err = -EFAULT;
|
||||
goto group_add_out;
|
||||
}
|
||||
|
||||
err = ext3_group_add(sb, &input);
|
||||
journal_lock_updates(EXT3_SB(sb)->s_journal);
|
||||
err2 = journal_flush(EXT3_SB(sb)->s_journal);
|
||||
journal_unlock_updates(EXT3_SB(sb)->s_journal);
|
||||
if (err == 0)
|
||||
err = err2;
|
||||
group_add_out:
|
||||
mnt_drop_write_file(filp);
|
||||
return err;
|
||||
}
|
||||
case FITRIM: {
|
||||
|
||||
struct super_block *sb = inode->i_sb;
|
||||
struct fstrim_range range;
|
||||
int ret = 0;
|
||||
|
||||
if (!capable(CAP_SYS_ADMIN))
|
||||
return -EPERM;
|
||||
|
||||
if (copy_from_user(&range, (struct fstrim_range __user *)arg,
|
||||
sizeof(range)))
|
||||
return -EFAULT;
|
||||
|
||||
ret = ext3_trim_fs(sb, &range);
|
||||
if (ret < 0)
|
||||
return ret;
|
||||
|
||||
if (copy_to_user((struct fstrim_range __user *)arg, &range,
|
||||
sizeof(range)))
|
||||
return -EFAULT;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
default:
|
||||
return -ENOTTY;
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef CONFIG_COMPAT
|
||||
long ext3_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
|
||||
{
|
||||
/* These are just misnamed, they actually get/put from/to user an int */
|
||||
switch (cmd) {
|
||||
case EXT3_IOC32_GETFLAGS:
|
||||
cmd = EXT3_IOC_GETFLAGS;
|
||||
break;
|
||||
case EXT3_IOC32_SETFLAGS:
|
||||
cmd = EXT3_IOC_SETFLAGS;
|
||||
break;
|
||||
case EXT3_IOC32_GETVERSION:
|
||||
cmd = EXT3_IOC_GETVERSION;
|
||||
break;
|
||||
case EXT3_IOC32_SETVERSION:
|
||||
cmd = EXT3_IOC_SETVERSION;
|
||||
break;
|
||||
case EXT3_IOC32_GROUP_EXTEND:
|
||||
cmd = EXT3_IOC_GROUP_EXTEND;
|
||||
break;
|
||||
case EXT3_IOC32_GETVERSION_OLD:
|
||||
cmd = EXT3_IOC_GETVERSION_OLD;
|
||||
break;
|
||||
case EXT3_IOC32_SETVERSION_OLD:
|
||||
cmd = EXT3_IOC_SETVERSION_OLD;
|
||||
break;
|
||||
#ifdef CONFIG_JBD_DEBUG
|
||||
case EXT3_IOC32_WAIT_FOR_READONLY:
|
||||
cmd = EXT3_IOC_WAIT_FOR_READONLY;
|
||||
break;
|
||||
#endif
|
||||
case EXT3_IOC32_GETRSVSZ:
|
||||
cmd = EXT3_IOC_GETRSVSZ;
|
||||
break;
|
||||
case EXT3_IOC32_SETRSVSZ:
|
||||
cmd = EXT3_IOC_SETRSVSZ;
|
||||
break;
|
||||
case EXT3_IOC_GROUP_ADD:
|
||||
break;
|
||||
default:
|
||||
return -ENOIOCTLCMD;
|
||||
}
|
||||
return ext3_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
|
||||
}
|
||||
#endif
|
2586
fs/ext3/namei.c
2586
fs/ext3/namei.c
File diff suppressed because it is too large
Load Diff
|
@ -1,27 +0,0 @@
|
|||
/* linux/fs/ext3/namei.h
|
||||
*
|
||||
* Copyright (C) 2005 Simtec Electronics
|
||||
* Ben Dooks <ben@simtec.co.uk>
|
||||
*
|
||||
*/
|
||||
|
||||
extern struct dentry *ext3_get_parent(struct dentry *child);
|
||||
|
||||
static inline struct buffer_head *ext3_dir_bread(handle_t *handle,
|
||||
struct inode *inode,
|
||||
int block, int create,
|
||||
int *err)
|
||||
{
|
||||
struct buffer_head *bh;
|
||||
|
||||
bh = ext3_bread(handle, inode, block, create, err);
|
||||
|
||||
if (!bh && !(*err)) {
|
||||
*err = -EIO;
|
||||
ext3_error(inode->i_sb, __func__,
|
||||
"Directory hole detected on inode %lu\n",
|
||||
inode->i_ino);
|
||||
return NULL;
|
||||
}
|
||||
return bh;
|
||||
}
|
1117
fs/ext3/resize.c
1117
fs/ext3/resize.c
File diff suppressed because it is too large
Load Diff
3165
fs/ext3/super.c
3165
fs/ext3/super.c
File diff suppressed because it is too large
Load Diff
|
@ -1,46 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/symlink.c
|
||||
*
|
||||
* Only fast symlinks left here - the rest is done by generic code. AV, 1999
|
||||
*
|
||||
* Copyright (C) 1992, 1993, 1994, 1995
|
||||
* Remy Card (card@masi.ibp.fr)
|
||||
* Laboratoire MASI - Institut Blaise Pascal
|
||||
* Universite Pierre et Marie Curie (Paris VI)
|
||||
*
|
||||
* from
|
||||
*
|
||||
* linux/fs/minix/symlink.c
|
||||
*
|
||||
* Copyright (C) 1991, 1992 Linus Torvalds
|
||||
*
|
||||
* ext3 symlink handling code
|
||||
*/
|
||||
|
||||
#include "ext3.h"
|
||||
#include "xattr.h"
|
||||
|
||||
const struct inode_operations ext3_symlink_inode_operations = {
|
||||
.readlink = generic_readlink,
|
||||
.follow_link = page_follow_link_light,
|
||||
.put_link = page_put_link,
|
||||
.setattr = ext3_setattr,
|
||||
#ifdef CONFIG_EXT3_FS_XATTR
|
||||
.setxattr = generic_setxattr,
|
||||
.getxattr = generic_getxattr,
|
||||
.listxattr = ext3_listxattr,
|
||||
.removexattr = generic_removexattr,
|
||||
#endif
|
||||
};
|
||||
|
||||
const struct inode_operations ext3_fast_symlink_inode_operations = {
|
||||
.readlink = generic_readlink,
|
||||
.follow_link = simple_follow_link,
|
||||
.setattr = ext3_setattr,
|
||||
#ifdef CONFIG_EXT3_FS_XATTR
|
||||
.setxattr = generic_setxattr,
|
||||
.getxattr = generic_getxattr,
|
||||
.listxattr = ext3_listxattr,
|
||||
.removexattr = generic_removexattr,
|
||||
#endif
|
||||
};
|
1330
fs/ext3/xattr.c
1330
fs/ext3/xattr.c
File diff suppressed because it is too large
Load Diff
136
fs/ext3/xattr.h
136
fs/ext3/xattr.h
|
@ -1,136 +0,0 @@
|
|||
/*
|
||||
File: fs/ext3/xattr.h
|
||||
|
||||
On-disk format of extended attributes for the ext3 filesystem.
|
||||
|
||||
(C) 2001 Andreas Gruenbacher, <a.gruenbacher@computer.org>
|
||||
*/
|
||||
|
||||
#include <linux/xattr.h>
|
||||
|
||||
/* Magic value in attribute blocks */
|
||||
#define EXT3_XATTR_MAGIC 0xEA020000
|
||||
|
||||
/* Maximum number of references to one attribute block */
|
||||
#define EXT3_XATTR_REFCOUNT_MAX 1024
|
||||
|
||||
/* Name indexes */
|
||||
#define EXT3_XATTR_INDEX_USER 1
|
||||
#define EXT3_XATTR_INDEX_POSIX_ACL_ACCESS 2
|
||||
#define EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT 3
|
||||
#define EXT3_XATTR_INDEX_TRUSTED 4
|
||||
#define EXT3_XATTR_INDEX_LUSTRE 5
|
||||
#define EXT3_XATTR_INDEX_SECURITY 6
|
||||
|
||||
struct ext3_xattr_header {
|
||||
__le32 h_magic; /* magic number for identification */
|
||||
__le32 h_refcount; /* reference count */
|
||||
__le32 h_blocks; /* number of disk blocks used */
|
||||
__le32 h_hash; /* hash value of all attributes */
|
||||
__u32 h_reserved[4]; /* zero right now */
|
||||
};
|
||||
|
||||
struct ext3_xattr_ibody_header {
|
||||
__le32 h_magic; /* magic number for identification */
|
||||
};
|
||||
|
||||
struct ext3_xattr_entry {
|
||||
__u8 e_name_len; /* length of name */
|
||||
__u8 e_name_index; /* attribute name index */
|
||||
__le16 e_value_offs; /* offset in disk block of value */
|
||||
__le32 e_value_block; /* disk block attribute is stored on (n/i) */
|
||||
__le32 e_value_size; /* size of attribute value */
|
||||
__le32 e_hash; /* hash value of name and value */
|
||||
char e_name[0]; /* attribute name */
|
||||
};
|
||||
|
||||
#define EXT3_XATTR_PAD_BITS 2
|
||||
#define EXT3_XATTR_PAD (1<<EXT3_XATTR_PAD_BITS)
|
||||
#define EXT3_XATTR_ROUND (EXT3_XATTR_PAD-1)
|
||||
#define EXT3_XATTR_LEN(name_len) \
|
||||
(((name_len) + EXT3_XATTR_ROUND + \
|
||||
sizeof(struct ext3_xattr_entry)) & ~EXT3_XATTR_ROUND)
|
||||
#define EXT3_XATTR_NEXT(entry) \
|
||||
( (struct ext3_xattr_entry *)( \
|
||||
(char *)(entry) + EXT3_XATTR_LEN((entry)->e_name_len)) )
|
||||
#define EXT3_XATTR_SIZE(size) \
|
||||
(((size) + EXT3_XATTR_ROUND) & ~EXT3_XATTR_ROUND)
|
||||
|
||||
# ifdef CONFIG_EXT3_FS_XATTR
|
||||
|
||||
extern const struct xattr_handler ext3_xattr_user_handler;
|
||||
extern const struct xattr_handler ext3_xattr_trusted_handler;
|
||||
extern const struct xattr_handler ext3_xattr_security_handler;
|
||||
|
||||
extern ssize_t ext3_listxattr(struct dentry *, char *, size_t);
|
||||
|
||||
extern int ext3_xattr_get(struct inode *, int, const char *, void *, size_t);
|
||||
extern int ext3_xattr_set(struct inode *, int, const char *, const void *, size_t, int);
|
||||
extern int ext3_xattr_set_handle(handle_t *, struct inode *, int, const char *, const void *, size_t, int);
|
||||
|
||||
extern void ext3_xattr_delete_inode(handle_t *, struct inode *);
|
||||
extern void ext3_xattr_put_super(struct super_block *);
|
||||
|
||||
extern int init_ext3_xattr(void);
|
||||
extern void exit_ext3_xattr(void);
|
||||
|
||||
extern const struct xattr_handler *ext3_xattr_handlers[];
|
||||
|
||||
# else /* CONFIG_EXT3_FS_XATTR */
|
||||
|
||||
static inline int
|
||||
ext3_xattr_get(struct inode *inode, int name_index, const char *name,
|
||||
void *buffer, size_t size, int flags)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
static inline int
|
||||
ext3_xattr_set(struct inode *inode, int name_index, const char *name,
|
||||
const void *value, size_t size, int flags)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
static inline int
|
||||
ext3_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
|
||||
const char *name, const void *value, size_t size, int flags)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
static inline void
|
||||
ext3_xattr_delete_inode(handle_t *handle, struct inode *inode)
|
||||
{
|
||||
}
|
||||
|
||||
static inline void
|
||||
ext3_xattr_put_super(struct super_block *sb)
|
||||
{
|
||||
}
|
||||
|
||||
static inline int
|
||||
init_ext3_xattr(void)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline void
|
||||
exit_ext3_xattr(void)
|
||||
{
|
||||
}
|
||||
|
||||
#define ext3_xattr_handlers NULL
|
||||
|
||||
# endif /* CONFIG_EXT3_FS_XATTR */
|
||||
|
||||
#ifdef CONFIG_EXT3_FS_SECURITY
|
||||
extern int ext3_init_security(handle_t *handle, struct inode *inode,
|
||||
struct inode *dir, const struct qstr *qstr);
|
||||
#else
|
||||
static inline int ext3_init_security(handle_t *handle, struct inode *inode,
|
||||
struct inode *dir, const struct qstr *qstr)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
#endif
|
|
@ -1,78 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/xattr_security.c
|
||||
* Handler for storing security labels as extended attributes.
|
||||
*/
|
||||
|
||||
#include <linux/security.h>
|
||||
#include "ext3.h"
|
||||
#include "xattr.h"
|
||||
|
||||
static size_t
|
||||
ext3_xattr_security_list(struct dentry *dentry, char *list, size_t list_size,
|
||||
const char *name, size_t name_len, int type)
|
||||
{
|
||||
const size_t prefix_len = XATTR_SECURITY_PREFIX_LEN;
|
||||
const size_t total_len = prefix_len + name_len + 1;
|
||||
|
||||
|
||||
if (list && total_len <= list_size) {
|
||||
memcpy(list, XATTR_SECURITY_PREFIX, prefix_len);
|
||||
memcpy(list+prefix_len, name, name_len);
|
||||
list[prefix_len + name_len] = '\0';
|
||||
}
|
||||
return total_len;
|
||||
}
|
||||
|
||||
static int
|
||||
ext3_xattr_security_get(struct dentry *dentry, const char *name,
|
||||
void *buffer, size_t size, int type)
|
||||
{
|
||||
if (strcmp(name, "") == 0)
|
||||
return -EINVAL;
|
||||
return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_SECURITY,
|
||||
name, buffer, size);
|
||||
}
|
||||
|
||||
static int
|
||||
ext3_xattr_security_set(struct dentry *dentry, const char *name,
|
||||
const void *value, size_t size, int flags, int type)
|
||||
{
|
||||
if (strcmp(name, "") == 0)
|
||||
return -EINVAL;
|
||||
return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_SECURITY,
|
||||
name, value, size, flags);
|
||||
}
|
||||
|
||||
static int ext3_initxattrs(struct inode *inode,
|
||||
const struct xattr *xattr_array,
|
||||
void *fs_info)
|
||||
{
|
||||
const struct xattr *xattr;
|
||||
handle_t *handle = fs_info;
|
||||
int err = 0;
|
||||
|
||||
for (xattr = xattr_array; xattr->name != NULL; xattr++) {
|
||||
err = ext3_xattr_set_handle(handle, inode,
|
||||
EXT3_XATTR_INDEX_SECURITY,
|
||||
xattr->name, xattr->value,
|
||||
xattr->value_len, 0);
|
||||
if (err < 0)
|
||||
break;
|
||||
}
|
||||
return err;
|
||||
}
|
||||
|
||||
int
|
||||
ext3_init_security(handle_t *handle, struct inode *inode, struct inode *dir,
|
||||
const struct qstr *qstr)
|
||||
{
|
||||
return security_inode_init_security(inode, dir, qstr,
|
||||
&ext3_initxattrs, handle);
|
||||
}
|
||||
|
||||
const struct xattr_handler ext3_xattr_security_handler = {
|
||||
.prefix = XATTR_SECURITY_PREFIX,
|
||||
.list = ext3_xattr_security_list,
|
||||
.get = ext3_xattr_security_get,
|
||||
.set = ext3_xattr_security_set,
|
||||
};
|
|
@ -1,54 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/xattr_trusted.c
|
||||
* Handler for trusted extended attributes.
|
||||
*
|
||||
* Copyright (C) 2003 by Andreas Gruenbacher, <a.gruenbacher@computer.org>
|
||||
*/
|
||||
|
||||
#include "ext3.h"
|
||||
#include "xattr.h"
|
||||
|
||||
static size_t
|
||||
ext3_xattr_trusted_list(struct dentry *dentry, char *list, size_t list_size,
|
||||
const char *name, size_t name_len, int type)
|
||||
{
|
||||
const size_t prefix_len = XATTR_TRUSTED_PREFIX_LEN;
|
||||
const size_t total_len = prefix_len + name_len + 1;
|
||||
|
||||
if (!capable(CAP_SYS_ADMIN))
|
||||
return 0;
|
||||
|
||||
if (list && total_len <= list_size) {
|
||||
memcpy(list, XATTR_TRUSTED_PREFIX, prefix_len);
|
||||
memcpy(list+prefix_len, name, name_len);
|
||||
list[prefix_len + name_len] = '\0';
|
||||
}
|
||||
return total_len;
|
||||
}
|
||||
|
||||
static int
|
||||
ext3_xattr_trusted_get(struct dentry *dentry, const char *name,
|
||||
void *buffer, size_t size, int type)
|
||||
{
|
||||
if (strcmp(name, "") == 0)
|
||||
return -EINVAL;
|
||||
return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_TRUSTED,
|
||||
name, buffer, size);
|
||||
}
|
||||
|
||||
static int
|
||||
ext3_xattr_trusted_set(struct dentry *dentry, const char *name,
|
||||
const void *value, size_t size, int flags, int type)
|
||||
{
|
||||
if (strcmp(name, "") == 0)
|
||||
return -EINVAL;
|
||||
return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_TRUSTED, name,
|
||||
value, size, flags);
|
||||
}
|
||||
|
||||
const struct xattr_handler ext3_xattr_trusted_handler = {
|
||||
.prefix = XATTR_TRUSTED_PREFIX,
|
||||
.list = ext3_xattr_trusted_list,
|
||||
.get = ext3_xattr_trusted_get,
|
||||
.set = ext3_xattr_trusted_set,
|
||||
};
|
|
@ -1,58 +0,0 @@
|
|||
/*
|
||||
* linux/fs/ext3/xattr_user.c
|
||||
* Handler for extended user attributes.
|
||||
*
|
||||
* Copyright (C) 2001 by Andreas Gruenbacher, <a.gruenbacher@computer.org>
|
||||
*/
|
||||
|
||||
#include "ext3.h"
|
||||
#include "xattr.h"
|
||||
|
||||
static size_t
|
||||
ext3_xattr_user_list(struct dentry *dentry, char *list, size_t list_size,
|
||||
const char *name, size_t name_len, int type)
|
||||
{
|
||||
const size_t prefix_len = XATTR_USER_PREFIX_LEN;
|
||||
const size_t total_len = prefix_len + name_len + 1;
|
||||
|
||||
if (!test_opt(dentry->d_sb, XATTR_USER))
|
||||
return 0;
|
||||
|
||||
if (list && total_len <= list_size) {
|
||||
memcpy(list, XATTR_USER_PREFIX, prefix_len);
|
||||
memcpy(list+prefix_len, name, name_len);
|
||||
list[prefix_len + name_len] = '\0';
|
||||
}
|
||||
return total_len;
|
||||
}
|
||||
|
||||
static int
|
||||
ext3_xattr_user_get(struct dentry *dentry, const char *name, void *buffer,
|
||||
size_t size, int type)
|
||||
{
|
||||
if (strcmp(name, "") == 0)
|
||||
return -EINVAL;
|
||||
if (!test_opt(dentry->d_sb, XATTR_USER))
|
||||
return -EOPNOTSUPP;
|
||||
return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_USER,
|
||||
name, buffer, size);
|
||||
}
|
||||
|
||||
static int
|
||||
ext3_xattr_user_set(struct dentry *dentry, const char *name,
|
||||
const void *value, size_t size, int flags, int type)
|
||||
{
|
||||
if (strcmp(name, "") == 0)
|
||||
return -EINVAL;
|
||||
if (!test_opt(dentry->d_sb, XATTR_USER))
|
||||
return -EOPNOTSUPP;
|
||||
return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_USER,
|
||||
name, value, size, flags);
|
||||
}
|
||||
|
||||
const struct xattr_handler ext3_xattr_user_handler = {
|
||||
.prefix = XATTR_USER_PREFIX,
|
||||
.list = ext3_xattr_user_list,
|
||||
.get = ext3_xattr_user_get,
|
||||
.set = ext3_xattr_user_set,
|
||||
};
|
|
@ -1,5 +1,38 @@
|
|||
# Ext3 configs are here for backward compatibility with old configs which may
|
||||
# have EXT3_FS set but not EXT4_FS set and thus would result in non-bootable
|
||||
# kernels after the removal of ext3 driver.
|
||||
config EXT3_FS
|
||||
tristate "The Extended 3 (ext3) filesystem"
|
||||
# These must match EXT4_FS selects...
|
||||
select EXT4_FS
|
||||
select JBD2
|
||||
select CRC16
|
||||
select CRYPTO
|
||||
select CRYPTO_CRC32C
|
||||
help
|
||||
This config option is here only for backward compatibility. ext3
|
||||
filesystem is now handled by the ext4 driver.
|
||||
|
||||
config EXT3_FS_POSIX_ACL
|
||||
bool "Ext3 POSIX Access Control Lists"
|
||||
depends on EXT3_FS
|
||||
select EXT4_FS_POSIX_ACL
|
||||
select FS_POSIX_ACL
|
||||
help
|
||||
This config option is here only for backward compatibility. ext3
|
||||
filesystem is now handled by the ext4 driver.
|
||||
|
||||
config EXT3_FS_SECURITY
|
||||
bool "Ext3 Security Labels"
|
||||
depends on EXT3_FS
|
||||
select EXT4_FS_SECURITY
|
||||
help
|
||||
This config option is here only for backward compatibility. ext3
|
||||
filesystem is now handled by the ext4 driver.
|
||||
|
||||
config EXT4_FS
|
||||
tristate "The Extended 4 (ext4) filesystem"
|
||||
# Please update EXT3_FS selects when changing these
|
||||
select JBD2
|
||||
select CRC16
|
||||
select CRYPTO
|
||||
|
@ -28,14 +61,14 @@ config EXT4_FS
|
|||
|
||||
If unsure, say N.
|
||||
|
||||
config EXT4_USE_FOR_EXT23
|
||||
config EXT4_USE_FOR_EXT2
|
||||
bool "Use ext4 for ext2/ext3 file systems"
|
||||
depends on EXT4_FS
|
||||
depends on EXT3_FS=n || EXT2_FS=n
|
||||
depends on EXT2_FS=n
|
||||
default y
|
||||
help
|
||||
Allow the ext4 file system driver code to be used for ext2 or
|
||||
ext3 file system mounts. This allows users to reduce their
|
||||
Allow the ext4 file system driver code to be used for ext2
|
||||
file system mounts. This allows users to reduce their
|
||||
compiled kernel size by using one file system driver for
|
||||
ext2, ext3, and ext4 file systems.
|
||||
|
||||
|
|
|
@ -84,7 +84,7 @@ static void ext4_unregister_li_request(struct super_block *sb);
|
|||
static void ext4_clear_request_list(void);
|
||||
static int ext4_reserve_clusters(struct ext4_sb_info *, ext4_fsblk_t);
|
||||
|
||||
#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
|
||||
#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT2)
|
||||
static struct file_system_type ext2_fs_type = {
|
||||
.owner = THIS_MODULE,
|
||||
.name = "ext2",
|
||||
|
@ -100,7 +100,6 @@ MODULE_ALIAS("ext2");
|
|||
#endif
|
||||
|
||||
|
||||
#if !defined(CONFIG_EXT3_FS) && !defined(CONFIG_EXT3_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
|
||||
static struct file_system_type ext3_fs_type = {
|
||||
.owner = THIS_MODULE,
|
||||
.name = "ext3",
|
||||
|
@ -111,9 +110,6 @@ static struct file_system_type ext3_fs_type = {
|
|||
MODULE_ALIAS_FS("ext3");
|
||||
MODULE_ALIAS("ext3");
|
||||
#define IS_EXT3_SB(sb) ((sb)->s_bdev->bd_holder == &ext3_fs_type)
|
||||
#else
|
||||
#define IS_EXT3_SB(sb) (0)
|
||||
#endif
|
||||
|
||||
static int ext4_verify_csum_type(struct super_block *sb,
|
||||
struct ext4_super_block *es)
|
||||
|
@ -5500,7 +5496,7 @@ static struct dentry *ext4_mount(struct file_system_type *fs_type, int flags,
|
|||
return mount_bdev(fs_type, flags, dev_name, data, ext4_fill_super);
|
||||
}
|
||||
|
||||
#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
|
||||
#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT2)
|
||||
static inline void register_as_ext2(void)
|
||||
{
|
||||
int err = register_filesystem(&ext2_fs_type);
|
||||
|
@ -5530,7 +5526,6 @@ static inline void unregister_as_ext2(void) { }
|
|||
static inline int ext2_feature_set_ok(struct super_block *sb) { return 0; }
|
||||
#endif
|
||||
|
||||
#if !defined(CONFIG_EXT3_FS) && !defined(CONFIG_EXT3_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
|
||||
static inline void register_as_ext3(void)
|
||||
{
|
||||
int err = register_filesystem(&ext3_fs_type);
|
||||
|
@ -5556,11 +5551,6 @@ static inline int ext3_feature_set_ok(struct super_block *sb)
|
|||
return 0;
|
||||
return 1;
|
||||
}
|
||||
#else
|
||||
static inline void register_as_ext3(void) { }
|
||||
static inline void unregister_as_ext3(void) { }
|
||||
static inline int ext3_feature_set_ok(struct super_block *sb) { return 0; }
|
||||
#endif
|
||||
|
||||
static struct file_system_type ext4_fs_type = {
|
||||
.owner = THIS_MODULE,
|
||||
|
|
|
@ -1,30 +0,0 @@
|
|||
config JBD
|
||||
tristate
|
||||
help
|
||||
This is a generic journalling layer for block devices. It is
|
||||
currently used by the ext3 file system, but it could also be
|
||||
used to add journal support to other file systems or block
|
||||
devices such as RAID or LVM.
|
||||
|
||||
If you are using the ext3 file system, you need to say Y here.
|
||||
If you are not using ext3 then you will probably want to say N.
|
||||
|
||||
To compile this device as a module, choose M here: the module will be
|
||||
called jbd. If you are compiling ext3 into the kernel, you
|
||||
cannot compile this code as a module.
|
||||
|
||||
config JBD_DEBUG
|
||||
bool "JBD (ext3) debugging support"
|
||||
depends on JBD && DEBUG_FS
|
||||
help
|
||||
If you are using the ext3 journaled file system (or potentially any
|
||||
other file system/device using JBD), this option allows you to
|
||||
enable debugging output while the system is running, in order to
|
||||
help track down any problems you are having. By default the
|
||||
debugging output will be turned off.
|
||||
|
||||
If you select Y here, then you will be able to turn on debugging
|
||||
with "echo N > /sys/kernel/debug/jbd/jbd-debug", where N is a
|
||||
number between 1 and 5, the higher the number, the more debugging
|
||||
output is generated. To turn debugging off again, do
|
||||
"echo 0 > /sys/kernel/debug/jbd/jbd-debug".
|
|
@ -1,7 +0,0 @@
|
|||
#
|
||||
# Makefile for the linux journaling routines.
|
||||
#
|
||||
|
||||
obj-$(CONFIG_JBD) += jbd.o
|
||||
|
||||
jbd-objs := transaction.o commit.o recovery.o checkpoint.o revoke.o journal.o
|
|
@ -1,782 +0,0 @@
|
|||
/*
|
||||
* linux/fs/jbd/checkpoint.c
|
||||
*
|
||||
* Written by Stephen C. Tweedie <sct@redhat.com>, 1999
|
||||
*
|
||||
* Copyright 1999 Red Hat Software --- All Rights Reserved
|
||||
*
|
||||
* This file is part of the Linux kernel and is made available under
|
||||
* the terms of the GNU General Public License, version 2, or at your
|
||||
* option, any later version, incorporated herein by reference.
|
||||
*
|
||||
* Checkpoint routines for the generic filesystem journaling code.
|
||||
* Part of the ext2fs journaling system.
|
||||
*
|
||||
* Checkpointing is the process of ensuring that a section of the log is
|
||||
* committed fully to disk, so that that portion of the log can be
|
||||
* reused.
|
||||
*/
|
||||
|
||||
#include <linux/time.h>
|
||||
#include <linux/fs.h>
|
||||
#include <linux/jbd.h>
|
||||
#include <linux/errno.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <trace/events/jbd.h>
|
||||
|
||||
/*
|
||||
* Unlink a buffer from a transaction checkpoint list.
|
||||
*
|
||||
* Called with j_list_lock held.
|
||||
*/
|
||||
static inline void __buffer_unlink_first(struct journal_head *jh)
|
||||
{
|
||||
transaction_t *transaction = jh->b_cp_transaction;
|
||||
|
||||
jh->b_cpnext->b_cpprev = jh->b_cpprev;
|
||||
jh->b_cpprev->b_cpnext = jh->b_cpnext;
|
||||
if (transaction->t_checkpoint_list == jh) {
|
||||
transaction->t_checkpoint_list = jh->b_cpnext;
|
||||
if (transaction->t_checkpoint_list == jh)
|
||||
transaction->t_checkpoint_list = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Unlink a buffer from a transaction checkpoint(io) list.
|
||||
*
|
||||
* Called with j_list_lock held.
|
||||
*/
|
||||
static inline void __buffer_unlink(struct journal_head *jh)
|
||||
{
|
||||
transaction_t *transaction = jh->b_cp_transaction;
|
||||
|
||||
__buffer_unlink_first(jh);
|
||||
if (transaction->t_checkpoint_io_list == jh) {
|
||||
transaction->t_checkpoint_io_list = jh->b_cpnext;
|
||||
if (transaction->t_checkpoint_io_list == jh)
|
||||
transaction->t_checkpoint_io_list = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Move a buffer from the checkpoint list to the checkpoint io list
|
||||
*
|
||||
* Called with j_list_lock held
|
||||
*/
|
||||
static inline void __buffer_relink_io(struct journal_head *jh)
|
||||
{
|
||||
transaction_t *transaction = jh->b_cp_transaction;
|
||||
|
||||
__buffer_unlink_first(jh);
|
||||
|
||||
if (!transaction->t_checkpoint_io_list) {
|
||||
jh->b_cpnext = jh->b_cpprev = jh;
|
||||
} else {
|
||||
jh->b_cpnext = transaction->t_checkpoint_io_list;
|
||||
jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
|
||||
jh->b_cpprev->b_cpnext = jh;
|
||||
jh->b_cpnext->b_cpprev = jh;
|
||||
}
|
||||
transaction->t_checkpoint_io_list = jh;
|
||||
}
|
||||
|
||||
/*
|
||||
* Try to release a checkpointed buffer from its transaction.
|
||||
* Returns 1 if we released it and 2 if we also released the
|
||||
* whole transaction.
|
||||
*
|
||||
* Requires j_list_lock
|
||||
* Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
|
||||
*/
|
||||
static int __try_to_free_cp_buf(struct journal_head *jh)
|
||||
{
|
||||
int ret = 0;
|
||||
struct buffer_head *bh = jh2bh(jh);
|
||||
|
||||
if (jh->b_jlist == BJ_None && !buffer_locked(bh) &&
|
||||
!buffer_dirty(bh) && !buffer_write_io_error(bh)) {
|
||||
/*
|
||||
* Get our reference so that bh cannot be freed before
|
||||
* we unlock it
|
||||
*/
|
||||
get_bh(bh);
|
||||
JBUFFER_TRACE(jh, "remove from checkpoint list");
|
||||
ret = __journal_remove_checkpoint(jh) + 1;
|
||||
jbd_unlock_bh_state(bh);
|
||||
BUFFER_TRACE(bh, "release");
|
||||
__brelse(bh);
|
||||
} else {
|
||||
jbd_unlock_bh_state(bh);
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* __log_wait_for_space: wait until there is space in the journal.
|
||||
*
|
||||
* Called under j-state_lock *only*. It will be unlocked if we have to wait
|
||||
* for a checkpoint to free up some space in the log.
|
||||
*/
|
||||
void __log_wait_for_space(journal_t *journal)
|
||||
{
|
||||
int nblocks, space_left;
|
||||
assert_spin_locked(&journal->j_state_lock);
|
||||
|
||||
nblocks = jbd_space_needed(journal);
|
||||
while (__log_space_left(journal) < nblocks) {
|
||||
if (journal->j_flags & JFS_ABORT)
|
||||
return;
|
||||
spin_unlock(&journal->j_state_lock);
|
||||
mutex_lock(&journal->j_checkpoint_mutex);
|
||||
|
||||
/*
|
||||
* Test again, another process may have checkpointed while we
|
||||
* were waiting for the checkpoint lock. If there are no
|
||||
* transactions ready to be checkpointed, try to recover
|
||||
* journal space by calling cleanup_journal_tail(), and if
|
||||
* that doesn't work, by waiting for the currently committing
|
||||
* transaction to complete. If there is absolutely no way
|
||||
* to make progress, this is either a BUG or corrupted
|
||||
* filesystem, so abort the journal and leave a stack
|
||||
* trace for forensic evidence.
|
||||
*/
|
||||
spin_lock(&journal->j_state_lock);
|
||||
spin_lock(&journal->j_list_lock);
|
||||
nblocks = jbd_space_needed(journal);
|
||||
space_left = __log_space_left(journal);
|
||||
if (space_left < nblocks) {
|
||||
int chkpt = journal->j_checkpoint_transactions != NULL;
|
||||
tid_t tid = 0;
|
||||
|
||||
if (journal->j_committing_transaction)
|
||||
tid = journal->j_committing_transaction->t_tid;
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
spin_unlock(&journal->j_state_lock);
|
||||
if (chkpt) {
|
||||
log_do_checkpoint(journal);
|
||||
} else if (cleanup_journal_tail(journal) == 0) {
|
||||
/* We were able to recover space; yay! */
|
||||
;
|
||||
} else if (tid) {
|
||||
log_wait_commit(journal, tid);
|
||||
} else {
|
||||
printk(KERN_ERR "%s: needed %d blocks and "
|
||||
"only had %d space available\n",
|
||||
__func__, nblocks, space_left);
|
||||
printk(KERN_ERR "%s: no way to get more "
|
||||
"journal space\n", __func__);
|
||||
WARN_ON(1);
|
||||
journal_abort(journal, 0);
|
||||
}
|
||||
spin_lock(&journal->j_state_lock);
|
||||
} else {
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
}
|
||||
mutex_unlock(&journal->j_checkpoint_mutex);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* We were unable to perform jbd_trylock_bh_state() inside j_list_lock.
|
||||
* The caller must restart a list walk. Wait for someone else to run
|
||||
* jbd_unlock_bh_state().
|
||||
*/
|
||||
static void jbd_sync_bh(journal_t *journal, struct buffer_head *bh)
|
||||
__releases(journal->j_list_lock)
|
||||
{
|
||||
get_bh(bh);
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
jbd_lock_bh_state(bh);
|
||||
jbd_unlock_bh_state(bh);
|
||||
put_bh(bh);
|
||||
}
|
||||
|
||||
/*
|
||||
* Clean up transaction's list of buffers submitted for io.
|
||||
* We wait for any pending IO to complete and remove any clean
|
||||
* buffers. Note that we take the buffers in the opposite ordering
|
||||
* from the one in which they were submitted for IO.
|
||||
*
|
||||
* Return 0 on success, and return <0 if some buffers have failed
|
||||
* to be written out.
|
||||
*
|
||||
* Called with j_list_lock held.
|
||||
*/
|
||||
static int __wait_cp_io(journal_t *journal, transaction_t *transaction)
|
||||
{
|
||||
struct journal_head *jh;
|
||||
struct buffer_head *bh;
|
||||
tid_t this_tid;
|
||||
int released = 0;
|
||||
int ret = 0;
|
||||
|
||||
this_tid = transaction->t_tid;
|
||||
restart:
|
||||
/* Did somebody clean up the transaction in the meanwhile? */
|
||||
if (journal->j_checkpoint_transactions != transaction ||
|
||||
transaction->t_tid != this_tid)
|
||||
return ret;
|
||||
while (!released && transaction->t_checkpoint_io_list) {
|
||||
jh = transaction->t_checkpoint_io_list;
|
||||
bh = jh2bh(jh);
|
||||
if (!jbd_trylock_bh_state(bh)) {
|
||||
jbd_sync_bh(journal, bh);
|
||||
spin_lock(&journal->j_list_lock);
|
||||
goto restart;
|
||||
}
|
||||
get_bh(bh);
|
||||
if (buffer_locked(bh)) {
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
jbd_unlock_bh_state(bh);
|
||||
wait_on_buffer(bh);
|
||||
/* the journal_head may have gone by now */
|
||||
BUFFER_TRACE(bh, "brelse");
|
||||
__brelse(bh);
|
||||
spin_lock(&journal->j_list_lock);
|
||||
goto restart;
|
||||
}
|
||||
if (unlikely(buffer_write_io_error(bh)))
|
||||
ret = -EIO;
|
||||
|
||||
/*
|
||||
* Now in whatever state the buffer currently is, we know that
|
||||
* it has been written out and so we can drop it from the list
|
||||
*/
|
||||
released = __journal_remove_checkpoint(jh);
|
||||
jbd_unlock_bh_state(bh);
|
||||
__brelse(bh);
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
#define NR_BATCH 64
|
||||
|
||||
static void
|
||||
__flush_batch(journal_t *journal, struct buffer_head **bhs, int *batch_count)
|
||||
{
|
||||
int i;
|
||||
struct blk_plug plug;
|
||||
|
||||
blk_start_plug(&plug);
|
||||
for (i = 0; i < *batch_count; i++)
|
||||
write_dirty_buffer(bhs[i], WRITE_SYNC);
|
||||
blk_finish_plug(&plug);
|
||||
|
||||
for (i = 0; i < *batch_count; i++) {
|
||||
struct buffer_head *bh = bhs[i];
|
||||
clear_buffer_jwrite(bh);
|
||||
BUFFER_TRACE(bh, "brelse");
|
||||
__brelse(bh);
|
||||
}
|
||||
*batch_count = 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Try to flush one buffer from the checkpoint list to disk.
|
||||
*
|
||||
* Return 1 if something happened which requires us to abort the current
|
||||
* scan of the checkpoint list. Return <0 if the buffer has failed to
|
||||
* be written out.
|
||||
*
|
||||
* Called with j_list_lock held and drops it if 1 is returned
|
||||
* Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
|
||||
*/
|
||||
static int __process_buffer(journal_t *journal, struct journal_head *jh,
|
||||
struct buffer_head **bhs, int *batch_count)
|
||||
{
|
||||
struct buffer_head *bh = jh2bh(jh);
|
||||
int ret = 0;
|
||||
|
||||
if (buffer_locked(bh)) {
|
||||
get_bh(bh);
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
jbd_unlock_bh_state(bh);
|
||||
wait_on_buffer(bh);
|
||||
/* the journal_head may have gone by now */
|
||||
BUFFER_TRACE(bh, "brelse");
|
||||
__brelse(bh);
|
||||
ret = 1;
|
||||
} else if (jh->b_transaction != NULL) {
|
||||
transaction_t *t = jh->b_transaction;
|
||||
tid_t tid = t->t_tid;
|
||||
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
jbd_unlock_bh_state(bh);
|
||||
log_start_commit(journal, tid);
|
||||
log_wait_commit(journal, tid);
|
||||
ret = 1;
|
||||
} else if (!buffer_dirty(bh)) {
|
||||
ret = 1;
|
||||
if (unlikely(buffer_write_io_error(bh)))
|
||||
ret = -EIO;
|
||||
get_bh(bh);
|
||||
J_ASSERT_JH(jh, !buffer_jbddirty(bh));
|
||||
BUFFER_TRACE(bh, "remove from checkpoint");
|
||||
__journal_remove_checkpoint(jh);
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
jbd_unlock_bh_state(bh);
|
||||
__brelse(bh);
|
||||
} else {
|
||||
/*
|
||||
* Important: we are about to write the buffer, and
|
||||
* possibly block, while still holding the journal lock.
|
||||
* We cannot afford to let the transaction logic start
|
||||
* messing around with this buffer before we write it to
|
||||
* disk, as that would break recoverability.
|
||||
*/
|
||||
BUFFER_TRACE(bh, "queue");
|
||||
get_bh(bh);
|
||||
J_ASSERT_BH(bh, !buffer_jwrite(bh));
|
||||
set_buffer_jwrite(bh);
|
||||
bhs[*batch_count] = bh;
|
||||
__buffer_relink_io(jh);
|
||||
jbd_unlock_bh_state(bh);
|
||||
(*batch_count)++;
|
||||
if (*batch_count == NR_BATCH) {
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
__flush_batch(journal, bhs, batch_count);
|
||||
ret = 1;
|
||||
}
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* Perform an actual checkpoint. We take the first transaction on the
|
||||
* list of transactions to be checkpointed and send all its buffers
|
||||
* to disk. We submit larger chunks of data at once.
|
||||
*
|
||||
* The journal should be locked before calling this function.
|
||||
* Called with j_checkpoint_mutex held.
|
||||
*/
|
||||
int log_do_checkpoint(journal_t *journal)
|
||||
{
|
||||
transaction_t *transaction;
|
||||
tid_t this_tid;
|
||||
int result;
|
||||
|
||||
jbd_debug(1, "Start checkpoint\n");
|
||||
|
||||
/*
|
||||
* First thing: if there are any transactions in the log which
|
||||
* don't need checkpointing, just eliminate them from the
|
||||
* journal straight away.
|
||||
*/
|
||||
result = cleanup_journal_tail(journal);
|
||||
trace_jbd_checkpoint(journal, result);
|
||||
jbd_debug(1, "cleanup_journal_tail returned %d\n", result);
|
||||
if (result <= 0)
|
||||
return result;
|
||||
|
||||
/*
|
||||
* OK, we need to start writing disk blocks. Take one transaction
|
||||
* and write it.
|
||||
*/
|
||||
result = 0;
|
||||
spin_lock(&journal->j_list_lock);
|
||||
if (!journal->j_checkpoint_transactions)
|
||||
goto out;
|
||||
transaction = journal->j_checkpoint_transactions;
|
||||
this_tid = transaction->t_tid;
|
||||
restart:
|
||||
/*
|
||||
* If someone cleaned up this transaction while we slept, we're
|
||||
* done (maybe it's a new transaction, but it fell at the same
|
||||
* address).
|
||||
*/
|
||||
if (journal->j_checkpoint_transactions == transaction &&
|
||||
transaction->t_tid == this_tid) {
|
||||
int batch_count = 0;
|
||||
struct buffer_head *bhs[NR_BATCH];
|
||||
struct journal_head *jh;
|
||||
int retry = 0, err;
|
||||
|
||||
while (!retry && transaction->t_checkpoint_list) {
|
||||
struct buffer_head *bh;
|
||||
|
||||
jh = transaction->t_checkpoint_list;
|
||||
bh = jh2bh(jh);
|
||||
if (!jbd_trylock_bh_state(bh)) {
|
||||
jbd_sync_bh(journal, bh);
|
||||
retry = 1;
|
||||
break;
|
||||
}
|
||||
retry = __process_buffer(journal, jh, bhs,&batch_count);
|
||||
if (retry < 0 && !result)
|
||||
result = retry;
|
||||
if (!retry && (need_resched() ||
|
||||
spin_needbreak(&journal->j_list_lock))) {
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
retry = 1;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (batch_count) {
|
||||
if (!retry) {
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
retry = 1;
|
||||
}
|
||||
__flush_batch(journal, bhs, &batch_count);
|
||||
}
|
||||
|
||||
if (retry) {
|
||||
spin_lock(&journal->j_list_lock);
|
||||
goto restart;
|
||||
}
|
||||
/*
|
||||
* Now we have cleaned up the first transaction's checkpoint
|
||||
* list. Let's clean up the second one
|
||||
*/
|
||||
err = __wait_cp_io(journal, transaction);
|
||||
if (!result)
|
||||
result = err;
|
||||
}
|
||||
out:
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
if (result < 0)
|
||||
journal_abort(journal, result);
|
||||
else
|
||||
result = cleanup_journal_tail(journal);
|
||||
|
||||
return (result < 0) ? result : 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Check the list of checkpoint transactions for the journal to see if
|
||||
* we have already got rid of any since the last update of the log tail
|
||||
* in the journal superblock. If so, we can instantly roll the
|
||||
* superblock forward to remove those transactions from the log.
|
||||
*
|
||||
* Return <0 on error, 0 on success, 1 if there was nothing to clean up.
|
||||
*
|
||||
* This is the only part of the journaling code which really needs to be
|
||||
* aware of transaction aborts. Checkpointing involves writing to the
|
||||
* main filesystem area rather than to the journal, so it can proceed
|
||||
* even in abort state, but we must not update the super block if
|
||||
* checkpointing may have failed. Otherwise, we would lose some metadata
|
||||
* buffers which should be written-back to the filesystem.
|
||||
*/
|
||||
|
||||
int cleanup_journal_tail(journal_t *journal)
|
||||
{
|
||||
transaction_t * transaction;
|
||||
tid_t first_tid;
|
||||
unsigned int blocknr, freed;
|
||||
|
||||
if (is_journal_aborted(journal))
|
||||
return 1;
|
||||
|
||||
/*
|
||||
* OK, work out the oldest transaction remaining in the log, and
|
||||
* the log block it starts at.
|
||||
*
|
||||
* If the log is now empty, we need to work out which is the
|
||||
* next transaction ID we will write, and where it will
|
||||
* start.
|
||||
*/
|
||||
spin_lock(&journal->j_state_lock);
|
||||
spin_lock(&journal->j_list_lock);
|
||||
transaction = journal->j_checkpoint_transactions;
|
||||
if (transaction) {
|
||||
first_tid = transaction->t_tid;
|
||||
blocknr = transaction->t_log_start;
|
||||
} else if ((transaction = journal->j_committing_transaction) != NULL) {
|
||||
first_tid = transaction->t_tid;
|
||||
blocknr = transaction->t_log_start;
|
||||
} else if ((transaction = journal->j_running_transaction) != NULL) {
|
||||
first_tid = transaction->t_tid;
|
||||
blocknr = journal->j_head;
|
||||
} else {
|
||||
first_tid = journal->j_transaction_sequence;
|
||||
blocknr = journal->j_head;
|
||||
}
|
||||
spin_unlock(&journal->j_list_lock);
|
||||
J_ASSERT(blocknr != 0);
|
||||
|
||||
/* If the oldest pinned transaction is at the tail of the log
|
||||
already then there's not much we can do right now. */
|
||||
if (journal->j_tail_sequence == first_tid) {
|
||||
spin_unlock(&journal->j_state_lock);
|
||||
return 1;
|
||||
}
|
||||
spin_unlock(&journal->j_state_lock);
|
||||
|
||||
/*
|
||||
* We need to make sure that any blocks that were recently written out
|
||||
* --- perhaps by log_do_checkpoint() --- are flushed out before we
|
||||
* drop the transactions from the journal. Similarly we need to be sure
|
||||
* superblock makes it to disk before next transaction starts reusing
|
||||
* freed space (otherwise we could replay some blocks of the new
|
||||
* transaction thinking they belong to the old one). So we use
|
||||
* WRITE_FLUSH_FUA. It's unlikely this will be necessary, especially
|
||||
* with an appropriately sized journal, but we need this to guarantee
|
||||
* correctness. Fortunately cleanup_journal_tail() doesn't get called
|
||||
* all that often.
|
||||
*/
|
||||
journal_update_sb_log_tail(journal, first_tid, blocknr,
|
||||
WRITE_FLUSH_FUA);
|
||||
|
||||
spin_lock(&journal->j_state_lock);
|
||||
/* OK, update the superblock to recover the freed space.
|
||||
* Physical blocks come first: have we wrapped beyond the end of
|
||||
* the log? */
|
||||
freed = blocknr - journal->j_tail;
|
||||
if (blocknr < journal->j_tail)
|
||||
freed = freed + journal->j_last - journal->j_first;
|
||||
|
||||
trace_jbd_cleanup_journal_tail(journal, first_tid, blocknr, freed);
|
||||
jbd_debug(1,
|
||||
"Cleaning journal tail from %d to %d (offset %u), "
|
||||
"freeing %u\n",
|
||||
journal->j_tail_sequence, first_tid, blocknr, freed);
|
||||
|
||||
journal->j_free += freed;
|
||||
journal->j_tail_sequence = first_tid;
|
||||
journal->j_tail = blocknr;
|
||||
spin_unlock(&journal->j_state_lock);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
/* Checkpoint list management */
|
||||
|
||||
/*
|
||||
* journal_clean_one_cp_list
|
||||
*
|
||||
* Find all the written-back checkpoint buffers in the given list and release
|
||||
* them.
|
||||
*
|
||||
* Called with j_list_lock held.
|
||||
* Returns number of buffers reaped (for debug)
|
||||
*/
|
||||
|
||||
static int journal_clean_one_cp_list(struct journal_head *jh, int *released)
|
||||
{
|
||||
struct journal_head *last_jh;
|
||||
struct journal_head *next_jh = jh;
|
||||
int ret, freed = 0;
|
||||
|
||||
*released = 0;
|
||||
if (!jh)
|
||||
return 0;
|
||||
|
||||
last_jh = jh->b_cpprev;
|
||||
do {
|
||||
jh = next_jh;
|
||||
next_jh = jh->b_cpnext;
|
||||
/* Use trylock because of the ranking */
|
||||
if (jbd_trylock_bh_state(jh2bh(jh))) {
|
||||
ret = __try_to_free_cp_buf(jh);
|
||||
if (ret) {
|
||||
freed++;
|
||||
if (ret == 2) {
|
||||
*released = 1;
|
||||
return freed;
|
||||
}
|
||||
}
|
||||
}
|
||||
/*
|
||||
* This function only frees up some memory
|
||||
* if possible so we dont have an obligation
|
||||
* to finish processing. Bail out if preemption
|
||||
* requested:
|
||||
*/
|
||||
if (need_resched())
|
||||
return freed;
|
||||
} while (jh != last_jh);
|
||||
|
||||
return freed;
|
||||
}
|
||||
|
||||
/*
|
||||
* journal_clean_checkpoint_list
|
||||
*
|
||||
* Find all the written-back checkpoint buffers in the journal and release them.
|
||||
*
|
||||
* Called with the journal locked.
|
||||
* Called with j_list_lock held.
|
||||
* Returns number of buffers reaped (for debug)
|
||||
*/
|
||||
|
||||
int __journal_clean_checkpoint_list(journal_t *journal)
|
||||
{
|
||||
transaction_t *transaction, *last_transaction, *next_transaction;
|
||||
int ret = 0;
|
||||
int released;
|
||||
|
||||
transaction = journal->j_checkpoint_transactions;
|
||||
if (!transaction)
|
||||
goto out;
|
||||
|
||||
last_transaction = transaction->t_cpprev;
|
||||
next_transaction = transaction;
|
||||
do {
|
||||
transaction = next_transaction;
|
||||
next_transaction = transaction->t_cpnext;
|
||||
ret += journal_clean_one_cp_list(transaction->
|
||||
t_checkpoint_list, &released);
|
||||
/*
|
||||
* This function only frees up some memory if possible so we
|
||||
* dont have an obligation to finish processing. Bail out if
|
||||
* preemption requested:
|
||||
*/
|
||||
if (need_resched())
|
||||
goto out;
|
||||
if (released)
|
||||
continue;
|
||||
/*
|
||||
* It is essential that we are as careful as in the case of
|
||||
* t_checkpoint_list with removing the buffer from the list as
|
||||
* we can possibly see not yet submitted buffers on io_list
|
||||
*/
|
||||
ret += journal_clean_one_cp_list(transaction->
|
||||
t_checkpoint_io_list, &released);
|
||||
if (need_resched())
|
||||
goto out;
|
||||
} while (transaction != last_transaction);
|
||||
out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* journal_remove_checkpoint: called after a buffer has been committed
|
||||
* to disk (either by being write-back flushed to disk, or being
|
||||
* committed to the log).
|
||||
*
|
||||
* We cannot safely clean a transaction out of the log until all of the
|
||||
* buffer updates committed in that transaction have safely been stored
|
||||
* elsewhere on disk. To achieve this, all of the buffers in a
|
||||
* transaction need to be maintained on the transaction's checkpoint
|
||||
* lists until they have been rewritten, at which point this function is
|
||||
* called to remove the buffer from the existing transaction's
|
||||
* checkpoint lists.
|
||||
*
|
||||
* The function returns 1 if it frees the transaction, 0 otherwise.
|
||||
* The function can free jh and bh.
|
||||
*
|
||||
* This function is called with j_list_lock held.
|
||||
* This function is called with jbd_lock_bh_state(jh2bh(jh))
|
||||
*/
|
||||
|
||||
int __journal_remove_checkpoint(struct journal_head *jh)
|
||||
{
|
||||
transaction_t *transaction;
|
||||
journal_t *journal;
|
||||
int ret = 0;
|
||||
|
||||
JBUFFER_TRACE(jh, "entry");
|
||||
|
||||
if ((transaction = jh->b_cp_transaction) == NULL) {
|
||||
JBUFFER_TRACE(jh, "not on transaction");
|
||||
goto out;
|
||||
}
|
||||
journal = transaction->t_journal;
|
||||
|
||||
JBUFFER_TRACE(jh, "removing from transaction");
|
||||
__buffer_unlink(jh);
|
||||
jh->b_cp_transaction = NULL;
|
||||
journal_put_journal_head(jh);
|
||||
|
||||
if (transaction->t_checkpoint_list != NULL ||
|
||||
transaction->t_checkpoint_io_list != NULL)
|
||||
goto out;
|
||||
|
||||
/*
|
||||
* There is one special case to worry about: if we have just pulled the
|
||||
* buffer off a running or committing transaction's checkpoing list,
|
||||
* then even if the checkpoint list is empty, the transaction obviously
|
||||
* cannot be dropped!
|
||||
*
|
||||
* The locking here around t_state is a bit sleazy.
|
||||
* See the comment at the end of journal_commit_transaction().
|
||||
*/
|
||||
if (transaction->t_state != T_FINISHED)
|
||||
goto out;
|
||||
|
||||
/* OK, that was the last buffer for the transaction: we can now
|
||||
safely remove this transaction from the log */
|
||||
|
||||
__journal_drop_transaction(journal, transaction);
|
||||
|
||||
/* Just in case anybody was waiting for more transactions to be
|
||||
checkpointed... */
|
||||
wake_up(&journal->j_wait_logspace);
|
||||
ret = 1;
|
||||
out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
|
||||
* list so that we know when it is safe to clean the transaction out of
|
||||
* the log.
|
||||
*
|
||||
* Called with the journal locked.
|
||||
* Called with j_list_lock held.
|
||||
*/
|
||||
void __journal_insert_checkpoint(struct journal_head *jh,
|
||||
transaction_t *transaction)
|
||||
{
|
||||
JBUFFER_TRACE(jh, "entry");
|
||||
J_ASSERT_JH(jh, buffer_dirty(jh2bh(jh)) || buffer_jbddirty(jh2bh(jh)));
|
||||
J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
|
||||
|
||||
/* Get reference for checkpointing transaction */
|
||||
journal_grab_journal_head(jh2bh(jh));
|
||||
jh->b_cp_transaction = transaction;
|
||||
|
||||
if (!transaction->t_checkpoint_list) {
|
||||
jh->b_cpnext = jh->b_cpprev = jh;
|
||||
} else {
|
||||
jh->b_cpnext = transaction->t_checkpoint_list;
|
||||
jh->b_cpprev = transaction->t_checkpoint_list->b_cpprev;
|
||||
jh->b_cpprev->b_cpnext = jh;
|
||||
jh->b_cpnext->b_cpprev = jh;
|
||||
}
|
||||
transaction->t_checkpoint_list = jh;
|
||||
}
|
||||
|
||||
/*
|
||||
* We've finished with this transaction structure: adios...
|
||||
*
|
||||
* The transaction must have no links except for the checkpoint by this
|
||||
* point.
|
||||
*
|
||||
* Called with the journal locked.
|
||||
* Called with j_list_lock held.
|
||||
*/
|
||||
|
||||
void __journal_drop_transaction(journal_t *journal, transaction_t *transaction)
|
||||
{
|
||||
assert_spin_locked(&journal->j_list_lock);
|
||||
if (transaction->t_cpnext) {
|
||||
transaction->t_cpnext->t_cpprev = transaction->t_cpprev;
|
||||
transaction->t_cpprev->t_cpnext = transaction->t_cpnext;
|
||||
if (journal->j_checkpoint_transactions == transaction)
|
||||
journal->j_checkpoint_transactions =
|
||||
transaction->t_cpnext;
|
||||
if (journal->j_checkpoint_transactions == transaction)
|
||||
journal->j_checkpoint_transactions = NULL;
|
||||
}
|
||||
|
||||
J_ASSERT(transaction->t_state == T_FINISHED);
|
||||
J_ASSERT(transaction->t_buffers == NULL);
|
||||
J_ASSERT(transaction->t_sync_datalist == NULL);
|
||||
J_ASSERT(transaction->t_forget == NULL);
|
||||
J_ASSERT(transaction->t_iobuf_list == NULL);
|
||||
J_ASSERT(transaction->t_shadow_list == NULL);
|
||||
J_ASSERT(transaction->t_log_list == NULL);
|
||||
J_ASSERT(transaction->t_checkpoint_list == NULL);
|
||||
J_ASSERT(transaction->t_checkpoint_io_list == NULL);
|
||||
J_ASSERT(transaction->t_updates == 0);
|
||||
J_ASSERT(journal->j_committing_transaction != transaction);
|
||||
J_ASSERT(journal->j_running_transaction != transaction);
|
||||
|
||||
trace_jbd_drop_transaction(journal, transaction);
|
||||
jbd_debug(1, "Dropping transaction %d, all done\n", transaction->t_tid);
|
||||
kfree(transaction);
|
||||
}
|
1021
fs/jbd/commit.c
1021
fs/jbd/commit.c
File diff suppressed because it is too large
Load Diff
2145
fs/jbd/journal.c
2145
fs/jbd/journal.c
File diff suppressed because it is too large
Load Diff
|
@ -1,594 +0,0 @@
|
|||
/*
|
||||
* linux/fs/jbd/recovery.c
|
||||
*
|
||||
* Written by Stephen C. Tweedie <sct@redhat.com>, 1999
|
||||
*
|
||||
* Copyright 1999-2000 Red Hat Software --- All Rights Reserved
|
||||
*
|
||||
* This file is part of the Linux kernel and is made available under
|
||||
* the terms of the GNU General Public License, version 2, or at your
|
||||
* option, any later version, incorporated herein by reference.
|
||||
*
|
||||
* Journal recovery routines for the generic filesystem journaling code;
|
||||
* part of the ext2fs journaling system.
|
||||
*/
|
||||
|
||||
#ifndef __KERNEL__
|
||||
#include "jfs_user.h"
|
||||
#else
|
||||
#include <linux/time.h>
|
||||
#include <linux/fs.h>
|
||||
#include <linux/jbd.h>
|
||||
#include <linux/errno.h>
|
||||
#include <linux/blkdev.h>
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Maintain information about the progress of the recovery job, so that
|
||||
* the different passes can carry information between them.
|
||||
*/
|
||||
struct recovery_info
|
||||
{
|
||||
tid_t start_transaction;
|
||||
tid_t end_transaction;
|
||||
|
||||
int nr_replays;
|
||||
int nr_revokes;
|
||||
int nr_revoke_hits;
|
||||
};
|
||||
|
||||
enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY};
|
||||
static int do_one_pass(journal_t *journal,
|
||||
struct recovery_info *info, enum passtype pass);
|
||||
static int scan_revoke_records(journal_t *, struct buffer_head *,
|
||||
tid_t, struct recovery_info *);
|
||||
|
||||
#ifdef __KERNEL__
|
||||
|
||||
/* Release readahead buffers after use */
|
||||
static void journal_brelse_array(struct buffer_head *b[], int n)
|
||||
{
|
||||
while (--n >= 0)
|
||||
brelse (b[n]);
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* When reading from the journal, we are going through the block device
|
||||
* layer directly and so there is no readahead being done for us. We
|
||||
* need to implement any readahead ourselves if we want it to happen at
|
||||
* all. Recovery is basically one long sequential read, so make sure we
|
||||
* do the IO in reasonably large chunks.
|
||||
*
|
||||
* This is not so critical that we need to be enormously clever about
|
||||
* the readahead size, though. 128K is a purely arbitrary, good-enough
|
||||
* fixed value.
|
||||
*/
|
||||
|
||||
#define MAXBUF 8
|
||||
static int do_readahead(journal_t *journal, unsigned int start)
|
||||
{
|
||||
int err;
|
||||
unsigned int max, nbufs, next;
|
||||
unsigned int blocknr;
|
||||
struct buffer_head *bh;
|
||||
|
||||
struct buffer_head * bufs[MAXBUF];
|
||||
|
||||
/* Do up to 128K of readahead */
|
||||
max = start + (128 * 1024 / journal->j_blocksize);
|
||||
if (max > journal->j_maxlen)
|
||||
max = journal->j_maxlen;
|
||||
|
||||
/* Do the readahead itself. We'll submit MAXBUF buffer_heads at
|
||||
* a time to the block device IO layer. */
|
||||
|
||||
nbufs = 0;
|
||||
|
||||
for (next = start; next < max; next++) {
|
||||
err = journal_bmap(journal, next, &blocknr);
|
||||
|
||||
if (err) {
|
||||
printk (KERN_ERR "JBD: bad block at offset %u\n",
|
||||
next);
|
||||
goto failed;
|
||||
}
|
||||
|
||||
bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
|
||||
if (!bh) {
|
||||
err = -ENOMEM;
|
||||
goto failed;
|
||||
}
|
||||
|
||||
if (!buffer_uptodate(bh) && !buffer_locked(bh)) {
|
||||
bufs[nbufs++] = bh;
|
||||
if (nbufs == MAXBUF) {
|
||||
ll_rw_block(READ, nbufs, bufs);
|
||||
journal_brelse_array(bufs, nbufs);
|
||||
nbufs = 0;
|
||||
}
|
||||
} else
|
||||
brelse(bh);
|
||||
}
|
||||
|
||||
if (nbufs)
|
||||
ll_rw_block(READ, nbufs, bufs);
|
||||
err = 0;
|
||||
|
||||
failed:
|
||||
if (nbufs)
|
||||
journal_brelse_array(bufs, nbufs);
|
||||
return err;
|
||||
}
|
||||
|
||||
#endif /* __KERNEL__ */
|
||||
|
||||
|
||||
/*
|
||||
* Read a block from the journal
|
||||
*/
|
||||
|
||||
static int jread(struct buffer_head **bhp, journal_t *journal,
|
||||
unsigned int offset)
|
||||
{
|
||||
int err;
|
||||
unsigned int blocknr;
|
||||
struct buffer_head *bh;
|
||||
|
||||
*bhp = NULL;
|
||||
|
||||
if (offset >= journal->j_maxlen) {
|
||||
printk(KERN_ERR "JBD: corrupted journal superblock\n");
|
||||
return -EIO;
|
||||
}
|
||||
|
||||
err = journal_bmap(journal, offset, &blocknr);
|
||||
|
||||
if (err) {
|
||||
printk (KERN_ERR "JBD: bad block at offset %u\n",
|
||||
offset);
|
||||
return err;
|
||||
}
|
||||
|
||||
bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
|
||||
if (!bh)
|
||||
return -ENOMEM;
|
||||
|
||||
if (!buffer_uptodate(bh)) {
|
||||
/* If this is a brand new buffer, start readahead.
|
||||
Otherwise, we assume we are already reading it. */
|
||||
if (!buffer_req(bh))
|
||||
do_readahead(journal, offset);
|
||||
wait_on_buffer(bh);
|
||||
}
|
||||
|
||||
if (!buffer_uptodate(bh)) {
|
||||
printk (KERN_ERR "JBD: Failed to read block at offset %u\n",
|
||||
offset);
|
||||
brelse(bh);
|
||||
return -EIO;
|
||||
}
|
||||
|
||||
*bhp = bh;
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Count the number of in-use tags in a journal descriptor block.
|
||||
*/
|
||||
|
||||
static int count_tags(struct buffer_head *bh, int size)
|
||||
{
|
||||
char * tagp;
|
||||
journal_block_tag_t * tag;
|
||||
int nr = 0;
|
||||
|
||||
tagp = &bh->b_data[sizeof(journal_header_t)];
|
||||
|
||||
while ((tagp - bh->b_data + sizeof(journal_block_tag_t)) <= size) {
|
||||
tag = (journal_block_tag_t *) tagp;
|
||||
|
||||
nr++;
|
||||
tagp += sizeof(journal_block_tag_t);
|
||||
if (!(tag->t_flags & cpu_to_be32(JFS_FLAG_SAME_UUID)))
|
||||
tagp += 16;
|
||||
|
||||
if (tag->t_flags & cpu_to_be32(JFS_FLAG_LAST_TAG))
|
||||
break;
|
||||
}
|
||||
|
||||
return nr;
|
||||
}
|
||||
|
||||
|
||||
/* Make sure we wrap around the log correctly! */
|
||||
#define wrap(journal, var) \
|
||||
do { \
|
||||
if (var >= (journal)->j_last) \
|
||||
var -= ((journal)->j_last - (journal)->j_first); \
|
||||
} while (0)
|
||||
|
||||
/**
|
||||
* journal_recover - recovers a on-disk journal
|
||||
* @journal: the journal to recover
|
||||
*
|
||||
* The primary function for recovering the log contents when mounting a
|
||||
* journaled device.
|
||||
*
|
||||
* Recovery is done in three passes. In the first pass, we look for the
|
||||
* end of the log. In the second, we assemble the list of revoke
|
||||
* blocks. In the third and final pass, we replay any un-revoked blocks
|
||||
* in the log.
|
||||
*/
|
||||
int journal_recover(journal_t *journal)
|
||||
{
|
||||
int err, err2;
|
||||
journal_superblock_t * sb;
|
||||
|
||||
struct recovery_info info;
|
||||
|
||||
memset(&info, 0, sizeof(info));
|
||||
sb = journal->j_superblock;
|
||||
|
||||
/*
|
||||
* The journal superblock's s_start field (the current log head)
|
||||
* is always zero if, and only if, the journal was cleanly
|
||||
* unmounted.
|
||||
*/
|
||||
|
||||
if (!sb->s_start) {
|
||||
jbd_debug(1, "No recovery required, last transaction %d\n",
|
||||
be32_to_cpu(sb->s_sequence));
|
||||
journal->j_transaction_sequence = be32_to_cpu(sb->s_sequence) + 1;
|
||||
return 0;
|
||||
}
|
||||
|
||||
err = do_one_pass(journal, &info, PASS_SCAN);
|
||||
if (!err)
|
||||
err = do_one_pass(journal, &info, PASS_REVOKE);
|
||||
if (!err)
|
||||
err = do_one_pass(journal, &info, PASS_REPLAY);
|
||||
|
||||
jbd_debug(1, "JBD: recovery, exit status %d, "
|
||||
"recovered transactions %u to %u\n",
|
||||
err, info.start_transaction, info.end_transaction);
|
||||
jbd_debug(1, "JBD: Replayed %d and revoked %d/%d blocks\n",
|
||||
info.nr_replays, info.nr_revoke_hits, info.nr_revokes);
|
||||
|
||||
/* Restart the log at the next transaction ID, thus invalidating
|
||||
* any existing commit records in the log. */
|
||||
journal->j_transaction_sequence = ++info.end_transaction;
|
||||
|
||||
journal_clear_revoke(journal);
|
||||
err2 = sync_blockdev(journal->j_fs_dev);
|
||||
if (!err)
|
||||
err = err2;
|
||||
/* Flush disk caches to get replayed data on the permanent storage */
|
||||
if (journal->j_flags & JFS_BARRIER) {
|
||||
err2 = blkdev_issue_flush(journal->j_fs_dev, GFP_KERNEL, NULL);
|
||||
if (!err)
|
||||
err = err2;
|
||||
}
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
/**
|
||||
* journal_skip_recovery - Start journal and wipe exiting records
|
||||
* @journal: journal to startup
|
||||
*
|
||||
* Locate any valid recovery information from the journal and set up the
|
||||
* journal structures in memory to ignore it (presumably because the
|
||||
* caller has evidence that it is out of date).
|
||||
* This function does'nt appear to be exorted..
|
||||
*
|
||||
* We perform one pass over the journal to allow us to tell the user how
|
||||
* much recovery information is being erased, and to let us initialise
|
||||
* the journal transaction sequence numbers to the next unused ID.
|
||||
*/
|
||||
int journal_skip_recovery(journal_t *journal)
|
||||
{
|
||||
int err;
|
||||
struct recovery_info info;
|
||||
|
||||
memset (&info, 0, sizeof(info));
|
||||
|
||||
err = do_one_pass(journal, &info, PASS_SCAN);
|
||||
|
||||
if (err) {
|
||||
printk(KERN_ERR "JBD: error %d scanning journal\n", err);
|
||||
++journal->j_transaction_sequence;
|
||||
} else {
|
||||
#ifdef CONFIG_JBD_DEBUG
|
||||
int dropped = info.end_transaction -
|
||||
be32_to_cpu(journal->j_superblock->s_sequence);
|
||||
jbd_debug(1,
|
||||
"JBD: ignoring %d transaction%s from the journal.\n",
|
||||
dropped, (dropped == 1) ? "" : "s");
|
||||
#endif
|
||||
journal->j_transaction_sequence = ++info.end_transaction;
|
||||
}
|
||||
|
||||
journal->j_tail = 0;
|
||||
return err;
|
||||
}
|
||||
|
||||
static int do_one_pass(journal_t *journal,
|
||||
struct recovery_info *info, enum passtype pass)
|
||||
{
|
||||
unsigned int first_commit_ID, next_commit_ID;
|
||||
unsigned int next_log_block;
|
||||
int err, success = 0;
|
||||
journal_superblock_t * sb;
|
||||
journal_header_t * tmp;
|
||||
struct buffer_head * bh;
|
||||
unsigned int sequence;
|
||||
int blocktype;
|
||||
|
||||
/*
|
||||
* First thing is to establish what we expect to find in the log
|
||||
* (in terms of transaction IDs), and where (in terms of log
|
||||
* block offsets): query the superblock.
|
||||
*/
|
||||
|
||||
sb = journal->j_superblock;
|
||||
next_commit_ID = be32_to_cpu(sb->s_sequence);
|
||||
next_log_block = be32_to_cpu(sb->s_start);
|
||||
|
||||
first_commit_ID = next_commit_ID;
|
||||
if (pass == PASS_SCAN)
|
||||
info->start_transaction = first_commit_ID;
|
||||
|
||||
jbd_debug(1, "Starting recovery pass %d\n", pass);
|
||||
|
||||
/*
|
||||
* Now we walk through the log, transaction by transaction,
|
||||
* making sure that each transaction has a commit block in the
|
||||
* expected place. Each complete transaction gets replayed back
|
||||
* into the main filesystem.
|
||||
*/
|
||||
|
||||
while (1) {
|
||||
int flags;
|
||||
char * tagp;
|
||||
journal_block_tag_t * tag;
|
||||
struct buffer_head * obh;
|
||||
struct buffer_head * nbh;
|
||||
|
||||
cond_resched();
|
||||
|
||||
/* If we already know where to stop the log traversal,
|
||||
* check right now that we haven't gone past the end of
|
||||
* the log. */
|
||||
|
||||
if (pass != PASS_SCAN)
|
||||
if (tid_geq(next_commit_ID, info->end_transaction))
|
||||
break;
|
||||
|
||||
jbd_debug(2, "Scanning for sequence ID %u at %u/%u\n",
|
||||
next_commit_ID, next_log_block, journal->j_last);
|
||||
|
||||
/* Skip over each chunk of the transaction looking
|
||||
* either the next descriptor block or the final commit
|
||||
* record. */
|
||||
|
||||
jbd_debug(3, "JBD: checking block %u\n", next_log_block);
|
||||
err = jread(&bh, journal, next_log_block);
|
||||
if (err)
|
||||
goto failed;
|
||||
|
||||
next_log_block++;
|
||||
wrap(journal, next_log_block);
|
||||
|
||||
/* What kind of buffer is it?
|
||||
*
|
||||
* If it is a descriptor block, check that it has the
|
||||
* expected sequence number. Otherwise, we're all done
|
||||
* here. */
|
||||
|
||||
tmp = (journal_header_t *)bh->b_data;
|
||||
|
||||
if (tmp->h_magic != cpu_to_be32(JFS_MAGIC_NUMBER)) {
|
||||
brelse(bh);
|
||||
break;
|
||||
}
|
||||
|
||||
blocktype = be32_to_cpu(tmp->h_blocktype);
|
||||
sequence = be32_to_cpu(tmp->h_sequence);
|
||||
jbd_debug(3, "Found magic %d, sequence %d\n",
|
||||
blocktype, sequence);
|
||||
|
||||
if (sequence != next_commit_ID) {
|
||||
brelse(bh);
|
||||
break;
|
||||
}
|
||||
|
||||
/* OK, we have a valid descriptor block which matches
|
||||
* all of the sequence number checks. What are we going
|
||||
* to do with it? That depends on the pass... */
|
||||
|
||||
switch(blocktype) {
|
||||
case JFS_DESCRIPTOR_BLOCK:
|
||||
/* If it is a valid descriptor block, replay it
|
||||
* in pass REPLAY; otherwise, just skip over the
|
||||
* blocks it describes. */
|
||||
if (pass != PASS_REPLAY) {
|
||||
next_log_block +=
|
||||
count_tags(bh, journal->j_blocksize);
|
||||
wrap(journal, next_log_block);
|
||||
brelse(bh);
|
||||
continue;
|
||||
}
|
||||
|
||||
/* A descriptor block: we can now write all of
|
||||
* the data blocks. Yay, useful work is finally
|
||||
* getting done here! */
|
||||
|
||||
tagp = &bh->b_data[sizeof(journal_header_t)];
|
||||
while ((tagp - bh->b_data +sizeof(journal_block_tag_t))
|
||||
<= journal->j_blocksize) {
|
||||
unsigned int io_block;
|
||||
|
||||
tag = (journal_block_tag_t *) tagp;
|
||||
flags = be32_to_cpu(tag->t_flags);
|
||||
|
||||
io_block = next_log_block++;
|
||||
wrap(journal, next_log_block);
|
||||
err = jread(&obh, journal, io_block);
|
||||
if (err) {
|
||||
/* Recover what we can, but
|
||||
* report failure at the end. */
|
||||
success = err;
|
||||
printk (KERN_ERR
|
||||
"JBD: IO error %d recovering "
|
||||
"block %u in log\n",
|
||||
err, io_block);
|
||||
} else {
|
||||
unsigned int blocknr;
|
||||
|
||||
J_ASSERT(obh != NULL);
|
||||
blocknr = be32_to_cpu(tag->t_blocknr);
|
||||
|
||||
/* If the block has been
|
||||
* revoked, then we're all done
|
||||
* here. */
|
||||
if (journal_test_revoke
|
||||
(journal, blocknr,
|
||||
next_commit_ID)) {
|
||||
brelse(obh);
|
||||
++info->nr_revoke_hits;
|
||||
goto skip_write;
|
||||
}
|
||||
|
||||
/* Find a buffer for the new
|
||||
* data being restored */
|
||||
nbh = __getblk(journal->j_fs_dev,
|
||||
blocknr,
|
||||
journal->j_blocksize);
|
||||
if (nbh == NULL) {
|
||||
printk(KERN_ERR
|
||||
"JBD: Out of memory "
|
||||
"during recovery.\n");
|
||||
err = -ENOMEM;
|
||||
brelse(bh);
|
||||
brelse(obh);
|
||||
goto failed;
|
||||
}
|
||||
|
||||
lock_buffer(nbh);
|
||||
memcpy(nbh->b_data, obh->b_data,
|
||||
journal->j_blocksize);
|
||||
if (flags & JFS_FLAG_ESCAPE) {
|
||||
*((__be32 *)nbh->b_data) =
|
||||
cpu_to_be32(JFS_MAGIC_NUMBER);
|
||||
}
|
||||
|
||||
BUFFER_TRACE(nbh, "marking dirty");
|
||||
set_buffer_uptodate(nbh);
|
||||
mark_buffer_dirty(nbh);
|
||||
BUFFER_TRACE(nbh, "marking uptodate");
|
||||
++info->nr_replays;
|
||||
/* ll_rw_block(WRITE, 1, &nbh); */
|
||||
unlock_buffer(nbh);
|
||||
brelse(obh);
|
||||
brelse(nbh);
|
||||
}
|
||||
|
||||
skip_write:
|
||||
tagp += sizeof(journal_block_tag_t);
|
||||
if (!(flags & JFS_FLAG_SAME_UUID))
|
||||
tagp += 16;
|
||||
|
||||
if (flags & JFS_FLAG_LAST_TAG)
|
||||
break;
|
||||
}
|
||||
|
||||
brelse(bh);
|
||||
continue;
|
||||
|
||||
case JFS_COMMIT_BLOCK:
|
||||
/* Found an expected commit block: not much to
|
||||
* do other than move on to the next sequence
|
||||
* number. */
|
||||
brelse(bh);
|
||||
next_commit_ID++;
|
||||
continue;
|
||||
|
||||
case JFS_REVOKE_BLOCK:
|
||||
/* If we aren't in the REVOKE pass, then we can
|
||||
* just skip over this block. */
|
||||
if (pass != PASS_REVOKE) {
|
||||
brelse(bh);
|
||||
continue;
|
||||
}
|
||||
|
||||
err = scan_revoke_records(journal, bh,
|
||||
next_commit_ID, info);
|
||||
brelse(bh);
|
||||
if (err)
|
||||
goto failed;
|
||||
continue;
|
||||
|
||||
default:
|
||||
jbd_debug(3, "Unrecognised magic %d, end of scan.\n",
|
||||
blocktype);
|
||||
brelse(bh);
|
||||
goto done;
|
||||
}
|
||||
}
|
||||
|
||||
done:
|
||||
/*
|
||||
* We broke out of the log scan loop: either we came to the
|
||||
* known end of the log or we found an unexpected block in the
|
||||
* log. If the latter happened, then we know that the "current"
|
||||
* transaction marks the end of the valid log.
|
||||
*/
|
||||
|
||||
if (pass == PASS_SCAN)
|
||||
info->end_transaction = next_commit_ID;
|
||||
else {
|
||||
/* It's really bad news if different passes end up at
|
||||
* different places (but possible due to IO errors). */
|
||||
if (info->end_transaction != next_commit_ID) {
|
||||
printk (KERN_ERR "JBD: recovery pass %d ended at "
|
||||
"transaction %u, expected %u\n",
|
||||
pass, next_commit_ID, info->end_transaction);
|
||||
if (!success)
|
||||
success = -EIO;
|
||||
}
|
||||
}
|
||||
|
||||
return success;
|
||||
|
||||
failed:
|
||||
return err;
|
||||
}
|
||||
|
||||
|
||||
/* Scan a revoke record, marking all blocks mentioned as revoked. */
|
||||
|
||||
static int scan_revoke_records(journal_t *journal, struct buffer_head *bh,
|
||||
tid_t sequence, struct recovery_info *info)
|
||||
{
|
||||
journal_revoke_header_t *header;
|
||||
int offset, max;
|
||||
|
||||
header = (journal_revoke_header_t *) bh->b_data;
|
||||
offset = sizeof(journal_revoke_header_t);
|
||||
max = be32_to_cpu(header->r_count);
|
||||
|
||||
while (offset < max) {
|
||||
unsigned int blocknr;
|
||||
int err;
|
||||
|
||||
blocknr = be32_to_cpu(* ((__be32 *) (bh->b_data+offset)));
|
||||
offset += 4;
|
||||
err = journal_set_revoke(journal, blocknr, sequence);
|
||||
if (err)
|
||||
return err;
|
||||
++info->nr_revokes;
|
||||
}
|
||||
return 0;
|
||||
}
|
733
fs/jbd/revoke.c
733
fs/jbd/revoke.c
|
@ -1,733 +0,0 @@
|
|||
/*
|
||||
* linux/fs/jbd/revoke.c
|
||||
*
|
||||
* Written by Stephen C. Tweedie <sct@redhat.com>, 2000
|
||||
*
|
||||
* Copyright 2000 Red Hat corp --- All Rights Reserved
|
||||
*
|
||||
* This file is part of the Linux kernel and is made available under
|
||||
* the terms of the GNU General Public License, version 2, or at your
|
||||
* option, any later version, incorporated herein by reference.
|
||||
*
|
||||
* Journal revoke routines for the generic filesystem journaling code;
|
||||
* part of the ext2fs journaling system.
|
||||
*
|
||||
* Revoke is the mechanism used to prevent old log records for deleted
|
||||
* metadata from being replayed on top of newer data using the same
|
||||
* blocks. The revoke mechanism is used in two separate places:
|
||||
*
|
||||
* + Commit: during commit we write the entire list of the current
|
||||
* transaction's revoked blocks to the journal
|
||||
*
|
||||
* + Recovery: during recovery we record the transaction ID of all
|
||||
* revoked blocks. If there are multiple revoke records in the log
|
||||
* for a single block, only the last one counts, and if there is a log
|
||||
* entry for a block beyond the last revoke, then that log entry still
|
||||
* gets replayed.
|
||||
*
|
||||
* We can get interactions between revokes and new log data within a
|
||||
* single transaction:
|
||||
*
|
||||
* Block is revoked and then journaled:
|
||||
* The desired end result is the journaling of the new block, so we
|
||||
* cancel the revoke before the transaction commits.
|
||||
*
|
||||
* Block is journaled and then revoked:
|
||||
* The revoke must take precedence over the write of the block, so we
|
||||
* need either to cancel the journal entry or to write the revoke
|
||||
* later in the log than the log block. In this case, we choose the
|
||||
* latter: journaling a block cancels any revoke record for that block
|
||||
* in the current transaction, so any revoke for that block in the
|
||||
* transaction must have happened after the block was journaled and so
|
||||
* the revoke must take precedence.
|
||||
*
|
||||
* Block is revoked and then written as data:
|
||||
* The data write is allowed to succeed, but the revoke is _not_
|
||||
* cancelled. We still need to prevent old log records from
|
||||
* overwriting the new data. We don't even need to clear the revoke
|
||||
* bit here.
|
||||
*
|
||||
* We cache revoke status of a buffer in the current transaction in b_states
|
||||
* bits. As the name says, revokevalid flag indicates that the cached revoke
|
||||
* status of a buffer is valid and we can rely on the cached status.
|
||||
*
|
||||
* Revoke information on buffers is a tri-state value:
|
||||
*
|
||||
* RevokeValid clear: no cached revoke status, need to look it up
|
||||
* RevokeValid set, Revoked clear:
|
||||
* buffer has not been revoked, and cancel_revoke
|
||||
* need do nothing.
|
||||
* RevokeValid set, Revoked set:
|
||||
* buffer has been revoked.
|
||||
*
|
||||
* Locking rules:
|
||||
* We keep two hash tables of revoke records. One hashtable belongs to the
|
||||
* running transaction (is pointed to by journal->j_revoke), the other one
|
||||
* belongs to the committing transaction. Accesses to the second hash table
|
||||
* happen only from the kjournald and no other thread touches this table. Also
|
||||
* journal_switch_revoke_table() which switches which hashtable belongs to the
|
||||
* running and which to the committing transaction is called only from
|
||||
* kjournald. Therefore we need no locks when accessing the hashtable belonging
|
||||
* to the committing transaction.
|
||||
*
|
||||
* All users operating on the hash table belonging to the running transaction
|
||||
* have a handle to the transaction. Therefore they are safe from kjournald
|
||||
* switching hash tables under them. For operations on the lists of entries in
|
||||
* the hash table j_revoke_lock is used.
|
||||
*
|
||||
* Finally, also replay code uses the hash tables but at this moment no one else
|
||||
* can touch them (filesystem isn't mounted yet) and hence no locking is
|
||||
* needed.
|
||||
*/
|
||||
|
||||
#ifndef __KERNEL__
|
||||
#include "jfs_user.h"
|
||||
#else
|
||||
#include <linux/time.h>
|
||||
#include <linux/fs.h>
|
||||
#include <linux/jbd.h>
|
||||
#include <linux/errno.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/list.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/bio.h>
|
||||
#endif
|
||||
#include <linux/log2.h>
|
||||
#include <linux/hash.h>
|
||||
|
||||
static struct kmem_cache *revoke_record_cache;
|
||||
static struct kmem_cache *revoke_table_cache;
|
||||
|
||||
/* Each revoke record represents one single revoked block. During
|
||||
journal replay, this involves recording the transaction ID of the
|
||||
last transaction to revoke this block. */
|
||||
|
||||
struct jbd_revoke_record_s
|
||||
{
|
||||
struct list_head hash;
|
||||
tid_t sequence; /* Used for recovery only */
|
||||
unsigned int blocknr;
|
||||
};
|
||||
|
||||
|
||||
/* The revoke table is just a simple hash table of revoke records. */
|
||||
struct jbd_revoke_table_s
|
||||
{
|
||||
/* It is conceivable that we might want a larger hash table
|
||||
* for recovery. Must be a power of two. */
|
||||
int hash_size;
|
||||
int hash_shift;
|
||||
struct list_head *hash_table;
|
||||
};
|
||||
|
||||
|
||||
#ifdef __KERNEL__
|
||||
static void write_one_revoke_record(journal_t *, transaction_t *,
|
||||
struct journal_head **, int *,
|
||||
struct jbd_revoke_record_s *, int);
|
||||
static void flush_descriptor(journal_t *, struct journal_head *, int, int);
|
||||
#endif
|
||||
|
||||
/* Utility functions to maintain the revoke table */
|
||||
|
||||
static inline int hash(journal_t *journal, unsigned int block)
|
||||
{
|
||||
struct jbd_revoke_table_s *table = journal->j_revoke;
|
||||
|
||||
return hash_32(block, table->hash_shift);
|
||||
}
|
||||
|
||||
static int insert_revoke_hash(journal_t *journal, unsigned int blocknr,
|
||||
tid_t seq)
|
||||
{
|
||||
struct list_head *hash_list;
|
||||
struct jbd_revoke_record_s *record;
|
||||
|
||||
repeat:
|
||||
record = kmem_cache_alloc(revoke_record_cache, GFP_NOFS);
|
||||
if (!record)
|
||||
goto oom;
|
||||
|
||||
record->sequence = seq;
|
||||
record->blocknr = blocknr;
|
||||
hash_list = &journal->j_revoke->hash_table[hash(journal, blocknr)];
|
||||
spin_lock(&journal->j_revoke_lock);
|
||||
list_add(&record->hash, hash_list);
|
||||
spin_unlock(&journal->j_revoke_lock);
|
||||
return 0;
|
||||
|
||||
oom:
|
||||
if (!journal_oom_retry)
|
||||
return -ENOMEM;
|
||||
jbd_debug(1, "ENOMEM in %s, retrying\n", __func__);
|
||||
yield();
|
||||
goto repeat;
|
||||
}
|
||||
|
||||
/* Find a revoke record in the journal's hash table. */
|
||||
|
||||
static struct jbd_revoke_record_s *find_revoke_record(journal_t *journal,
|
||||
unsigned int blocknr)
|
||||
{
|
||||
struct list_head *hash_list;
|
||||
struct jbd_revoke_record_s *record;
|
||||
|
||||
hash_list = &journal->j_revoke->hash_table[hash(journal, blocknr)];
|
||||
|
||||
spin_lock(&journal->j_revoke_lock);
|
||||
record = (struct jbd_revoke_record_s *) hash_list->next;
|
||||
while (&(record->hash) != hash_list) {
|
||||
if (record->blocknr == blocknr) {
|
||||
spin_unlock(&journal->j_revoke_lock);
|
||||
return record;
|
||||
}
|
||||
record = (struct jbd_revoke_record_s *) record->hash.next;
|
||||
}
|
||||
spin_unlock(&journal->j_revoke_lock);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
void journal_destroy_revoke_caches(void)
|
||||
{
|
||||
if (revoke_record_cache) {
|
||||
kmem_cache_destroy(revoke_record_cache);
|
||||
revoke_record_cache = NULL;
|
||||
}
|
||||
if (revoke_table_cache) {
|
||||
kmem_cache_destroy(revoke_table_cache);
|
||||
revoke_table_cache = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
int __init journal_init_revoke_caches(void)
|
||||
{
|
||||
J_ASSERT(!revoke_record_cache);
|
||||
J_ASSERT(!revoke_table_cache);
|
||||
|
||||
revoke_record_cache = kmem_cache_create("revoke_record",
|
||||
sizeof(struct jbd_revoke_record_s),
|
||||
0,
|
||||
SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY,
|
||||
NULL);
|
||||
if (!revoke_record_cache)
|
||||
goto record_cache_failure;
|
||||
|
||||
revoke_table_cache = kmem_cache_create("revoke_table",
|
||||
sizeof(struct jbd_revoke_table_s),
|
||||
0, SLAB_TEMPORARY, NULL);
|
||||
if (!revoke_table_cache)
|
||||
goto table_cache_failure;
|
||||
|
||||
return 0;
|
||||
|
||||
table_cache_failure:
|
||||
journal_destroy_revoke_caches();
|
||||
record_cache_failure:
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
static struct jbd_revoke_table_s *journal_init_revoke_table(int hash_size)
|
||||
{
|
||||
int i;
|
||||
struct jbd_revoke_table_s *table;
|
||||
|
||||
table = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
|
||||
if (!table)
|
||||
goto out;
|
||||
|
||||
table->hash_size = hash_size;
|
||||
table->hash_shift = ilog2(hash_size);
|
||||
table->hash_table =
|
||||
kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
|
||||
if (!table->hash_table) {
|
||||
kmem_cache_free(revoke_table_cache, table);
|
||||
table = NULL;
|
||||
goto out;
|
||||
}
|
||||
|
||||
for (i = 0; i < hash_size; i++)
|
||||
INIT_LIST_HEAD(&table->hash_table[i]);
|
||||
|
||||
out:
|
||||
return table;
|
||||
}
|
||||
|
||||
static void journal_destroy_revoke_table(struct jbd_revoke_table_s *table)
|
||||
{
|
||||
int i;
|
||||
struct list_head *hash_list;
|
||||
|
||||
for (i = 0; i < table->hash_size; i++) {
|
||||
hash_list = &table->hash_table[i];
|
||||
J_ASSERT(list_empty(hash_list));
|
||||
}
|
||||
|
||||
kfree(table->hash_table);
|
||||
kmem_cache_free(revoke_table_cache, table);
|
||||
}
|
||||
|
||||
/* Initialise the revoke table for a given journal to a given size. */
|
||||
int journal_init_revoke(journal_t *journal, int hash_size)
|
||||
{
|
||||
J_ASSERT(journal->j_revoke_table[0] == NULL);
|
||||
J_ASSERT(is_power_of_2(hash_size));
|
||||
|
||||
journal->j_revoke_table[0] = journal_init_revoke_table(hash_size);
|
||||
if (!journal->j_revoke_table[0])
|
||||
goto fail0;
|
||||
|
||||
journal->j_revoke_table[1] = journal_init_revoke_table(hash_size);
|
||||
if (!journal->j_revoke_table[1])
|
||||
goto fail1;
|
||||
|
||||
journal->j_revoke = journal->j_revoke_table[1];
|
||||
|
||||
spin_lock_init(&journal->j_revoke_lock);
|
||||
|
||||
return 0;
|
||||
|
||||
fail1:
|
||||
journal_destroy_revoke_table(journal->j_revoke_table[0]);
|
||||
fail0:
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
/* Destroy a journal's revoke table. The table must already be empty! */
|
||||
void journal_destroy_revoke(journal_t *journal)
|
||||
{
|
||||
journal->j_revoke = NULL;
|
||||
if (journal->j_revoke_table[0])
|
||||
journal_destroy_revoke_table(journal->j_revoke_table[0]);
|
||||
if (journal->j_revoke_table[1])
|
||||
journal_destroy_revoke_table(journal->j_revoke_table[1]);
|
||||
}
|
||||
|
||||
|
||||
#ifdef __KERNEL__
|
||||
|
||||
/*
|
||||
* journal_revoke: revoke a given buffer_head from the journal. This
|
||||
* prevents the block from being replayed during recovery if we take a
|
||||
* crash after this current transaction commits. Any subsequent
|
||||
* metadata writes of the buffer in this transaction cancel the
|
||||
* revoke.
|
||||
*
|
||||
* Note that this call may block --- it is up to the caller to make
|
||||
* sure that there are no further calls to journal_write_metadata
|
||||
* before the revoke is complete. In ext3, this implies calling the
|
||||
* revoke before clearing the block bitmap when we are deleting
|
||||
* metadata.
|
||||
*
|
||||
* Revoke performs a journal_forget on any buffer_head passed in as a
|
||||
* parameter, but does _not_ forget the buffer_head if the bh was only
|
||||
* found implicitly.
|
||||
*
|
||||
* bh_in may not be a journalled buffer - it may have come off
|
||||
* the hash tables without an attached journal_head.
|
||||
*
|
||||
* If bh_in is non-zero, journal_revoke() will decrement its b_count
|
||||
* by one.
|
||||
*/
|
||||
|
||||
int journal_revoke(handle_t *handle, unsigned int blocknr,
|
||||
struct buffer_head *bh_in)
|
||||
{
|
||||
struct buffer_head *bh = NULL;
|
||||
journal_t *journal;
|
||||
struct block_device *bdev;
|
||||
int err;
|
||||
|
||||
might_sleep();
|
||||
if (bh_in)
|
||||
BUFFER_TRACE(bh_in, "enter");
|
||||
|
||||
journal = handle->h_transaction->t_journal;
|
||||
if (!journal_set_features(journal, 0, 0, JFS_FEATURE_INCOMPAT_REVOKE)){
|
||||
J_ASSERT (!"Cannot set revoke feature!");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
bdev = journal->j_fs_dev;
|
||||
bh = bh_in;
|
||||
|
||||
if (!bh) {
|
||||
bh = __find_get_block(bdev, blocknr, journal->j_blocksize);
|
||||
if (bh)
|
||||
BUFFER_TRACE(bh, "found on hash");
|
||||
}
|
||||
#ifdef JBD_EXPENSIVE_CHECKING
|
||||
else {
|
||||
struct buffer_head *bh2;
|
||||
|
||||
/* If there is a different buffer_head lying around in
|
||||
* memory anywhere... */
|
||||
bh2 = __find_get_block(bdev, blocknr, journal->j_blocksize);
|
||||
if (bh2) {
|
||||
/* ... and it has RevokeValid status... */
|
||||
if (bh2 != bh && buffer_revokevalid(bh2))
|
||||
/* ...then it better be revoked too,
|
||||
* since it's illegal to create a revoke
|
||||
* record against a buffer_head which is
|
||||
* not marked revoked --- that would
|
||||
* risk missing a subsequent revoke
|
||||
* cancel. */
|
||||
J_ASSERT_BH(bh2, buffer_revoked(bh2));
|
||||
put_bh(bh2);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
/* We really ought not ever to revoke twice in a row without
|
||||
first having the revoke cancelled: it's illegal to free a
|
||||
block twice without allocating it in between! */
|
||||
if (bh) {
|
||||
if (!J_EXPECT_BH(bh, !buffer_revoked(bh),
|
||||
"inconsistent data on disk")) {
|
||||
if (!bh_in)
|
||||
brelse(bh);
|
||||
return -EIO;
|
||||
}
|
||||
set_buffer_revoked(bh);
|
||||
set_buffer_revokevalid(bh);
|
||||
if (bh_in) {
|
||||
BUFFER_TRACE(bh_in, "call journal_forget");
|
||||
journal_forget(handle, bh_in);
|
||||
} else {
|
||||
BUFFER_TRACE(bh, "call brelse");
|
||||
__brelse(bh);
|
||||
}
|
||||
}
|
||||
|
||||
jbd_debug(2, "insert revoke for block %u, bh_in=%p\n", blocknr, bh_in);
|
||||
err = insert_revoke_hash(journal, blocknr,
|
||||
handle->h_transaction->t_tid);
|
||||
BUFFER_TRACE(bh_in, "exit");
|
||||
return err;
|
||||
}
|
||||
|
||||
/*
|
||||
* Cancel an outstanding revoke. For use only internally by the
|
||||
* journaling code (called from journal_get_write_access).
|
||||
*
|
||||
* We trust buffer_revoked() on the buffer if the buffer is already
|
||||
* being journaled: if there is no revoke pending on the buffer, then we
|
||||
* don't do anything here.
|
||||
*
|
||||
* This would break if it were possible for a buffer to be revoked and
|
||||
* discarded, and then reallocated within the same transaction. In such
|
||||
* a case we would have lost the revoked bit, but when we arrived here
|
||||
* the second time we would still have a pending revoke to cancel. So,
|
||||
* do not trust the Revoked bit on buffers unless RevokeValid is also
|
||||
* set.
|
||||
*/
|
||||
int journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
|
||||
{
|
||||
struct jbd_revoke_record_s *record;
|
||||
journal_t *journal = handle->h_transaction->t_journal;
|
||||
int need_cancel;
|
||||
int did_revoke = 0; /* akpm: debug */
|
||||
struct buffer_head *bh = jh2bh(jh);
|
||||
|
||||
jbd_debug(4, "journal_head %p, cancelling revoke\n", jh);
|
||||
|
||||
/* Is the existing Revoke bit valid? If so, we trust it, and
|
||||
* only perform the full cancel if the revoke bit is set. If
|
||||
* not, we can't trust the revoke bit, and we need to do the
|
||||
* full search for a revoke record. */
|
||||
if (test_set_buffer_revokevalid(bh)) {
|
||||
need_cancel = test_clear_buffer_revoked(bh);
|
||||
} else {
|
||||
need_cancel = 1;
|
||||
clear_buffer_revoked(bh);
|
||||
}
|
||||
|
||||
if (need_cancel) {
|
||||
record = find_revoke_record(journal, bh->b_blocknr);
|
||||
if (record) {
|
||||
jbd_debug(4, "cancelled existing revoke on "
|
||||
"blocknr %llu\n", (unsigned long long)bh->b_blocknr);
|
||||
spin_lock(&journal->j_revoke_lock);
|
||||
list_del(&record->hash);
|
||||
spin_unlock(&journal->j_revoke_lock);
|
||||
kmem_cache_free(revoke_record_cache, record);
|
||||
did_revoke = 1;
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef JBD_EXPENSIVE_CHECKING
|
||||
/* There better not be one left behind by now! */
|
||||
record = find_revoke_record(journal, bh->b_blocknr);
|
||||
J_ASSERT_JH(jh, record == NULL);
|
||||
#endif
|
||||
|
||||
/* Finally, have we just cleared revoke on an unhashed
|
||||
* buffer_head? If so, we'd better make sure we clear the
|
||||
* revoked status on any hashed alias too, otherwise the revoke
|
||||
* state machine will get very upset later on. */
|
||||
if (need_cancel) {
|
||||
struct buffer_head *bh2;
|
||||
bh2 = __find_get_block(bh->b_bdev, bh->b_blocknr, bh->b_size);
|
||||
if (bh2) {
|
||||
if (bh2 != bh)
|
||||
clear_buffer_revoked(bh2);
|
||||
__brelse(bh2);
|
||||
}
|
||||
}
|
||||
return did_revoke;
|
||||
}
|
||||
|
||||
/*
|
||||
* journal_clear_revoked_flags clears revoked flag of buffers in
|
||||
* revoke table to reflect there is no revoked buffer in the next
|
||||
* transaction which is going to be started.
|
||||
*/
|
||||
void journal_clear_buffer_revoked_flags(journal_t *journal)
|
||||
{
|
||||
struct jbd_revoke_table_s *revoke = journal->j_revoke;
|
||||
int i = 0;
|
||||
|
||||
for (i = 0; i < revoke->hash_size; i++) {
|
||||
struct list_head *hash_list;
|
||||
struct list_head *list_entry;
|
||||
hash_list = &revoke->hash_table[i];
|
||||
|
||||
list_for_each(list_entry, hash_list) {
|
||||
struct jbd_revoke_record_s *record;
|
||||
struct buffer_head *bh;
|
||||
record = (struct jbd_revoke_record_s *)list_entry;
|
||||
bh = __find_get_block(journal->j_fs_dev,
|
||||
record->blocknr,
|
||||
journal->j_blocksize);
|
||||
if (bh) {
|
||||
clear_buffer_revoked(bh);
|
||||
__brelse(bh);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* journal_switch_revoke table select j_revoke for next transaction
|
||||
* we do not want to suspend any processing until all revokes are
|
||||
* written -bzzz
|
||||
*/
|
||||
void journal_switch_revoke_table(journal_t *journal)
|
||||
{
|
||||
int i;
|
||||
|
||||
if (journal->j_revoke == journal->j_revoke_table[0])
|
||||
journal->j_revoke = journal->j_revoke_table[1];
|
||||
else
|
||||
journal->j_revoke = journal->j_revoke_table[0];
|
||||
|
||||
for (i = 0; i < journal->j_revoke->hash_size; i++)
|
||||
INIT_LIST_HEAD(&journal->j_revoke->hash_table[i]);
|
||||
}
|
||||
|
||||
/*
|
||||
* Write revoke records to the journal for all entries in the current
|
||||
* revoke hash, deleting the entries as we go.
|
||||
*/
|
||||
void journal_write_revoke_records(journal_t *journal,
|
||||
transaction_t *transaction, int write_op)
|
||||
{
|
||||
struct journal_head *descriptor;
|
||||
struct jbd_revoke_record_s *record;
|
||||
struct jbd_revoke_table_s *revoke;
|
||||
struct list_head *hash_list;
|
||||
int i, offset, count;
|
||||
|
||||
descriptor = NULL;
|
||||
offset = 0;
|
||||
count = 0;
|
||||
|
||||
/* select revoke table for committing transaction */
|
||||
revoke = journal->j_revoke == journal->j_revoke_table[0] ?
|
||||
journal->j_revoke_table[1] : journal->j_revoke_table[0];
|
||||
|
||||
for (i = 0; i < revoke->hash_size; i++) {
|
||||
hash_list = &revoke->hash_table[i];
|
||||
|
||||
while (!list_empty(hash_list)) {
|
||||
record = (struct jbd_revoke_record_s *)
|
||||
hash_list->next;
|
||||
write_one_revoke_record(journal, transaction,
|
||||
&descriptor, &offset,
|
||||
record, write_op);
|
||||
count++;
|
||||
list_del(&record->hash);
|
||||
kmem_cache_free(revoke_record_cache, record);
|
||||
}
|
||||
}
|
||||
if (descriptor)
|
||||
flush_descriptor(journal, descriptor, offset, write_op);
|
||||
jbd_debug(1, "Wrote %d revoke records\n", count);
|
||||
}
|
||||
|
||||
/*
|
||||
* Write out one revoke record. We need to create a new descriptor
|
||||
* block if the old one is full or if we have not already created one.
|
||||
*/
|
||||
|
||||
static void write_one_revoke_record(journal_t *journal,
|
||||
transaction_t *transaction,
|
||||
struct journal_head **descriptorp,
|
||||
int *offsetp,
|
||||
struct jbd_revoke_record_s *record,
|
||||
int write_op)
|
||||
{
|
||||
struct journal_head *descriptor;
|
||||
int offset;
|
||||
journal_header_t *header;
|
||||
|
||||
/* If we are already aborting, this all becomes a noop. We
|
||||
still need to go round the loop in
|
||||
journal_write_revoke_records in order to free all of the
|
||||
revoke records: only the IO to the journal is omitted. */
|
||||
if (is_journal_aborted(journal))
|
||||
return;
|
||||
|
||||
descriptor = *descriptorp;
|
||||
offset = *offsetp;
|
||||
|
||||
/* Make sure we have a descriptor with space left for the record */
|
||||
if (descriptor) {
|
||||
if (offset == journal->j_blocksize) {
|
||||
flush_descriptor(journal, descriptor, offset, write_op);
|
||||
descriptor = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
if (!descriptor) {
|
||||
descriptor = journal_get_descriptor_buffer(journal);
|
||||
if (!descriptor)
|
||||
return;
|
||||
header = (journal_header_t *) &jh2bh(descriptor)->b_data[0];
|
||||
header->h_magic = cpu_to_be32(JFS_MAGIC_NUMBER);
|
||||
header->h_blocktype = cpu_to_be32(JFS_REVOKE_BLOCK);
|
||||
header->h_sequence = cpu_to_be32(transaction->t_tid);
|
||||
|
||||
/* Record it so that we can wait for IO completion later */
|
||||
JBUFFER_TRACE(descriptor, "file as BJ_LogCtl");
|
||||
journal_file_buffer(descriptor, transaction, BJ_LogCtl);
|
||||
|
||||
offset = sizeof(journal_revoke_header_t);
|
||||
*descriptorp = descriptor;
|
||||
}
|
||||
|
||||
* ((__be32 *)(&jh2bh(descriptor)->b_data[offset])) =
|
||||
cpu_to_be32(record->blocknr);
|
||||
offset += 4;
|
||||
*offsetp = offset;
|
||||
}
|
||||
|
||||
/*
|
||||
* Flush a revoke descriptor out to the journal. If we are aborting,
|
||||
* this is a noop; otherwise we are generating a buffer which needs to
|
||||
* be waited for during commit, so it has to go onto the appropriate
|
||||
* journal buffer list.
|
||||
*/
|
||||
|
||||
static void flush_descriptor(journal_t *journal,
|
||||
struct journal_head *descriptor,
|
||||
int offset, int write_op)
|
||||
{
|
||||
journal_revoke_header_t *header;
|
||||
struct buffer_head *bh = jh2bh(descriptor);
|
||||
|
||||
if (is_journal_aborted(journal)) {
|
||||
put_bh(bh);
|
||||
return;
|
||||
}
|
||||
|
||||
header = (journal_revoke_header_t *) jh2bh(descriptor)->b_data;
|
||||
header->r_count = cpu_to_be32(offset);
|
||||
set_buffer_jwrite(bh);
|
||||
BUFFER_TRACE(bh, "write");
|
||||
set_buffer_dirty(bh);
|
||||
write_dirty_buffer(bh, write_op);
|
||||
}
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Revoke support for recovery.
|
||||
*
|
||||
* Recovery needs to be able to:
|
||||
*
|
||||
* record all revoke records, including the tid of the latest instance
|
||||
* of each revoke in the journal
|
||||
*
|
||||
* check whether a given block in a given transaction should be replayed
|
||||
* (ie. has not been revoked by a revoke record in that or a subsequent
|
||||
* transaction)
|
||||
*
|
||||
* empty the revoke table after recovery.
|
||||
*/
|
||||
|
||||
/*
|
||||
* First, setting revoke records. We create a new revoke record for
|
||||
* every block ever revoked in the log as we scan it for recovery, and
|
||||
* we update the existing records if we find multiple revokes for a
|
||||
* single block.
|
||||
*/
|
||||
|
||||
int journal_set_revoke(journal_t *journal,
|
||||
unsigned int blocknr,
|
||||
tid_t sequence)
|
||||
{
|
||||
struct jbd_revoke_record_s *record;
|
||||
|
||||
record = find_revoke_record(journal, blocknr);
|
||||
if (record) {
|
||||
/* If we have multiple occurrences, only record the
|
||||
* latest sequence number in the hashed record */
|
||||
if (tid_gt(sequence, record->sequence))
|
||||
record->sequence = sequence;
|
||||
return 0;
|
||||
}
|
||||
return insert_revoke_hash(journal, blocknr, sequence);
|
||||
}
|
||||
|
||||
/*
|
||||
* Test revoke records. For a given block referenced in the log, has
|
||||
* that block been revoked? A revoke record with a given transaction
|
||||
* sequence number revokes all blocks in that transaction and earlier
|
||||
* ones, but later transactions still need replayed.
|
||||
*/
|
||||
|
||||
int journal_test_revoke(journal_t *journal,
|
||||
unsigned int blocknr,
|
||||
tid_t sequence)
|
||||
{
|
||||
struct jbd_revoke_record_s *record;
|
||||
|
||||
record = find_revoke_record(journal, blocknr);
|
||||
if (!record)
|
||||
return 0;
|
||||
if (tid_gt(sequence, record->sequence))
|
||||
return 0;
|
||||
return 1;
|
||||
}
|
||||
|
||||
/*
|
||||
* Finally, once recovery is over, we need to clear the revoke table so
|
||||
* that it can be reused by the running filesystem.
|
||||
*/
|
||||
|
||||
void journal_clear_revoke(journal_t *journal)
|
||||
{
|
||||
int i;
|
||||
struct list_head *hash_list;
|
||||
struct jbd_revoke_record_s *record;
|
||||
struct jbd_revoke_table_s *revoke;
|
||||
|
||||
revoke = journal->j_revoke;
|
||||
|
||||
for (i = 0; i < revoke->hash_size; i++) {
|
||||
hash_list = &revoke->hash_table[i];
|
||||
while (!list_empty(hash_list)) {
|
||||
record = (struct jbd_revoke_record_s*) hash_list->next;
|
||||
list_del(&record->hash);
|
||||
kmem_cache_free(revoke_record_cache, record);
|
||||
}
|
||||
}
|
||||
}
|
2237
fs/jbd/transaction.c
2237
fs/jbd/transaction.c
File diff suppressed because it is too large
Load Diff
1047
include/linux/jbd.h
1047
include/linux/jbd.h
File diff suppressed because it is too large
Load Diff
|
@ -29,6 +29,7 @@
|
|||
#include <linux/mutex.h>
|
||||
#include <linux/timer.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/bit_spinlock.h>
|
||||
#include <crypto/hash.h>
|
||||
#endif
|
||||
|
||||
|
@ -336,7 +337,45 @@ BUFFER_FNS(Freed, freed)
|
|||
BUFFER_FNS(Shadow, shadow)
|
||||
BUFFER_FNS(Verified, verified)
|
||||
|
||||
#include <linux/jbd_common.h>
|
||||
static inline struct buffer_head *jh2bh(struct journal_head *jh)
|
||||
{
|
||||
return jh->b_bh;
|
||||
}
|
||||
|
||||
static inline struct journal_head *bh2jh(struct buffer_head *bh)
|
||||
{
|
||||
return bh->b_private;
|
||||
}
|
||||
|
||||
static inline void jbd_lock_bh_state(struct buffer_head *bh)
|
||||
{
|
||||
bit_spin_lock(BH_State, &bh->b_state);
|
||||
}
|
||||
|
||||
static inline int jbd_trylock_bh_state(struct buffer_head *bh)
|
||||
{
|
||||
return bit_spin_trylock(BH_State, &bh->b_state);
|
||||
}
|
||||
|
||||
static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
|
||||
{
|
||||
return bit_spin_is_locked(BH_State, &bh->b_state);
|
||||
}
|
||||
|
||||
static inline void jbd_unlock_bh_state(struct buffer_head *bh)
|
||||
{
|
||||
bit_spin_unlock(BH_State, &bh->b_state);
|
||||
}
|
||||
|
||||
static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
|
||||
{
|
||||
bit_spin_lock(BH_JournalHead, &bh->b_state);
|
||||
}
|
||||
|
||||
static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
|
||||
{
|
||||
bit_spin_unlock(BH_JournalHead, &bh->b_state);
|
||||
}
|
||||
|
||||
#define J_ASSERT(assert) BUG_ON(!(assert))
|
||||
|
||||
|
|
|
@ -1,46 +0,0 @@
|
|||
#ifndef _LINUX_JBD_STATE_H
|
||||
#define _LINUX_JBD_STATE_H
|
||||
|
||||
#include <linux/bit_spinlock.h>
|
||||
|
||||
static inline struct buffer_head *jh2bh(struct journal_head *jh)
|
||||
{
|
||||
return jh->b_bh;
|
||||
}
|
||||
|
||||
static inline struct journal_head *bh2jh(struct buffer_head *bh)
|
||||
{
|
||||
return bh->b_private;
|
||||
}
|
||||
|
||||
static inline void jbd_lock_bh_state(struct buffer_head *bh)
|
||||
{
|
||||
bit_spin_lock(BH_State, &bh->b_state);
|
||||
}
|
||||
|
||||
static inline int jbd_trylock_bh_state(struct buffer_head *bh)
|
||||
{
|
||||
return bit_spin_trylock(BH_State, &bh->b_state);
|
||||
}
|
||||
|
||||
static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
|
||||
{
|
||||
return bit_spin_is_locked(BH_State, &bh->b_state);
|
||||
}
|
||||
|
||||
static inline void jbd_unlock_bh_state(struct buffer_head *bh)
|
||||
{
|
||||
bit_spin_unlock(BH_State, &bh->b_state);
|
||||
}
|
||||
|
||||
static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
|
||||
{
|
||||
bit_spin_lock(BH_JournalHead, &bh->b_state);
|
||||
}
|
||||
|
||||
static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
|
||||
{
|
||||
bit_spin_unlock(BH_JournalHead, &bh->b_state);
|
||||
}
|
||||
|
||||
#endif
|
|
@ -1,866 +0,0 @@
|
|||
#undef TRACE_SYSTEM
|
||||
#define TRACE_SYSTEM ext3
|
||||
|
||||
#if !defined(_TRACE_EXT3_H) || defined(TRACE_HEADER_MULTI_READ)
|
||||
#define _TRACE_EXT3_H
|
||||
|
||||
#include <linux/tracepoint.h>
|
||||
|
||||
TRACE_EVENT(ext3_free_inode,
|
||||
TP_PROTO(struct inode *inode),
|
||||
|
||||
TP_ARGS(inode),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( umode_t, mode )
|
||||
__field( uid_t, uid )
|
||||
__field( gid_t, gid )
|
||||
__field( blkcnt_t, blocks )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->mode = inode->i_mode;
|
||||
__entry->uid = i_uid_read(inode);
|
||||
__entry->gid = i_gid_read(inode);
|
||||
__entry->blocks = inode->i_blocks;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu mode 0%o uid %u gid %u blocks %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
__entry->mode, __entry->uid, __entry->gid,
|
||||
(unsigned long) __entry->blocks)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_request_inode,
|
||||
TP_PROTO(struct inode *dir, int mode),
|
||||
|
||||
TP_ARGS(dir, mode),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, dir )
|
||||
__field( umode_t, mode )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = dir->i_sb->s_dev;
|
||||
__entry->dir = dir->i_ino;
|
||||
__entry->mode = mode;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d dir %lu mode 0%o",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->dir, __entry->mode)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_allocate_inode,
|
||||
TP_PROTO(struct inode *inode, struct inode *dir, int mode),
|
||||
|
||||
TP_ARGS(inode, dir, mode),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( ino_t, dir )
|
||||
__field( umode_t, mode )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->dir = dir->i_ino;
|
||||
__entry->mode = mode;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu dir %lu mode 0%o",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
(unsigned long) __entry->dir, __entry->mode)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_evict_inode,
|
||||
TP_PROTO(struct inode *inode),
|
||||
|
||||
TP_ARGS(inode),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( int, nlink )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->nlink = inode->i_nlink;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu nlink %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino, __entry->nlink)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_drop_inode,
|
||||
TP_PROTO(struct inode *inode, int drop),
|
||||
|
||||
TP_ARGS(inode, drop),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( int, drop )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->drop = drop;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu drop %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino, __entry->drop)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_mark_inode_dirty,
|
||||
TP_PROTO(struct inode *inode, unsigned long IP),
|
||||
|
||||
TP_ARGS(inode, IP),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field(unsigned long, ip )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->ip = IP;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu caller %pS",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino, (void *)__entry->ip)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_write_begin,
|
||||
TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
|
||||
unsigned int flags),
|
||||
|
||||
TP_ARGS(inode, pos, len, flags),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( loff_t, pos )
|
||||
__field( unsigned int, len )
|
||||
__field( unsigned int, flags )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->pos = pos;
|
||||
__entry->len = len;
|
||||
__entry->flags = flags;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu pos %llu len %u flags %u",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
(unsigned long long) __entry->pos, __entry->len,
|
||||
__entry->flags)
|
||||
);
|
||||
|
||||
DECLARE_EVENT_CLASS(ext3__write_end,
|
||||
TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
|
||||
unsigned int copied),
|
||||
|
||||
TP_ARGS(inode, pos, len, copied),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( loff_t, pos )
|
||||
__field( unsigned int, len )
|
||||
__field( unsigned int, copied )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->pos = pos;
|
||||
__entry->len = len;
|
||||
__entry->copied = copied;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu pos %llu len %u copied %u",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
(unsigned long long) __entry->pos, __entry->len,
|
||||
__entry->copied)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(ext3__write_end, ext3_ordered_write_end,
|
||||
|
||||
TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
|
||||
unsigned int copied),
|
||||
|
||||
TP_ARGS(inode, pos, len, copied)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(ext3__write_end, ext3_writeback_write_end,
|
||||
|
||||
TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
|
||||
unsigned int copied),
|
||||
|
||||
TP_ARGS(inode, pos, len, copied)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(ext3__write_end, ext3_journalled_write_end,
|
||||
|
||||
TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
|
||||
unsigned int copied),
|
||||
|
||||
TP_ARGS(inode, pos, len, copied)
|
||||
);
|
||||
|
||||
DECLARE_EVENT_CLASS(ext3__page_op,
|
||||
TP_PROTO(struct page *page),
|
||||
|
||||
TP_ARGS(page),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( pgoff_t, index )
|
||||
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->index = page->index;
|
||||
__entry->ino = page->mapping->host->i_ino;
|
||||
__entry->dev = page->mapping->host->i_sb->s_dev;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu page_index %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino, __entry->index)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(ext3__page_op, ext3_ordered_writepage,
|
||||
|
||||
TP_PROTO(struct page *page),
|
||||
|
||||
TP_ARGS(page)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(ext3__page_op, ext3_writeback_writepage,
|
||||
|
||||
TP_PROTO(struct page *page),
|
||||
|
||||
TP_ARGS(page)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(ext3__page_op, ext3_journalled_writepage,
|
||||
|
||||
TP_PROTO(struct page *page),
|
||||
|
||||
TP_ARGS(page)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(ext3__page_op, ext3_readpage,
|
||||
|
||||
TP_PROTO(struct page *page),
|
||||
|
||||
TP_ARGS(page)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(ext3__page_op, ext3_releasepage,
|
||||
|
||||
TP_PROTO(struct page *page),
|
||||
|
||||
TP_ARGS(page)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_invalidatepage,
|
||||
TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
|
||||
|
||||
TP_ARGS(page, offset, length),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( pgoff_t, index )
|
||||
__field( unsigned int, offset )
|
||||
__field( unsigned int, length )
|
||||
__field( ino_t, ino )
|
||||
__field( dev_t, dev )
|
||||
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->index = page->index;
|
||||
__entry->offset = offset;
|
||||
__entry->length = length;
|
||||
__entry->ino = page->mapping->host->i_ino;
|
||||
__entry->dev = page->mapping->host->i_sb->s_dev;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
__entry->index, __entry->offset, __entry->length)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_discard_blocks,
|
||||
TP_PROTO(struct super_block *sb, unsigned long blk,
|
||||
unsigned long count),
|
||||
|
||||
TP_ARGS(sb, blk, count),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( unsigned long, blk )
|
||||
__field( unsigned long, count )
|
||||
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = sb->s_dev;
|
||||
__entry->blk = blk;
|
||||
__entry->count = count;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d blk %lu count %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->blk, __entry->count)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_request_blocks,
|
||||
TP_PROTO(struct inode *inode, unsigned long goal,
|
||||
unsigned long count),
|
||||
|
||||
TP_ARGS(inode, goal, count),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( unsigned long, count )
|
||||
__field( unsigned long, goal )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->count = count;
|
||||
__entry->goal = goal;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu count %lu goal %lu ",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
__entry->count, __entry->goal)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_allocate_blocks,
|
||||
TP_PROTO(struct inode *inode, unsigned long goal,
|
||||
unsigned long count, unsigned long block),
|
||||
|
||||
TP_ARGS(inode, goal, count, block),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( unsigned long, block )
|
||||
__field( unsigned long, count )
|
||||
__field( unsigned long, goal )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->block = block;
|
||||
__entry->count = count;
|
||||
__entry->goal = goal;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu count %lu block %lu goal %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
__entry->count, __entry->block,
|
||||
__entry->goal)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_free_blocks,
|
||||
TP_PROTO(struct inode *inode, unsigned long block,
|
||||
unsigned long count),
|
||||
|
||||
TP_ARGS(inode, block, count),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( umode_t, mode )
|
||||
__field( unsigned long, block )
|
||||
__field( unsigned long, count )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->mode = inode->i_mode;
|
||||
__entry->block = block;
|
||||
__entry->count = count;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu mode 0%o block %lu count %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
__entry->mode, __entry->block, __entry->count)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_sync_file_enter,
|
||||
TP_PROTO(struct file *file, int datasync),
|
||||
|
||||
TP_ARGS(file, datasync),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( ino_t, parent )
|
||||
__field( int, datasync )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
struct dentry *dentry = file->f_path.dentry;
|
||||
|
||||
__entry->dev = d_inode(dentry)->i_sb->s_dev;
|
||||
__entry->ino = d_inode(dentry)->i_ino;
|
||||
__entry->datasync = datasync;
|
||||
__entry->parent = d_inode(dentry->d_parent)->i_ino;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu parent %ld datasync %d ",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
(unsigned long) __entry->parent, __entry->datasync)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_sync_file_exit,
|
||||
TP_PROTO(struct inode *inode, int ret),
|
||||
|
||||
TP_ARGS(inode, ret),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( int, ret )
|
||||
__field( ino_t, ino )
|
||||
__field( dev_t, dev )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->ret = ret;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu ret %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
__entry->ret)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_sync_fs,
|
||||
TP_PROTO(struct super_block *sb, int wait),
|
||||
|
||||
TP_ARGS(sb, wait),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( int, wait )
|
||||
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = sb->s_dev;
|
||||
__entry->wait = wait;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d wait %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->wait)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_rsv_window_add,
|
||||
TP_PROTO(struct super_block *sb,
|
||||
struct ext3_reserve_window_node *rsv_node),
|
||||
|
||||
TP_ARGS(sb, rsv_node),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( unsigned long, start )
|
||||
__field( unsigned long, end )
|
||||
__field( dev_t, dev )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = sb->s_dev;
|
||||
__entry->start = rsv_node->rsv_window._rsv_start;
|
||||
__entry->end = rsv_node->rsv_window._rsv_end;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d start %lu end %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->start, __entry->end)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_discard_reservation,
|
||||
TP_PROTO(struct inode *inode,
|
||||
struct ext3_reserve_window_node *rsv_node),
|
||||
|
||||
TP_ARGS(inode, rsv_node),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( unsigned long, start )
|
||||
__field( unsigned long, end )
|
||||
__field( ino_t, ino )
|
||||
__field( dev_t, dev )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->start = rsv_node->rsv_window._rsv_start;
|
||||
__entry->end = rsv_node->rsv_window._rsv_end;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu start %lu end %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long)__entry->ino, __entry->start,
|
||||
__entry->end)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_alloc_new_reservation,
|
||||
TP_PROTO(struct super_block *sb, unsigned long goal),
|
||||
|
||||
TP_ARGS(sb, goal),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( unsigned long, goal )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = sb->s_dev;
|
||||
__entry->goal = goal;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d goal %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->goal)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_reserved,
|
||||
TP_PROTO(struct super_block *sb, unsigned long block,
|
||||
struct ext3_reserve_window_node *rsv_node),
|
||||
|
||||
TP_ARGS(sb, block, rsv_node),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( unsigned long, block )
|
||||
__field( unsigned long, start )
|
||||
__field( unsigned long, end )
|
||||
__field( dev_t, dev )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->block = block;
|
||||
__entry->start = rsv_node->rsv_window._rsv_start;
|
||||
__entry->end = rsv_node->rsv_window._rsv_end;
|
||||
__entry->dev = sb->s_dev;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d block %lu, start %lu end %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->block, __entry->start, __entry->end)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_forget,
|
||||
TP_PROTO(struct inode *inode, int is_metadata, unsigned long block),
|
||||
|
||||
TP_ARGS(inode, is_metadata, block),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( ino_t, ino )
|
||||
__field( umode_t, mode )
|
||||
__field( int, is_metadata )
|
||||
__field( unsigned long, block )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->mode = inode->i_mode;
|
||||
__entry->is_metadata = is_metadata;
|
||||
__entry->block = block;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu mode 0%o is_metadata %d block %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
__entry->mode, __entry->is_metadata, __entry->block)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_read_block_bitmap,
|
||||
TP_PROTO(struct super_block *sb, unsigned int group),
|
||||
|
||||
TP_ARGS(sb, group),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( __u32, group )
|
||||
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = sb->s_dev;
|
||||
__entry->group = group;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d group %u",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->group)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_direct_IO_enter,
|
||||
TP_PROTO(struct inode *inode, loff_t offset, unsigned long len, int rw),
|
||||
|
||||
TP_ARGS(inode, offset, len, rw),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( ino_t, ino )
|
||||
__field( dev_t, dev )
|
||||
__field( loff_t, pos )
|
||||
__field( unsigned long, len )
|
||||
__field( int, rw )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->pos = offset;
|
||||
__entry->len = len;
|
||||
__entry->rw = rw;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu pos %llu len %lu rw %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
(unsigned long long) __entry->pos, __entry->len,
|
||||
__entry->rw)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_direct_IO_exit,
|
||||
TP_PROTO(struct inode *inode, loff_t offset, unsigned long len,
|
||||
int rw, int ret),
|
||||
|
||||
TP_ARGS(inode, offset, len, rw, ret),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( ino_t, ino )
|
||||
__field( dev_t, dev )
|
||||
__field( loff_t, pos )
|
||||
__field( unsigned long, len )
|
||||
__field( int, rw )
|
||||
__field( int, ret )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->pos = offset;
|
||||
__entry->len = len;
|
||||
__entry->rw = rw;
|
||||
__entry->ret = ret;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu pos %llu len %lu rw %d ret %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
(unsigned long long) __entry->pos, __entry->len,
|
||||
__entry->rw, __entry->ret)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_unlink_enter,
|
||||
TP_PROTO(struct inode *parent, struct dentry *dentry),
|
||||
|
||||
TP_ARGS(parent, dentry),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( ino_t, parent )
|
||||
__field( ino_t, ino )
|
||||
__field( loff_t, size )
|
||||
__field( dev_t, dev )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->parent = parent->i_ino;
|
||||
__entry->ino = d_inode(dentry)->i_ino;
|
||||
__entry->size = d_inode(dentry)->i_size;
|
||||
__entry->dev = d_inode(dentry)->i_sb->s_dev;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu size %lld parent %ld",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
(unsigned long long)__entry->size,
|
||||
(unsigned long) __entry->parent)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_unlink_exit,
|
||||
TP_PROTO(struct dentry *dentry, int ret),
|
||||
|
||||
TP_ARGS(dentry, ret),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( ino_t, ino )
|
||||
__field( dev_t, dev )
|
||||
__field( int, ret )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->ino = d_inode(dentry)->i_ino;
|
||||
__entry->dev = d_inode(dentry)->i_sb->s_dev;
|
||||
__entry->ret = ret;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu ret %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
__entry->ret)
|
||||
);
|
||||
|
||||
DECLARE_EVENT_CLASS(ext3__truncate,
|
||||
TP_PROTO(struct inode *inode),
|
||||
|
||||
TP_ARGS(inode),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( ino_t, ino )
|
||||
__field( dev_t, dev )
|
||||
__field( blkcnt_t, blocks )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->blocks = inode->i_blocks;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu blocks %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino, (unsigned long) __entry->blocks)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(ext3__truncate, ext3_truncate_enter,
|
||||
|
||||
TP_PROTO(struct inode *inode),
|
||||
|
||||
TP_ARGS(inode)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(ext3__truncate, ext3_truncate_exit,
|
||||
|
||||
TP_PROTO(struct inode *inode),
|
||||
|
||||
TP_ARGS(inode)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_get_blocks_enter,
|
||||
TP_PROTO(struct inode *inode, unsigned long lblk,
|
||||
unsigned long len, int create),
|
||||
|
||||
TP_ARGS(inode, lblk, len, create),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( ino_t, ino )
|
||||
__field( dev_t, dev )
|
||||
__field( unsigned long, lblk )
|
||||
__field( unsigned long, len )
|
||||
__field( int, create )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->lblk = lblk;
|
||||
__entry->len = len;
|
||||
__entry->create = create;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu lblk %lu len %lu create %u",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
__entry->lblk, __entry->len, __entry->create)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_get_blocks_exit,
|
||||
TP_PROTO(struct inode *inode, unsigned long lblk,
|
||||
unsigned long pblk, unsigned long len, int ret),
|
||||
|
||||
TP_ARGS(inode, lblk, pblk, len, ret),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( ino_t, ino )
|
||||
__field( dev_t, dev )
|
||||
__field( unsigned long, lblk )
|
||||
__field( unsigned long, pblk )
|
||||
__field( unsigned long, len )
|
||||
__field( int, ret )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
__entry->lblk = lblk;
|
||||
__entry->pblk = pblk;
|
||||
__entry->len = len;
|
||||
__entry->ret = ret;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu lblk %lu pblk %lu len %lu ret %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino,
|
||||
__entry->lblk, __entry->pblk,
|
||||
__entry->len, __entry->ret)
|
||||
);
|
||||
|
||||
TRACE_EVENT(ext3_load_inode,
|
||||
TP_PROTO(struct inode *inode),
|
||||
|
||||
TP_ARGS(inode),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( ino_t, ino )
|
||||
__field( dev_t, dev )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->ino = inode->i_ino;
|
||||
__entry->dev = inode->i_sb->s_dev;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d ino %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
(unsigned long) __entry->ino)
|
||||
);
|
||||
|
||||
#endif /* _TRACE_EXT3_H */
|
||||
|
||||
/* This part must be outside protection */
|
||||
#include <trace/define_trace.h>
|
|
@ -1,194 +0,0 @@
|
|||
#undef TRACE_SYSTEM
|
||||
#define TRACE_SYSTEM jbd
|
||||
|
||||
#if !defined(_TRACE_JBD_H) || defined(TRACE_HEADER_MULTI_READ)
|
||||
#define _TRACE_JBD_H
|
||||
|
||||
#include <linux/jbd.h>
|
||||
#include <linux/tracepoint.h>
|
||||
|
||||
TRACE_EVENT(jbd_checkpoint,
|
||||
|
||||
TP_PROTO(journal_t *journal, int result),
|
||||
|
||||
TP_ARGS(journal, result),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( int, result )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = journal->j_fs_dev->bd_dev;
|
||||
__entry->result = result;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d result %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->result)
|
||||
);
|
||||
|
||||
DECLARE_EVENT_CLASS(jbd_commit,
|
||||
|
||||
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
|
||||
|
||||
TP_ARGS(journal, commit_transaction),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( int, transaction )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = journal->j_fs_dev->bd_dev;
|
||||
__entry->transaction = commit_transaction->t_tid;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d transaction %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->transaction)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(jbd_commit, jbd_start_commit,
|
||||
|
||||
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
|
||||
|
||||
TP_ARGS(journal, commit_transaction)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(jbd_commit, jbd_commit_locking,
|
||||
|
||||
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
|
||||
|
||||
TP_ARGS(journal, commit_transaction)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(jbd_commit, jbd_commit_flushing,
|
||||
|
||||
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
|
||||
|
||||
TP_ARGS(journal, commit_transaction)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(jbd_commit, jbd_commit_logging,
|
||||
|
||||
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
|
||||
|
||||
TP_ARGS(journal, commit_transaction)
|
||||
);
|
||||
|
||||
TRACE_EVENT(jbd_drop_transaction,
|
||||
|
||||
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
|
||||
|
||||
TP_ARGS(journal, commit_transaction),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( int, transaction )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = journal->j_fs_dev->bd_dev;
|
||||
__entry->transaction = commit_transaction->t_tid;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d transaction %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->transaction)
|
||||
);
|
||||
|
||||
TRACE_EVENT(jbd_end_commit,
|
||||
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
|
||||
|
||||
TP_ARGS(journal, commit_transaction),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( int, transaction )
|
||||
__field( int, head )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = journal->j_fs_dev->bd_dev;
|
||||
__entry->transaction = commit_transaction->t_tid;
|
||||
__entry->head = journal->j_tail_sequence;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d transaction %d head %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->transaction, __entry->head)
|
||||
);
|
||||
|
||||
TRACE_EVENT(jbd_do_submit_data,
|
||||
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
|
||||
|
||||
TP_ARGS(journal, commit_transaction),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( int, transaction )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = journal->j_fs_dev->bd_dev;
|
||||
__entry->transaction = commit_transaction->t_tid;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d transaction %d",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->transaction)
|
||||
);
|
||||
|
||||
TRACE_EVENT(jbd_cleanup_journal_tail,
|
||||
|
||||
TP_PROTO(journal_t *journal, tid_t first_tid,
|
||||
unsigned long block_nr, unsigned long freed),
|
||||
|
||||
TP_ARGS(journal, first_tid, block_nr, freed),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( tid_t, tail_sequence )
|
||||
__field( tid_t, first_tid )
|
||||
__field(unsigned long, block_nr )
|
||||
__field(unsigned long, freed )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = journal->j_fs_dev->bd_dev;
|
||||
__entry->tail_sequence = journal->j_tail_sequence;
|
||||
__entry->first_tid = first_tid;
|
||||
__entry->block_nr = block_nr;
|
||||
__entry->freed = freed;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d from %u to %u offset %lu freed %lu",
|
||||
MAJOR(__entry->dev), MINOR(__entry->dev),
|
||||
__entry->tail_sequence, __entry->first_tid,
|
||||
__entry->block_nr, __entry->freed)
|
||||
);
|
||||
|
||||
TRACE_EVENT(journal_write_superblock,
|
||||
TP_PROTO(journal_t *journal, int write_op),
|
||||
|
||||
TP_ARGS(journal, write_op),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( dev_t, dev )
|
||||
__field( int, write_op )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->dev = journal->j_fs_dev->bd_dev;
|
||||
__entry->write_op = write_op;
|
||||
),
|
||||
|
||||
TP_printk("dev %d,%d write_op %x", MAJOR(__entry->dev),
|
||||
MINOR(__entry->dev), __entry->write_op)
|
||||
);
|
||||
|
||||
#endif /* _TRACE_JBD_H */
|
||||
|
||||
/* This part must be outside protection */
|
||||
#include <trace/define_trace.h>
|
Loading…
Reference in New Issue