Add an FS-Cache cache-backend that permits a mounted filesystem to be used as a
backing store for the cache.
CacheFiles uses a userspace daemon to do some of the cache management - such as
reaping stale nodes and culling. This is called cachefilesd and lives in
/sbin. The source for the daemon can be downloaded from:
http://people.redhat.com/~dhowells/cachefs/cachefilesd.c
And an example configuration from:
http://people.redhat.com/~dhowells/cachefs/cachefilesd.conf
The filesystem and data integrity of the cache are only as good as those of the
filesystem providing the backing services. Note that CacheFiles does not
attempt to journal anything since the journalling interfaces of the various
filesystems are very specific in nature.
CacheFiles creates a misc character device - "/dev/cachefiles" - that is used
to communication with the daemon. Only one thing may have this open at once,
and whilst it is open, a cache is at least partially in existence. The daemon
opens this and sends commands down it to control the cache.
CacheFiles is currently limited to a single cache.
CacheFiles attempts to maintain at least a certain percentage of free space on
the filesystem, shrinking the cache by culling the objects it contains to make
space if necessary - see the "Cache Culling" section. This means it can be
placed on the same medium as a live set of data, and will expand to make use of
spare space and automatically contract when the set of data requires more
space.
============
REQUIREMENTS
============
The use of CacheFiles and its daemon requires the following features to be
available in the system and in the cache filesystem:
- dnotify.
- extended attributes (xattrs).
- openat() and friends.
- bmap() support on files in the filesystem (FIBMAP ioctl).
- The use of bmap() to detect a partial page at the end of the file.
It is strongly recommended that the "dir_index" option is enabled on Ext3
filesystems being used as a cache.
=============
CONFIGURATION
=============
The cache is configured by a script in /etc/cachefilesd.conf. These commands
set up cache ready for use. The following script commands are available:
(*) brun <N>%
(*) bcull <N>%
(*) bstop <N>%
(*) frun <N>%
(*) fcull <N>%
(*) fstop <N>%
Configure the culling limits. Optional. See the section on culling
The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
The commands beginning with a 'b' are file space (block) limits, those
beginning with an 'f' are file count limits.
(*) dir <path>
Specify the directory containing the root of the cache. Mandatory.
(*) tag <name>
Specify a tag to FS-Cache to use in distinguishing multiple caches.
Optional. The default is "CacheFiles".
(*) debug <mask>
Specify a numeric bitmask to control debugging in the kernel module.
Optional. The default is zero (all off). The following values can be
OR'd into the mask to collect various information:
1 Turn on trace of function entry (_enter() macros)
2 Turn on trace of function exit (_leave() macros)
4 Turn on trace of internal debug points (_debug())
This mask can also be set through sysfs, eg:
echo 5 >/sys/modules/cachefiles/parameters/debug
==================
STARTING THE CACHE
==================
The cache is started by running the daemon. The daemon opens the cache device,
configures the cache and tells it to begin caching. At that point the cache
binds to fscache and the cache becomes live.
The daemon is run as follows:
/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
The flags are:
(*) -d
Increase the debugging level. This can be specified multiple times and
is cumulative with itself.
(*) -s
Send messages to stderr instead of syslog.
(*) -n
Don't daemonise and go into background.
(*) -f <configfile>
Use an alternative configuration file rather than the default one.
===============
THINGS TO AVOID
===============
Do not mount other things within the cache as this will cause problems. The
kernel module contains its own very cut-down path walking facility that ignores
mountpoints, but the daemon can't avoid them.
Do not create, rename or unlink files and directories in the cache whilst the
cache is active, as this may cause the state to become uncertain.
Renaming files in the cache might make objects appear to be other objects (the
filename is part of the lookup key).
Do not change or remove the extended attributes attached to cache files by the
cache as this will cause the cache state management to get confused.
Do not create files or directories in the cache, lest the cache get confused or
serve incorrect data.
Do not chmod files in the cache. The module creates things with minimal
permissions to prevent random users being able to access them directly.
=============
CACHE CULLING
=============
The cache may need culling occasionally to make space. This involves
discarding objects from the cache that have been used less recently than
anything else. Culling is based on the access time of data objects. Empty
directories are culled if not in use.
Cache culling is done on the basis of the percentage of blocks and the
percentage of files available in the underlying filesystem. There are six
"limits":
(*) brun
(*) frun
If the amount of free space and the number of available files in the cache
rises above both these limits, then culling is turned off.
(*) bcull
(*) fcull
If the amount of available space or the number of available files in the
cache falls below either of these limits, then culling is started.
(*) bstop
(*) fstop
If the amount of available space or the number of available files in the
cache falls below either of these limits, then no further allocation of
disk space or files is permitted until culling has raised things above
these limits again.
These must be configured thusly:
0 <= bstop < bcull < brun < 100
0 <= fstop < fcull < frun < 100
Note that these are percentages of available space and available files, and do
_not_ appear as 100 minus the percentage displayed by the "df" program.
The userspace daemon scans the cache to build up a table of cullable objects.
These are then culled in least recently used order. A new scan of the cache is
started as soon as space is made in the table. Objects will be skipped if
their atimes have changed or if the kernel module says it is still using them.
===============
CACHE STRUCTURE
===============
The CacheFiles module will create two directories in the directory it was
given:
(*) cache/
(*) graveyard/
The active cache objects all reside in the first directory. The CacheFiles
kernel module moves any retired or culled objects that it can't simply unlink
to the graveyard from which the daemon will actually delete them.
The daemon uses dnotify to monitor the graveyard directory, and will delete
anything that appears therein.
The module represents index objects as directories with the filename "I..." or
"J...". Note that the "cache/" directory is itself a special index.
Data objects are represented as files if they have no children, or directories
if they do. Their filenames all begin "D..." or "E...". If represented as a
directory, data objects will have a file in the directory called "data" that
actually holds the data.
Special objects are similar to data objects, except their filenames begin
"S..." or "T...".
If an object has children, then it will be represented as a directory.
Immediately in the representative directory are a collection of directories
named for hash values of the child object keys with an '@' prepended. Into
this directory, if possible, will be placed the representations of the child
objects:
INDEX INDEX INDEX DATA FILES
========= ========== ================================= ================
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
If the key is so long that it exceeds NAME_MAX with the decorations added on to
it, then it will be cut into pieces, the first few of which will be used to
make a nest of directories, and the last one of which will be the objects
inside the last directory. The names of the intermediate directories will have
'+' prepended:
J1223/@23/+xy...z/+kl...m/Epqr
Note that keys are raw data, and not only may they exceed NAME_MAX in size,
they may also contain things like '/' and NUL characters, and so they may not
be suitable for turning directly into a filename.
To handle this, CacheFiles will use a suitably printable filename directly and
"base-64" encode ones that aren't directly suitable. The two versions of
object filenames indicate the encoding:
OBJECT TYPE PRINTABLE ENCODED
=============== =============== ===============
Index "I..." "J..."
Data "D..." "E..."
Special "S..." "T..."
Intermediate directories are always "@" or "+" as appropriate.
Each object in the cache has an extended attribute label that holds the object
type ID (required to distinguish special objects) and the auxiliary data from
the netfs. The latter is used to detect stale objects in the cache and update
or retire them.
Note that CacheFiles will erase from the cache any file it doesn't recognise or
any file of an incorrect type (such as a FIFO file or a device file).
==========================
SECURITY MODEL AND SELINUX
==========================
CacheFiles is implemented to deal properly with the LSM security features of
the Linux kernel and the SELinux facility.
One of the problems that CacheFiles faces is that it is generally acting on
behalf of a process, and running in that process's context, and that includes a
security context that is not appropriate for accessing the cache - either
because the files in the cache are inaccessible to that process, or because if
the process creates a file in the cache, that file may be inaccessible to other
processes.
The way CacheFiles works is to temporarily change the security context (fsuid,
fsgid and actor security label) that the process acts as - without changing the
security context of the process when it the target of an operation performed by
some other process (so signalling and suchlike still work correctly).
When the CacheFiles module is asked to bind to its cache, it:
(1) Finds the security label attached to the root cache directory and uses
that as the security label with which it will create files. By default,
this is:
cachefiles_var_t
(2) Finds the security label of the process which issued the bind request
(presumed to be the cachefilesd daemon), which by default will be:
cachefilesd_t
and asks LSM to supply a security ID as which it should act given the
daemon's label. By default, this will be:
cachefiles_kernel_t
SELinux transitions the daemon's security ID to the module's security ID
based on a rule of this form in the policy.
type_transition <daemon's-ID> kernel_t : process <module's-ID>;
For instance:
type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
The module's security ID gives it permission to create, move and remove files
and directories in the cache, to find and access directories and files in the
cache, to set and access extended attributes on cache objects, and to read and
write files in the cache.
The daemon's security ID gives it only a very restricted set of permissions: it
may scan directories, stat files and erase files and directories. It may
not read or write files in the cache, and so it is precluded from accessing the
data cached therein; nor is it permitted to create new files in the cache.
There are policy source files available in:
http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
and later versions. In that tarball, see the files:
cachefilesd.te
cachefilesd.fc
cachefilesd.if
They are built and installed directly by the RPM.
If a non-RPM based system is being used, then copy the above files to their own
directory and run:
make -f /usr/share/selinux/devel/Makefile
semodule -i cachefilesd.pp
You will need checkpolicy and selinux-policy-devel installed prior to the
build.
By default, the cache is located in /var/fscache, but if it is desirable that
it should be elsewhere, than either the above policy files must be altered, or
an auxiliary policy must be installed to label the alternate location of the
cache.
For instructions on how to add an auxiliary policy to enable the cache to be
located elsewhere when SELinux is in enforcing mode, please see:
/usr/share/doc/cachefilesd-*/move-cache.txt
When the cachefilesd rpm is installed; alternatively, the document can be found
in the sources.
==================
A NOTE ON SECURITY
==================
CacheFiles makes use of the split security in the task_struct. It allocates
its own task_security structure, and redirects current->act_as to point to it
when it acts on behalf of another process, in that process's context.
The reason it does this is that it calls vfs_mkdir() and suchlike rather than
bypassing security and calling inode ops directly. Therefore the VFS and LSM
may deny the CacheFiles access to the cache data because under some
circumstances the caching code is running in the security context of whatever
process issued the original syscall on the netfs.
Furthermore, should CacheFiles create a file or directory, the security
parameters with that object is created (UID, GID, security label) would be
derived from that process that issued the system call, thus potentially
preventing other processes from accessing the cache - including CacheFiles's
cache management daemon (cachefilesd).
What is required is to temporarily override the security of the process that
issued the system call. We can't, however, just do an in-place change of the
security data as that affects the process as an object, not just as a subject.
This means it may lose signals or ptrace events for example, and affects what
the process looks like in /proc.
So CacheFiles makes use of a logical split in the security between the
objective security (task->sec) and the subjective security (task->act_as). The
objective security holds the intrinsic security properties of a process and is
never overridden. This is what appears in /proc, and is what is used when a
process is the target of an operation by some other process (SIGKILL for
example).
The subjective security holds the active security properties of a process, and
may be overridden. This is not seen externally, and is used whan a process
acts upon another object, for example SIGKILLing another process or opening a
file.
LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request
for CacheFiles to run in a context of a specific security label, or to create
files and directories with another security label.
This documentation is added by the patch to:
Documentation/filesystems/caching/cachefiles.txt
Signed-Off-By: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
Add the main configuration option, allowing FS-Cache to be selected; the
module entry and exit functions and the debugging stuff used by these patches.
The two configuration options added are:
CONFIG_FSCACHE
CONFIG_FSCACHE_DEBUG
The first enables the facility, and the second makes the debugging statements
enableable through the "debug" module parameter. The value of this parameter
is a bitmask as described in:
Documentation/filesystems/caching/fscache.txt
The module can be loaded at this point, but all it will do at this point in
the patch series is to start up the slow work facility and shut it down again.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (864 commits)
Btrfs: explicitly mark the tree log root for writeback
Btrfs: Drop the hardware crc32c asm code
Btrfs: Add Documentation/filesystem/btrfs.txt, remove old COPYING
Btrfs: kmap_atomic(KM_USER0) is safe for btrfs_readpage_end_io_hook
Btrfs: Don't use kmap_atomic(..., KM_IRQ0) during checksum verifies
Btrfs: tree logging checksum fixes
Btrfs: don't change file extent's ram_bytes in btrfs_drop_extents
Btrfs: Use btrfs_join_transaction to avoid deadlocks during snapshot creation
Btrfs: drop remaining LINUX_KERNEL_VERSION checks and compat code
Btrfs: drop EXPORT symbols from extent_io.c
Btrfs: Fix checkpatch.pl warnings
Btrfs: Fix free block discard calls down to the block layer
Btrfs: avoid orphan inode caused by log replay
Btrfs: avoid potential super block corruption
Btrfs: do not call kfree if kmalloc failed in btrfs_sysfs_add_super
Btrfs: fix a memory leak in btrfs_get_sb
Btrfs: Fix typo in clear_state_cb
Btrfs: Fix memset length in btrfs_file_write
Btrfs: update directory's size when creating subvol/snapshot
Btrfs: add permission checks to the ioctls
...
Have one option to control Miscellaneous filesystems. This makes it easy
to disable all of them at one time.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is going to be a new version of quota format having 64-bit
quota limits and a new quota format for OCFS2. They are both
going to use the same tree structure as VFSv0 quota format. So
split out tree handling into a separate file and make size of
leaf blocks, amount of space usable in each block (needed for
checksumming) and structures contained in them configurable
so that the code can be shared.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
JBD2 is fully backwards compatible with JBD and it's been tested enough with
Ocfs2 that we can clean this code up now.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
This patch adds the Kconfig option "CONFIG_OCFS2_FS_POSIX_ACL"
and mount options "acl" to enable acls in Ocfs2.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Creating a generic filesystem notification interface, fsnotify, which will be
used by inotify, dnotify, and eventually fanotify is really starting to
clutter the fs directory. This patch simply moves inotify and dnotify into
fs/notify/inotify and fs/notify/dnotify respectively to make both current fs/
and future notification tidier.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is a large change for adding compression on reading and writing,
both for inline and regular extents. It does some fairly large
surgery to the writeback paths.
Compression is off by default and enabled by mount -o compress. Even
when the -o compress mount option is not used, it is possible to read
compressed extents off the disk.
If compression for a given set of pages fails to make them smaller, the
file is flagged to avoid future compression attempts later.
* While finding delalloc extents, the pages are locked before being sent down
to the delalloc handler. This allows the delalloc handler to do complex things
such as cleaning the pages, marking them writeback and starting IO on their
behalf.
* Inline extents are inserted at delalloc time now. This allows us to compress
the data before inserting the inline extent, and it allows us to insert
an inline extent that spans multiple pages.
* All of the in-memory extent representations (extent_map.c, ordered-data.c etc)
are changed to record both an in-memory size and an on disk size, as well
as a flag for compression.
From a disk format point of view, the extent pointers in the file are changed
to record the on disk size of a given extent and some encoding flags.
Space in the disk format is allocated for compression encoding, as well
as encryption and a generic 'other' field. Neither the encryption or the
'other' field are currently used.
In order to limit the amount of data read for a single random read in the
file, the size of a compressed extent is limited to 128k. This is a
software only limit, the disk format supports u64 sized compressed extents.
In order to limit the ram consumed while processing extents, the uncompressed
size of a compressed extent is limited to 256k. This is a software only limit
and will be subject to tuning later.
Checksumming is still done on compressed extents, and it is done on the
uncompressed version of the data. This way additional encodings can be
layered on without having to figure out which encoding to checksum.
Compression happens at delalloc time, which is basically singled threaded because
it is usually done by a single pdflush thread. This makes it tricky to
spread the compression load across all the cpus on the box. We'll have to
look at parallel pdflush walks of dirty inodes at a later time.
Decompression is hooked into readpages and it does spread across CPUs nicely.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Assume you have:
- one or more of ext2/3/4 statically built into your kernel
- none of these with extended attributes enabled and
- want to add onother one of ext2/3/4 modular and with
extended attributes enabled
then you currently have to reboot to use it since this results in
CONFIG_FS_MBCACHE=y.
That's not a common issue, but I just ran into it and since there's no
reason to get a built-in mbcache in this case this patch fixes it.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use fs/*/Kconfig more, which is good because everything related to one
filesystem is in one place and fs/Kconfig is quite fat.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.infradead.org/mtd-2.6: (69 commits)
Revert "[MTD] m25p80.c code cleanup"
[MTD] [NAND] GPIO driver depends on ARM... for now.
[MTD] [NAND] sh_flctl: fix compile error
[MTD] [NOR] AT49BV6416 has swapped erase regions
[MTD] [NAND] GPIO NAND flash driver
[MTD] cmdlineparts documentation change - explain where mtd-id comes from
[MTD] cfi_cmdset_0002.c: Add Macronix CFI V1.0 TopBottom detection
[MTD] [NAND] Fix compilation warnings in drivers/mtd/nand/cs553x_nand.c
[JFFS2] Write buffer offset adjustment for NOR-ECC (Sibley) flash
[MTD] mtdoops: Fix a bug where block may not be erased
[MTD] mtdoops: Add a magic number to logged kernel oops
[MTD] mtdoops: Fix an off by one error
[JFFS2] Correct parameter names of jffs2_compress() in comments
[MTD] [NAND] sh_flctl: add support for Renesas SuperH FLCTL
[MTD] [NAND] Bug on atmel_nand HW ECC : OOB info not correctly written
[MTD] [MAPS] Remove unused variable after ROM API cleanup.
[MTD] m25p80.c extended jedec support (v2)
[MTD] remove unused mtd parameter in of_mtd_parse_partitions()
[MTD] [NAND] remove dead Kconfig associated with !CONFIG_PPC_MERGE
[MTD] [NAND] driver extension to support NAND on TQM85xx modules
...
Make the short description of the FUSE_FS config option clearer.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (56 commits)
ocfs2: Make cached block reads the common case.
ocfs2: Kill the last naked wait_on_buffer() for cached reads.
ocfs2: Move ocfs2_bread() into dir.c
ocfs2: Simplify ocfs2_read_block()
ocfs2: Require an inode for ocfs2_read_block(s)().
ocfs2: Separate out sync reads from ocfs2_read_blocks()
ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().
ocfs2: Calculate EA hash only by its suffix.
ocfs2: Move trusted and user attribute support into xattr.c
ocfs2: Uninline ocfs2_xattr_name_hash()
ocfs2: Don't check for NULL before brelse()
ocfs2: use smaller counters in ocfs2_remove_xattr_clusters_from_cache
ocfs2: Documentation update for user_xattr / nouser_xattr mount options
ocfs2: make la_debug_mutex static
ocfs2: Remove pointless !!
ocfs2: Add empty bucket support in xattr.
ocfs2/xattr.c: Fix a bug when inserting xattr.
ocfs2: Add xattr mount option in ocfs2_show_options()
ocfs2: Switch over to JBD2.
ocfs2: Add the 'inode64' mount option.
...
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux: (59 commits)
svcrdma: Fix IRD/ORD polarity
svcrdma: Update svc_rdma_send_error to use DMA LKEY
svcrdma: Modify the RPC reply path to use FRMR when available
svcrdma: Modify the RPC recv path to use FRMR when available
svcrdma: Add support to svc_rdma_send to handle chained WR
svcrdma: Modify post recv path to use local dma key
svcrdma: Add a service to register a Fast Reg MR with the device
svcrdma: Query device for Fast Reg support during connection setup
svcrdma: Add FRMR get/put services
NLM: Remove unused argument from svc_addsock() function
NLM: Remove "proto" argument from lockd_up()
NLM: Always start both UDP and TCP listeners
lockd: Remove unused fields in the nlm_reboot structure
lockd: Add helper to sanity check incoming NOTIFY requests
lockd: change nlmclnt_grant() to take a "struct sockaddr *"
lockd: Adjust nlmsvc_lookup_host() to accomodate AF_INET6 addresses
lockd: Adjust nlmclnt_lookup_host() signature to accomodate non-AF_INET
lockd: Support non-AF_INET addresses in nlm_lookup_host()
NLM: Convert nlm_lookup_host() to use a single argument
svcrdma: Add Fast Reg MR Data Types
...
ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
limiting our maximum filesystem size.
It's a pretty trivial change. Most functions are just renamed. The
only functional change is moving to Jan's inode-based ordered data mode.
It's better, too.
Because JBD2 reads and writes JBD journals, this is compatible with any
existing filesystem. It can even interact with JBD-based ocfs2 as long
as the journal is formated for JBD.
We provide a compatibility option so that paranoid people can still use
JBD for the time being. This will go away shortly.
[ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
ocfs2_truncate_for_delete(). --Mark ]
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Looks like there is one more instance where ext4dev should be changed
to ext4 because the module name will be "ext4" unless EXT4DEV_COMPAT
is selected.
Signed-off-by: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The ext4 filesystem is getting stable enough that it's time to drop
the "dev" prefix. Also remove the requirement for the TEST_FILESYS
flag.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In order to advertise NFS-related services on IPv6 interfaces via
rpcbind, the kernel RPC server implementation must use
rpcb_v4_register() instead of rpcb_register().
A new kernel build option allows distributions to use the legacy
v2 call until they integrate an appropriate user-space rpcbind
daemon that can support IPv6 RPC services.
I tried adding some automatic logic to fall back if registering
with a v4 protocol request failed, but there are too many corner
cases. So I just made it a compile-time switch that distributions
can throw when they've replaced portmapper with rpcbind.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch adds the CONFIG_FILE_LOCKING option which allows to remove
support for advisory locks. With this patch enabled, the flock()
system call, the F_GETLK, F_SETLK and F_SETLKW operations of fcntl()
and NFS support are disabled. These features are not necessarly needed
on embedded systems. It allows to save ~11 Kb of kernel code and data:
text data bss dec hex filename
1125436 118764 212992 1457192 163c28 vmlinux.old
1114299 118564 212992 1445855 160fdf vmlinux
-11137 -200 0 -11337 -2C49 +/-
This patch has originally been written by Matt Mackall
<mpm@selenic.com>, and is part of the Linux Tiny project.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: matthew@wil.cx
Cc: linux-fsdevel@vger.kernel.org
Cc: mpm@selenic.com
Cc: akpm@linux-foundation.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Adds OMFS to the fs Kconfig and Makefile
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While fixing CONFIG_ leakages to the userspace kernel headers I ran into
CODA_FS_OLD_API.
After five years, are there still people using the old API left?
Especially considering that you have to choose at compile time which API
to support in the kernel (and distributions tend to offer the new API for
some time).
Jan: "The old API can definitely go. Around the time the new
interface went in there were some non-Coda userspace file system
implementations that took a while longer to convert to the new API,
but by now they all switched to the new interface or in some cases
to a FUSE-based solution."
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
[PATCH] ocfs2: fix oops in mmap_truncate testing
configfs: call drop_link() to cleanup after create_link() failure
configfs: Allow ->make_item() and ->make_group() to return detailed errors.
configfs: Fix failing mkdir() making racing rmdir() fail
configfs: Fix deadlock with racing rmdir() and rename()
configfs: Make configfs_new_dirent() return error code instead of NULL
configfs: Protect configfs_dirent s_links list mutations
configfs: Introduce configfs_dirent_lock
ocfs2: Don't snprintf() without a format.
ocfs2: Fix CONFIG_OCFS2_DEBUG_FS #ifdefs
ocfs2/net: Silence build warnings on sparc64
ocfs2: Handle error during journal load
ocfs2: Silence an error message in ocfs2_file_aio_read()
ocfs2: use simple_read_from_buffer()
ocfs2: fix printk format warnings with OCFS2_FS_STATS=n
[PATCH 2/2] ocfs2: Instrument fs cluster locks
[PATCH 1/2] ocfs2: Add CONFIG_OCFS2_FS_STATS config option
* 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6:
UBIFS: include to compilation
UBIFS: add new flash file system
UBIFS: add brief documentation
MAINTAINERS: add UBIFS section
do_mounts: allow UBI root device name
VFS: export sync_sb_inodes
VFS: move inode_lock into sync_sb_inodes
Add UBIFS to Makefile and Kbuild.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
This patch adds config option CONFIG_OCFS2_FS_STATS to allow building
the fs with instrumentation enabled. An upcoming patch will provide
support to instrument cluster locking, which is a crucial overhead in
a cluster file system. This config option allows users to avoid the cpu
and memory overhead that is involved in gathering such statistics.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Some server vendors support the higher versions of rpcbind only for
AF_INET6. The kernel doesn't need to use v3 or v4 for AF_INET anyway,
so change the kernel's rpcbind client to query AF_INET servers over
rpcbind v2 only.
This has a few interesting benefits:
1. If the rpcbind request is going over TCP, and the server doesn't
support rpcbind versions 3 or 4, the client reduces by two the number
of ephemeral ports left in TIME_WAIT for each rpcbind request. This
will help during NFS mount storms.
2. The rpcbind interaction with servers that don't support rpcbind
versions 3 or 4 will use less network traffic. Also helpful
during mount storms.
3. We can eliminate the kernel build option that controls whether the
kernel's rpcbind client uses rpcbind version 3 and 4 for AF_INET
servers. Less complicated kernel configuration...
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: refresh the help text for Kconfig items related to the NFS
client. Remove obsolete URLs, and make the language consistent among
the options.
Also move the ROOT_NFS config option next to the options related to the
NFS client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I would suggest to remove the "experimental" status from Kdump.
Kdump is now in the kernel since a long time and used by Enterprise
distributions. I don't think that "experimental" is true any more.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: vgoyal@redhat.com
Cc: kexec@lists.infradead.org
Cc: Bernhard Walle <bwalle@suse.de>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The url in the help text for ntfs should be updated.
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds hugetlbfs support on System z, using both hardware large page
support if available and software large page emulation on older hardware.
Shared (large) page tables are implemented in software emulation mode,
by using page->index of the first tail page from a compound large page
to store page table information.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (80 commits)
SUNRPC: Invalidate the RPCSEC_GSS session if the server dropped the request
make nfs_automount_list static
NFS: remove duplicate flags assignment from nfs_validate_mount_data
NFS - fix potential NULL pointer dereference v2
SUNRPC: Don't change the RPCSEC_GSS context on a credential that is in use
SUNRPC: Fix a race in gss_refresh_upcall()
SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests
SUNRPC: Remove the unused export of xprt_force_disconnect
SUNRPC: remove XS_SENDMSG_RETRY
SUNRPC: Protect creds against early garbage collection
NFSv4: Attempt to use machine credentials in SETCLIENTID calls
NFSv4: Reintroduce machine creds
NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid()
nfs: fix printout of multiword bitfields
nfs: return negative error value from nfs{,4}_stat_to_errno
NLM/lockd: Ensure client locking calls use correct credentials
NFS: Remove the buggy lock-if-signalled case from do_setlk()
NLM/lockd: Fix a race when cancelling a blocking lock
NLM/lockd: Ensure that nlmclnt_cancel() returns results of the CANCEL call
NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel
...
Clean up: Because NFSD_V4 "depends on" NFSD_V3, it appears as a child of
the NFSD_V3 menu entry, and is not visible if NFSD_V3 is unselected.
Replace the dependency on NFSD_V3 with a "select NFSD_V3". This makes
NFSD_V4 look and work just like NFS_V3, while ensuring that NFSD_V3 is
enabled if NFSD_V4 is.
Sam Ravnborg adds:
"This use of select is questionable. In general it is bad to select
a symbol with dependencies.
In this case the dependencies of NFSD_V3 are duplicated for NFSD_V4
so we will not se erratic configurations but do you remember to
update NFSD_V4 when you add a depends on NFSD_V3?
But I see no other clean way to do it right now."
Later he said:
"My comment was more to say we have things to address in kconfig.
This is abuse in the acceptable range."
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Recently, commit 440bcc59 added a reverse dependency to fs/Kconfig to
ensure that PROC_FS was enabled if SUNRPC_GSS was enabled.
Apparently this isn't necessary because the auth_gss components under
net/sunrpc will build correctly even if PROC_FS is disabled, though
RPCSEC_GSS will not work without /proc.
It also violates the guideline in Documentation/kbuild/kconfig-language.txt
that states "In general use select only for non-visible symbols (no prompts
anywhere) and for symbols with no dependencies."
To address these issues, remove the dependency.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Recently, commit 440bcc59 added a reverse dependency to fs/Kconfig to
ensure that PROC_FS was enabled if NFSD_V4 was enabled.
There is a guideline in Documentation/kbuild/kconfig-language.txt that
states "In general use select only for non-visible symbols (no prompts
anywhere) and for symbols with no dependencies."
A quick grep around other Kconfig files reveals that no entry currently
uses "select PROC_FS" -- every one uses "depends on". Thus CONFIG_NFSD_V4
should use "depends on PROC_FS" as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
As far as I can tell, selecting the CRYPTO and CRYPTO_MD5 entries under
CONFIG_NFSD is redundant, since CONFIG_NFSD_V4 already selects
RPCSEC_GSS_KRB5, which selects these entries.
Testing with "make menuconfig" shows that the entries under CRYPTO still
properly reflect "Y" or "M" based on the setting of CONFIG_NFSD after this
change is applied.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: since NFSD_V2_ACL is a boolean, it can be selected safely
under the NFSD_V3_ACL entry (also a boolean).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: since FS_POSIX_ACL is a non-visible boolean entry, it can be
selected safely under the NFSD_V4 entry (also a boolean).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: refresh the help text for Kconfig items related to the NFS
server. Remove obsolete URLs, and make the language consistent among
the options.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Likewise, distros usually leave CONFIG_NFSD_TCP enabled.
TCP support in the Linux NFS server is stable enough that we can leave it
on always. CONFIG_NFSD_TCP adds about 10 lines of code, and defaults to
"Y" anyway.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: (41 commits)
udf: use crc_itu_t from lib instead of udf_crc
udf: Fix compilation warnings when UDF debug is on
udf: Fix bug in VAT mapping code
udf: Add read-only support for 2.50 UDF media
udf: Fix handling of multisession media
udf: Mount filesystem read-only if it has pseudooverwrite partition
udf: Handle VAT packed inside inode properly
udf: Allow loading of VAT inode
udf: Fix detection of VAT version
udf: Silence warning about accesses beyond end of device
udf: Improve anchor block detection
udf: Cleanup anchor block detection.
udf: Move processing of virtual partitions
udf: Move filling of partition descriptor info into a separate function
udf: Improve error recovery on mount
udf: Cleanup volume descriptor sequence processing
udf: fix anchor point detection
udf: Remove declarations of arrays of size UDF_NAME_LEN (256 bytes)
udf: Remove checking of existence of filename in udf_add_entry()
udf: Mark udf_process_sequence() as noinline
...
ocfs2 now supports plug-ins for the classic O2CB stack as well as
userspace cluster stacks in conjunction with fs/dlm. This allows zero,
one, or both of the plug-ins to be selected in Kconfig. For local mounts
(non-clustered), neither plug-in is needed. Both plugins can be loaded
at one time, the runtime will select the one needed for the cluster
systme in use.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
As pointed out by Sergey Vlasov, UDF implements its own version of
the CRC ITU-T V.41. Convert it to use the one in the library.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Cc: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Jan Kara <jack@suse.cz>
Documentation/ is a little large, and filesystems/ seems an obvious
place for this file.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Most distros will want support for rpcbind protocols 3 and 4 to default off
until they have integrated user-space support for the new rpcbind daemon
which supports IPv6 RPC services.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: refresh the help text for Kconfig items related to the sunrpc
module. Remove obsolete URLs, and make the language consistent among
the options.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>