2005-04-17 06:20:36 +08:00
|
|
|
#ifndef __SHMEM_FS_H
|
|
|
|
#define __SHMEM_FS_H
|
|
|
|
|
|
|
|
#include <linux/swap.h>
|
|
|
|
#include <linux/mempolicy.h>
|
2011-06-28 07:18:04 +08:00
|
|
|
#include <linux/pagemap.h>
|
2010-08-10 08:19:05 +08:00
|
|
|
#include <linux/percpu_counter.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
/* inode in-kernel data */
|
|
|
|
|
|
|
|
struct shmem_inode_info {
|
|
|
|
spinlock_t lock;
|
|
|
|
unsigned long flags;
|
|
|
|
unsigned long alloced; /* data pages alloced to file */
|
tmpfs: implement generic xattr support
Implement generic xattrs for tmpfs filesystems. The Feodra project, while
trying to replace suid apps with file capabilities, realized that tmpfs,
which is used on the build systems, does not support file capabilities and
thus cannot be used to build packages which use file capabilities. Xattrs
are also needed for overlayfs.
The xattr interface is a bit odd. If a filesystem does not implement any
{get,set,list}xattr functions the VFS will call into some random LSM hooks
and the running LSM can then implement some method for handling xattrs.
SELinux for example provides a method to support security.selinux but no
other security.* xattrs.
As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
xattr handler routines specifically to handle acls. Because of this tmpfs
would loose the VFS/LSM helpers to support the running LSM. To make up
for that tmpfs had stub functions that did nothing but call into the LSM
hooks which implement the helpers.
This new patch does not use the LSM fallback functions and instead just
implements a native get/set/list xattr feature for the full security.* and
trusted.* namespace like a normal filesystem. This means that tmpfs can
now support both security.selinux and security.capability, which was not
previously possible.
The basic implementation is that I attach a:
struct shmem_xattr {
struct list_head list; /* anchored by shmem_inode_info->xattr_list */
char *name;
size_t size;
char value[0];
};
Into the struct shmem_inode_info for each xattr that is set. This
implementation could easily support the user.* namespace as well, except
some care needs to be taken to prevent large amounts of unswappable memory
being allocated for unprivileged users.
[mszeredi@suse.cz: new config option, suport trusted.*, support symlinks]
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Jordi Pujol <jordipujolp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:12:39 +08:00
|
|
|
union {
|
2011-08-04 07:21:26 +08:00
|
|
|
unsigned long swapped; /* subtotal assigned to swap */
|
|
|
|
char *symlink; /* unswappable short symlink */
|
tmpfs: implement generic xattr support
Implement generic xattrs for tmpfs filesystems. The Feodra project, while
trying to replace suid apps with file capabilities, realized that tmpfs,
which is used on the build systems, does not support file capabilities and
thus cannot be used to build packages which use file capabilities. Xattrs
are also needed for overlayfs.
The xattr interface is a bit odd. If a filesystem does not implement any
{get,set,list}xattr functions the VFS will call into some random LSM hooks
and the running LSM can then implement some method for handling xattrs.
SELinux for example provides a method to support security.selinux but no
other security.* xattrs.
As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
xattr handler routines specifically to handle acls. Because of this tmpfs
would loose the VFS/LSM helpers to support the running LSM. To make up
for that tmpfs had stub functions that did nothing but call into the LSM
hooks which implement the helpers.
This new patch does not use the LSM fallback functions and instead just
implements a native get/set/list xattr feature for the full security.* and
trusted.* namespace like a normal filesystem. This means that tmpfs can
now support both security.selinux and security.capability, which was not
previously possible.
The basic implementation is that I attach a:
struct shmem_xattr {
struct list_head list; /* anchored by shmem_inode_info->xattr_list */
char *name;
size_t size;
char value[0];
};
Into the struct shmem_inode_info for each xattr that is set. This
implementation could easily support the user.* namespace as well, except
some care needs to be taken to prevent large amounts of unswappable memory
being allocated for unprivileged users.
[mszeredi@suse.cz: new config option, suport trusted.*, support symlinks]
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Jordi Pujol <jordipujolp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:12:39 +08:00
|
|
|
};
|
2011-08-04 07:21:26 +08:00
|
|
|
struct shared_policy policy; /* NUMA memory alloc policy */
|
2005-04-17 06:20:36 +08:00
|
|
|
struct list_head swaplist; /* chain of maybes on swap */
|
tmpfs: implement generic xattr support
Implement generic xattrs for tmpfs filesystems. The Feodra project, while
trying to replace suid apps with file capabilities, realized that tmpfs,
which is used on the build systems, does not support file capabilities and
thus cannot be used to build packages which use file capabilities. Xattrs
are also needed for overlayfs.
The xattr interface is a bit odd. If a filesystem does not implement any
{get,set,list}xattr functions the VFS will call into some random LSM hooks
and the running LSM can then implement some method for handling xattrs.
SELinux for example provides a method to support security.selinux but no
other security.* xattrs.
As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
xattr handler routines specifically to handle acls. Because of this tmpfs
would loose the VFS/LSM helpers to support the running LSM. To make up
for that tmpfs had stub functions that did nothing but call into the LSM
hooks which implement the helpers.
This new patch does not use the LSM fallback functions and instead just
implements a native get/set/list xattr feature for the full security.* and
trusted.* namespace like a normal filesystem. This means that tmpfs can
now support both security.selinux and security.capability, which was not
previously possible.
The basic implementation is that I attach a:
struct shmem_xattr {
struct list_head list; /* anchored by shmem_inode_info->xattr_list */
char *name;
size_t size;
char value[0];
};
Into the struct shmem_inode_info for each xattr that is set. This
implementation could easily support the user.* namespace as well, except
some care needs to be taken to prevent large amounts of unswappable memory
being allocated for unprivileged users.
[mszeredi@suse.cz: new config option, suport trusted.*, support symlinks]
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Jordi Pujol <jordipujolp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:12:39 +08:00
|
|
|
struct list_head xattr_list; /* list of shmem_xattr */
|
2005-04-17 06:20:36 +08:00
|
|
|
struct inode vfs_inode;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct shmem_sb_info {
|
|
|
|
unsigned long max_blocks; /* How many blocks are allowed */
|
2010-08-10 08:19:05 +08:00
|
|
|
struct percpu_counter used_blocks; /* How many are allocated */
|
2005-04-17 06:20:36 +08:00
|
|
|
unsigned long max_inodes; /* How many inodes are allowed */
|
|
|
|
unsigned long free_inodes; /* How many are left for allocation */
|
2008-02-08 20:21:48 +08:00
|
|
|
spinlock_t stat_lock; /* Serialize shmem_sb_info changes */
|
|
|
|
uid_t uid; /* Mount uid for root directory */
|
|
|
|
gid_t gid; /* Mount gid for root directory */
|
2011-07-26 15:15:03 +08:00
|
|
|
umode_t mode; /* Mount mode for root directory */
|
mempolicy: use struct mempolicy pointer in shmem_sb_info
This patch replaces the mempolicy mode, mode_flags, and nodemask in the
shmem_sb_info struct with a struct mempolicy pointer, initialized to NULL.
This removes dependency on the details of mempolicy from shmem.c and hugetlbfs
inode.c and simplifies the interfaces.
mpol_parse_str() in mempolicy.c is changed to return, via a pointer to a
pointer arg, a struct mempolicy pointer on success. For MPOL_DEFAULT, the
returned pointer is NULL. Further, mpol_parse_str() now takes a 'no_context'
argument that causes the input nodemask to be stored in the w.user_nodemask of
the created mempolicy for use when the mempolicy is installed in a tmpfs inode
shared policy tree. At that time, any cpuset contextualization is applied to
the original input nodemask. This preserves the previous behavior where the
input nodemask was stored in the superblock. We can think of the returned
mempolicy as "context free".
Because mpol_parse_str() is now calling mpol_new(), we can remove from
mpol_to_str() the semantic checks that mpol_new() already performs.
Add 'no_context' parameter to mpol_to_str() to specify that it should format
the nodemask in w.user_nodemask for 'bind' and 'interleave' policies.
Change mpol_shared_policy_init() to take a pointer to a "context free" struct
mempolicy and to create a new, "contextualized" mempolicy using the mode,
mode_flags and user_nodemask from the input mempolicy.
Note: we know that the mempolicy passed to mpol_to_str() or
mpol_shared_policy_init() from a tmpfs superblock is "context free". This
is currently the only instance thereof. However, if we found more uses for
this concept, and introduced any ambiguity as to whether a mempolicy was
context free or not, we could add another internal mode flag to identify
context free mempolicies. Then, we could remove the 'no_context' argument
from mpol_to_str().
Added shmem_get_sbmpol() to return a reference counted superblock mempolicy,
if one exists, to pass to mpol_shared_policy_init(). We must add the
reference under the sb stat_lock to prevent races with replacement of the mpol
by remount. This reference is removed in mpol_shared_policy_init().
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: another build fix]
[akpm@linux-foundation.org: yet another build fix]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 17:13:26 +08:00
|
|
|
struct mempolicy *mpol; /* default memory policy for mappings */
|
2005-04-17 06:20:36 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
|
|
|
|
{
|
|
|
|
return container_of(inode, struct shmem_inode_info, vfs_inode);
|
|
|
|
}
|
|
|
|
|
2011-06-28 07:18:02 +08:00
|
|
|
/*
|
|
|
|
* Functions in mm/shmem.c called directly from elsewhere:
|
|
|
|
*/
|
2011-08-04 07:21:21 +08:00
|
|
|
extern int shmem_init(void);
|
Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
very early at kernel initialization, before any driver-core device
is registered. Every device with a major/minor will provide a
device node in devtmpfs.
Devtmpfs can be changed and altered by userspace at any time,
and in any way needed - just like today's udev-mounted tmpfs.
Unmodified udev versions will run just fine on top of it, and will
recognize an already existing kernel-created device node and use it.
The default node permissions are root:root 0600. Proper permissions
and user/group ownership, meaningful symlinks, all other policy still
needs to be applied by userspace.
If a node is created by devtmps, devtmpfs will remove the device node
when the device goes away. If the device node was created by
userspace, or the devtmpfs created node was replaced by userspace, it
will no longer be removed by devtmpfs.
If it is requested to auto-mount it, it makes init=/bin/sh work
without any further userspace support. /dev will be fully populated
and dynamic, and always reflect the current device state of the kernel.
With the commonly used dynamic device numbers, it solves the problem
where static devices nodes may point to the wrong devices.
It is intended to make the initial bootup logic simpler and more robust,
by de-coupling the creation of the inital environment, to reliably run
userspace processes, from a complex userspace bootstrap logic to provide
a working /dev.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Tested-By: Harald Hoyer <harald@redhat.com>
Tested-By: Scott James Remnant <scott@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-04-30 21:23:42 +08:00
|
|
|
extern int shmem_fill_super(struct super_block *sb, void *data, int silent);
|
2011-06-28 07:18:02 +08:00
|
|
|
extern struct file *shmem_file_setup(const char *name,
|
|
|
|
loff_t size, unsigned long flags);
|
|
|
|
extern int shmem_zero_setup(struct vm_area_struct *);
|
|
|
|
extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
|
SHM_UNLOCK: fix Unevictable pages stranded after swap
Commit cc39c6a9bbde ("mm: account skipped entries to avoid looping in
find_get_pages") correctly fixed an infinite loop; but left a problem
that find_get_pages() on shmem would return 0 (appearing to callers to
mean end of tree) when it meets a run of nr_pages swap entries.
The only uses of find_get_pages() on shmem are via pagevec_lookup(),
called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
scan_mapping_unevictable_pages(). The first is already commented, and
not worth worrying about; but the second can leave pages on the
Unevictable list after an unusual sequence of swapping and locking.
Fix that by using shmem_find_get_pages_and_swap() (then ignoring the
swap) instead of pagevec_lookup().
But I don't want to contaminate vmscan.c with shmem internals, nor
shmem.c with LRU locking. So move scan_mapping_unevictable_pages() into
shmem.c, renaming it shmem_unlock_mapping(); and rename
check_move_unevictable_page() to check_move_unevictable_pages(), looping
down an array of pages, oftentimes under the same lock.
Leave out the "rotate unevictable list" block: that's a leftover from
when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed
handling involved looking at pages at tail of LRU.
Was there significance to the sequence first ClearPageUnevictable, then
test page_evictable, then SetPageUnevictable here? I think not, we're
under LRU lock, and have no barriers between those.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: <stable@vger.kernel.org> [back to 3.1 but will need respins]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-21 06:34:21 +08:00
|
|
|
extern void shmem_unlock_mapping(struct address_space *mapping);
|
2011-06-28 07:18:04 +08:00
|
|
|
extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
|
|
|
|
pgoff_t index, gfp_t gfp_mask);
|
tmpfs: take control of its truncate_range
2.6.35's new truncate convention gave tmpfs the opportunity to control
its file truncation, no longer enforced from outside by vmtruncate().
We shall want to build upon that, to handle pagecache and swap together.
Slightly redefine the ->truncate_range interface: let it now be called
between the unmap_mapping_range()s, with the filesystem responsible for
doing the truncate_inode_pages_range() from it - just as the filesystem
is nowadays responsible for doing that from its ->setattr.
Let's rename shmem_notify_change() to shmem_setattr(). Instead of
calling the generic truncate_setsize(), bring that code in so we can
call shmem_truncate_range() - which will later be updated to perform its
own variant of truncate_inode_pages_range().
Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
now that the COW's unmap_mapping_range() comes after ->truncate_range,
there is no need to call it a third time.
Export shmem_truncate_range() and add it to the list in shmem_fs.h, so
that i915_gem_object_truncate() can call it explicitly in future; get
this patch in first, then update drm/i915 once this is available (until
then, i915 will just be doing the truncate_inode_pages() twice).
Though introduced five years ago, no other filesystem is implementing
->truncate_range, and its only other user is madvise(,,MADV_REMOVE): we
expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly,
whereupon ->truncate_range can be removed from inode_operations -
shmem_truncate_range() will help i915 across that transition too.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-28 07:18:03 +08:00
|
|
|
extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
|
2011-06-28 07:18:02 +08:00
|
|
|
extern int shmem_unuse(swp_entry_t entry, struct page *page);
|
Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
very early at kernel initialization, before any driver-core device
is registered. Every device with a major/minor will provide a
device node in devtmpfs.
Devtmpfs can be changed and altered by userspace at any time,
and in any way needed - just like today's udev-mounted tmpfs.
Unmodified udev versions will run just fine on top of it, and will
recognize an already existing kernel-created device node and use it.
The default node permissions are root:root 0600. Proper permissions
and user/group ownership, meaningful symlinks, all other policy still
needs to be applied by userspace.
If a node is created by devtmps, devtmpfs will remove the device node
when the device goes away. If the device node was created by
userspace, or the devtmpfs created node was replaced by userspace, it
will no longer be removed by devtmpfs.
If it is requested to auto-mount it, it makes init=/bin/sh work
without any further userspace support. /dev will be fully populated
and dynamic, and always reflect the current device state of the kernel.
With the commonly used dynamic device numbers, it solves the problem
where static devices nodes may point to the wrong devices.
It is intended to make the initial bootup logic simpler and more robust,
by de-coupling the creation of the inital environment, to reliably run
userspace processes, from a complex userspace bootstrap logic to provide
a working /dev.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Tested-By: Harald Hoyer <harald@redhat.com>
Tested-By: Scott James Remnant <scott@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-04-30 21:23:42 +08:00
|
|
|
|
2011-06-28 07:18:04 +08:00
|
|
|
static inline struct page *shmem_read_mapping_page(
|
|
|
|
struct address_space *mapping, pgoff_t index)
|
|
|
|
{
|
|
|
|
return shmem_read_mapping_page_gfp(mapping, index,
|
|
|
|
mapping_gfp_mask(mapping));
|
|
|
|
}
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
#endif
|