Commit Graph

551 Commits

Author SHA1 Message Date
Oleg Nesterov a872ff0cb2 [PATCH] simplify/fix first_tid()
first_tid:

	/* If nr exceeds the number of threads there is nothing todo */
	if (nr) {
		if (nr >= get_nr_threads(leader))
			goto done;
	}

This is not reliable: sub-threads can exit after this check, so the
'for' loop below can overlap and proc_task_readdir() can return an
already filldir'ed dirents.

	for (; pos && pid_alive(pos); pos = next_thread(pos)) {
		if (--nr > 0)
			continue;

Off-by-one error, will return 'leader' when nr == 1.

This patch tries to fix these problems and simplify the code.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman cc288738c9 [PATCH] proc: Remove tasklist_lock from proc_task_readdir.
This is just like my previous removal of tasklist_lock from first_tgid, and
next_tgid.  It simply had to wait until it was rcu safe to walk the thread
list.

This should be the last instance of the tasklist_lock in proc.  So user
processes should not be able to influence the tasklist lock hold times.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman df26c40e56 [PATCH] proc: Cleanup proc_fd_access_allowed
In process of getting proc_fd_access_allowed to work it has developed a few
warts.  In particular the special case that always allows introspection and
the special case to allow inspection of kernel threads.

The special case for introspection is needed for /proc/self/mem.

The special case for kernel threads really should be overridable
by security modules.

So consolidate these checks into ptrace.c:may_attach().

The check to always allow introspection is trivial.

The check to allow access to kernel threads, and zombies is a little
trickier.  mem_read and mem_write already verify an mm exists so it isn't
needed twice.  proc_fd_access_allowed only doesn't want a check to verify
task->mm exits, s it prevents all access to kernel threads.  So just move
the task->mm check into ptrace_attach where it is needed for practical
reasons.

I did a quick audit and none of the security modules in the kernel seem to
care if they are passed a task without an mm into security_ptrace.  So the
above move should be safe and it allows security modules to come up with
more restrictive policy.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman 778c114477 [PATCH] proc: Use sane permission checks on the /proc/<pid>/fd/ symlinks
Since 2.2 we have been doing a chroot check to see if it is appropriate to
return a read or follow one of these magic symlinks.  The chroot check was
asking a question about the visibility of files to the calling process and
it was actually checking the destination process, and not the files
themselves.  That test was clearly bogus.

In my first pass through I simply fixed the test to check the visibility of
the files themselves.  That naive approach to fixing the permissions was
too strict and resulted in cases where a task could not even see all of
it's file descriptors.

What has disturbed me about relaxing this check is that file descriptors
are per-process private things, and they are occasionaly used a user space
capability tokens.  Looking a little farther into the symlink path on /proc
I did find userid checks and a check for capability (CAP_DAC_OVERRIDE) so
there were permissions checking this.

But I was still concerned about privacy.  Besides /proc there is only one
other way to find out this kind of information, and that is ptrace.  ptrace
has been around for a long time and it has a well established security
model.

So after thinking about it I finally realized that the permission checks
that make sense are the permission checks applied to ptrace_attach.  The
checks are simple per process, and won't cause nasty surprises for people
coming from less capable unices.

Unfortunately there is one case that the current ptrace_attach test does
not cover: Zombies and kernel threads.  Single stepping those kinds of
processes is impossible.  Being able to see which file descriptors are open
on these tasks is important to lsof, fuser and friends.  So for these
special processes I made the rule you can't find out unless you have
CAP_SYS_PTRACE.

These proc permission checks should now conform to the principle of least
surprise.  As well as using much less code to implement :)

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman 5b0c1dd38b [PATCH] proc: optimize proc_check_dentry_visible
The code doesn't need to sleep to when making this check so I can just do the
comparison and not worry about the reference counts.

TODO: While looking at this I realized that my original cleanup did not push
the permission check far enough down into the stack.  The call of
proc_check_dentry_visible needs to move out of the generic proc
readlink/follow link code and into the individual get_link instances.
Otherwise the shared resources checks are not quite correct (shared
files_struct does not require a shared fs_struct), and there are races with
unshare.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman 13b41b0949 [PATCH] proc: Use struct pid not struct task_ref
Incrementally update my proc-dont-lock-task_structs-indefinitely patches so
that they work with struct pid instead of struct task_ref.

Mostly this is a straight 1-1 substitution.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman 99f8955183 [PATCH] proc: don't lock task_structs indefinitely
Every inode in /proc holds a reference to a struct task_struct.  If a
directory or file is opened and remains open after the the task exits this
pinning continues.  With 8K stacks on a 32bit machine the amount pinned per
file descriptor is about 10K.

Normally I would figure a reasonable per user process limit is about 100
processes.  With 80 processes, with a 1000 file descriptors each I can trigger
the 00M killer on a 32bit kernel, because I have pinned about 800MB of useless
data.

This patch replaces the struct task_struct pointer with a pointer to a struct
task_ref which has a struct task_struct pointer.  The so the pinning of dead
tasks does not happen.

The code now has to contend with the fact that the task may now exit at any
time.  Which is a little but not muh more complicated.

With this change it takes about 1000 processes each opening up 1000 file
descriptors before I can trigger the OOM killer.  Much better.

[mlp@google.com: task_mmu small fixes]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Paul Jackson <pj@sgi.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Albert Cahalan <acahalan@gmail.com>
Signed-off-by: Prasanna Meda <mlp@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman 8578cea750 [PATCH] proc: make PROC_NUMBUF the buffer size for holding integers as strings
Currently in /proc at several different places we define buffers to hold a
process id, or a file descriptor .  In most of them we use either a hard coded
number or a different define.  Modify them all to use PROC_NUMBUF, so the code
has a chance of being maintained.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman 9cc8cbc7f8 [PATCH] simply fix first_tgid
Like the bug Oleg spotted in first_tid there was also a small off by one
error in first_tgid, when a seek was done on the /proc directory.  This
fixes that and changes the code structure to make it a little more obvious
what is going on.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman de7587343b [PATCH] proc: Remove tasklist_lock from proc_pid_lookup() and proc_task_lookup()
Since we no longer need the tasklist_lock for get_task_struct the lookup
methods no longer need the tasklist_lock.

This just depends on my previous patch that makes get_task_struct() rcu
safe.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman 454cc105ef [PATCH] proc: Remove tasklist_lock from proc_pid_readdir
We don't need the tasklist_lock to safely iterate through processes
anymore.

This depends on my previous to task patches that make get_task_struct rcu
safe, and that make next_task() rcu safe.  I haven't gotten
first_tid/next_tid yet only because next_thread is missing an
rcu_dereference.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman 0bc58a9102 [PATCH] proc: refactor reading directories of tasks
There are a couple of problems this patch addresses.
- /proc/<tgid>/task currently does not work correctly if you stop reading
  in the middle of a directory.

- /proc/ currently requires a full pass through the task list with
  the tasklist lock held, to determine there are no more processes to read.

- The hand rolled integer to string conversion does not properly running
  out of buffer space.

- We seem to be batching reading of pids from the tasklist without reason,
  and complicating the logic of the code.

This patch addresses that by changing how tasks are processed.  A
first_<task_type> function is built that handles restarts, and a
next_<task_type> function is built that just advances to the next task.

first_<task_type> when it detects a restart usually uses find_task_by_pid.  If
that doesn't work because there has been a seek on the directory, or we have
already given a complete directory listing, it first checks the number tasks
of that type, and only if we are under that count does it walk through all of
the tasks to find the one we are interested in.

The code that fills in the directory is simpler because there is only a single
for loop.

The hand rolled integer to string conversion is replaced by snprintf which
should handle the the out of buffer case correctly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman cd6a3ce9ec [PATCH] proc: Close the race of a process dying durning lookup
proc_lookup and task exiting are not synchronized, although some of the
previous code may have suggested that.  Every time before we reuse a dentry
namei.c calls d_op->derevalidate which prevents us from reusing a stale dcache
entry.  Unfortunately it does not prevent us from returning a stale dcache
entry.  This race has been explicitly plugged in proc_pid_lookup but there is
nothing to confine it to just that proc lookup function.

So to prevent the race I call revalidate explictily in all of the proc lookup
functions after I call d_add, and report an error if the revalidate does not
succeed.

Years ago Al Viro did something similar but those changes got lost in the
churn.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman 48e6484d49 [PATCH] proc: Rewrite the proc dentry flush on exit optimization
To keep the dcache from filling up with dead /proc entries we flush them on
process exit.  However over the years that code has gotten hairy with a
dentry_pointer and a lock in task_struct and misdocumented as a correctness
feature.

I have rewritten this code to look and see if we have a corresponding entry in
the dcache and if so flush it on process exit.  This removes the extra fields
in the task_struct and allows me to trivially handle the case of a
/proc/<tgid>/task/<pid> entry as well as the current /proc/<pid> entries.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 662795deb8 [PATCH] proc: Move proc_maps_operations into task_mmu.c
All of the functions for proc_maps_operations are already defined in
task_mmu.c so move the operations structure to keep the functionality
together.

Since task_nommu.c implements a dummy version of /proc/<pid>/maps give it a
simplified version of proc_maps_operations that it can modify to best suit its
needs.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 6e66b52bf5 [PATCH] proc: Fix the link count for /proc/<pid>/task
Use getattr to get an accurate link count when needed.  This is cheaper and
more accurate than trying to derive it by walking the thread list of a
process.

Especially as it happens when needed stat instead of at readdir time.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 0f2fe20f55 [PATCH] proc: Properly filter out files that are not visible to a process
Long ago and far away in 2.2 we started checking to ensure the files we
displayed in /proc were visible to the current process.  It was an
unsophisticated time and no one was worried about functions full of FIXMES in
a stable kernel.  As time passed the function became sacred and was enshrined
in the shrine of how things have always been.  The fixes came in but only to
keep the function working no one really remembering or documenting why we did
things that way.

The intent and the functionality make a lot of sense.  Don't let /proc be an
access point for files a process can see no other way.  The implementation
however is completely wrong.

We are currently checking the root directories of the two processes, we are
not checking the actual file descriptors themselves.

We are strangely checking with a permission method instead of just when we use
the data.

This patch fixes the logic to actually check the file descriptors and make a
note that implementing a permission method for this part of /proc almost
certainly indicates a bug in the reasoning.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 22c2c5d75e [PATCH] proc: Kill proc_mem_inode_operations
The inode operations only exist to support the proc_permission function.
Currently mem_read and mem_write have all the same permission checks as
ptrace.  The fs check makes no sense in this context, and we can trivially get
around it by calling ptrace.

So simply the code by killing the strange weird case.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 68602066c3 [PATCH] proc: Remove bogus proc_task_permission
First we can access every /proc/<tgid>/task/<pid> directory as /proc/<pid> so
proc_task_permission is not usefully limiting visibility.

Second having related filesystems information should have nothing to do with
process visibility.  kill does not implement any checks like that.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman aed7a6c476 [PATCH] proc: Replace proc_inode.type with proc_inode.fd
The sole renaming use of proc_inode.type is to discover the file descriptor
number, so just store the file descriptor number and don't wory about
processing this field.  This removes any /proc limits on the maximum number of
file descriptors, and clears the path to make the hard coded /proc inode
numbers go away.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 87bfbf679f [PATCH] proc: Simplify the ownership rules for /proc
Currently in /proc if the task is dumpable all of files are owned by the tasks
effective users.  Otherwise the files are owned by root.  Unless it is the
/proc/<tgid>/ or /proc/<tgid>/task/<pid> directory in that case we always make
the directory owned by the effective user.

However the special case for directories is pointless except as a way to read
the effective user, because the permissions on both of those directories are
world readable, and executable.

/proc/<tgid>/status provides a much better way to read a processes effecitve
userid, so it is silly to try to provide that on the directory.

So this patch simplifies the code by removing a pointless special case and
gets us one step closer to being able to remove the hard coded /proc inode
numbers.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:23 -07:00
Eric W. Biederman 1679654951 [PATCH] proc: Remove unnecessary and misleading assignments from proc_pid_make_inode
The removed fields are already set by proc_alloc_inode.  Initializing them in
proc_alloc_inode implies they need it for proper cleanup.  At least ei->pde
was not set on all paths making it look like proc_alloc_inode was buggy.  So
just remove the redundant assignments.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:23 -07:00
Eric W. Biederman ff9724a3f7 [PATCH] proc: Remove useless BKL in proc_pid_readlink
We already call everything except do_proc_readlink outside of the BKL in
proc_pid_followlink, and there appears to be nothing in do_proc_readlink that
needs any special protection.

So remove this leftover from one of the BKL cleanup efforts.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:23 -07:00
Eric W. Biederman 5634708b5f [PATCH] proc: Fix the .. inode number on /proc/<pid>/fd
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:23 -07:00
Michael LeMay 4eb582cf1f [PATCH] keys: add a way to store the appropriate context for newly-created keys
Add a /proc/<pid>/attr/keycreate entry that stores the appropriate context for
newly-created keys.  Modify the selinux_key_alloc hook to make use of the new
entry.  Update the flask headers to include a new "setkeycreate" permission
for processes.  Update the flask headers to include a new "create" permission
for keys.  Use the create permission to restrict which SIDs each task can
assign to newly-created keys.  Add a new parameter to the security hook
"security_key_alloc" to indicate whether it is being invoked by the kernel, or
from userspace.  If it is being invoked by the kernel, the security hook
should never fail.  Update the documentation to reflect these changes.

Signed-off-by: Michael LeMay <mdlemay@epoch.ncsc.mil>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:18 -07:00
Al Viro e018290929 [PATCH] proc_loginuid_write() uses simple_strtoul() on non-terminated array
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20 05:25:24 -04:00
Dipankar Sarma ca99c1da08 [PATCH] Fix file lookup without ref
There are places in the kernel where we look up files in fd tables and
access the file structure without holding refereces to the file.  So, we
need special care to avoid the race between looking up files in the fd
table and tearing down of the file in another CPU.  Otherwise, one might
see a NULL f_dentry or such torn down version of the file.  This patch
fixes those special places where such a race may happen.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-19 09:13:51 -07:00
Herbert Poetzl e4e5d3fc80 [PATCH] cleanup in proc_check_chroot()
proc_check_chroot() does the check in a very unintuitive way (keeping a
copy of the argument, then modifying the argument), and has uncommented
sideeffects.

Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:59 -08:00
Chuck Lever b4629fe2f0 VFS: New /proc file /proc/self/mountstats
Create a new file under /proc/self, called mountstats, where mounted file
systems can export information (configuration options, performance counters,
and so on).  Use a mechanism similar to /proc/mounts and s_ops->show_options.

This mechanism does not violate namespace security, and is safe to use while
other processes are unmounting file systems.

Thanks to Mike Waychison for his review and comments.

Test-plan:
Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:12 -05:00
Randy Dunlap 16f7e0fe2e [PATCH] capable/capability.h (fs/)
fs: Use <linux/capability.h> where capable() is used.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 18:42:13 -08:00
Al Viro 5addc5dd88 [PATCH] make /proc/mounts pollable
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 18:18:10 -08:00
Yoshinori Sato 63c6764ce4 [PATCH] nommu build error fix
"proc_smaps_operations" is not defined in case of "CONFIG_MMU=n".

Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-14 17:10:13 -07:00
Andrew Morton 0678e5feaa [PATCH] proc_task_root_link c99 fix
fs/proc/base.c: In function `proc_task_root_link':
fs/proc/base.c:364: warning: ISO C90 forbids mixed declarations and code

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Sripathi Kodi 66dcca0628 [PATCH] Fix invisible threads problem
When the main thread of a thread group has done pthread_exit() and died,
the other threads are still happily running, but will not be visible
under /proc because their leader is no longer accessible.

This fixes the access control so that we can see the sub-threads again.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Acked-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-21 09:15:34 -07:00
Dipankar Sarma b835996f62 [PATCH] files: lock-free fd look-up
With the use of RCU in files structure, the look-up of files using fds can now
be lock-free.  The lookup is protected by rcu_read_lock()/rcu_read_unlock().
This patch changes the readers to use lock-free lookup.

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Ravikiran Thirumalai <kiran_th@gmail.com>
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:55 -07:00
Dipankar Sarma badf16621c [PATCH] files: break up files struct
In order for the RCU to work, the file table array, sets and their sizes must
be updated atomically.  Instead of ensuring this through too many memory
barriers, we put the arrays and their sizes in a separate structure.  This
patch takes the first step of putting the file table elements in a separate
structure fdtable that is embedded withing files_struct.  It also changes all
the users to refer to the file table using files_fdtable() macro.  Subsequent
applciation of RCU becomes easier after this.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:55 -07:00
Miklos Szeredi ab8d11beb4 [PATCH] remove duplicated code from proc and ptrace
Extract common code used by ptrace_attach() and may_ptrace_attach()
into a separate function.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:43 -07:00
Miklos Szeredi 5e21ccb136 [PATCH] fix enum pid_directory_inos in proc/base.c
This patch fixes wrongly placed elements in the pid_directory_inos
enum.  Also add comment so this mistake is not repeated.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:43 -07:00
Miklos Szeredi 0494f6ec5d [PATCH] use get_fs_struct() in proc
This patch cleans up proc_cwd_link() and proc_root_link() by factoring
out common code into get_fs_struct().

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:43 -07:00
Mauricio Lin e070ad49f3 [PATCH] add /proc/pid/smaps
Add a "smaps" entry to /proc/pid: show howmuch memory is resident in each
mapping.

People that want to perform a memory consumption analysing can use it
mainly if someone needs to figure out which libraries can be reduced for
embedded systems.  So the new features are the physical size of shared and
clean [or dirty]; private and clean [or dirty].

Take a look the example below:

# cat /proc/4576/smaps

08048000-080dc000 r-xp /bin/bash
Size:               592 KB
Rss:                500 KB
Shared_Clean:       500 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:        0 KB
080dc000-080e2000 rw-p /bin/bash
Size:                24 KB
Rss:                 24 KB
Shared_Clean:         0 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:       24 KB
080e2000-08116000 rw-p
Size:               208 KB
Rss:                208 KB
Shared_Clean:         0 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:      208 KB
b7e2b000-b7e34000 r-xp /lib/tls/libnss_files-2.3.2.so
Size:                36 KB
Rss:                 12 KB
Shared_Clean:        12 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:        0 KB
...

(Includes a cleanup from "Richard Purdie" <rpurdie@rpsys.net>)

From: Torsten Foertsch <torsten.foertsch@gmx.net>

show_smap calls first show_map and then prints its additional information to
the seq_file.  show_map checks if all it has to print fits into the buffer and
if yes marks the current vma as written.  While that is correct for show_map
it is not for show_smap.  Here the vma should be marked as written only after
the additional information is also written.

The attached patch cures the problem.  It moves the functionality of the
show_map function to a new function show_map_internal that is called with an
additional struct mem_size_stats* argument.  Then show_map calls
show_map_internal with NULL as struct mem_size_stats* whereas show_smap calls
it with a real pointer.  Now the final

	if (m->count < m->size)  /* vma is copied successfully */
		m->version = (vma != get_gate_vma(task))? vma->vm_start: 0;

is done only if the whole entry fits into the buffer.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:05:49 -07:00
Christoph Lameter 6e21c8f145 [PATCH] /proc/<pid>/numa_maps to show on which nodes pages reside
This patch was recently discussed on linux-mm:
http://marc.theaimsgroup.com/?t=112085728500002&r=1&w=2

I inherited a large code base from Ray for page migration.  There was a
small patch in there that I find to be very useful since it allows the
display of the locality of the pages in use by a process.  I reworked that
patch and came up with a /proc/<pid>/numa_maps that gives more information
about the vma's of a process.  numa_maps is indexes by the start address
found in /proc/<pid>/maps.  F.e.  with this patch you can see the page use
of the "getty" process:

margin:/proc/12008 # cat maps
00000000-00004000 r--p 00000000 00:00 0
2000000000000000-200000000002c000 r-xp 00000000 08:04 516                /lib/ld-2.3.3.so
2000000000038000-2000000000040000 rw-p 00028000 08:04 516                /lib/ld-2.3.3.so
2000000000040000-2000000000044000 rw-p 2000000000040000 00:00 0
2000000000058000-2000000000260000 r-xp 00000000 08:04 54707842           /lib/tls/libc.so.6.1
2000000000260000-2000000000268000 ---p 00208000 08:04 54707842           /lib/tls/libc.so.6.1
2000000000268000-2000000000274000 rw-p 00200000 08:04 54707842           /lib/tls/libc.so.6.1
2000000000274000-2000000000280000 rw-p 2000000000274000 00:00 0
2000000000280000-20000000002b4000 r--p 00000000 08:04 9126923            /usr/lib/locale/en_US.utf8/LC_CTYPE
2000000000300000-2000000000308000 r--s 00000000 08:04 60071467           /usr/lib/gconv/gconv-modules.cache
2000000000318000-2000000000328000 rw-p 2000000000318000 00:00 0
4000000000000000-4000000000008000 r-xp 00000000 08:04 29576399           /sbin/mingetty
6000000000004000-6000000000008000 rw-p 00004000 08:04 29576399           /sbin/mingetty
6000000000008000-600000000002c000 rw-p 6000000000008000 00:00 0          [heap]
60000fff7fffc000-60000fff80000000 rw-p 60000fff7fffc000 00:00 0
60000ffffff44000-60000ffffff98000 rw-p 60000ffffff44000 00:00 0          [stack]
a000000000000000-a000000000020000 ---p 00000000 00:00 0                  [vdso]

cat numa_maps
2000000000000000 default MaxRef=43 Pages=11 Mapped=11 N0=4 N1=3 N2=2 N3=2
2000000000038000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
2000000000040000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
2000000000058000 default MaxRef=43 Pages=61 Mapped=61 N0=14 N1=15 N2=16 N3=16
2000000000268000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
2000000000274000 default MaxRef=1 Pages=3 Mapped=3 Anon=3 N0=3
2000000000280000 default MaxRef=8 Pages=3 Mapped=3 N0=3
2000000000300000 default MaxRef=8 Pages=2 Mapped=2 N0=2
2000000000318000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N2=1
4000000000000000 default MaxRef=6 Pages=2 Mapped=2 N1=2
6000000000004000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
6000000000008000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
60000fff7fffc000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
60000ffffff44000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1

getty uses ld.so.  The first vma is the code segment which is used by 43
other processes and the pages are evenly distributed over the 4 nodes.

The second vma is the process specific data portion for ld.so.  This is
only one page.

The display format is:

<startaddress>	 Links to information in /proc/<pid>/map
<memory policy>  This can be "default" "interleave={}", "prefer=<node>" or "bind={<zones>}"
MaxRef=		<maximum reference to a page in this vma>
Pages=		<Nr of pages in use>
Mapped=		<Nr of pages with mapcount >
Anon=		<nr of anonymous pages>
Nx=		<Nr of pages on Node x>

The content of the proc-file is self-evident.  If this would be tied into
the sparsemem system then the contents of this file would not be too
useful.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:05:43 -07:00
Al Viro 008b150a3c [PATCH] Fix up symlink function pointers
This fixes up the symlink functions for the calling convention change:

 * afs, autofs4, befs, devfs, freevxfs, jffs2, jfs, ncpfs, procfs,
   smbfs, sysvfs, ufs, xfs - prototype change for ->follow_link()
 * befs, smbfs, xfs - same for ->put_link()

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-19 18:08:21 -07:00
Alan Cox d6e7114481 [PATCH] setuid core dump
Add a new `suid_dumpable' sysctl:

This value can be used to query and set the core dump mode for setuid
or otherwise protected/tainted binaries. The modes are

0 - (default) - traditional behaviour.  Any process which has changed
    privilege levels or is execute only will not be dumped

1 - (debug) - all processes dump core when possible.  The core dump is
    owned by the current user and no security is applied.  This is intended
    for system debugging situations only.  Ptrace is unchecked.

2 - (suidsafe) - any binary which normally would not be dumped is dumped
    readable by root only.  This allows the end user to remove such a dump but
    not access it directly.  For security reasons core dumps in this mode will
    not overwrite one another or other files.  This mode is appropriate when
    adminstrators are attempting to debug problems in a normal environment.

(akpm:

> > +EXPORT_SYMBOL(suid_dumpable);
>
> EXPORT_SYMBOL_GPL?

No problem to me.

> >  	if (current->euid == current->uid && current->egid == current->gid)
> >  		current->mm->dumpable = 1;
>
> Should this be SUID_DUMP_USER?

Actually the feedback I had from last time was that the SUID_ defines
should go because its clearer to follow the numbers. They can go
everywhere (and there are lots of places where dumpable is tested/used
as a bool in untouched code)

> Maybe this should be renamed to `dump_policy' or something.  Doing that
> would help us catch any code which isn't using the #defines, too.

Fair comment. The patch was designed to be easy to maintain for Red Hat
rather than for merging. Changing that field would create a gigantic
diff because it is used all over the place.

)

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:26 -07:00
David Woodhouse 27b030d58c Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-05-03 08:14:09 +01:00
Martin Waitz 67be2dd1ba [PATCH] DocBook: fix some descriptions
Some KernelDoc descriptions are updated to match the current code.
No code changes.

Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:26 -07:00
Pavel Pisa 4dc3b16ba1 [PATCH] DocBook: changes and extensions to the kernel documentation
I have recompiled Linux kernel 2.6.11.5 documentation for me and our
university students again.  The documentation could be extended for more
sources which are equipped by structured comments for recent 2.6 kernels.  I
have tried to proceed with that task.  I have done that more times from 2.6.0
time and it gets boring to do same changes again and again.  Linux kernel
compiles after changes for i386 and ARM targets.  I have added references to
some more files into kernel-api book, I have added some section names as well.
 So please, check that changes do not break something and that categories are
not too much skewed.

I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved
by kernel convention.  Most of the other changes are modifications in the
comments to make kernel-doc happy, accept some parameters description and do
not bail out on errors.  Changed <pid> to @pid in the description, moved some
#ifdef before comments to correct function to comments bindings, etc.

You can see result of the modified documentation build at
  http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz

Some more sources are ready to be included into kernel-doc generated
documentation.  Sources has been added into kernel-api for now.  Some more
section names added and probably some more chaos introduced as result of quick
cleanup work.

Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:25 -07:00
Daniel Drake f246315e1a [PATCH] procfs: Fix hardlink counts for /proc/<PID>/task
The current logic assumes that a /proc/<PID>/task directory should have a
hardlink count of 3, probably counting ".", "..", and a directory for a
single child task.

It's fairly obvious that this doesn't work out correctly when a PID has
more than one child task, which is quite often the case.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:03 -07:00
Daniel Drake bcf88e1163 [PATCH] procfs: Fix hardlink counts
The pid directories in /proc/ currently return the wrong hardlink count - 3,
when there are actually 4 : ".", "..", "fd", and "task".

This is easy to notice using find(1):
	cd /proc/<pid>
	find

In the output, you'll see a message similar to:

find: WARNING: Hard link count is wrong for .: this may be a bug in your
filesystem driver.  Automatically turning on find's -noleaf option.
Earlier results may have failed to include directories that should have
been searched.

http://bugs.gentoo.org/show_bug.cgi?id=86031

I also noticed that CONFIG_SECURITY can add a 5th: attr, and performed a
similar fix on the task directories too.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:03 -07:00
Steve Grubb 456be6cd90 [AUDIT] LOGIN message credentials
Attached is a new patch that solves the issue of getting valid credentials 
into the LOGIN message. The current code was assuming that the audit context 
had already been copied. This is not always the case for LOGIN messages.

To solve the problem, the patch passes the task struct to the function that 
emits the message where it can get valid credentials.

Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2005-04-29 17:30:07 +01:00
Andrea Arcangeli 79befd0c08 [PATCH] oom-killer disable for iscsi/lvm2/multipath userland critical sections
iscsi/lvm2/multipath needs guaranteed protection from the oom-killer, so
make the magical value of -17 in /proc/<pid>/oom_adj defeat the oom-killer
altogether.

(akpm: we still need to document oom_adj and friends in
Documentation/filesystems/proc.txt!)

Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:24:05 -07:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00