OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
David S. Miller	7e452baf6b	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/message/fusion/mptlan.c drivers/net/sfc/ethtool.c net/mac80211/debugfs_sta.c	2008-11-11 15:43:02 -08:00
David Howells	1f8f5cf6e4	KEYS: Make request key instantiate the per-user keyrings Make request_key() instantiate the per-user keyrings so that it doesn't oops if it needs to get hold of the user session keyring because there isn't a session keyring in place. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Steve French <smfrench@gmail.com> Tested-by: Rutger Nijlunsing <rutger.nijlunsing@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-10 13:20:57 -08:00
David S. Miller	9eeda9abd1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath5k/base.c net/8021q/vlan_core.c	2008-11-06 22:43:03 -08:00
Linus Torvalds	0a6d2fac61	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: SELinux: properly handle empty tty_files list	2008-11-01 09:50:38 -07:00
Serge Hallyn	3318a386e4	file caps: always start with clear bprm->caps_* While Linux doesn't honor setuid on scripts. However, it mistakenly behaves differently for file capabilities. This patch fixes that behavior by making sure that get_file_caps() begins with empty bprm->caps_. That way when a script is loaded, its bprm->caps_ may be filled when binfmt_misc calls prepare_binprm(), but they will be cleared again when binfmt_elf calls prepare_binprm() next to read the interpreter's file capabilities. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-01 09:49:45 -07:00
Eric Paris	37dd0bd04a	SELinux: properly handle empty tty_files list SELinux has wrongly (since 2004) had an incorrect test for an empty tty->tty_files list. With an empty list selinux would be pointing to part of the tty struct itself and would then proceed to dereference that value and again dereference that result. An F10 change to plymouth on a ppc64 system is actually currently triggering this bug. This patch uses list_empty() to handle empty lists rather than looking at a meaningless location. [note, this fixes the oops reported in https://bugzilla.redhat.com/show_bug.cgi?id=469079] Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-01 09:38:48 +11:00
Harvey Harrison	3685f25de1	misc: replace NIPQUAD() Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:56:49 -07:00
David S. Miller	a1744d3bee	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/p54/p54common.c	2008-10-31 00:17:34 -07:00
Alan Cox	731572d39f	nfsd: fix vm overcommit crash Junjiro R. Okajima reported a problem where knfsd crashes if you are using it to export shmemfs objects and run strict overcommit. In this situation the current->mm based modifier to the overcommit goes through a NULL pointer. We could simply check for NULL and skip the modifier but we've caught other real bugs in the past from mm being NULL here - cases where we did need a valid mm set up (eg the exec bug about a year ago). To preserve the checks and get the logic we want shuffle the checking around and add a new helper to the vm_ security wrappers Also fix a current->mm reference in nommu that should use the passed mm [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix build] Reported-by: Junjiro R. Okajima <hooanon05@yahoo.co.jp> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-30 11:38:47 -07:00
Harvey Harrison	5b095d9892	net: replace %p6 with %pI6 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-29 12:52:50 -07:00
Harvey Harrison	1afa67f5e7	misc: replace NIP6_FMT with %p6 format specifier The iscsi_ibft.c changes are almost certainly a bugfix as the pointer 'ip' is a u8 *, so they never print the last 8 bytes of the IPv6 address, and the eight bytes they do print have a zero byte with them in each 16-bit word. Other than that, this should cause no difference in functionality. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-28 16:06:44 -07:00
Alexey Dobriyan	def8b4faff	net: reduce structures when XFRM=n ifdef out * struct sk_buff::sp (pointer) * struct dst_entry::xfrm (pointer) * struct sock::sk_policy (2 pointers) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-28 13:24:06 -07:00
Linus Torvalds	99ebcf8285	Merge branch 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits) fix documentation of sysrq-q really Fix documentation of sysrq-q timer_list: add base address to clock base timer_list: print cpu number of clockevents device timer_list: print real timer address NOHZ: restart tick device from irq_enter() NOHZ: split tick_nohz_restart_sched_tick() NOHZ: unify the nohz function calls in irq_enter() timers: fix itimer/many thread hang, fix timers: fix itimer/many thread hang, v3 ntp: improve adjtimex frequency rounding timekeeping: fix rounding problem during clock update ntp: let update_persistent_clock() sleep hrtimer: reorder struct hrtimer to save 8 bytes on 64bit builds posix-timers: lock_timer: make it readable posix-timers: lock_timer: kill the bogus ->it_id check posix-timers: kill ->it_sigev_signo and ->it_sigev_value posix-timers: sys_timer_create: cleanup the error handling posix-timers: move the initialization of timer->sigq from send to create path posix-timers: sys_timer_create: simplify and s/tasklist/rcu/ ... Fix trivial conflicts due to sysrq-q description clahes in Documentation/sysrq.txt and drivers/char/sysrq.c	2008-10-20 13:19:56 -07:00
Lai Jiangshan	47c59803be	devcgroup: remove spin_lock() Since we introduced rcu for read side, spin_lock is used only for update. But we always hold cgroup_lock() when update, so spin_lock() is not need. Additional cleanup: 1) include linux/rcupdate.h explicitly 2) remove unused variable cur_devcgroup in devcgroup_update_access() Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: "Serge E. Hallyn" <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Li Zefan	c012a54ae0	devcgroup: remove unused variable Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Li Zefan	2cdc7241a2	devcgroup: use kmemdup() This saves 40 bytes on my x86_32 box. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Thomas Gleixner	c465a76af6	Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus	2008-10-20 13:14:06 +02:00
Steven Whitehouse	a447c09324	vfs: Use const for kernel parser table This is a much better version of a previous patch to make the parser tables constant. Rather than changing the typedef, we put the "const" in all the various places where its required, allowing the __initconst exception for nfsroot which was the cause of the previous trouble. This was posted for review some time ago and I believe its been in -mm since then. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Alexander Viro <aviro@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-13 10:10:37 -07:00
Linus Torvalds	8d71ff0bef	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (24 commits) integrity: special fs magic As pointed out by Jonathan Corbet, the timer must be deleted before ERROR: code indent should use tabs where possible The tpm_dev_release function is only called for platform devices, not pnp Protect tpm_chip_list when transversing it. Renames num_open to is_open, as only one process can open the file at a time. Remove the BKL calls from the TPM driver, which were added in the overall netlabel: Add configuration support for local labeling cipso: Add support for native local labeling and fixup mapping names netlabel: Changes to the NetLabel security attributes to allow LSMs to pass full contexts selinux: Cache NetLabel secattrs in the socket's security struct selinux: Set socket NetLabel based on connection endpoint netlabel: Add functionality to set the security attributes of a packet netlabel: Add network address selectors to the NetLabel/LSM domain mapping netlabel: Add a generic way to create ordered linked lists of network addrs netlabel: Replace protocol/NetLabel linking with refrerence counts smack: Fix missing calls to netlbl_skbuff_err() selinux: Fix missing calls to netlbl_skbuff_err() selinux: Fix a problem in security_netlbl_sid_to_secattr() selinux: Better local/forward check in selinux_ip_postroute() ...	2008-10-13 10:00:44 -07:00
Alan Cox	934e6ebf96	tty: Redo current tty locking Currently it is sometimes locked by the tty mutex and sometimes by the sighand lock. The latter is in fact correct and now we can hand back referenced objects we can fix this up without problems around sleeping functions. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-13 09:51:41 -07:00
Alan Cox	452a00d2ee	tty: Make get_current_tty use a kref We now return a kref covered tty reference. That ensures the tty structure doesn't go away when you have a return from get_current_tty. This is not enough to protect you from most of the resources being freed behind your back - yet. [Updated to include fixes for SELinux problems found by Andrew Morton and an s390 leak found while debugging the former] Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-13 09:51:41 -07:00
Mimi Zohar	9256292782	integrity: special fs magic Discussion on the mailing list questioned the use of these magic values in userspace, concluding these values are already exported to userspace via statfs and their correct/incorrect usage is left up to the userspace application. - Move special fs magic number definitions to magic.h - Add magic.h include Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-10-13 09:47:43 +11:00
James Morris	0da939b005	Merge branch 'master' of git://git.infradead.org/users/pcmoore/lblnet-2.6_next into next	2008-10-11 09:26:14 +11:00
Paul Moore	8d75899d03	netlabel: Changes to the NetLabel security attributes to allow LSMs to pass full contexts This patch provides support for including the LSM's secid in addition to the LSM's MLS information in the NetLabel security attributes structure. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:33 -04:00
Paul Moore	6c5b3fc014	selinux: Cache NetLabel secattrs in the socket's security struct Previous work enabled the use of address based NetLabel selectors, which while highly useful, brought the potential for additional per-packet overhead when used. This patch attempts to mitigate some of that overhead by caching the NetLabel security attribute struct within the SELinux socket security structure. This should help eliminate the need to recreate the NetLabel secattr structure for each packet resulting in less overhead. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:33 -04:00
Paul Moore	014ab19a69	selinux: Set socket NetLabel based on connection endpoint Previous work enabled the use of address based NetLabel selectors, which while highly useful, brought the potential for additional per-packet overhead when used. This patch attempts to solve that by applying NetLabel socket labels when sockets are connect()'d. This should alleviate the per-packet NetLabel labeling for all connected sockets (yes, it even works for connected DGRAM sockets). Signed-off-by: Paul Moore <paul.moore@hp.com> Reviewed-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:33 -04:00
Paul Moore	948bf85c1b	netlabel: Add functionality to set the security attributes of a packet This patch builds upon the new NetLabel address selector functionality by providing the NetLabel KAPI and CIPSO engine support needed to enable the new packet-based labeling. The only new addition to the NetLabel KAPI at this point is shown below: * int netlbl_skbuff_setattr(skb, family, secattr) ... and is designed to be called from a Netfilter hook after the packet's IP header has been populated such as in the FORWARD or LOCAL_OUT hooks. This patch also provides the necessary SELinux hooks to support this new functionality. Smack support is not currently included due to uncertainty regarding the permissions needed to expand the Smack network access controls. Signed-off-by: Paul Moore <paul.moore@hp.com> Reviewed-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:32 -04:00
Paul Moore	b1edeb1023	netlabel: Replace protocol/NetLabel linking with refrerence counts NetLabel has always had a list of backpointers in the CIPSO DOI definition structure which pointed to the NetLabel LSM domain mapping structures which referenced the CIPSO DOI struct. The rationale for this was that when an administrator removed a CIPSO DOI from the system all of the associated NetLabel LSM domain mappings should be removed as well; a list of backpointers made this a simple operation. Unfortunately, while the backpointers did make the removal easier they were a bit of a mess from an implementation point of view which was making further development difficult. Since the removal of a CIPSO DOI is a realtively rare event it seems to make sense to remove this backpointer list as the optimization was hurting us more then it was helping. However, we still need to be able to track when a CIPSO DOI definition is being used so replace the backpointer list with a reference count. In order to preserve the current functionality of removing the associated LSM domain mappings when a CIPSO DOI is removed we walk the LSM domain mapping table, removing the relevant entries. Signed-off-by: Paul Moore <paul.moore@hp.com> Reviewed-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:31 -04:00
Paul Moore	a8134296ba	smack: Fix missing calls to netlbl_skbuff_err() Smack needs to call netlbl_skbuff_err() to let NetLabel do the necessary protocol specific error handling. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com>	2008-10-10 10:16:31 -04:00
Paul Moore	dfaebe9825	selinux: Fix missing calls to netlbl_skbuff_err() At some point I think I messed up and dropped the calls to netlbl_skbuff_err() which are necessary for CIPSO to send error notifications to remote systems. This patch re-introduces the error handling calls into the SELinux code. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:31 -04:00
Paul Moore	99d854d231	selinux: Fix a problem in security_netlbl_sid_to_secattr() Currently when SELinux fails to allocate memory in security_netlbl_sid_to_secattr() the NetLabel LSM domain field is set to NULL which triggers the default NetLabel LSM domain mapping which may not always be the desired mapping. This patch fixes this by returning an error when the kernel is unable to allocate memory. This could result in more failures on a system with heavy memory pressure but it is the "correct" thing to do. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:30 -04:00
Paul Moore	d8395c876b	selinux: Better local/forward check in selinux_ip_postroute() It turns out that checking to see if skb->sk is NULL is not a very good indicator of a forwarded packet as some locally generated packets also have skb->sk set to NULL. Fix this by not only checking the skb->sk field but also the IP[6]CB(skb)->flags field for the IP[6]SKB_FORWARDED flag. While we are at it, we are calling selinux_parse_skb() much earlier than we really should resulting in potentially wasted cycles parsing packets for information we might no use; so shuffle the code around a bit to fix this. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:30 -04:00
Paul Moore	aa86290089	selinux: Correctly handle IPv4 packets on IPv6 sockets in all cases We did the right thing in a few cases but there were several areas where we determined a packet's address family based on the socket's address family which is not the right thing to do since we can get IPv4 packets on IPv6 sockets. This patch fixes these problems by either taking the address family directly from the packet. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:29 -04:00
Paul Moore	accc609322	selinux: Cleanup the NetLabel glue code We were doing a lot of extra work in selinux_netlbl_sock_graft() what wasn't necessary so this patch removes that code. It also removes the redundant second argument to selinux_netlbl_sock_setsid() which allows us to simplify a few other functions. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org>	2008-10-10 10:16:29 -04:00
Paul Moore	3040a6d5a2	selinux: Fix an uninitialized variable BUG/panic in selinux_secattr_to_sid() At some point during the 2.6.27 development cycle two new fields were added to the SELinux context structure, a string pointer and a length field. The code in selinux_secattr_to_sid() was not modified and as a result these two fields were left uninitialized which could result in erratic behavior, including kernel panics, when NetLabel is used. This patch fixes the problem by fully initializing the context in selinux_secattr_to_sid() before use and reducing the level of direct context manipulation done to help prevent future problems. Please apply this to the 2.6.27-rcX release stream. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-10-04 08:25:18 +10:00
Paul Moore	81990fbdd1	selinux: Fix an uninitialized variable BUG/panic in selinux_secattr_to_sid() At some point during the 2.6.27 development cycle two new fields were added to the SELinux context structure, a string pointer and a length field. The code in selinux_secattr_to_sid() was not modified and as a result these two fields were left uninitialized which could result in erratic behavior, including kernel panics, when NetLabel is used. This patch fixes the problem by fully initializing the context in selinux_secattr_to_sid() before use and reducing the level of direct context manipulation done to help prevent future problems. Please apply this to the 2.6.27-rcX release stream. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-10-04 08:18:18 +10:00
Stephen Smalley	ea6b184f7d	selinux: use default proc sid on symlinks As we are not concerned with fine-grained control over reading of symlinks in proc, always use the default proc SID for all proc symlinks. This should help avoid permission issues upon changes to the proc tree as in the /proc/net -> /proc/self/net example. This does not alter labeling of symlinks within /proc/pid directories. ls -Zd /proc/net output before and after the patch should show the difference. Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-09-30 00:26:53 +10:00
Serge E. Hallyn	de45e806a8	file capabilities: uninline cap_safe_nice This reduces the kernel size by 289 bytes. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-09-27 15:07:56 +10:00
James Morris	ab2b49518e	Merge branch 'master' into next Conflicts: MAINTAINERS Thanks for breaking my tree :-) Signed-off-by: James Morris <jmorris@namei.org>	2008-09-21 17:41:56 -07:00
Frank Mayhar	f06febc96b	timers: fix itimer/many thread hang Overview This patch reworks the handling of POSIX CPU timers, including the ITIMER_PROF, ITIMER_VIRT timers and rlimit handling. It was put together with the help of Roland McGrath, the owner and original writer of this code. The problem we ran into, and the reason for this rework, has to do with using a profiling timer in a process with a large number of threads. It appears that the performance of the old implementation of run_posix_cpu_timers() was at least O(n3) (where "n" is the number of threads in a process) or worse. Everything is fine with an increasing number of threads until the time taken for that routine to run becomes the same as or greater than the tick time, at which point things degrade rather quickly. This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF." Code Changes This rework corrects the implementation of run_posix_cpu_timers() to make it run in constant time for a particular machine. (Performance may vary between one machine and another depending upon whether the kernel is built as single- or multiprocessor and, in the latter case, depending upon the number of running processors.) To do this, at each tick we now update fields in signal_struct as well as task_struct. The run_posix_cpu_timers() function uses those fields to make its decisions. We define a new structure, "task_cputime," to contain user, system and scheduler times and use these in appropriate places: struct task_cputime { cputime_t utime; cputime_t stime; unsigned long long sum_exec_runtime; }; This is included in the structure "thread_group_cputime," which is a new substructure of signal_struct and which varies for uniprocessor versus multiprocessor kernels. For uniprocessor kernels, it uses "task_cputime" as a simple substructure, while for multiprocessor kernels it is a pointer: struct thread_group_cputime { struct task_cputime totals; }; struct thread_group_cputime { struct task_cputime totals; }; We also add a new task_cputime substructure directly to signal_struct, to cache the earliest expiration of process-wide timers, and task_cputime also replaces the it_*_expires fields of task_struct (used for earliest expiration of thread timers). The "thread_group_cputime" structure contains process-wide timers that are updated via account_user_time() and friends. In the non-SMP case the structure is a simple aggregator; unfortunately in the SMP case that simplicity was not achievable due to cache-line contention between CPUs (in one measured case performance was actually _worse_ on a 16-cpu system than the same test on a 4-cpu system, due to this contention). For SMP, the thread_group_cputime counters are maintained as a per-cpu structure allocated using alloc_percpu(). The timer functions update only the timer field in the structure corresponding to the running CPU, obtained using per_cpu_ptr(). We define a set of inline functions in sched.h that we use to maintain the thread_group_cputime structure and hide the differences between UP and SMP implementations from the rest of the kernel. The thread_group_cputime_init() function initializes the thread_group_cputime structure for the given task. The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the out-of-line function thread_group_cputime_alloc_smp() to allocate and fill in the per-cpu structures and fields. The thread_group_cputime_free() function, also a no-op for UP, in SMP frees the per-cpu structures. The thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls thread_group_cputime_alloc() if the per-cpu structures haven't yet been allocated. The thread_group_cputime() function fills the task_cputime structure it is passed with the contents of the thread_group_cputime fields; in UP it's that simple but in SMP it must also safely check that tsk->signal is non-NULL (if it is it just uses the appropriate fields of task_struct) and, if so, sums the per-cpu values for each online CPU. Finally, the three functions account_group_user_time(), account_group_system_time() and account_group_exec_runtime() are used by timer functions to update the respective fields of the thread_group_cputime structure. Non-SMP operation is trivial and will not be mentioned further. The per-cpu structure is always allocated when a task creates its first new thread, via a call to thread_group_cputime_clone_thread() from copy_signal(). It is freed at process exit via a call to thread_group_cputime_free() from cleanup_signal(). All functions that formerly summed utime/stime/sum_sched_runtime values from from all threads in the thread group now use thread_group_cputime() to snapshot the values in the thread_group_cputime structure or the values in the task structure itself if the per-cpu structure hasn't been allocated. Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit. The run_posix_cpu_timers() function has been split into a fast path and a slow path; the former safely checks whether there are any expired thread timers and, if not, just returns, while the slow path does the heavy lifting. With the dedicated thread group fields, timers are no longer "rebalanced" and the process_timer_rebalance() function and related code has gone away. All summing loops are gone and all code that used them now uses the thread_group_cputime() inline. When process-wide timers are set, the new task_cputime structure in signal_struct is used to cache the earliest expiration; this is checked in the fast path. Performance The fix appears not to add significant overhead to existing operations. It generally performs the same as the current code except in two cases, one in which it performs slightly worse (Case 5 below) and one in which it performs very significantly better (Case 2 below). Overall it's a wash except in those two cases. I've since done somewhat more involved testing on a dual-core Opteron system. Case 1: With no itimer running, for a test with 100,000 threads, the fixed kernel took 1428.5 seconds, 513 seconds more than the unfixed system, all of which was spent in the system. There were twice as many voluntary context switches with the fix as without it. Case 2: With an itimer running at .01 second ticks and 4000 threads (the most an unmodified kernel can handle), the fixed kernel ran the test in eight percent of the time (5.8 seconds as opposed to 70 seconds) and had better tick accuracy (.012 seconds per tick as opposed to .023 seconds per tick). Case 3: A 4000-thread test with an initial timer tick of .01 second and an interval of 10,000 seconds (i.e. a timer that ticks only once) had very nearly the same performance in both cases: 6.3 seconds elapsed for the fixed kernel versus 5.5 seconds for the unfixed kernel. With fewer threads (eight in these tests), the Case 1 test ran in essentially the same time on both the modified and unmodified kernels (5.2 seconds versus 5.8 seconds). The Case 2 test ran in about the same time as well, 5.9 seconds versus 5.4 seconds but again with much better tick accuracy, .013 seconds per tick versus .025 seconds per tick for the unmodified kernel. Since the fix affected the rlimit code, I also tested soft and hard CPU limits. Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer running), the modified kernel was very slightly favored in that while it killed the process in 19.997 seconds of CPU time (5.002 seconds of wall time), only .003 seconds of that was system time, the rest was user time. The unmodified kernel killed the process in 20.001 seconds of CPU (5.014 seconds of wall time) of which .016 seconds was system time. Really, though, the results were too close to call. The results were essentially the same with no itimer running. Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds (where the hard limit would never be reached) and an itimer running, the modified kernel exhibited worse tick accuracy than the unmodified kernel: .050 seconds/tick versus .028 seconds/tick. Otherwise, performance was almost indistinguishable. With no itimer running this test exhibited virtually identical behavior and times in both cases. In times past I did some limited performance testing. those results are below. On a four-cpu Opteron system without this fix, a sixteen-thread test executed in 3569.991 seconds, of which user was 3568.435s and system was 1.556s. On the same system with the fix, user and elapsed time were about the same, but system time dropped to 0.007 seconds. Performance with eight, four and one thread were comparable. Interestingly, the timer ticks with the fix seemed more accurate: The sixteen-thread test with the fix received 149543 ticks for 0.024 seconds per tick, while the same test without the fix received 58720 for 0.061 seconds per tick. Both cases were configured for an interval of 0.01 seconds. Again, the other tests were comparable. Each thread in this test computed the primes up to 25,000,000. I also did a test with a large number of threads, 100,000 threads, which is impossible without the fix. In this case each thread computed the primes only up to 10,000 (to make the runtime manageable). System time dominated, at 1546.968 seconds out of a total 2176.906 seconds (giving a user time of 629.938s). It received 147651 ticks for 0.015 seconds per tick, still quite accurate. There is obviously no comparable test without the fix. Signed-off-by: Frank Mayhar <fmayhar@google.com> Cc: Roland McGrath <roland@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-14 16:25:35 +02:00
Stephen Smalley	f058925b20	Update selinux info in MAINTAINERS and Kconfig help text Update the SELinux entry in MAINTAINERS and drop the obsolete information from the selinux Kconfig help text. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-09-12 00:44:08 +10:00
Eric Paris	8e531af90f	SELinux: memory leak in security_context_to_sid_core Fix a bug and a philosophical decision about who handles errors. security_context_to_sid_core() was leaking a context in the common case. This was causing problems on fedora systems which recently have started making extensive use of this function. In discussion it was decided that if string_to_context_struct() had an error it was its own responsibility to clean up any mess it created along the way. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-09-04 08:35:13 +10:00
Li Zefan	36fd71d293	devcgroup: fix race against rmdir() During the use of a dev_cgroup, we should guarantee the corresponding cgroup won't be deleted (i.e. via rmdir). This can be done through css_get(&dev_cgroup->css), but here we can just get and use the dev_cgroup under rcu_read_lock. And also remove checking NULL dev_cgroup, it won't be NULL since a task always belongs to a cgroup. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-02 19:21:38 -07:00
KaiGai Kohei	d9250dea3f	SELinux: add boundary support and thread context assignment The purpose of this patch is to assign per-thread security context under a constraint. It enables multi-threaded server application to kick a request handler with its fair security context, and helps some of userspace object managers to handle user's request. When we assign a per-thread security context, it must not have wider permissions than the original one. Because a multi-threaded process shares a single local memory, an arbitary per-thread security context also means another thread can easily refer violated information. The constraint on a per-thread security context requires a new domain has to be equal or weaker than its original one, when it tries to assign a per-thread security context. Bounds relationship between two types is a way to ensure a domain can never have wider permission than its bounds. We can define it in two explicit or implicit ways. The first way is using new TYPEBOUNDS statement. It enables to define a boundary of types explicitly. The other one expand the concept of existing named based hierarchy. If we defines a type with "." separated name like "httpd_t.php", toolchain implicitly set its bounds on "httpd_t". This feature requires a new policy version. The 24th version (POLICYDB_VERSION_BOUNDARY) enables to ship them into kernel space, and the following patch enables to handle it. Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-29 00:33:33 +10:00
Eric Paris	da31894ed7	securityfs: do not depend on CONFIG_SECURITY Add a new Kconfig option SECURITYFS which will build securityfs support but does not require CONFIG_SECURITY. The only current user of securityfs does not depend on CONFIG_SECURITY and there is no reason the full LSM needs to be built to build this fs. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-28 10:47:42 +10:00
James Morris	86d688984d	Merge branch 'master' into next	2008-08-28 10:47:34 +10:00
Randy Dunlap	3f23d815c5	security: add/fix security kernel-doc Add security/inode.c functions to the kernel-api docbook. Use '%' on constants in kernel-doc notation. Fix several typos/spellos in security function descriptions. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-20 20:16:32 +10:00
Vesa-Matti Kari	dbc74c65b3	selinux: Unify for- and while-loop style Replace "thing != NULL" comparisons with just "thing" to make the code look more uniform (mixed styles were used even in the same source file). Signed-off-by: Vesa-Matti Kari <vmkari@cc.helsinki.fi> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-15 08:40:47 +10:00
David Howells	5cd9c58fbe	security: Fix setting of PF_SUPERPRIV by __capable() Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags the target process if that is not the current process and it is trying to change its own flags in a different way at the same time. __capable() is using neither atomic ops nor locking to protect t->flags. This patch removes __capable() and introduces has_capability() that doesn't set PF_SUPERPRIV on the process being queried. This patch further splits security_ptrace() in two: (1) security_ptrace_may_access(). This passes judgement on whether one process may access another only (PTRACE_MODE_ATTACH for ptrace() and PTRACE_MODE_READ for /proc), and takes a pointer to the child process. current is the parent. (2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only, and takes only a pointer to the parent process. current is the child. In Smack and commoncap, this uses has_capability() to determine whether the parent will be permitted to use PTRACE_ATTACH if normal checks fail. This does not set PF_SUPERPRIV. Two of the instances of __capable() actually only act on current, and so have been changed to calls to capable(). Of the places that were using __capable(): (1) The OOM killer calls __capable() thrice when weighing the killability of a process. All of these now use has_capability(). (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see whether the parent was allowed to trace any process. As mentioned above, these have been split. For PTRACE_ATTACH and /proc, capable() is now used, and for PTRACE_TRACEME, has_capability() is used. (3) cap_safe_nice() only ever saw current, so now uses capable(). (4) smack_setprocattr() rejected accesses to tasks other than current just after calling __capable(), so the order of these two tests have been switched and capable() is used instead. (5) In smack_file_send_sigiotask(), we need to allow privileged processes to receive SIGIO on files they're manipulating. (6) In smack_task_wait(), we let a process wait for a privileged process, whether or not the process doing the waiting is privileged. I've tested this with the LTP SELinux and syscalls testscripts. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-14 22:59:43 +10:00
Vesa-Matti Kari	421fae06be	selinux: conditional expression type validation was off-by-one expr_isvalid() in conditional.c was off-by-one and allowed invalid expression type COND_LAST. However, it is this header file that needs to be fixed. That way the if-statement's disjunction's second component reads more naturally, "if expr type is greater than the last allowed value" ( rather than using ">=" in conditional.c): if (expr->expr_type <= 0 \|\| expr->expr_type > COND_LAST) Signed-off-by: Vesa-Matti Kari <vmkari@cc.helsinki.fi> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-07 08:56:16 +10:00

1 2 3 4 5 ...

615 Commits