linux-sg2042/include/linux/poll.h

142 lines
3.6 KiB
C
Raw Normal View History

#ifndef _LINUX_POLL_H
#define _LINUX_POLL_H
#include <asm/poll.h>
#ifdef __KERNEL__
#include <linux/compiler.h>
#include <linux/ktime.h>
#include <linux/wait.h>
#include <linux/string.h>
#include <linux/fs.h>
#include <linux/sysctl.h>
#include <asm/uaccess.h>
extern struct ctl_table epoll_table[]; /* for sysctl */
[PATCH] Optimize select/poll by putting small data sets on the stack Optimize select and poll by a using stack space for small fd sets This brings back an old optimization from Linux 2.0. Using the stack is faster than kmalloc. On a Intel P4 system it speeds up a select of a single pty fd by about 13% (~4000 cycles -> ~3500) It also saves memory because a daemon hanging in select or poll will usually save one or two less pages. This can add up - e.g. if you have 10 daemons blocking in poll/select you save 40KB of memory. I did a patch for this long ago, but it was never applied. This version is a reimplementation of the old patch that tries to be less intrusive. I only did the minimal changes needed for the stack allocation. The cut off point before external memory is allocated is currently at 832bytes. The system calls always allocate this much memory on the stack. These 832 bytes are divided into 256 bytes frontend data (for the select bitmaps of the pollfds) and the rest of the space for the wait queues used by the low level drivers. There are some extreme cases where this won't work out for select and it falls back to allocating memory too early - especially with very sparse large select bitmaps - but the majority of processes who only have a small number of file descriptors should be ok. [TBD: 832/256 might not be the best split for select or poll] I suspect more optimizations might be possible, but they would be more complicated. One way would be to cache the select/poll context over multiple system calls because typically the input values should be similar. Problem is when to flush the file descriptors out though. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 17:56:33 +08:00
/* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
additional memory. */
#define MAX_STACK_ALLOC 832
#define FRONTEND_STACK_ALLOC 256
#define SELECT_STACK_ALLOC FRONTEND_STACK_ALLOC
#define POLL_STACK_ALLOC FRONTEND_STACK_ALLOC
#define WQUEUES_STACK_ALLOC (MAX_STACK_ALLOC - FRONTEND_STACK_ALLOC)
#define N_INLINE_POLL_ENTRIES (WQUEUES_STACK_ALLOC / sizeof(struct poll_table_entry))
#define DEFAULT_POLLMASK (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM)
struct poll_table_struct;
/*
* structures and helpers for f_op->poll implementations
*/
typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, struct poll_table_struct *);
typedef struct poll_table_struct {
poll_queue_proc qproc;
unsigned long key;
} poll_table;
static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
{
if (p && wait_address)
p->qproc(filp, wait_address, p);
}
static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
{
pt->qproc = qproc;
pt->key = ~0UL; /* all events enabled */
}
[PATCH] Optimize select/poll by putting small data sets on the stack Optimize select and poll by a using stack space for small fd sets This brings back an old optimization from Linux 2.0. Using the stack is faster than kmalloc. On a Intel P4 system it speeds up a select of a single pty fd by about 13% (~4000 cycles -> ~3500) It also saves memory because a daemon hanging in select or poll will usually save one or two less pages. This can add up - e.g. if you have 10 daemons blocking in poll/select you save 40KB of memory. I did a patch for this long ago, but it was never applied. This version is a reimplementation of the old patch that tries to be less intrusive. I only did the minimal changes needed for the stack allocation. The cut off point before external memory is allocated is currently at 832bytes. The system calls always allocate this much memory on the stack. These 832 bytes are divided into 256 bytes frontend data (for the select bitmaps of the pollfds) and the rest of the space for the wait queues used by the low level drivers. There are some extreme cases where this won't work out for select and it falls back to allocating memory too early - especially with very sparse large select bitmaps - but the majority of processes who only have a small number of file descriptors should be ok. [TBD: 832/256 might not be the best split for select or poll] I suspect more optimizations might be possible, but they would be more complicated. One way would be to cache the select/poll context over multiple system calls because typically the input values should be similar. Problem is when to flush the file descriptors out though. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 17:56:33 +08:00
struct poll_table_entry {
poll: allow f_op->poll to sleep f_op->poll is the only vfs operation which is not allowed to sleep. It's because poll and select implementation used task state to synchronize against wake ups, which doesn't have to be the case anymore as wait/wake interface can now use custom wake up functions. The non-sleep restriction can be a bit tricky because ->poll is not called from an atomic context and the result of accidentally sleeping in ->poll only shows up as temporary busy looping when the timing is right or rather wrong. This patch converts poll/select to use custom wake up function and use separate triggered variable to synchronize against wake up events. The only added overhead is an extra function call during wake up and negligible. This patch removes the one non-sleep exception from vfs locking rules and is beneficial to userland filesystem implementations like FUSE, 9p or peculiar fs like spufs as it's very difficult for those to implement non-sleeping poll method. While at it, make the following cosmetic changes to make poll.h and select.c checkpatch friendly. * s/type * symbol/type *symbol/ : three places in poll.h * remove blank line before EXPORT_SYMBOL() : two places in select.c Oleg: spotted missing barrier in poll_schedule_timeout() Davide: spotted missing write barrier in pollwake() Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Brad Boyer <flar@allandria.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Roland McGrath <roland@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-07 06:40:59 +08:00
struct file *filp;
unsigned long key;
[PATCH] Optimize select/poll by putting small data sets on the stack Optimize select and poll by a using stack space for small fd sets This brings back an old optimization from Linux 2.0. Using the stack is faster than kmalloc. On a Intel P4 system it speeds up a select of a single pty fd by about 13% (~4000 cycles -> ~3500) It also saves memory because a daemon hanging in select or poll will usually save one or two less pages. This can add up - e.g. if you have 10 daemons blocking in poll/select you save 40KB of memory. I did a patch for this long ago, but it was never applied. This version is a reimplementation of the old patch that tries to be less intrusive. I only did the minimal changes needed for the stack allocation. The cut off point before external memory is allocated is currently at 832bytes. The system calls always allocate this much memory on the stack. These 832 bytes are divided into 256 bytes frontend data (for the select bitmaps of the pollfds) and the rest of the space for the wait queues used by the low level drivers. There are some extreme cases where this won't work out for select and it falls back to allocating memory too early - especially with very sparse large select bitmaps - but the majority of processes who only have a small number of file descriptors should be ok. [TBD: 832/256 might not be the best split for select or poll] I suspect more optimizations might be possible, but they would be more complicated. One way would be to cache the select/poll context over multiple system calls because typically the input values should be similar. Problem is when to flush the file descriptors out though. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 17:56:33 +08:00
wait_queue_t wait;
poll: allow f_op->poll to sleep f_op->poll is the only vfs operation which is not allowed to sleep. It's because poll and select implementation used task state to synchronize against wake ups, which doesn't have to be the case anymore as wait/wake interface can now use custom wake up functions. The non-sleep restriction can be a bit tricky because ->poll is not called from an atomic context and the result of accidentally sleeping in ->poll only shows up as temporary busy looping when the timing is right or rather wrong. This patch converts poll/select to use custom wake up function and use separate triggered variable to synchronize against wake up events. The only added overhead is an extra function call during wake up and negligible. This patch removes the one non-sleep exception from vfs locking rules and is beneficial to userland filesystem implementations like FUSE, 9p or peculiar fs like spufs as it's very difficult for those to implement non-sleeping poll method. While at it, make the following cosmetic changes to make poll.h and select.c checkpatch friendly. * s/type * symbol/type *symbol/ : three places in poll.h * remove blank line before EXPORT_SYMBOL() : two places in select.c Oleg: spotted missing barrier in poll_schedule_timeout() Davide: spotted missing write barrier in pollwake() Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Brad Boyer <flar@allandria.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Roland McGrath <roland@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-07 06:40:59 +08:00
wait_queue_head_t *wait_address;
[PATCH] Optimize select/poll by putting small data sets on the stack Optimize select and poll by a using stack space for small fd sets This brings back an old optimization from Linux 2.0. Using the stack is faster than kmalloc. On a Intel P4 system it speeds up a select of a single pty fd by about 13% (~4000 cycles -> ~3500) It also saves memory because a daemon hanging in select or poll will usually save one or two less pages. This can add up - e.g. if you have 10 daemons blocking in poll/select you save 40KB of memory. I did a patch for this long ago, but it was never applied. This version is a reimplementation of the old patch that tries to be less intrusive. I only did the minimal changes needed for the stack allocation. The cut off point before external memory is allocated is currently at 832bytes. The system calls always allocate this much memory on the stack. These 832 bytes are divided into 256 bytes frontend data (for the select bitmaps of the pollfds) and the rest of the space for the wait queues used by the low level drivers. There are some extreme cases where this won't work out for select and it falls back to allocating memory too early - especially with very sparse large select bitmaps - but the majority of processes who only have a small number of file descriptors should be ok. [TBD: 832/256 might not be the best split for select or poll] I suspect more optimizations might be possible, but they would be more complicated. One way would be to cache the select/poll context over multiple system calls because typically the input values should be similar. Problem is when to flush the file descriptors out though. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 17:56:33 +08:00
};
/*
* Structures and helpers for sys_poll/sys_poll
*/
struct poll_wqueues {
poll_table pt;
poll: allow f_op->poll to sleep f_op->poll is the only vfs operation which is not allowed to sleep. It's because poll and select implementation used task state to synchronize against wake ups, which doesn't have to be the case anymore as wait/wake interface can now use custom wake up functions. The non-sleep restriction can be a bit tricky because ->poll is not called from an atomic context and the result of accidentally sleeping in ->poll only shows up as temporary busy looping when the timing is right or rather wrong. This patch converts poll/select to use custom wake up function and use separate triggered variable to synchronize against wake up events. The only added overhead is an extra function call during wake up and negligible. This patch removes the one non-sleep exception from vfs locking rules and is beneficial to userland filesystem implementations like FUSE, 9p or peculiar fs like spufs as it's very difficult for those to implement non-sleeping poll method. While at it, make the following cosmetic changes to make poll.h and select.c checkpatch friendly. * s/type * symbol/type *symbol/ : three places in poll.h * remove blank line before EXPORT_SYMBOL() : two places in select.c Oleg: spotted missing barrier in poll_schedule_timeout() Davide: spotted missing write barrier in pollwake() Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Brad Boyer <flar@allandria.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Roland McGrath <roland@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-07 06:40:59 +08:00
struct poll_table_page *table;
struct task_struct *polling_task;
int triggered;
int error;
[PATCH] Optimize select/poll by putting small data sets on the stack Optimize select and poll by a using stack space for small fd sets This brings back an old optimization from Linux 2.0. Using the stack is faster than kmalloc. On a Intel P4 system it speeds up a select of a single pty fd by about 13% (~4000 cycles -> ~3500) It also saves memory because a daemon hanging in select or poll will usually save one or two less pages. This can add up - e.g. if you have 10 daemons blocking in poll/select you save 40KB of memory. I did a patch for this long ago, but it was never applied. This version is a reimplementation of the old patch that tries to be less intrusive. I only did the minimal changes needed for the stack allocation. The cut off point before external memory is allocated is currently at 832bytes. The system calls always allocate this much memory on the stack. These 832 bytes are divided into 256 bytes frontend data (for the select bitmaps of the pollfds) and the rest of the space for the wait queues used by the low level drivers. There are some extreme cases where this won't work out for select and it falls back to allocating memory too early - especially with very sparse large select bitmaps - but the majority of processes who only have a small number of file descriptors should be ok. [TBD: 832/256 might not be the best split for select or poll] I suspect more optimizations might be possible, but they would be more complicated. One way would be to cache the select/poll context over multiple system calls because typically the input values should be similar. Problem is when to flush the file descriptors out though. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 17:56:33 +08:00
int inline_index;
struct poll_table_entry inline_entries[N_INLINE_POLL_ENTRIES];
};
extern void poll_initwait(struct poll_wqueues *pwq);
extern void poll_freewait(struct poll_wqueues *pwq);
poll: allow f_op->poll to sleep f_op->poll is the only vfs operation which is not allowed to sleep. It's because poll and select implementation used task state to synchronize against wake ups, which doesn't have to be the case anymore as wait/wake interface can now use custom wake up functions. The non-sleep restriction can be a bit tricky because ->poll is not called from an atomic context and the result of accidentally sleeping in ->poll only shows up as temporary busy looping when the timing is right or rather wrong. This patch converts poll/select to use custom wake up function and use separate triggered variable to synchronize against wake up events. The only added overhead is an extra function call during wake up and negligible. This patch removes the one non-sleep exception from vfs locking rules and is beneficial to userland filesystem implementations like FUSE, 9p or peculiar fs like spufs as it's very difficult for those to implement non-sleeping poll method. While at it, make the following cosmetic changes to make poll.h and select.c checkpatch friendly. * s/type * symbol/type *symbol/ : three places in poll.h * remove blank line before EXPORT_SYMBOL() : two places in select.c Oleg: spotted missing barrier in poll_schedule_timeout() Davide: spotted missing write barrier in pollwake() Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Brad Boyer <flar@allandria.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Roland McGrath <roland@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-07 06:40:59 +08:00
extern int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
ktime_t *expires, unsigned long slack);
static inline int poll_schedule(struct poll_wqueues *pwq, int state)
{
return poll_schedule_timeout(pwq, state, NULL, 0);
}
/*
* Scaleable version of the fd_set.
*/
typedef struct {
unsigned long *in, *out, *ex;
unsigned long *res_in, *res_out, *res_ex;
} fd_set_bits;
/*
* How many longwords for "nr" bits?
*/
#define FDS_BITPERLONG (8*sizeof(long))
#define FDS_LONGS(nr) (((nr)+FDS_BITPERLONG-1)/FDS_BITPERLONG)
#define FDS_BYTES(nr) (FDS_LONGS(nr)*sizeof(long))
/*
* We do a VERIFY_WRITE here even though we are only reading this time:
* we'll write to it eventually..
*
* Use "unsigned long" accesses to let user-mode fd_set's be long-aligned.
*/
static inline
int get_fd_set(unsigned long nr, void __user *ufdset, unsigned long *fdset)
{
nr = FDS_BYTES(nr);
if (ufdset)
return copy_from_user(fdset, ufdset, nr) ? -EFAULT : 0;
memset(fdset, 0, nr);
return 0;
}
static inline unsigned long __must_check
set_fd_set(unsigned long nr, void __user *ufdset, unsigned long *fdset)
{
if (ufdset)
return __copy_to_user(ufdset, fdset, FDS_BYTES(nr));
return 0;
}
static inline
void zero_fd_set(unsigned long nr, unsigned long *fdset)
{
memset(fdset, 0, FDS_BYTES(nr));
}
[PATCH] Add pselect/ppoll system call implementation The following implementation of ppoll() and pselect() system calls depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the thread_info. These system calls have to change the signal mask during their operation, and signal handlers must be invoked using the new, temporary signal mask. The old signal mask must be restored either upon successful exit from the system call, or upon returning from the invoked signal handler if the system call is interrupted. We can't simply restore the original signal mask and return to userspace, since the restored signal mask may actually block the signal which interrupted the system call. The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit path to trap into do_signal() just as TIF_SIGPENDING does, and by causing do_signal() to use the saved signal mask instead of the current signal mask when setting up the stack frame for the signal handler -- or by causing do_signal() to simply restore the saved signal mask in the case where there is no handler to be invoked. The first patch implements the sys_pselect() and sys_ppoll() system calls, which are present only if TIF_RESTORE_SIGMASK is defined. That #ifdef should go away in time when all architectures have implemented it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC kernel (in the -mm tree), and the third patch then removes the arch-specific implementations of sys_rt_sigsuspend() and replaces them with generic versions using the same trick. The fourth and fifth patches, provided by David Howells, implement TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch adds the syscalls to the i386 syscall table. This patch: Add the pselect() and ppoll() system calls, providing core routines usable by the original select() and poll() system calls and also the new calls (with their semantics w.r.t timeouts). Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-19 09:44:05 +08:00
#define MAX_INT64_SECONDS (((s64)(~((u64)0)>>1)/HZ)-1)
extern int do_select(int n, fd_set_bits *fds, struct timespec *end_time);
[PATCH] Add pselect/ppoll system call implementation The following implementation of ppoll() and pselect() system calls depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the thread_info. These system calls have to change the signal mask during their operation, and signal handlers must be invoked using the new, temporary signal mask. The old signal mask must be restored either upon successful exit from the system call, or upon returning from the invoked signal handler if the system call is interrupted. We can't simply restore the original signal mask and return to userspace, since the restored signal mask may actually block the signal which interrupted the system call. The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit path to trap into do_signal() just as TIF_SIGPENDING does, and by causing do_signal() to use the saved signal mask instead of the current signal mask when setting up the stack frame for the signal handler -- or by causing do_signal() to simply restore the saved signal mask in the case where there is no handler to be invoked. The first patch implements the sys_pselect() and sys_ppoll() system calls, which are present only if TIF_RESTORE_SIGMASK is defined. That #ifdef should go away in time when all architectures have implemented it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC kernel (in the -mm tree), and the third patch then removes the arch-specific implementations of sys_rt_sigsuspend() and replaces them with generic versions using the same trick. The fourth and fifth patches, provided by David Howells, implement TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch adds the syscalls to the i386 syscall table. This patch: Add the pselect() and ppoll() system calls, providing core routines usable by the original select() and poll() system calls and also the new calls (with their semantics w.r.t timeouts). Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-19 09:44:05 +08:00
extern int do_sys_poll(struct pollfd __user * ufds, unsigned int nfds,
struct timespec *end_time);
extern int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
fd_set __user *exp, struct timespec *end_time);
extern int poll_select_set_timeout(struct timespec *to, long sec, long nsec);
#endif /* KERNEL */
#endif /* _LINUX_POLL_H */