pipe: Add general notification queue support
Make it possible to have a general notification queue built on top of a
standard pipe. Notifications are 'spliced' into the pipe and then read
out. splice(), vmsplice() and sendfile() are forbidden on pipes used for
notifications as post_one_notification() cannot take pipe->mutex. This
means that notifications could be posted in between individual pipe
buffers, making iov_iter_revert() difficult to effect.
The way the notification queue is used is:
(1) An application opens a pipe with a special flag and indicates the
number of messages it wishes to be able to queue at once (this can
only be set once):
pipe2(fds, O_NOTIFICATION_PIPE);
ioctl(fds[0], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
(2) The application then uses poll() and read() as normal to extract data
from the pipe. read() will return multiple notifications if the
buffer is big enough, but it will not split a notification across
buffers - rather it will return a short read or EMSGSIZE.
Notification messages include a length in the header so that the
caller can split them up.
Each message has a header that describes it:
struct watch_notification {
__u32 type:24;
__u32 subtype:8;
__u32 info;
};
The type indicates the source (eg. mount tree changes, superblock events,
keyring changes, block layer events) and the subtype indicates the event
type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field
indicates a number of things, including the entry length, an ID assigned to
a watchpoint contributing to this buffer and type-specific flags.
Supplementary data, such as the key ID that generated an event, can be
attached in additional slots. The maximum message size is 127 bytes.
Messages may not be padded or aligned, so there is no guarantee, for
example, that the notification type will be on a 4-byte bounary.
Signed-off-by: David Howells <dhowells@redhat.com>
2020-01-15 01:07:11 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
|
|
|
/* User-mappable watch queue
|
|
|
|
*
|
|
|
|
* Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
|
|
|
|
* Written by David Howells (dhowells@redhat.com)
|
|
|
|
*
|
2022-06-26 17:10:56 +08:00
|
|
|
* See Documentation/core-api/watch_queue.rst
|
pipe: Add general notification queue support
Make it possible to have a general notification queue built on top of a
standard pipe. Notifications are 'spliced' into the pipe and then read
out. splice(), vmsplice() and sendfile() are forbidden on pipes used for
notifications as post_one_notification() cannot take pipe->mutex. This
means that notifications could be posted in between individual pipe
buffers, making iov_iter_revert() difficult to effect.
The way the notification queue is used is:
(1) An application opens a pipe with a special flag and indicates the
number of messages it wishes to be able to queue at once (this can
only be set once):
pipe2(fds, O_NOTIFICATION_PIPE);
ioctl(fds[0], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
(2) The application then uses poll() and read() as normal to extract data
from the pipe. read() will return multiple notifications if the
buffer is big enough, but it will not split a notification across
buffers - rather it will return a short read or EMSGSIZE.
Notification messages include a length in the header so that the
caller can split them up.
Each message has a header that describes it:
struct watch_notification {
__u32 type:24;
__u32 subtype:8;
__u32 info;
};
The type indicates the source (eg. mount tree changes, superblock events,
keyring changes, block layer events) and the subtype indicates the event
type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field
indicates a number of things, including the entry length, an ID assigned to
a watchpoint contributing to this buffer and type-specific flags.
Supplementary data, such as the key ID that generated an event, can be
attached in additional slots. The maximum message size is 127 bytes.
Messages may not be padded or aligned, so there is no guarantee, for
example, that the notification type will be on a 4-byte bounary.
Signed-off-by: David Howells <dhowells@redhat.com>
2020-01-15 01:07:11 +08:00
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef _LINUX_WATCH_QUEUE_H
|
|
|
|
#define _LINUX_WATCH_QUEUE_H
|
|
|
|
|
|
|
|
#include <uapi/linux/watch_queue.h>
|
|
|
|
#include <linux/kref.h>
|
|
|
|
#include <linux/rcupdate.h>
|
|
|
|
|
|
|
|
#ifdef CONFIG_WATCH_QUEUE
|
|
|
|
|
|
|
|
struct cred;
|
|
|
|
|
|
|
|
struct watch_type_filter {
|
|
|
|
enum watch_notification_type type;
|
|
|
|
__u32 subtype_filter[1]; /* Bitmask of subtypes to filter on */
|
|
|
|
__u32 info_filter; /* Filter on watch_notification::info */
|
|
|
|
__u32 info_mask; /* Mask of relevant bits in info_filter */
|
|
|
|
};
|
|
|
|
|
|
|
|
struct watch_filter {
|
|
|
|
union {
|
|
|
|
struct rcu_head rcu;
|
2022-03-11 21:23:31 +08:00
|
|
|
/* Bitmask of accepted types */
|
|
|
|
DECLARE_BITMAP(type_filter, WATCH_TYPE__NR);
|
pipe: Add general notification queue support
Make it possible to have a general notification queue built on top of a
standard pipe. Notifications are 'spliced' into the pipe and then read
out. splice(), vmsplice() and sendfile() are forbidden on pipes used for
notifications as post_one_notification() cannot take pipe->mutex. This
means that notifications could be posted in between individual pipe
buffers, making iov_iter_revert() difficult to effect.
The way the notification queue is used is:
(1) An application opens a pipe with a special flag and indicates the
number of messages it wishes to be able to queue at once (this can
only be set once):
pipe2(fds, O_NOTIFICATION_PIPE);
ioctl(fds[0], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
(2) The application then uses poll() and read() as normal to extract data
from the pipe. read() will return multiple notifications if the
buffer is big enough, but it will not split a notification across
buffers - rather it will return a short read or EMSGSIZE.
Notification messages include a length in the header so that the
caller can split them up.
Each message has a header that describes it:
struct watch_notification {
__u32 type:24;
__u32 subtype:8;
__u32 info;
};
The type indicates the source (eg. mount tree changes, superblock events,
keyring changes, block layer events) and the subtype indicates the event
type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field
indicates a number of things, including the entry length, an ID assigned to
a watchpoint contributing to this buffer and type-specific flags.
Supplementary data, such as the key ID that generated an event, can be
attached in additional slots. The maximum message size is 127 bytes.
Messages may not be padded or aligned, so there is no guarantee, for
example, that the notification type will be on a 4-byte bounary.
Signed-off-by: David Howells <dhowells@redhat.com>
2020-01-15 01:07:11 +08:00
|
|
|
};
|
|
|
|
u32 nr_filters; /* Number of filters */
|
|
|
|
struct watch_type_filter filters[];
|
|
|
|
};
|
|
|
|
|
|
|
|
struct watch_queue {
|
|
|
|
struct rcu_head rcu;
|
|
|
|
struct watch_filter __rcu *filter;
|
|
|
|
struct pipe_inode_info *pipe; /* The pipe we're using as a buffer */
|
|
|
|
struct hlist_head watches; /* Contributory watches */
|
|
|
|
struct page **notes; /* Preallocated notifications */
|
|
|
|
unsigned long *notes_bitmap; /* Allocation bitmap for notes */
|
|
|
|
struct kref usage; /* Object usage count */
|
|
|
|
spinlock_t lock;
|
|
|
|
unsigned int nr_notes; /* Number of notes */
|
|
|
|
unsigned int nr_pages; /* Number of pages in notes[] */
|
|
|
|
bool defunct; /* T when queues closed */
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Representation of a watch on an object.
|
|
|
|
*/
|
|
|
|
struct watch {
|
|
|
|
union {
|
|
|
|
struct rcu_head rcu;
|
|
|
|
u32 info_id; /* ID to be OR'd in to info field */
|
|
|
|
};
|
|
|
|
struct watch_queue __rcu *queue; /* Queue to post events to */
|
|
|
|
struct hlist_node queue_node; /* Link in queue->watches */
|
|
|
|
struct watch_list __rcu *watch_list;
|
|
|
|
struct hlist_node list_node; /* Link in watch_list->watchers */
|
|
|
|
const struct cred *cred; /* Creds of the owner of the watch */
|
|
|
|
void *private; /* Private data for the watched object */
|
|
|
|
u64 id; /* Internal identifier */
|
|
|
|
struct kref usage; /* Object usage count */
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* List of watches on an object.
|
|
|
|
*/
|
|
|
|
struct watch_list {
|
|
|
|
struct rcu_head rcu;
|
|
|
|
struct hlist_head watchers;
|
|
|
|
void (*release_watch)(struct watch *);
|
|
|
|
spinlock_t lock;
|
|
|
|
};
|
|
|
|
|
|
|
|
extern void __post_watch_notification(struct watch_list *,
|
|
|
|
struct watch_notification *,
|
|
|
|
const struct cred *,
|
|
|
|
u64);
|
|
|
|
extern struct watch_queue *get_watch_queue(int);
|
|
|
|
extern void put_watch_queue(struct watch_queue *);
|
|
|
|
extern void init_watch(struct watch *, struct watch_queue *);
|
|
|
|
extern int add_watch_to_object(struct watch *, struct watch_list *);
|
|
|
|
extern int remove_watch_from_object(struct watch_list *, struct watch_queue *, u64, bool);
|
|
|
|
extern long watch_queue_set_size(struct pipe_inode_info *, unsigned int);
|
|
|
|
extern long watch_queue_set_filter(struct pipe_inode_info *,
|
|
|
|
struct watch_notification_filter __user *);
|
|
|
|
extern int watch_queue_init(struct pipe_inode_info *);
|
|
|
|
extern void watch_queue_clear(struct watch_queue *);
|
|
|
|
|
|
|
|
static inline void init_watch_list(struct watch_list *wlist,
|
|
|
|
void (*release_watch)(struct watch *))
|
|
|
|
{
|
|
|
|
INIT_HLIST_HEAD(&wlist->watchers);
|
|
|
|
spin_lock_init(&wlist->lock);
|
|
|
|
wlist->release_watch = release_watch;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void post_watch_notification(struct watch_list *wlist,
|
|
|
|
struct watch_notification *n,
|
|
|
|
const struct cred *cred,
|
|
|
|
u64 id)
|
|
|
|
{
|
|
|
|
if (unlikely(wlist))
|
|
|
|
__post_watch_notification(wlist, n, cred, id);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void remove_watch_list(struct watch_list *wlist, u64 id)
|
|
|
|
{
|
|
|
|
if (wlist) {
|
|
|
|
remove_watch_from_object(wlist, NULL, id, true);
|
|
|
|
kfree_rcu(wlist, rcu);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* watch_sizeof - Calculate the information part of the size of a watch record,
|
|
|
|
* given the structure size.
|
|
|
|
*/
|
|
|
|
#define watch_sizeof(STRUCT) (sizeof(STRUCT) << WATCH_INFO_LENGTH__SHIFT)
|
|
|
|
|
2020-10-01 20:50:55 +08:00
|
|
|
#else
|
|
|
|
static inline int watch_queue_init(struct pipe_inode_info *pipe)
|
|
|
|
{
|
|
|
|
return -ENOPKG;
|
|
|
|
}
|
|
|
|
|
pipe: Add general notification queue support
Make it possible to have a general notification queue built on top of a
standard pipe. Notifications are 'spliced' into the pipe and then read
out. splice(), vmsplice() and sendfile() are forbidden on pipes used for
notifications as post_one_notification() cannot take pipe->mutex. This
means that notifications could be posted in between individual pipe
buffers, making iov_iter_revert() difficult to effect.
The way the notification queue is used is:
(1) An application opens a pipe with a special flag and indicates the
number of messages it wishes to be able to queue at once (this can
only be set once):
pipe2(fds, O_NOTIFICATION_PIPE);
ioctl(fds[0], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
(2) The application then uses poll() and read() as normal to extract data
from the pipe. read() will return multiple notifications if the
buffer is big enough, but it will not split a notification across
buffers - rather it will return a short read or EMSGSIZE.
Notification messages include a length in the header so that the
caller can split them up.
Each message has a header that describes it:
struct watch_notification {
__u32 type:24;
__u32 subtype:8;
__u32 info;
};
The type indicates the source (eg. mount tree changes, superblock events,
keyring changes, block layer events) and the subtype indicates the event
type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field
indicates a number of things, including the entry length, an ID assigned to
a watchpoint contributing to this buffer and type-specific flags.
Supplementary data, such as the key ID that generated an event, can be
attached in additional slots. The maximum message size is 127 bytes.
Messages may not be padded or aligned, so there is no guarantee, for
example, that the notification type will be on a 4-byte bounary.
Signed-off-by: David Howells <dhowells@redhat.com>
2020-01-15 01:07:11 +08:00
|
|
|
#endif
|
|
|
|
|
|
|
|
#endif /* _LINUX_WATCH_QUEUE_H */
|