2018-11-13 18:19:29 +08:00
|
|
|
# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
|
|
|
|
#
|
|
|
|
# system call numbers and entry vectors for xtensa
|
|
|
|
#
|
|
|
|
# The format is:
|
|
|
|
# <number> <abi> <name> <entry point>
|
|
|
|
#
|
|
|
|
# The <abi> is always "common" for this file
|
|
|
|
#
|
|
|
|
0 common spill sys_ni_syscall
|
|
|
|
1 common xtensa sys_ni_syscall
|
|
|
|
2 common available4 sys_ni_syscall
|
|
|
|
3 common available5 sys_ni_syscall
|
|
|
|
4 common available6 sys_ni_syscall
|
|
|
|
5 common available7 sys_ni_syscall
|
|
|
|
6 common available8 sys_ni_syscall
|
|
|
|
7 common available9 sys_ni_syscall
|
|
|
|
# File Operations
|
|
|
|
8 common open sys_open
|
|
|
|
9 common close sys_close
|
|
|
|
10 common dup sys_dup
|
|
|
|
11 common dup2 sys_dup2
|
|
|
|
12 common read sys_read
|
|
|
|
13 common write sys_write
|
|
|
|
14 common select sys_select
|
|
|
|
15 common lseek sys_lseek
|
|
|
|
16 common poll sys_poll
|
|
|
|
17 common _llseek sys_llseek
|
|
|
|
18 common epoll_wait sys_epoll_wait
|
|
|
|
19 common epoll_ctl sys_epoll_ctl
|
|
|
|
20 common epoll_create sys_epoll_create
|
|
|
|
21 common creat sys_creat
|
|
|
|
22 common truncate sys_truncate
|
|
|
|
23 common ftruncate sys_ftruncate
|
|
|
|
24 common readv sys_readv
|
|
|
|
25 common writev sys_writev
|
|
|
|
26 common fsync sys_fsync
|
|
|
|
27 common fdatasync sys_fdatasync
|
|
|
|
28 common truncate64 sys_truncate64
|
|
|
|
29 common ftruncate64 sys_ftruncate64
|
|
|
|
30 common pread64 sys_pread64
|
|
|
|
31 common pwrite64 sys_pwrite64
|
|
|
|
32 common link sys_link
|
|
|
|
33 common rename sys_rename
|
|
|
|
34 common symlink sys_symlink
|
|
|
|
35 common readlink sys_readlink
|
|
|
|
36 common mknod sys_mknod
|
|
|
|
37 common pipe sys_pipe
|
|
|
|
38 common unlink sys_unlink
|
|
|
|
39 common rmdir sys_rmdir
|
|
|
|
40 common mkdir sys_mkdir
|
|
|
|
41 common chdir sys_chdir
|
|
|
|
42 common fchdir sys_fchdir
|
|
|
|
43 common getcwd sys_getcwd
|
|
|
|
44 common chmod sys_chmod
|
|
|
|
45 common chown sys_chown
|
|
|
|
46 common stat sys_newstat
|
|
|
|
47 common stat64 sys_stat64
|
|
|
|
48 common lchown sys_lchown
|
|
|
|
49 common lstat sys_newlstat
|
|
|
|
50 common lstat64 sys_lstat64
|
|
|
|
51 common available51 sys_ni_syscall
|
|
|
|
52 common fchmod sys_fchmod
|
|
|
|
53 common fchown sys_fchown
|
|
|
|
54 common fstat sys_newfstat
|
|
|
|
55 common fstat64 sys_fstat64
|
|
|
|
56 common flock sys_flock
|
|
|
|
57 common access sys_access
|
|
|
|
58 common umask sys_umask
|
|
|
|
59 common getdents sys_getdents
|
|
|
|
60 common getdents64 sys_getdents64
|
|
|
|
61 common fcntl64 sys_fcntl64
|
|
|
|
62 common fallocate sys_fallocate
|
|
|
|
63 common fadvise64_64 xtensa_fadvise64_64
|
y2038: rename old time and utime syscalls
The time, stime, utime, utimes, and futimesat system calls are only
used on older architectures, and we do not provide y2038 safe variants
of them, as they are replaced by clock_gettime64, clock_settime64,
and utimensat_time64.
However, for consistency it seems better to have the 32-bit architectures
that still use them call the "time32" entry points (leaving the
traditional handlers for the 64-bit architectures), like we do for system
calls that now require two versions.
Note: We used to always define __ARCH_WANT_SYS_TIME and
__ARCH_WANT_SYS_UTIME and only set __ARCH_WANT_COMPAT_SYS_TIME and
__ARCH_WANT_SYS_UTIME32 for compat mode on 64-bit kernels. Now this is
reversed: only 64-bit architectures set __ARCH_WANT_SYS_TIME/UTIME, while
we need __ARCH_WANT_SYS_TIME32/UTIME32 for 32-bit architectures and compat
mode. The resulting asm/unistd.h changes look a bit counterintuitive.
This is only a cleanup patch and it should not change any behavior.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-01-07 06:45:29 +08:00
|
|
|
64 common utime sys_utime32
|
|
|
|
65 common utimes sys_utimes_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
66 common ioctl sys_ioctl
|
|
|
|
67 common fcntl sys_fcntl
|
|
|
|
68 common setxattr sys_setxattr
|
|
|
|
69 common getxattr sys_getxattr
|
|
|
|
70 common listxattr sys_listxattr
|
|
|
|
71 common removexattr sys_removexattr
|
|
|
|
72 common lsetxattr sys_lsetxattr
|
|
|
|
73 common lgetxattr sys_lgetxattr
|
|
|
|
74 common llistxattr sys_llistxattr
|
|
|
|
75 common lremovexattr sys_lremovexattr
|
|
|
|
76 common fsetxattr sys_fsetxattr
|
|
|
|
77 common fgetxattr sys_fgetxattr
|
|
|
|
78 common flistxattr sys_flistxattr
|
|
|
|
79 common fremovexattr sys_fremovexattr
|
|
|
|
# File Map / Shared Memory Operations
|
|
|
|
80 common mmap2 sys_mmap_pgoff
|
|
|
|
81 common munmap sys_munmap
|
|
|
|
82 common mprotect sys_mprotect
|
|
|
|
83 common brk sys_brk
|
|
|
|
84 common mlock sys_mlock
|
|
|
|
85 common munlock sys_munlock
|
|
|
|
86 common mlockall sys_mlockall
|
|
|
|
87 common munlockall sys_munlockall
|
|
|
|
88 common mremap sys_mremap
|
|
|
|
89 common msync sys_msync
|
|
|
|
90 common mincore sys_mincore
|
|
|
|
91 common madvise sys_madvise
|
|
|
|
92 common shmget sys_shmget
|
|
|
|
93 common shmat xtensa_shmat
|
ipc: rename old-style shmctl/semctl/msgctl syscalls
The behavior of these system calls is slightly different between
architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
symbol. Most architectures that implement the split IPC syscalls don't set
that symbol and only get the modern version, but alpha, arm, microblaze,
mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.
For the architectures that so far only implement sys_ipc(), i.e. m68k,
mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
when adding the split syscalls, so we need to distinguish between the
two groups of architectures.
The method I picked for this distinction is to have a separate system call
entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
does not. The system call tables of the five architectures are changed
accordingly.
As an additional benefit, we no longer need the configuration specific
definition for ipc_parse_version(), it always does the same thing now,
but simply won't get called on architectures with the modern interface.
A small downside is that on architectures that do set
ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
that are never called. They only add a few bytes of bloat, so it seems
better to keep them compared to adding yet another Kconfig symbol.
I considered adding new syscall numbers for the IPC_64 variants for
consistency, but decided against that for now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-01-01 05:22:40 +08:00
|
|
|
94 common shmctl sys_old_shmctl
|
2018-11-13 18:19:29 +08:00
|
|
|
95 common shmdt sys_shmdt
|
|
|
|
# Socket Operations
|
|
|
|
96 common socket sys_socket
|
|
|
|
97 common setsockopt sys_setsockopt
|
|
|
|
98 common getsockopt sys_getsockopt
|
|
|
|
99 common shutdown sys_shutdown
|
|
|
|
100 common bind sys_bind
|
|
|
|
101 common connect sys_connect
|
|
|
|
102 common listen sys_listen
|
|
|
|
103 common accept sys_accept
|
|
|
|
104 common getsockname sys_getsockname
|
|
|
|
105 common getpeername sys_getpeername
|
|
|
|
106 common sendmsg sys_sendmsg
|
|
|
|
107 common recvmsg sys_recvmsg
|
|
|
|
108 common send sys_send
|
|
|
|
109 common recv sys_recv
|
|
|
|
110 common sendto sys_sendto
|
|
|
|
111 common recvfrom sys_recvfrom
|
|
|
|
112 common socketpair sys_socketpair
|
|
|
|
113 common sendfile sys_sendfile
|
|
|
|
114 common sendfile64 sys_sendfile64
|
|
|
|
115 common sendmmsg sys_sendmmsg
|
|
|
|
# Process Operations
|
|
|
|
116 common clone sys_clone
|
|
|
|
117 common execve sys_execve
|
|
|
|
118 common exit sys_exit
|
|
|
|
119 common exit_group sys_exit_group
|
|
|
|
120 common getpid sys_getpid
|
|
|
|
121 common wait4 sys_wait4
|
|
|
|
122 common waitid sys_waitid
|
|
|
|
123 common kill sys_kill
|
|
|
|
124 common tkill sys_tkill
|
|
|
|
125 common tgkill sys_tgkill
|
|
|
|
126 common set_tid_address sys_set_tid_address
|
|
|
|
127 common gettid sys_gettid
|
|
|
|
128 common setsid sys_setsid
|
|
|
|
129 common getsid sys_getsid
|
|
|
|
130 common prctl sys_prctl
|
|
|
|
131 common personality sys_personality
|
|
|
|
132 common getpriority sys_getpriority
|
|
|
|
133 common setpriority sys_setpriority
|
|
|
|
134 common setitimer sys_setitimer
|
|
|
|
135 common getitimer sys_getitimer
|
|
|
|
136 common setuid sys_setuid
|
|
|
|
137 common getuid sys_getuid
|
|
|
|
138 common setgid sys_setgid
|
|
|
|
139 common getgid sys_getgid
|
|
|
|
140 common geteuid sys_geteuid
|
|
|
|
141 common getegid sys_getegid
|
|
|
|
142 common setreuid sys_setreuid
|
|
|
|
143 common setregid sys_setregid
|
|
|
|
144 common setresuid sys_setresuid
|
|
|
|
145 common getresuid sys_getresuid
|
|
|
|
146 common setresgid sys_setresgid
|
|
|
|
147 common getresgid sys_getresgid
|
|
|
|
148 common setpgid sys_setpgid
|
|
|
|
149 common getpgid sys_getpgid
|
|
|
|
150 common getppid sys_getppid
|
|
|
|
151 common getpgrp sys_getpgrp
|
|
|
|
# 152 was set_thread_area
|
|
|
|
152 common reserved152 sys_ni_syscall
|
|
|
|
# 153 was get_thread_area
|
|
|
|
153 common reserved153 sys_ni_syscall
|
|
|
|
154 common times sys_times
|
|
|
|
155 common acct sys_acct
|
|
|
|
156 common sched_setaffinity sys_sched_setaffinity
|
|
|
|
157 common sched_getaffinity sys_sched_getaffinity
|
|
|
|
158 common capget sys_capget
|
|
|
|
159 common capset sys_capset
|
|
|
|
160 common ptrace sys_ptrace
|
2019-01-01 08:13:32 +08:00
|
|
|
161 common semtimedop sys_semtimedop_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
162 common semget sys_semget
|
|
|
|
163 common semop sys_semop
|
ipc: rename old-style shmctl/semctl/msgctl syscalls
The behavior of these system calls is slightly different between
architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
symbol. Most architectures that implement the split IPC syscalls don't set
that symbol and only get the modern version, but alpha, arm, microblaze,
mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.
For the architectures that so far only implement sys_ipc(), i.e. m68k,
mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
when adding the split syscalls, so we need to distinguish between the
two groups of architectures.
The method I picked for this distinction is to have a separate system call
entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
does not. The system call tables of the five architectures are changed
accordingly.
As an additional benefit, we no longer need the configuration specific
definition for ipc_parse_version(), it always does the same thing now,
but simply won't get called on architectures with the modern interface.
A small downside is that on architectures that do set
ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
that are never called. They only add a few bytes of bloat, so it seems
better to keep them compared to adding yet another Kconfig symbol.
I considered adding new syscall numbers for the IPC_64 variants for
consistency, but decided against that for now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-01-01 05:22:40 +08:00
|
|
|
164 common semctl sys_old_semctl
|
2018-11-13 18:19:29 +08:00
|
|
|
165 common available165 sys_ni_syscall
|
|
|
|
166 common msgget sys_msgget
|
|
|
|
167 common msgsnd sys_msgsnd
|
|
|
|
168 common msgrcv sys_msgrcv
|
ipc: rename old-style shmctl/semctl/msgctl syscalls
The behavior of these system calls is slightly different between
architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
symbol. Most architectures that implement the split IPC syscalls don't set
that symbol and only get the modern version, but alpha, arm, microblaze,
mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.
For the architectures that so far only implement sys_ipc(), i.e. m68k,
mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
when adding the split syscalls, so we need to distinguish between the
two groups of architectures.
The method I picked for this distinction is to have a separate system call
entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
does not. The system call tables of the five architectures are changed
accordingly.
As an additional benefit, we no longer need the configuration specific
definition for ipc_parse_version(), it always does the same thing now,
but simply won't get called on architectures with the modern interface.
A small downside is that on architectures that do set
ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
that are never called. They only add a few bytes of bloat, so it seems
better to keep them compared to adding yet another Kconfig symbol.
I considered adding new syscall numbers for the IPC_64 variants for
consistency, but decided against that for now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-01-01 05:22:40 +08:00
|
|
|
169 common msgctl sys_old_msgctl
|
2018-11-13 18:19:29 +08:00
|
|
|
170 common available170 sys_ni_syscall
|
|
|
|
# File System
|
|
|
|
171 common umount2 sys_umount
|
|
|
|
172 common mount sys_mount
|
|
|
|
173 common swapon sys_swapon
|
|
|
|
174 common chroot sys_chroot
|
|
|
|
175 common pivot_root sys_pivot_root
|
|
|
|
176 common umount sys_oldumount
|
|
|
|
177 common swapoff sys_swapoff
|
|
|
|
178 common sync sys_sync
|
|
|
|
179 common syncfs sys_syncfs
|
|
|
|
180 common setfsuid sys_setfsuid
|
|
|
|
181 common setfsgid sys_setfsgid
|
|
|
|
182 common sysfs sys_sysfs
|
|
|
|
183 common ustat sys_ustat
|
|
|
|
184 common statfs sys_statfs
|
|
|
|
185 common fstatfs sys_fstatfs
|
|
|
|
186 common statfs64 sys_statfs64
|
|
|
|
187 common fstatfs64 sys_fstatfs64
|
|
|
|
# System
|
|
|
|
188 common setrlimit sys_setrlimit
|
|
|
|
189 common getrlimit sys_getrlimit
|
|
|
|
190 common getrusage sys_getrusage
|
2019-01-01 08:13:32 +08:00
|
|
|
191 common futex sys_futex_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
192 common gettimeofday sys_gettimeofday
|
|
|
|
193 common settimeofday sys_settimeofday
|
2019-01-01 08:13:32 +08:00
|
|
|
194 common adjtimex sys_adjtimex_time32
|
|
|
|
195 common nanosleep sys_nanosleep_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
196 common getgroups sys_getgroups
|
|
|
|
197 common setgroups sys_setgroups
|
|
|
|
198 common sethostname sys_sethostname
|
|
|
|
199 common setdomainname sys_setdomainname
|
|
|
|
200 common syslog sys_syslog
|
|
|
|
201 common vhangup sys_vhangup
|
|
|
|
202 common uselib sys_uselib
|
|
|
|
203 common reboot sys_reboot
|
|
|
|
204 common quotactl sys_quotactl
|
|
|
|
# 205 was old nfsservctl
|
|
|
|
205 common nfsservctl sys_ni_syscall
|
2020-08-15 08:31:07 +08:00
|
|
|
206 common _sysctl sys_ni_syscall
|
2018-11-13 18:19:29 +08:00
|
|
|
207 common bdflush sys_bdflush
|
|
|
|
208 common uname sys_newuname
|
|
|
|
209 common sysinfo sys_sysinfo
|
|
|
|
210 common init_module sys_init_module
|
|
|
|
211 common delete_module sys_delete_module
|
|
|
|
212 common sched_setparam sys_sched_setparam
|
|
|
|
213 common sched_getparam sys_sched_getparam
|
|
|
|
214 common sched_setscheduler sys_sched_setscheduler
|
|
|
|
215 common sched_getscheduler sys_sched_getscheduler
|
|
|
|
216 common sched_get_priority_max sys_sched_get_priority_max
|
|
|
|
217 common sched_get_priority_min sys_sched_get_priority_min
|
2019-01-01 08:13:32 +08:00
|
|
|
218 common sched_rr_get_interval sys_sched_rr_get_interval_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
219 common sched_yield sys_sched_yield
|
|
|
|
222 common available222 sys_ni_syscall
|
|
|
|
# Signal Handling
|
|
|
|
223 common restart_syscall sys_restart_syscall
|
|
|
|
224 common sigaltstack sys_sigaltstack
|
|
|
|
225 common rt_sigreturn xtensa_rt_sigreturn
|
|
|
|
226 common rt_sigaction sys_rt_sigaction
|
|
|
|
227 common rt_sigprocmask sys_rt_sigprocmask
|
|
|
|
228 common rt_sigpending sys_rt_sigpending
|
2019-01-01 08:13:32 +08:00
|
|
|
229 common rt_sigtimedwait sys_rt_sigtimedwait_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
230 common rt_sigqueueinfo sys_rt_sigqueueinfo
|
|
|
|
231 common rt_sigsuspend sys_rt_sigsuspend
|
|
|
|
# Message
|
|
|
|
232 common mq_open sys_mq_open
|
|
|
|
233 common mq_unlink sys_mq_unlink
|
2019-01-01 08:13:32 +08:00
|
|
|
234 common mq_timedsend sys_mq_timedsend_time32
|
|
|
|
235 common mq_timedreceive sys_mq_timedreceive_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
236 common mq_notify sys_mq_notify
|
|
|
|
237 common mq_getsetattr sys_mq_getsetattr
|
|
|
|
238 common available238 sys_ni_syscall
|
|
|
|
239 common io_setup sys_io_setup
|
|
|
|
# IO
|
|
|
|
240 common io_destroy sys_io_destroy
|
|
|
|
241 common io_submit sys_io_submit
|
2019-01-01 08:13:32 +08:00
|
|
|
242 common io_getevents sys_io_getevents_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
243 common io_cancel sys_io_cancel
|
2019-01-01 08:13:32 +08:00
|
|
|
244 common clock_settime sys_clock_settime32
|
|
|
|
245 common clock_gettime sys_clock_gettime32
|
|
|
|
246 common clock_getres sys_clock_getres_time32
|
|
|
|
247 common clock_nanosleep sys_clock_nanosleep_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
# Timer
|
|
|
|
248 common timer_create sys_timer_create
|
|
|
|
249 common timer_delete sys_timer_delete
|
2019-01-01 08:13:32 +08:00
|
|
|
250 common timer_settime sys_timer_settime32
|
|
|
|
251 common timer_gettime sys_timer_gettime32
|
2018-11-13 18:19:29 +08:00
|
|
|
252 common timer_getoverrun sys_timer_getoverrun
|
|
|
|
# System
|
|
|
|
253 common reserved253 sys_ni_syscall
|
|
|
|
254 common lookup_dcookie sys_lookup_dcookie
|
|
|
|
255 common available255 sys_ni_syscall
|
|
|
|
256 common add_key sys_add_key
|
|
|
|
257 common request_key sys_request_key
|
|
|
|
258 common keyctl sys_keyctl
|
|
|
|
259 common available259 sys_ni_syscall
|
|
|
|
260 common readahead sys_readahead
|
|
|
|
261 common remap_file_pages sys_remap_file_pages
|
|
|
|
262 common migrate_pages sys_migrate_pages
|
|
|
|
263 common mbind sys_mbind
|
|
|
|
264 common get_mempolicy sys_get_mempolicy
|
|
|
|
265 common set_mempolicy sys_set_mempolicy
|
|
|
|
266 common unshare sys_unshare
|
|
|
|
267 common move_pages sys_move_pages
|
|
|
|
268 common splice sys_splice
|
|
|
|
269 common tee sys_tee
|
|
|
|
270 common vmsplice sys_vmsplice
|
|
|
|
271 common available271 sys_ni_syscall
|
2019-01-01 08:13:32 +08:00
|
|
|
272 common pselect6 sys_pselect6_time32
|
|
|
|
273 common ppoll sys_ppoll_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
274 common epoll_pwait sys_epoll_pwait
|
|
|
|
275 common epoll_create1 sys_epoll_create1
|
|
|
|
276 common inotify_init sys_inotify_init
|
|
|
|
277 common inotify_add_watch sys_inotify_add_watch
|
|
|
|
278 common inotify_rm_watch sys_inotify_rm_watch
|
|
|
|
279 common inotify_init1 sys_inotify_init1
|
|
|
|
280 common getcpu sys_getcpu
|
|
|
|
281 common kexec_load sys_ni_syscall
|
|
|
|
282 common ioprio_set sys_ioprio_set
|
|
|
|
283 common ioprio_get sys_ioprio_get
|
|
|
|
284 common set_robust_list sys_set_robust_list
|
|
|
|
285 common get_robust_list sys_get_robust_list
|
|
|
|
286 common available286 sys_ni_syscall
|
|
|
|
287 common available287 sys_ni_syscall
|
|
|
|
# Relative File Operations
|
|
|
|
288 common openat sys_openat
|
|
|
|
289 common mkdirat sys_mkdirat
|
|
|
|
290 common mknodat sys_mknodat
|
|
|
|
291 common unlinkat sys_unlinkat
|
|
|
|
292 common renameat sys_renameat
|
|
|
|
293 common linkat sys_linkat
|
|
|
|
294 common symlinkat sys_symlinkat
|
|
|
|
295 common readlinkat sys_readlinkat
|
2019-01-01 08:13:32 +08:00
|
|
|
296 common utimensat sys_utimensat_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
297 common fchownat sys_fchownat
|
y2038: rename old time and utime syscalls
The time, stime, utime, utimes, and futimesat system calls are only
used on older architectures, and we do not provide y2038 safe variants
of them, as they are replaced by clock_gettime64, clock_settime64,
and utimensat_time64.
However, for consistency it seems better to have the 32-bit architectures
that still use them call the "time32" entry points (leaving the
traditional handlers for the 64-bit architectures), like we do for system
calls that now require two versions.
Note: We used to always define __ARCH_WANT_SYS_TIME and
__ARCH_WANT_SYS_UTIME and only set __ARCH_WANT_COMPAT_SYS_TIME and
__ARCH_WANT_SYS_UTIME32 for compat mode on 64-bit kernels. Now this is
reversed: only 64-bit architectures set __ARCH_WANT_SYS_TIME/UTIME, while
we need __ARCH_WANT_SYS_TIME32/UTIME32 for 32-bit architectures and compat
mode. The resulting asm/unistd.h changes look a bit counterintuitive.
This is only a cleanup patch and it should not change any behavior.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-01-07 06:45:29 +08:00
|
|
|
298 common futimesat sys_futimesat_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
299 common fstatat64 sys_fstatat64
|
|
|
|
300 common fchmodat sys_fchmodat
|
|
|
|
301 common faccessat sys_faccessat
|
|
|
|
302 common available302 sys_ni_syscall
|
|
|
|
303 common available303 sys_ni_syscall
|
|
|
|
304 common signalfd sys_signalfd
|
|
|
|
# 305 was timerfd
|
|
|
|
306 common eventfd sys_eventfd
|
2019-01-01 08:13:32 +08:00
|
|
|
307 common recvmmsg sys_recvmmsg_time32
|
2018-11-13 18:19:29 +08:00
|
|
|
308 common setns sys_setns
|
|
|
|
309 common signalfd4 sys_signalfd4
|
|
|
|
310 common dup3 sys_dup3
|
|
|
|
311 common pipe2 sys_pipe2
|
|
|
|
312 common timerfd_create sys_timerfd_create
|
2019-01-01 08:13:32 +08:00
|
|
|
313 common timerfd_settime sys_timerfd_settime32
|
|
|
|
314 common timerfd_gettime sys_timerfd_gettime32
|
2018-11-13 18:19:29 +08:00
|
|
|
315 common available315 sys_ni_syscall
|
|
|
|
316 common eventfd2 sys_eventfd2
|
|
|
|
317 common preadv sys_preadv
|
|
|
|
318 common pwritev sys_pwritev
|
|
|
|
319 common available319 sys_ni_syscall
|
|
|
|
320 common fanotify_init sys_fanotify_init
|
|
|
|
321 common fanotify_mark sys_fanotify_mark
|
|
|
|
322 common process_vm_readv sys_process_vm_readv
|
|
|
|
323 common process_vm_writev sys_process_vm_writev
|
|
|
|
324 common name_to_handle_at sys_name_to_handle_at
|
|
|
|
325 common open_by_handle_at sys_open_by_handle_at
|
|
|
|
326 common sync_file_range2 sys_sync_file_range2
|
|
|
|
327 common perf_event_open sys_perf_event_open
|
|
|
|
328 common rt_tgsigqueueinfo sys_rt_tgsigqueueinfo
|
2019-01-01 08:13:32 +08:00
|
|
|
329 common clock_adjtime sys_clock_adjtime32
|
2018-11-13 18:19:29 +08:00
|
|
|
330 common prlimit64 sys_prlimit64
|
|
|
|
331 common kcmp sys_kcmp
|
|
|
|
332 common finit_module sys_finit_module
|
|
|
|
333 common accept4 sys_accept4
|
|
|
|
334 common sched_setattr sys_sched_setattr
|
|
|
|
335 common sched_getattr sys_sched_getattr
|
|
|
|
336 common renameat2 sys_renameat2
|
|
|
|
337 common seccomp sys_seccomp
|
|
|
|
338 common getrandom sys_getrandom
|
|
|
|
339 common memfd_create sys_memfd_create
|
|
|
|
340 common bpf sys_bpf
|
|
|
|
341 common execveat sys_execveat
|
|
|
|
342 common userfaultfd sys_userfaultfd
|
|
|
|
343 common membarrier sys_membarrier
|
|
|
|
344 common mlock2 sys_mlock2
|
|
|
|
345 common copy_file_range sys_copy_file_range
|
|
|
|
346 common preadv2 sys_preadv2
|
|
|
|
347 common pwritev2 sys_pwritev2
|
|
|
|
348 common pkey_mprotect sys_pkey_mprotect
|
|
|
|
349 common pkey_alloc sys_pkey_alloc
|
|
|
|
350 common pkey_free sys_pkey_free
|
|
|
|
351 common statx sys_statx
|
2019-01-01 06:12:32 +08:00
|
|
|
352 common rseq sys_rseq
|
2019-01-10 19:45:11 +08:00
|
|
|
# 353 through 402 are unassigned to sync up with generic numbers
|
|
|
|
403 common clock_gettime64 sys_clock_gettime
|
|
|
|
404 common clock_settime64 sys_clock_settime
|
|
|
|
405 common clock_adjtime64 sys_clock_adjtime
|
|
|
|
406 common clock_getres_time64 sys_clock_getres
|
|
|
|
407 common clock_nanosleep_time64 sys_clock_nanosleep
|
|
|
|
408 common timer_gettime64 sys_timer_gettime
|
|
|
|
409 common timer_settime64 sys_timer_settime
|
|
|
|
410 common timerfd_gettime64 sys_timerfd_gettime
|
|
|
|
411 common timerfd_settime64 sys_timerfd_settime
|
|
|
|
412 common utimensat_time64 sys_utimensat
|
|
|
|
413 common pselect6_time64 sys_pselect6
|
|
|
|
414 common ppoll_time64 sys_ppoll
|
|
|
|
416 common io_pgetevents_time64 sys_io_pgetevents
|
|
|
|
417 common recvmmsg_time64 sys_recvmmsg
|
|
|
|
418 common mq_timedsend_time64 sys_mq_timedsend
|
|
|
|
419 common mq_timedreceive_time64 sys_mq_timedreceive
|
|
|
|
420 common semtimedop_time64 sys_semtimedop
|
|
|
|
421 common rt_sigtimedwait_time64 sys_rt_sigtimedwait
|
|
|
|
422 common futex_time64 sys_futex
|
|
|
|
423 common sched_rr_get_interval_time64 sys_sched_rr_get_interval
|
2019-02-28 20:59:19 +08:00
|
|
|
424 common pidfd_send_signal sys_pidfd_send_signal
|
|
|
|
425 common io_uring_setup sys_io_uring_setup
|
|
|
|
426 common io_uring_enter sys_io_uring_enter
|
|
|
|
427 common io_uring_register sys_io_uring_register
|
2019-05-16 19:52:34 +08:00
|
|
|
428 common open_tree sys_open_tree
|
|
|
|
429 common move_mount sys_move_mount
|
|
|
|
430 common fsopen sys_fsopen
|
|
|
|
431 common fsconfig sys_fsconfig
|
|
|
|
432 common fsmount sys_fsmount
|
|
|
|
433 common fspick sys_fspick
|
2019-05-24 18:44:59 +08:00
|
|
|
434 common pidfd_open sys_pidfd_open
|
2019-07-12 01:09:44 +08:00
|
|
|
435 common clone3 sys_clone3
|
2019-05-24 17:31:44 +08:00
|
|
|
436 common close_range sys_close_range
|
open: introduce openat2(2) syscall
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).
Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.
In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.
Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).
/* Syscall Prototype. */
/*
* open_how is an extensible structure (similar in interface to
* clone3(2) or sched_setattr(2)). The size parameter must be set to
* sizeof(struct open_how), to allow for future extensions. All future
* extensions will be appended to open_how, with their zero value
* acting as a no-op default.
*/
struct open_how { /* ... */ };
int openat2(int dfd, const char *pathname,
struct open_how *how, size_t size);
/* Description. */
The initial version of 'struct open_how' contains the following fields:
flags
Used to specify openat(2)-style flags. However, any unknown flag
bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
will result in -EINVAL. In addition, this field is 64-bits wide to
allow for more O_ flags than currently permitted with openat(2).
mode
The file mode for O_CREAT or O_TMPFILE.
Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.
resolve
Restrict path resolution (in contrast to O_* flags they affect all
path components). The current set of flags are as follows (at the
moment, all of the RESOLVE_ flags are implemented as just passing
the corresponding LOOKUP_ flag).
RESOLVE_NO_XDEV => LOOKUP_NO_XDEV
RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS
RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
RESOLVE_BENEATH => LOOKUP_BENEATH
RESOLVE_IN_ROOT => LOOKUP_IN_ROOT
open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.
Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).
After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.
/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.
In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).
/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).
Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.
Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).
[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb834 ("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
[6]: https://youtu.be/ggD-eb3yPVs
Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-18 20:07:59 +08:00
|
|
|
437 common openat2 sys_openat2
|
2020-01-08 01:59:26 +08:00
|
|
|
438 common pidfd_getfd sys_pidfd_getfd
|
2020-05-14 22:44:25 +08:00
|
|
|
439 common faccessat2 sys_faccessat2
|
mm/madvise: introduce process_madvise() syscall: an external memory hinting API
There is usecase that System Management Software(SMS) want to give a
memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
case of Android, it is the ActivityManagerService.
The information required to make the reclaim decision is not known to the
app. Instead, it is known to the centralized userspace
daemon(ActivityManagerService), and that daemon must be able to initiate
reclaim on its own without any app involvement.
To solve the issue, this patch introduces a new syscall
process_madvise(2). It uses pidfd of an external process to give the
hint. It also supports vector address range because Android app has
thousands of vmas due to zygote so it's totally waste of CPU and power if
we should call the syscall one by one for each vma.(With testing 2000-vma
syscall vs 1-vector syscall, it showed 15% performance improvement. I
think it would be bigger in real practice because the testing ran very
cache friendly environment).
Another potential use case for the vector range is to amortize the cost
ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
benefit users like TCP receive zerocopy and malloc implementations. In
future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment. With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.
ince it could affect other process's address range, only privileged
process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
UID) gives it the right to ptrace the process could use it successfully.
The flag argument is reserved for future use if we need to extend the API.
I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky. Because we are not sure all hints make
sense from external process and implementation for the hint may rely on
the caller being in the current context so it could be error-prone. Thus,
I just limited hints as MADV_[COLD|PAGEOUT] in this patch.
If someone want to add other hints, we could hear the usecase and review
it for each hint. It's safer for maintenance rather than introducing a
buggy syscall but hard to fix it later.
So finally, the API is as follows,
ssize_t process_madvise(int pidfd, const struct iovec *iovec,
unsigned long vlen, int advice, unsigned int flags);
DESCRIPTION
The process_madvise() system call is used to give advice or directions
to the kernel about the address ranges from external process as well as
local process. It provides the advice to address ranges of process
described by iovec and vlen. The goal of such advice is to improve
system or application performance.
The pidfd selects the process referred to by the PID file descriptor
specified in pidfd. (See pidofd_open(2) for further information)
The pointer iovec points to an array of iovec structures, defined in
<sys/uio.h> as:
struct iovec {
void *iov_base; /* starting address */
size_t iov_len; /* number of bytes to be advised */
};
The iovec describes address ranges beginning at address(iov_base)
and with size length of bytes(iov_len).
The vlen represents the number of elements in iovec.
The advice is indicated in the advice argument, which is one of the
following at this moment if the target process specified by pidfd is
external.
MADV_COLD
MADV_PAGEOUT
Permission to provide a hint to external process is governed by a
ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).
The process_madvise supports every advice madvise(2) has if target
process is in same thread group with calling process so user could
use process_madvise(2) to extend existing madvise(2) to support
vector address ranges.
RETURN VALUE
On success, process_madvise() returns the number of bytes advised.
This return value may be less than the total number of requested
bytes, if an error occurred. The caller should check return value
to determine whether a partial advice occurred.
FAQ:
Q.1 - Why does any external entity have better knowledge?
Quote from Sandeep
"For Android, every application (including the special SystemServer)
are forked from Zygote. The reason of course is to share as many
libraries and classes between the two as possible to benefit from the
preloading during boot.
After applications start, (almost) all of the APIs end up calling into
this SystemServer process over IPC (binder) and back to the
application.
In a fully running system, the SystemServer monitors every single
process periodically to calculate their PSS / RSS and also decides
which process is "important" to the user for interactivity.
So, because of how these processes start _and_ the fact that the
SystemServer is looping to monitor each process, it does tend to *know*
which address range of the application is not used / useful.
Besides, we can never rely on applications to clean things up
themselves. We've had the "hey app1, the system is low on memory,
please trim your memory usage down" notifications for a long time[1].
They rely on applications honoring the broadcasts and very few do.
So, if we want to avoid the inevitable killing of the application and
restarting it, some way to be able to tell the OS about unimportant
memory in these applications will be useful.
- ssp
Q.2 - How to guarantee the race(i.e., object validation) between when
giving a hint from an external process and get the hint from the target
process?
process_madvise operates on the target process's address space as it
exists at the instant that process_madvise is called. If the space
target process can run between the time the process_madvise process
inspects the target process address space and the time that
process_madvise is actually called, process_madvise may operate on
memory regions that the calling process does not expect. It's the
responsibility of the process calling process_madvise to close this
race condition. For example, the calling process can suspend the
target process with ptrace, SIGSTOP, or the freezer cgroup so that it
doesn't have an opportunity to change its own address space before
process_madvise is called. Another option is to operate on memory
regions that the caller knows a priori will be unchanged in the target
process. Yet another option is to accept the race for certain
process_madvise calls after reasoning that mistargeting will do no
harm. The suggested API itself does not provide synchronization. It
also apply other APIs like move_pages, process_vm_write.
The race isn't really a problem though. Why is it so wrong to require
that callers do their own synchronization in some manner? Nobody
objects to write(2) merely because it's possible for two processes to
open the same file and clobber each other's writes --- instead, we tell
people to use flock or something. Think about mmap. It never
guarantees newly allocated address space is still valid when the user
tries to access it because other threads could unmap the memory right
before. That's where we need synchronization by using other API or
design from userside. It shouldn't be part of API itself. If someone
needs more fine-grained synchronization rather than process level,
there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
applicable via using last reserved argument of the API but I don't
think it's necessary right now since we have already ways to prevent
the race so don't want to add additional complexity with more
fine-grained optimization model.
To make the API extend, it reserved an unsigned long as last argument
so we could support it in future if someone really needs it.
Q.3 - Why doesn't ptrace work?
Injecting an madvise in the target process using ptrace would not work
for us because such injected madvise would have to be executed by the
target process, which means that process would have to be runnable and
that creates the risk of the abovementioned race and hinting a wrong
VMA. Furthermore, we want to act the hint in caller's context, not the
callee's, because the callee is usually limited in cpuset/cgroups or
even freezed state so they can't act by themselves quick enough, which
causes more thrashing/kill. It doesn't work if the target process are
ptraced(e.g., strace, debugger, minidump) because a process can have at
most one ptracer.
[1] https://developer.android.com/topic/performance/memory"
[2] process_getinfo for getting the cookie which is updated whenever
vma of process address layout are changed - Daniel Colascione -
https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224
[3] anonymous fd which is used for the object(i.e., address range)
validation - Michal Hocko -
https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/
[minchan@kernel.org: fix process_madvise build break for arm64]
Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
[minchan@kernel.org: fix build error for mips of process_madvise]
Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
[akpm@linux-foundation.org: fix patch ordering issue]
[akpm@linux-foundation.org: fix arm64 whoops]
[minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
[akpm@linux-foundation.org: fix i386 build]
[sfr@canb.auug.org.au: fix syscall numbering]
Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
[sfr@canb.auug.org.au: madvise.c needs compat.h]
Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
[minchan@kernel.org: fix mips build]
Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
[yuehaibing@huawei.com: remove duplicate header which is included twice]
Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
[minchan@kernel.org: do not use helper functions for process_madvise]
Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
[akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
[sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18 07:14:59 +08:00
|
|
|
440 common process_madvise sys_process_madvise
|