pkeys: Add details of system call use to Documentation/

This spells out all of the pkey-related system calls that we have
and provides some example code fragments to demonstrate how we
expect them to be used.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: mgorman@techsingularity.net
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163020.59350E33@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This commit is contained in:
Dave Hansen 2016-07-29 09:30:20 -07:00 committed by Thomas Gleixner
parent a60f7b69d9
commit c74fe39408
1 changed files with 62 additions and 0 deletions

View File

@ -18,6 +18,68 @@ even though there is theoretically space in the PAE PTEs. These
permissions are enforced on data access only and have no effect on permissions are enforced on data access only and have no effect on
instruction fetches. instruction fetches.
=========================== Syscalls ===========================
There are 2 system calls which directly interact with pkeys:
int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
int pkey_free(int pkey);
int pkey_mprotect(unsigned long start, size_t len,
unsigned long prot, int pkey);
Before a pkey can be used, it must first be allocated with
pkey_alloc(). An application calls the WRPKRU instruction
directly in order to change access permissions to memory covered
with a key. In this example WRPKRU is wrapped by a C function
called pkey_set().
int real_prot = PROT_READ|PROT_WRITE;
pkey = pkey_alloc(0, PKEY_DENY_WRITE);
ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
... application runs here
Now, if the application needs to update the data at 'ptr', it can
gain access, do the update, then remove its write access:
pkey_set(pkey, 0); // clear PKEY_DENY_WRITE
*ptr = foo; // assign something
pkey_set(pkey, PKEY_DENY_WRITE); // set PKEY_DENY_WRITE again
Now when it frees the memory, it will also free the pkey since it
is no longer in use:
munmap(ptr, PAGE_SIZE);
pkey_free(pkey);
=========================== Behavior ===========================
The kernel attempts to make protection keys consistent with the
behavior of a plain mprotect(). For instance if you do this:
mprotect(ptr, size, PROT_NONE);
something(ptr);
you can expect the same effects with protection keys when doing this:
pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
something(ptr);
That should be true whether something() is a direct access to 'ptr'
like:
*ptr = foo;
or when the kernel does the access on the application's behalf like
with a read():
read(fd, ptr, 1);
The kernel will send a SIGSEGV in both cases, but si_code will be set
to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
the plain mprotect() permissions are violated.
=========================== Config Option =========================== =========================== Config Option ===========================
This config option adds approximately 1.5kb of text. and 50 bytes of This config option adds approximately 1.5kb of text. and 50 bytes of