2005-04-17 06:20:36 +08:00
|
|
|
Kernel Support for miscellaneous (your favourite) Binary Formats v1.1
|
|
|
|
=====================================================================
|
|
|
|
|
|
|
|
This Kernel feature allows you to invoke almost (for restrictions see below)
|
|
|
|
every program by simply typing its name in the shell.
|
|
|
|
This includes for example compiled Java(TM), Python or Emacs programs.
|
|
|
|
|
|
|
|
To achieve this you must tell binfmt_misc which interpreter has to be invoked
|
|
|
|
with which binary. Binfmt_misc recognises the binary-type by matching some bytes
|
|
|
|
at the beginning of the file with a magic byte sequence (masking out specified
|
|
|
|
bits) you have supplied. Binfmt_misc can also recognise a filename extension
|
|
|
|
aka '.com' or '.exe'.
|
|
|
|
|
|
|
|
First you must mount binfmt_misc:
|
|
|
|
mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
|
|
|
|
|
|
|
|
To actually register a new binary type, you have to set up a string looking like
|
2014-10-14 06:52:05 +08:00
|
|
|
:name:type:offset:magic:mask:interpreter:flags (where you can choose the ':'
|
|
|
|
upon your needs) and echo it to /proc/sys/fs/binfmt_misc/register.
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
Here is what the fields mean:
|
|
|
|
- 'name' is an identifier string. A new /proc file will be created with this
|
2014-10-14 06:52:05 +08:00
|
|
|
name below /proc/sys/fs/binfmt_misc; cannot contain slashes '/' for obvious
|
|
|
|
reasons.
|
2005-04-17 06:20:36 +08:00
|
|
|
- 'type' is the type of recognition. Give 'M' for magic and 'E' for extension.
|
|
|
|
- 'offset' is the offset of the magic/mask in the file, counted in bytes. This
|
2014-10-14 06:52:05 +08:00
|
|
|
defaults to 0 if you omit it (i.e. you write ':name:type::magic...'). Ignored
|
|
|
|
when using filename extension matching.
|
2005-04-17 06:20:36 +08:00
|
|
|
- 'magic' is the byte sequence binfmt_misc is matching for. The magic string
|
2014-10-14 06:52:05 +08:00
|
|
|
may contain hex-encoded characters like \x0a or \xA4. Note that you must
|
|
|
|
escape any NUL bytes; parsing halts at the first one. In a shell environment
|
|
|
|
you might have to write \\x0a to prevent the shell from eating your \.
|
2005-04-17 06:20:36 +08:00
|
|
|
If you chose filename extension matching, this is the extension to be
|
|
|
|
recognised (without the '.', the \x0a specials are not allowed). Extension
|
2014-10-14 06:52:05 +08:00
|
|
|
matching is case sensitive, and slashes '/' are not allowed!
|
2005-04-17 06:20:36 +08:00
|
|
|
- 'mask' is an (optional, defaults to all 0xff) mask. You can mask out some
|
|
|
|
bits from matching by supplying a string like magic and as long as magic.
|
2014-10-14 06:52:05 +08:00
|
|
|
The mask is anded with the byte sequence of the file. Note that you must
|
|
|
|
escape any NUL bytes; parsing halts at the first one. Ignored when using
|
|
|
|
filename extension matching.
|
2005-04-17 06:20:36 +08:00
|
|
|
- 'interpreter' is the program that should be invoked with the binary as first
|
|
|
|
argument (specify the full path)
|
|
|
|
- 'flags' is an optional field that controls several aspects of the invocation
|
2014-10-14 06:52:05 +08:00
|
|
|
of the interpreter. It is a string of capital letters, each controls a
|
|
|
|
certain aspect. The following flags are supported -
|
|
|
|
'P' - preserve-argv[0]. Legacy behavior of binfmt_misc is to overwrite
|
|
|
|
the original argv[0] with the full path to the binary. When this
|
|
|
|
flag is included, binfmt_misc will add an argument to the argument
|
|
|
|
vector for this purpose, thus preserving the original argv[0].
|
|
|
|
e.g. If your interp is set to /bin/foo and you run `blah` (which is
|
|
|
|
in /usr/local/bin), then the kernel will execute /bin/foo with
|
|
|
|
argv[] set to ["/bin/foo", "/usr/local/bin/blah", "blah"]. The
|
|
|
|
interp has to be aware of this so it can execute /usr/local/bin/blah
|
|
|
|
with argv[] set to ["blah"].
|
2005-04-17 06:20:36 +08:00
|
|
|
'O' - open-binary. Legacy behavior of binfmt_misc is to pass the full path
|
|
|
|
of the binary to the interpreter as an argument. When this flag is
|
|
|
|
included, binfmt_misc will open the file for reading and pass its
|
|
|
|
descriptor as an argument, instead of the full path, thus allowing
|
2014-10-14 06:52:05 +08:00
|
|
|
the interpreter to execute non-readable binaries. This feature
|
|
|
|
should be used with care - the interpreter has to be trusted not to
|
|
|
|
emit the contents of the non-readable binary.
|
2005-04-17 06:20:36 +08:00
|
|
|
'C' - credentials. Currently, the behavior of binfmt_misc is to calculate
|
|
|
|
the credentials and security token of the new process according to
|
|
|
|
the interpreter. When this flag is included, these attributes are
|
|
|
|
calculated according to the binary. It also implies the 'O' flag.
|
|
|
|
This feature should be used with care as the interpreter
|
|
|
|
will run with root permissions when a setuid binary owned by root
|
|
|
|
is run with binfmt_misc.
|
2016-02-26 00:32:51 +08:00
|
|
|
'F' - fix binary. The usual behaviour of binfmt_misc is to spawn the
|
|
|
|
binary lazily when the misc format file is invoked. However,
|
|
|
|
this doesn't work very well in the face of mount namespaces and
|
|
|
|
changeroots, so the F mode opens the binary as soon as the
|
|
|
|
emulation is installed and uses the opened image to spawn the
|
|
|
|
emulator, meaning it is always available once installed,
|
|
|
|
regardless of how the environment changes.
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
|
|
|
|
There are some restrictions:
|
binfmt_misc: expand the register format limit to 1920 bytes
The current code places a 256 byte limit on the registration format.
This ends up being fairly limited when you try to do matching against a
binary format like ELF:
- the magic & mask formats cannot have any embedded NUL chars
(string_unescape_inplace halts at the first NUL)
- each escape sequence quadruples the size: \x00 is needed for NUL
- trying to match bytes at the start of the file as well as further
on leads to a lot of \x00 sequences in the mask
- magic & mask have to be the same length (when decoded)
- still need bytes for the other fields
- impossible!
Let's look at a concrete (and common) example: using QEMU to run MIPS
ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20
bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We
need 7 bytes for the delimiter (usually ":"). We can skip offset. So
already we're down to 107 bytes to use with the magic/mask instead of
the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to
register (which they do the majority of the time), they're down to ~26
possible bytes since the escape sequence must be \x##.
The ELF format looks like (both 32 & 64 bit):
e_ident: 16 bytes
e_type: 2 bytes
e_machine: 2 bytes
Those 20 bytes are enough for most architectures because they have so few
formats in the first place, thus they can be uniquely identified. That
also means for shell users, since 20 is smaller than 26, they can sanely
register a handler.
But for some targets (like MIPS), we need to poke further. The ELF fields
continue on:
e_entry: 4 or 8 bytes
e_phoff: 4 or 8 bytes
e_shoff: 4 or 8 bytes
e_flags: 4 bytes
We only care about e_flags here as that includes the bits to identify
whether the ELF is O32/N32/N64. But now we have to consume another 16
bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the
flags. If every byte is escaped, we send 288 more bytes to the kernel
((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4
{e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our
budget.
Even if we try to be clever and do the decoding ourselves (rather than
relying on the kernel to process \x##), we still can't hit the mark --
string_unescape_inplace treats mask & magic as C strings so NUL cannot
be embedded. That leaves us with having to pass \x00 for the 12/24
entry/phoff/shoff bytes (as those will be completely random addresses),
and that is a minimum requirement of 48/96 bytes for the mask alone.
Add up the rest and we blow through it (this is for 64 bit ELFs):
magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} +
4 {e_flags} = 48 # ^^ See note below.
mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} +
4 {e_flags} = 120
Remember above we had 107 left over, and now we're at 168. This is of
course the *best* case scenario -- you'll also want to have NUL bytes
in the magic & mask too to match literal zeros.
Note: the reason we can use 24 in the magic is that we can work off of the
fact that for bytes the mask would clobber, we can stuff any value into
magic that we want. So when mask is \x00, we don't need the magic to also
be \x00, it can be an unescaped raw byte like '!'. This lets us handle
more formats (barely) under the current 256 limit, but that's a pretty
tall hoop to force people to jump through.
With all that said, let's bump the limit from 256 bytes to 1920. This way
we support escaping every byte of the mask & magic field (which is 1024
bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other
fields. Like long paths to the interpreter (when you have source in your
/really/long/homedir/qemu/foo). Since the current code stuffs more than
one structure into the same buffer, we leave a bit of space to easily
round up to 2k. 1920 is just as arbitrary as 256 ;).
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 06:52:03 +08:00
|
|
|
- the whole register string may not exceed 1920 characters
|
2005-04-17 06:20:36 +08:00
|
|
|
- the magic must reside in the first 128 bytes of the file, i.e.
|
|
|
|
offset+size(magic) has to be less than 128
|
|
|
|
- the interpreter string may not exceed 127 characters
|
|
|
|
|
|
|
|
To use binfmt_misc you have to mount it first. You can mount it with
|
|
|
|
"mount -t binfmt_misc none /proc/sys/fs/binfmt_misc" command, or you can add
|
|
|
|
a line "none /proc/sys/fs/binfmt_misc binfmt_misc defaults 0 0" to your
|
|
|
|
/etc/fstab so it auto mounts on boot.
|
|
|
|
|
|
|
|
You may want to add the binary formats in one of your /etc/rc scripts during
|
|
|
|
boot-up. Read the manual of your init program to figure out how to do this
|
|
|
|
right.
|
|
|
|
|
|
|
|
Think about the order of adding entries! Later added entries are matched first!
|
|
|
|
|
|
|
|
|
|
|
|
A few examples (assumed you are in /proc/sys/fs/binfmt_misc):
|
|
|
|
|
|
|
|
- enable support for em86 (like binfmt_em86, for Alpha AXP only):
|
|
|
|
echo ':i386:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
|
|
|
|
echo ':i486:M::\x7fELF\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x06:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff:/bin/em86:' > register
|
|
|
|
|
|
|
|
- enable support for packed DOS applications (pre-configured dosemu hdimages):
|
|
|
|
echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register
|
|
|
|
|
|
|
|
- enable support for Windows executables using wine:
|
|
|
|
echo ':DOSWin:M::MZ::/usr/local/bin/wine:' > register
|
|
|
|
|
|
|
|
For java support see Documentation/java.txt
|
|
|
|
|
|
|
|
|
|
|
|
You can enable/disable binfmt_misc or one binary type by echoing 0 (to disable)
|
|
|
|
or 1 (to enable) to /proc/sys/fs/binfmt_misc/status or /proc/.../the_name.
|
|
|
|
Catting the file tells you the current status of binfmt_misc/the entry.
|
|
|
|
|
|
|
|
You can remove one entry or all entries by echoing -1 to /proc/.../the_name
|
|
|
|
or /proc/sys/fs/binfmt_misc/status.
|
|
|
|
|
|
|
|
|
|
|
|
HINTS:
|
|
|
|
======
|
|
|
|
|
|
|
|
If you want to pass special arguments to your interpreter, you can
|
|
|
|
write a wrapper script for it. See Documentation/java.txt for an
|
|
|
|
example.
|
|
|
|
|
|
|
|
Your interpreter should NOT look in the PATH for the filename; the kernel
|
|
|
|
passes it the full filename (or the file descriptor) to use. Using $PATH can
|
|
|
|
cause unexpected behaviour and can be a security hazard.
|
|
|
|
|
|
|
|
|
2007-05-09 14:50:42 +08:00
|
|
|
Richard Günther <rguenth@tat.physik.uni-tuebingen.de>
|