Commit Graph

968 Commits

Author SHA1 Message Date
Aleksa Sarai 972c176ae4
tests: fix all the things
This fixes all of the tests that were broken as part of the console
rewrite. This includes fixing the integration tests that used TTY
handling inside libcontainer, as well as the bats integration tests that
needed to be rewritten to use recvtty (as they rely on detached
containers that are running).

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:37 +11:00
Aleksa Sarai bda3055055
*: update busybox test rootfs
Switch to the actual source of the official Docker library of images, so
that we have a proper source for the test filesystem. In addition,
update to the latest released version (1.25.0 [2016-06-23]) so that we
can use more up-to-date applets in our tests (such as stat(3)).

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Aleksa Sarai 7df64f8886
runc: implement --console-socket
This allows for higher-level orchestrators to be able to have access to
the master pty file descriptor without keeping the runC process running.
This is key to having (detach && createTTY) with a _real_ pty created
inside the container, which is then sent to a higher level orchestrator
over an AF_UNIX socket.

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Mrunal Patel f1324a9fc1
Don't label the console as it already has the right label
[@cyphar: removed mountLabel argument from .mount().]

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Aleksa Sarai c0c8edb9e8
console: don't chown(2) the slave PTY
Since the gid=X and mode=Y flags can be set inside config.json as mount
options, don't override them with our own defaults. This avoids
/dev/pts/* not being owned by tty in a regular container, as well as all
of the issues with us implementing grantpt(3) manually. This is the
least opinionated approach to take.

This patch is part of the console rewrite patchset.

Reported-by: Mrunal Patel <mrunalp@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Aleksa Sarai 244c9fc426
*: console rewrite
This implements {createTTY, detach} and all of the combinations and
negations of the two that were previously implemented. There are some
valid questions about out-of-OCI-scope topics like !createTTY and how
things should be handled (why do we dup the current stdio to the
process, and how is that not a security issue). However, these will be
dealt with in a separate patchset.

In order to allow for late console setup, split setupRootfs into the
"preparation" section where all of the mounts are created and the
"finalize" section where we pivot_root and set things as ro. In between
the two we can set up all of the console mountpoints and symlinks we
need.

We use two-stage synchronisation to ensures that when the syscalls are
reordered in a suboptimal way, an out-of-place read() on the parentPipe
will not gobble the ancilliary information.

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Aleksa Sarai 4776b4326a
libcontainer: refactor syncT handling
To make the code cleaner, and more clear, refactor the syncT handling
used when creating the `runc init` process. In addition, document the
state changes so that people actually understand what is going on.

Rather than only using syncT for the standard initProcess, use it for
both initProcess and setnsProcess. This removes some special cases, as
well as allowing for the use of syncT with setnsProcess.

Also remove a bunch of the boilerplate around syncT handling.

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:46:04 +11:00
Aleksa Sarai 2055115566
cmsg: add cmsg {send,recv}fd wrappers
This adds C wrappers for sendmsg and recvmsg, specifically used for
passing around file descriptors in Go. The wrappers (sendfd, recvfd)
expect to be called in a context where it makes sense (where the other
side is carrying out the corresponding action).

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:46:04 +11:00
yupeng 145d23e084 error strings should not be capitalized or end with punctuation
Signed-off-by: yupeng <yu.peng36@zte.com.cn>
2016-12-01 11:57:16 +08:00
Wang Long 1b401664d1 tiny refactor
Signed-off-by: Wang Long <long.wanglong@huawei.com>
2016-11-30 20:53:37 +08:00
allencloud f596858395 fix typos
Signed-off-by: allencloud <allen.sun@daocloud.io>
2016-11-30 13:31:36 +08:00
Mrunal Patel 4c013a1524 Merge pull request #1194 from hqhq/fix_cpu_exclusive
Fix cpuset issue with cpuset.cpu_exclusive
2016-11-29 09:49:34 -08:00
Daniel, Dao Quang Minh f156f73c2a Merge pull request #1154 from hqhq/sync_child
Sync with grandchild
2016-11-23 09:10:00 -08:00
Qiang Huang 14d58e1e48 Fix leftover cgroup directory issue
In the cases that we got failure on a subsystem's Apply,
we'll get some subsystems' cgroup directories leftover.

On Docker's point of view, start a container failed, use
`docker rm` to remove the container, but some cgroup files
are leftover.

Sometimes we don't want to clean everyting up when something
went wrong, because we need these inter situation
information to debug what's going on, but cgroup directories
are not useful information we want to keep.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-22 08:02:43 +08:00
Qiang Huang aee46862ec Fix cpuset issue with cpuset.cpu_exclusive
This PR fix issue in this scenario:

```
in terminal 1:
~# cd /sys/fs/cgroup/cpuset
~# mkdir test
~# cd test
~# cat cpuset.cpus
0-3
~# echo 1 > cpuset.cpu_exclusive (make sure you don't have other cgroups under root)

in terminal 2:
~# echo $$ > /sys/fs/cgroup/cpuset/test/tasks
// set resources.cpu.cpus="0-2" in config.json
~# runc run test1

back to terminal 1:
~# cd test1
~# cat cpuset.cpus
0-2
~# echo 1 > cpuset.cpu_exclusive

in terminal 3:
~# echo $$ > /sys/fs/cgroup/test/tasks
// set resources.cpu.cpus="3" in config.json
~# runc run test2
container_linux.go:247: starting container process caused "process_linux.go:258:
applying cgroup configuration for process caused \"failed to write 0-3\\n to
cpuset.cpus: write /sys/fs/cgroup/cpuset/test2/cpuset.cpus: invalid argument\""
```

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-18 15:28:40 +08:00
Qiang Huang 16a2e8ba6e Sync with grandchild
Without this, it's possible that father process exit with
0 before grandchild exit with error.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-17 08:59:37 +08:00
rajasec 43287af982 Fixing error message in nsexec
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-11-10 17:06:50 +05:30
Mrunal Patel 51371867a0 Merge pull request #1180 from crosbymichael/kill-all
Add --all flag to kill
2016-11-09 12:21:22 -07:00
Michael Crosby e58671e530 Add --all flag to kill
This allows a user to send a signal to all the processes in the
container within a single atomic action to avoid new processes being
forked off before the signal can be sent.

This is basically taking functionality that we already use being
`delete` and exposing it ok the `kill` command by adding a flag.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-11-08 09:35:02 -08:00
Mrunal Patel 8779fa57eb Merge pull request #1168 from hqhq/fix_nsexec_comments
More fix to nsexec.c's comments
2016-11-07 16:20:42 -07:00
Michael Crosby 5f24c9a61a Merge pull request #1146 from cyphar/io-set-termios-onlcr
libcontainer: io: stop screwing with \n in console output
2016-11-03 09:49:50 -07:00
Mrunal Patel d7481c10f4 Merge pull request #1172 from crosbymichael/ambient-tag
Move ambient capabilties behind build tag
2016-11-02 20:16:26 -07:00
Qiang Huang 84a4218ece More fix to nsexec.c's comments
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-03 10:15:01 +08:00
Aleksa Sarai 49ed0a10e4
merge branch 'pr-1117'
LGTMs: @hqhq @cyphar
Closes: #1117
2016-11-03 05:03:26 +11:00
Michael Crosby 603c151e6c Move ambient capabilties behind build tag
This moves the ambient capability support behind an `ambient` build tag
so that it is only compiled upon request.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-11-02 10:59:59 -07:00
Crazykev 34d7c5c099 fix error message
Signed-off-by: Crazykev <crazykev@zju.edu.cn>
2016-11-02 16:34:08 +08:00
Aleksa Sarai fd7ab60a70
libcontainer: make tests to make sure we don't mess with \r
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-11-01 14:40:54 +11:00
Aleksa Sarai eea28f480d
libcontainer: io: stop screwing with \n in console output
The default terminal setting for a new pty on Linux (unix98) has +ONLCR,
resulting in '\n' writes by a container process to be converted to
'\r\n' reads by the managing process. This is quite unexpected, and
causes multiple issues with things like bats testing. To fix it, make
the terminal sane after opening it by setting -ONLCR.

This patch might need to be rewritten after the console rewrite patchset
is merged.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-11-01 14:40:54 +11:00
Mrunal Patel bc462c96bf Merge pull request #1165 from cyphar/nsenter-fix-comments
nsenter: fix up comments
2016-10-31 10:39:34 -07:00
Daniel, Dao Quang Minh 509b1db98c Merge pull request #1160 from hqhq/fix_typos
Fix all typos found by misspell
2016-10-31 17:28:44 +00:00
Michael Crosby 8b9b444820 Merge pull request #1157 from rajasec/readme-containerstate
Updating container state and status API in README
2016-10-31 10:26:21 -07:00
Michael Crosby 4c7b8d6c59 Merge pull request #1159 from hqhq/unify_rootfs_validation
Unify rootfs validation
2016-10-31 10:22:01 -07:00
Aleksa Sarai 9b15bf17a0
nsenter: fix up comments
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-11-01 00:21:09 +11:00
rajasec 16ad3855e7 Correction in util error messages
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-29 19:50:56 +05:30
Qiang Huang b15668b36d Fix all typos found by misspell
I use the same tool (https://github.com/client9/misspell)
as Daniel used a few days ago, don't why he missed these
typos at that time.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-29 14:14:42 +08:00
Qiang Huang 81d6088c8f Unify rootfs validation
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-29 10:31:44 +08:00
rajasec 1535e67592 Updating container state and status API in README
Signed-off-by: rajasec <rajasec79@gmail.com>

Updating container state and status API in README

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-27 15:29:34 +05:30
Qiang Huang e7abf30cb8 Merge pull request #1150 from WeiZhang555/forbid-duplicated-namespace
Detect and forbid duplicated namespace in spec
2016-10-27 10:23:16 +08:00
Qiang Huang f520eab891 Remove unnecessary cloneflag validation
config.cloneflag is not mandatory, when using `runc exec`,
config.cloneflag can be empty, and even then it won't be
`-1` but `0`.

So this validation is totally wrong and unneeded.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-27 09:34:20 +08:00
Andrei Vagin 040fb7311c checkpoint: handle config.Devices and config.MaskPaths
In user namespaces devices are bind-mounted from the host, so
we need to add them as external mounts for CRIU.

Reported-by: Ross Boucher <boucher@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2016-10-26 23:50:54 +03:00
Mrunal Patel 4599e7074e Merge pull request #1148 from rhvgoyal/parent-mount-private
Make parent mount private before bind mounting rootfs
2016-10-26 17:30:37 +00:00
Zhang Wei a0f7977f0f Detect and forbid duplicated namespace in spec
When spec file contains duplicated namespaces, e.g.

specs: specs.Spec{
        Linux: &specs.Linux{
            Namespaces: []specs.Namespace{
                {
                    Type: "pid",
                },
                {
                    Type: "pid",
                    Path: "/proc/1/ns/pid",
                },
            },
        },
    }

runc should report malformed spec instead of using latest one by
default, because this spec could be quite confusing.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-10-27 00:44:36 +08:00
Michael Crosby 6328410520 Merge pull request #1149 from cyphar/fix-sysctl-validation
validator: unbreak sysctl net.* validation
2016-10-26 09:06:41 -07:00
Aleksa Sarai 1ab3c035d2
validator: actually test success
Previously we only tested failures, which causes us to miss issues where
setting sysctls would *always* fail.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-26 23:07:57 +11:00
Aleksa Sarai 2a94c3651b
validator: unbreak sysctl net.* validation
When changing this validation, the code actually allowing the validation
to pass was removed. This meant that any net.* sysctl would always fail
to validate.

Fixes: bc84f83344 ("fix docker/docker#27484")
Reported-by: Justin Cormack <justin.cormack@docker.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-26 22:58:51 +11:00
Qiang Huang 157a96a428 Merge pull request #977 from cyphar/nsenter-userns-ordering
nsenter: guarantee correct user namespace ordering
2016-10-26 16:45:15 +08:00
Vivek Goyal 6c147f8649 Make parent mount private before bind mounting rootfs
This reverts part of the commit eb0a144b5e

That commit introduced two issues.

- We need to make parent mount of rootfs private before bind mounting
  rootfs. Otherwise bind mounting root can propagate in other mount
  namespaces. (If parent mount is shared).

- It broke test TestRootfsPropagationSharedMount() on Fedora.

  On fedora /tmp is a mount point with "shared" propagation. I think
  you should be able to reproduce it on other distributions as well
  as long as you mount tmpfs on /tmp and make it "shared" propagation.

  Reason for failure is that pivot_root() fails. And it fails because
  kernel does following check.

  IS_MNT_SHARED(new_mnt->mnt_parent)

  Say /tmp/foo is new rootfs, we have bind mounted rootfs, so new_mnt
  is /tmp/foo, and new_mnt->mnt_parent is /tmp which is "shared" on
  fedora and above check fails.

So this change broke few things, it is a good idea to revert part of it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2016-10-25 11:15:11 -04:00
Qiang Huang 4ec570d060 Merge pull request #1138 from gaocegege/fix-config-validator
docker/docker#27484-check if sysctls are used in host network mode.
2016-10-25 11:08:51 +08:00
Aleksa Sarai c7ed2244f4
merge branch 'pr-1125'
LGTMs: @hqhq @mrunalp
Closes #1125
2016-10-25 10:05:28 +11:00
Ce Gao 41c35810f2 add test cases about host ns
Signed-off-by: Ce Gao <ce.gao@outlook.com>
2016-10-22 11:31:15 +08:00
Ce Gao bc84f83344 fix docker/docker#27484
Signed-off-by: Ce Gao <ce.gao@outlook.com>
2016-10-22 11:22:52 +08:00
Alexander Morozov 1ab9d5e6f4 Merge pull request #845 from mrunalp/cp_tmpfs
Add support for copying up directories into tmpfs when a tmpfs is mounted over them
2016-10-21 13:47:16 -07:00
Mrunal Patel c4198ad9af Merge pull request #1134 from WeiZhang555/tiny-refactor
Some refactor and cleanup
2016-10-20 15:08:40 -07:00
Yong Tang a83f5bac28 Fix issue in `GetProcessStartTime`
This fix tries to address the issue raised in docker:
https://github.com/docker/docker/issues/27540

The issue was that `GetProcessStartTime` use space `"  "`
to split the `/proc/[pid]/stat` and take the `22`th value.

However, the `2`th value is inside `(` and `)`, and could
contain space. The following are two examples:
```
ubuntu@ubuntu:~/runc$ cat /proc/90286/stat
90286 (bash) S 90271 90286 90286 34818 90286 4194560 1412 1130576 4 0 2 1 2334 438 20 0 1 0 3093098 20733952 823 18446744073709551615 1 1 0 0 0 0 0 3670020 1266777851 0 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
ubuntu@ubuntu:~/runc$ cat /proc/89653/stat
89653 (gunicorn: maste) S 89630 89653 89653 0 -1 4194560 29689 28896 0 3 146 32 76 19 20 0 1 0 2971844 52965376 3920 18446744073709551615 1 1 0 0 0 0 0 16781312 137447943 0 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
```

This fix fixes this issue by removing the prefix before `)`,
then finding the `20`th value (instead of `22`th value).

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2016-10-20 11:34:21 -07:00
Zhang Wei c179b0ffc7 Some refactor and cleanup
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-10-20 17:58:51 +08:00
Aleksa Sarai f8e6b5af5e
rootfs: make pivot_root not use a temporary directory
Namely, use an undocumented feature of pivot_root(2) where
pivot_root(".", ".") is actually a feature and allows you to make the
old_root be tied to your /proc/self/cwd in a way that makes unmounting
easy. Thanks a lot to the LXC developers which came up with this idea
first.

This is the first step of many to allowing runC to work with a
completely read-only rootfs.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-20 12:55:58 +11:00
Derek Carr d223e2adae Ignore error when starting transient unit that already exists
Signed-off-by: Derek Carr <decarr@redhat.com>
2016-10-19 14:55:52 -04:00
Aleksa Sarai e3cd191acc
nsenter: un-split clone(cloneflags) for RHEL
Without this patch applied, RHEL's SELinux policies cause container
creation to not really work. Unfortunately this might be an issue for
rootless containers (opencontainers/runc#774) but we'll cross that
bridge when we come to it.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-18 18:26:27 +11:00
Michael Crosby fcc40b7a63 Remove panic from init
Print the error message to stderr if we are unable to return it back via
the pipe to the parent process.  Also, don't panic here as it is most
likely a system or user error and not a programmer error.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-10-17 15:54:51 -07:00
Mrunal Patel 4161f2a63b Merge pull request #1115 from rajasec/filemode-panic
Fixing runc panic for missing file mode
2016-10-17 15:01:49 -07:00
Dan Walsh 6932807107 Add support for r/o mount labels
We need support for read/only mounts in SELinux to allow a bunch of
containers to share the same read/only image.  In order to do this
we need a new label which allows container processes to read/execute
all files but not write them.

Existing mount label is either shared write or private write.  This
label is shared read/execute.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2016-10-17 16:56:42 -04:00
rajasec 034cba6af0 Fixing runc panic for missing file mode
Signed-off-by: rajasec <rajasec79@gmail.com>

Fixing runc panic for missing file mode

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-16 20:39:44 +05:30
rajasec 4b263c9594 Fixing runc panic during hugetlb pages
Signed-off-by: rajasec <rajasec79@gmail.com>

Fixing runc panic during hugetlb pages

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-15 19:47:33 +05:30
Dan Walsh 491cadac92 DupSecOpt needs to match InitLabels
At some point InitLabels was changed to look for SecuritOptions
separated by a ":" rather then an "=", but DupSecOpt was never
changed to match this default.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2016-10-13 16:10:29 -04:00
Daniel, Dao Quang Minh d186a7552b Merge pull request #1111 from keloyang/rpid-limit-check
tiny fix, add a null check for specs.Resources.Pids.Limit
2016-10-13 18:04:49 +01:00
Shukui Yang affc105264 tiny fix, add a null check for specs.Resources.Pids.Limit
Signed-off-by: Shukui Yang <yangshukui@huawei.com>
2016-10-13 15:55:30 +08:00
Daniel Dao 1b876b0bf2 fix typos with misspell
pipe the source through https://github.com/client9/misspell. typos be gone!

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2016-10-11 23:22:48 +00:00
Daniel, Dao Quang Minh 8d505cb9dc Merge pull request #1107 from datawolf/fix-a-typo
just fix a typo
2016-10-12 00:15:51 +01:00
Wang Long 5eaa9ed5cd just fix a typo
Signed-off-by: Wang Long <long.wanglong@huawei.com>
2016-10-11 08:38:15 +00:00
Xianglin Gao 9df4847a23 tiny fix
Signed-off-by: Xianglin Gao <xlgao@zju.edu.cn>
2016-10-11 16:32:56 +08:00
Michael Crosby 11222ee1f1 Don't enable kernel mem if not set
Don't enable the kmem limit if it is not specified in the config.

Fixes #1083

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-10-07 10:02:19 -07:00
Aleksa Sarai b1eb19b4f3
merge branch 'pr-1084'
LGTMs: @mrunalp @cyphar

Closes #1084
2016-10-07 19:10:14 +11:00
Mrunal Patel c4e7f01c4b Add an integration test for tmpfs copy up
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-10-04 11:26:37 -07:00
Mrunal Patel c7406f7075 Support copyup mount extension for tmpfs mounts
If copyup is specified for a tmpfs mount, then the contents of the
underlying directory are copied into the tmpfs mounted over it.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-10-04 11:26:30 -07:00
Aleksa Sarai 2cd9c31b99
nsenter: guarantee correct user namespace ordering
Depending on your SELinux setup, the order in which you join namespaces
can be important. In general, user namespaces should *always* be joined
and unshared first because then the other namespaces are correctly
pinned and you have the right priviliges within them. This also is very
useful for rootless containers, as well as older kernels that had
essentially broken unshare(2) and clone(2) implementations.

This also includes huge refactorings in how we spawn processes for
complicated reasons that I don't want to get into because it will make
me spiral into a cloud of rage. The reasoning is in the giant comment in
clone_parent. Have fun.

In addition, because we now create multiple children with CLONE_PARENT,
we cannot wait for them to SIGCHLD us in the case of a death. Thus, we
have to resort to having a child kindly send us their exit code before
they die. Hopefully this all works okay, but at this point there's not
much more than we can do.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-04 16:17:55 +11:00
Aleksa Sarai ed053a740c
nsenter: specify namespace type in setns()
This avoids us from running into cases where libcontainer thinks that a
particular namespace file is a different type, and makes it a fatal
error rather than causing broken functionality.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-04 16:17:55 +11:00
Mrunal Patel 7b1bcb3762 Merge pull request #1090 from crosbymichael/bind-root
Remove check for binding to /
2016-09-30 14:42:30 -07:00
Mrunal Patel 4356468f49 Parse the new extension flags
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-09-30 09:48:03 -07:00
Mrunal Patel f5103d311e config: Add new Extensions flag to support custom mount options in runc
Also, defines a EXT_COPYUP flag for supporting tmpfs copyup operation.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-09-30 09:46:46 -07:00
Michael Crosby 70b16a5ab9 Remove check for binding to /
In order to mount root filesystems inside the container's mount
namespace as part of the spec we need to have the ability to do a bind
mount to / as the destination.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-09-29 15:26:09 -07:00
Qiang Huang 3597b7b743 Merge pull request #1087 from williammartin/master
Fix typo when container does not exist
2016-09-29 09:19:45 +08:00
Mrunal Patel b3833a00e6 Merge pull request #1086 from justincormack/ambient
Set ambient capabilities where supported
2016-09-28 10:00:00 -07:00
Michael Crosby 3d777789a2 Merge pull request #1081 from ggaaooppeenngg/gaopeng/replace-range-map
Refactor enum map range to slice range
2016-09-28 09:50:38 -07:00
William Martin 152169ed34 Fix typo when container does not exist
Signed-off-by: William Martin <wmartin@pivotal.io>
2016-09-28 11:00:50 +00:00
Justin Cormack 4e179bddca Set ambient capabilities where supported
Since Linux 4.3 ambient capabilities are available. If set these allow unprivileged child
processes to inherit capabilities, while at present there is no means to set capabilities
on non root processes, other than via filesystem capabilities which are not usually
supported in image formats.

With ambient capabilities non root processes can be given capabilities as well, and so
the main reason to use root in containers goes away, and capabilities work as expected.

The code falls back to the existing behaviour if ambient capabilities are not supported.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2016-09-28 09:13:56 +01:00
Peng Gao c5393da813 Refactor enum map range to slice range
grep -r "range map" showw 3 parts use map to
range enum types, use slice instead can get
better performance and less memory usage.

Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
2016-09-28 15:36:29 +08:00
derekwaynecarr 1a75f815d5 systemd cgroup driver supports slice management
Signed-off-by: derekwaynecarr <decarr@redhat.com>
2016-09-27 16:01:37 -04:00
Mrunal Patel 1359131f4a Merge pull request #1080 from hqhq/fix_user_test
Fix TestGetAdditionalGroups on i686
2016-09-27 10:18:27 -07:00
Qiang Huang dc0a4cf488 Fix TestGetAdditionalGroups on i686
Fixes: #941

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-09-27 18:25:53 +08:00
Daniel, Dao Quang Minh cce5713940 Merge pull request #1077 from rajasec/readme-container-usage
Updating libcontainer README for container run
2016-09-26 23:52:06 +01:00
rajasec c1d967f055 Updating libcontainer README for container run
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-09-25 23:02:55 +05:30
Akihiro Suda 53179559a1 MaskPaths: support directory
For example, the /sys/firmware directory should be masked because it can contain some sensitive files:
  - /sys/firmware/acpi/tables/{SLIC,MSDM}: Windows license information:
  - /sys/firmware/ibft/target0/chap-secret: iSCSI CHAP secret

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2016-09-23 16:14:41 +00:00
Qiang Huang e83ccf62aa Merge pull request #1063 from datawolf/test-error-code
[unittest] add extra ErrorCode in TestErrorCode testcase
2016-09-23 11:55:44 +08:00
Mrunal Patel 5653ced544 Merge pull request #1059 from datawolf/use-WriteCgrougProc
cgroup: using WriteCgroupProc to write the specified pid into the cgroup's cgroup.procs file
2016-09-22 11:31:35 -07:00
Mrunal Patel bb792edd31 Merge pull request #1058 from datawolf/update-pause-comment
update the comment for container.Pause() method on linux
2016-09-22 11:31:07 -07:00
Michael Crosby 20c7c3bb37 Merge pull request #1049 from mrunalp/getcgroups_all
Add flag to allow getting all mounts for cgroups subsystems
2016-09-22 11:15:39 -07:00
Wang Long 132f5ee7d4 [unittest] add extra ErrorCode in TestErrorCode testcase
Signed-off-by: Wang Long <long.wanglong@huawei.com>
2016-09-22 20:15:54 +08:00
Yuanhong Peng 6ed0652ee0 Fix typo
Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
2016-09-21 20:13:32 +08:00
Wang Long ce9951834c cgroup: using WriteCgroupProc to write the specified pid into the cgroup's cgroup.procs file
cgroupData.join method using `WriteCgroupProc` to place the pid into
the proc file, it can avoid attach any pid to the cgroup if -1 is
specified as a pid.

so, replace `writeFile` with `WriteCgroupProc` like `cpuset.go`'s
ApplyDir method.

Signed-off-by: Wang Long <long.wanglong@huawei.com>
2016-09-21 10:57:03 +00:00
Wang Long 59a241f647 update the comment for container.Pause() method on linux
if a container state is running or created, the container.Pause()
method can set the state to pausing, and then paused.

this patch update the comment, so it can be consistent with the code.

Signed-off-by: Wang Long <long.wanglong@huawei.com>
2016-09-20 10:49:04 +08:00
Qiang Huang 38e0df9ec6 Merge pull request #1046 from rhatdan/relabel
Fix error messages to give information of relabeling failed
2016-09-18 11:18:07 +08:00
Michael Crosby 8b4850b8cd Merge pull request #1045 from hqhq/recursive_generic_error
Allow recrusive generic error
2016-09-16 10:36:57 -07:00
Mrunal Patel f557996401 Add flag to allow getting all mounts for cgroups subsystems
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-09-15 15:19:27 -04:00
Dan Walsh d37c5be9ff Fix error messages to give information of relabeling failed
Currently if a user does a command like

docker: Error response from daemon: operation not supported.

With this fix they should see a much more informative error message.

 docker run -ti -v /proc:/proc:Z fedora sh
docker: Error response from daemon: SELinux Relabeling of /proc is not allowed: operation not supported.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2016-09-15 04:38:16 -04:00
Qiang Huang b2e811183b Allow recrusive generic error
Error sent from child process is already genericError, if
we don't allow recrusive generic error, we won't get any
cause infomation from parent process.

Before, we got:
WARN[0000] exit status 1
ERRO[0000] operation not permitted

After, we got:
WARN[0000] exit status 1
ERRO[0000] container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"operation not permitted\""

it's not pretty but useful for detecting root causes.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-09-14 15:55:46 +08:00
Wang Long fd92846686 move m.GetPaths out of the loop
only call m.GetPaths once is ok. os move it out of the loop.

Signed-off-by: Wang Long <long.wanglong@huawei.com>
2016-09-13 12:19:48 +00:00
Qiang Huang 5be3ce2817 Merge pull request #1036 from athomason/1035-update-runtime-spec
Update runtime-spec to current upstream
2016-09-13 16:10:10 +08:00
Michael Crosby 9a072b611e Merge pull request #1013 from hqhq/fix_ps_issue
Fix runc ps issue
2016-09-12 14:03:21 -07:00
Mrunal Patel 124187bea3 Merge pull request #1028 from YummyPeng/fix-typo
Fix typo.
2016-09-12 10:00:41 -07:00
Michael Crosby ad400bb093 Change netclassid json tag
This allows older state files to be loaded without the unmarshal error
of the string to int conversion.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-09-12 09:31:58 -07:00
Qiang Huang b5b6989e9a Fix runc pause and runc update
Fixes: #1034
Fixes: #1031

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-09-12 16:02:56 +08:00
Qiang Huang da7bac1c90 Fix runc ps issue
After #1009, we don't always set `cgroup.Paths`, so
`getCgroupPath()` will return wrong cgroup path because
it'll take current process's cgroup as the parent, which
would be wrong when we try to find the cgroup path in
`runc ps` and `runc kill`.

Fix it by using `m.GetPath()` to get the true cgroup
paths.

Reported-by: Yang Shukui <yangshukui@huawei.com>
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-09-12 15:41:16 +08:00
Adam Thomason 83cbdbd64c Add checks for nil spec.Linux
Signed-off-by: Adam Thomason <ad@mthomason.net>
2016-09-11 16:31:34 -07:00
Yuanhong Peng a71a301a28 Fix typo.
Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
2016-09-09 16:18:54 +08:00
Daniel, Dao Quang Minh da202fe232 Merge pull request #1019 from keloyang/remote-by
remove redundant by in annotation(nsexec.c)
2016-09-07 22:01:19 +01:00
Zhang Wei 7303a9a720 Tiny refactor: remove unused local variables
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-09-06 23:41:40 +08:00
Shukui Yang e15af9ffbb remove redundant by in annotation(nsexec.c)
Signed-off-by: Shukui Yang <yangshukui@huawei.com>
2016-09-05 10:53:19 +08:00
Qiang Huang aa2dd02f5a Fix null point reference panic
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-09-01 08:34:22 +08:00
Qiang Huang 220e5098a8 Fix default cgroup path
Alternative of #895 , part of #892

The intension of current behavior if to create cgroup in
parent cgroup of current process, but we did this in a
wrong way, we used devices cgroup path of current process
as the default parent path for all subsystems, this is
wrong because we don't always have the same cgroup path
for all subsystems.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-08-30 14:12:15 +08:00
rajasec 714550f87c Error handling when container not exists
Signed-off-by: rajasec <rajasec79@gmail.com>

Error handling when container not exists

Signed-off-by: rajasec <rajasec79@gmail.com>

Error handling when container not exists

Signed-off-by: rajasec <rajasec79@gmail.com>

Error handling when container not exists

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-08-26 00:00:54 +05:30
Qiang Huang 1e319efa36 Merge pull request #815 from rajasec/basecont-comments
Updated the libcontainer interface comments
2016-08-26 09:43:50 +08:00
Michael Crosby 46d9535096 Merge pull request #934 from macrosheep/fix-initargs
Fix and refactor init args
2016-08-24 10:06:01 -07:00
Mrunal Patel 4d34c30196 Merge pull request #988 from chlunde/i386-32-bit-uid
Support 32 bit UID on i386
2016-08-24 09:55:41 -07:00
Aleksa Sarai e43f740ed7
Merge branch 'pr-987'
Closes #987 [Test: Make TestCaptureTestFunc pass in localunittest]
2016-08-24 18:37:06 +10:00
Michael Crosby b4ffe2974d Merge pull request #995 from estesp/starttime-for-criu-container
Restored-from-checkpoint containers should have a start time
2016-08-23 15:07:14 -07:00
Alexander Morozov 0c6733d669 Merge pull request #970 from hqhq/fix_race_cgroup_paths
Fix race condition when using cgroups.Paths
2016-08-23 10:47:00 -07:00
rajasec 1ea17d73fe Updated the libcontainer interface comments
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-08-23 19:14:27 +05:30
xiekeyang 206fea7f50 remove unused code
Signed-off-by: xiekeyang <xiekeyang@huawei.com>
2016-08-22 17:16:45 +08:00
Phil Estes 85f4d20b44
Restored-from-checkpoint containers should have a start time
Set the start time similar to a brand new container.

Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
2016-08-21 18:15:18 -04:00
xiekeyang 2fcbb5a494 move util function
Signed-off-by: xiekeyang <xiekeyang@huawei.com>
2016-08-19 16:08:06 +08:00
Mrunal Patel 0bd675a56c Fix format specifier for size_t
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-08-17 11:40:08 -07:00
Mrunal Patel aee3f6ff5a Merge pull request #950 from cyphar/cleanup-nsenter
nsenter: major cleanups
2016-08-16 16:00:22 -07:00
Aleksa Sarai 4e72ffc237
nsenter: simplify netlink parsing
This just moves everything to one function so we don't have to pass a
bunch of things to functions when there's no real benefit. It also makes
the API nicer.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-08-17 08:21:48 +10:00
Carl Henrik Lunde 0a45903563 Support 32 bit UID on i386
The original SETUID takes a 16 bit UID.  Linux 2.4 introduced  a new
syscall, SETUID32, with support for 32 bit UIDs.  The setgid wrapper
already uses SETGID32.

Signed-off-by: Carl Henrik Lunde <chlunde@ifi.uio.no>
2016-08-16 22:47:38 +02:00
Zhao Lei bb067f55aa Test: Make TestCaptureTestFunc pass in localunittest
TestCaptureTestFunc failed in localunittest:
 # make localunittest
 === RUN   TestCaptureTestFunc
 --- FAIL: TestCaptureTestFunc (0.00s)
         capture_test.go:26: expected package "github.com/opencontainers/runc/libcontainer/stacktrace" but received "_/root/runc/libcontainer/stacktrace"
 #

Reason: the path for stacktrace is a fixed string which
only valid for container environment.
And we can switch to relative path to make both in-container
and out-of-container test works.

After patch:
 # make localunittest
 === RUN   TestCaptureTestFunc
 --- PASS: TestCaptureTestFunc (0.00s)
 #

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
2016-08-16 18:37:01 +08:00
Serge Hallyn 52a8873f62 checkMountDesktionation: add swaps and uptime to /proc whitelist
Signed-off-by: Serge Hallyn <serge@hallyn.com>
2016-08-14 18:32:39 -05:00
Aleksa Sarai faa3281ce8
nsenter: major cleanup
Removed a lot of clutter, improved the style of the code, removed
unnecessary complexity. In addition, made errors unique by making bail()
exit with a unique error code. Most of this code comes from the current
state of the rootless containers branch.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-08-13 03:18:04 +10:00
Michael Crosby ae7a92e352 Merge pull request #983 from justincormack/no-dev-fuse
Do not create /dev/fuse by default
2016-08-12 09:35:08 -07:00
Michael Crosby 7d8f322fdd Merge pull request #860 from bgray/806-set_cgroup_cpu_rt_before_joining
Set the cpu cgroup RT sched params before joining.
2016-08-12 09:24:15 -07:00
Justin Cormack 834e53144b Do not create /dev/fuse by default
This device is not required by the OCI spec.

The rationale for this was linked to https://github.com/docker/docker/issues/2393

So a non functional /dev/fuse was created, and actual fuse use still is
required to add the device explicitly. However even old versions of the JVM
on Ubuntu 12.04 no longer require the fuse package, and this is all not
needed.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2016-08-12 13:00:24 +01:00
Aleksa Sarai 0f76457138 Merge pull request #980 from LK4D4/safer_hook_run
libcontainer/configs: make hooks run safer
2016-08-09 22:22:04 +10:00
Alexander Morozov 7679c80be5 libcontainer/configs: make hooks run safer
It's possible that `cmd.Process` is still nil when we reach timeout.
Start creates `Process` field synchronously, and there is no way to such
race.

Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
2016-08-08 10:16:35 -07:00
Alexander Morozov 946d3b7c9d Merge pull request #979 from hmeng-19/fix_chdir_err
Fix the err info of chdir(cwd) failure
2016-08-08 09:57:53 -07:00
Haiyan Meng def07036a0 Fix the err info of chdir(cwd) failure
Signed-off-by: Haiyan Meng <haiyanalady@gmail.com>
2016-08-08 12:26:59 -04:00
Haiyan Meng f40fbcd595 Fix the err info of mount failure
Signed-off-by: Haiyan Meng <haiyanalady@gmail.com>
2016-08-08 11:58:28 -04:00
Qiang Huang 6ecb469b2b Fix race condition when using cgroups.Paths
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-08-02 15:43:04 +08:00
Qiang Huang 50f0a2b1e1 Merge pull request #962 from dubstack/fix_kmem_limits
Remove kmem Initialization check while setting memory configuration
2016-08-02 10:04:18 +08:00
Qiang Huang 777ac05e5e Cleanup GetLongBit
Follow up: #962

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-08-02 09:04:30 +08:00
Mrunal Patel 56fc0ac9ce Merge pull request #966 from sjenning/fix-initscope-cgroup-path
fix init.scope in cgroup paths
2016-08-01 14:29:47 -07:00
Buddha Prakash fcd966f501 Remove kmem Initialization check
Signed-off-by: Buddha Prakash <buddhap@google.com>
2016-08-01 09:47:34 -07:00
Seth Jennings 4b44b98596 fix init.scope in cgroup paths
Signed-off-by: Seth Jennings <sjenning@redhat.com>
2016-08-01 11:14:29 -05:00
Qiang Huang 1a81e9ab1f Merge pull request #958 from dubstack/skip-devices
Skip updates on parent Devices cgroup
2016-07-29 10:31:49 +08:00
Buddha Prakash d4c67195c6 Add test
Signed-off-by: Buddha Prakash <buddhap@google.com>
2016-07-28 17:14:51 -07:00
Mrunal Patel 21124f6274 Merge pull request #963 from guilhermebr/master
libcontainer: rename keyctl package to keys
2016-07-26 07:34:57 -07:00
Qiang Huang 8033a83975 Merge pull request #964 from zhaoleidd/test_fix
UNITTEST: Bypass userns test on platform without userns support
2016-07-26 11:30:17 +08:00
Guilherme Rezende 1cdaa709f1
libcontainer: rename keyctl package to keys
This avoid the goimports tool from remove the libcontainer/keys import line due the package name is diferent from folder name

Signed-off-by: Guilherme Rezende <guilhermebr@gmail.com>
2016-07-25 20:59:26 -03:00
Buddha Prakash ef4ff6a8ad Skip updates on parent Devices cgroup
Signed-off-by: Buddha Prakash <buddhap@google.com>
2016-07-25 10:30:46 -07:00
Zhao Lei bac8b4f0b4 UNITTEST: Bypass userns test on platform without userns support
We should bypass userns test instead of show fail in platform
without userns support.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
2016-07-25 15:35:04 +08:00
Daniel, Dao Quang Minh f0e17e9a46 Merge pull request #961 from hqhq/revert_935
Revert "Use update time to detect if kmem limits have been set"
2016-07-21 14:51:21 +01:00
Daniel, Dao Quang Minh ff88baa42f Merge pull request #611 from mrunalp/fix_set
Fix cgroup Set when Paths are specified
2016-07-21 14:00:22 +01:00
Qiang Huang 15c93ee9e0 Revert "Use update time to detect if kmem limits have been set"
Revert: #935
Fixes: #946

I can reproduce #946 on some machines, the problem is on
some machines, it could be very fast that modify time
of `memory.kmem.limit_in_bytes` could be the same as
before it's modified.

And now we'll call `SetKernelMemory` twice on container
creation which cause the second time failure.

Revert this before we find a better solution.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-07-21 19:14:38 +08:00
Mrunal Patel 0ae6018eb9 Merge pull request #956 from dubstack/skip-pid
Allow cgroup creation without attaching a pid
2016-07-20 16:40:13 -07:00
Buddha Prakash ebe85bf180 Allow cgroup creation without attaching a pid
Signed-off-by: Buddha Prakash <buddhap@google.com>
2016-07-20 13:49:48 -07:00
Zhao Lei f2c4c4ad35 integration_testing: Fix a output typo
s/destory/destroy for error message output.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
2016-07-20 11:17:13 +08:00
Haiyan Meng 6d14dd67b2 Fix nsenter/README.md
Signed-off-by: Haiyan Meng <haiyanalady@gmail.com>
2016-07-19 12:11:33 -04:00
Mrunal Patel 4dedd09396 Merge pull request #937 from hushan/net_cls-classid
fix setting net_cls classid
2016-07-18 17:18:23 -04:00
Mrunal Patel a0dccbd174 Merge pull request #947 from hencrice/patch-1
Fixed typo in build constraint.
2016-07-18 12:47:37 -04:00
Aleksa Sarai aa029491be
configs: fix json tags for CpuRt* options
Previously we used the same JSON tag name for the regular and realtime
versions of the CpuRt* fields, which causes issues when you want to use
two different values for the fields.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-18 17:02:30 +10:00
Qiang Huang 1b49d9b4db Merge pull request #936 from macrosheep/set-criupath-helper
libcontainer: Add a helper func to set CriuPath
2016-07-18 09:37:47 +08:00
Yen-Lin Chen a318a2ae1b Fixed typo in build constraint.
Signed-off-by: Yenlin Chen <hencrice@gmail.com>
2016-07-15 19:24:22 -07:00
Qiang Huang 41b12c095b Merge pull request #913 from cloudfoundry-incubator/addgroupsnocompatible
Let the user explicitly specify `additionalGids` on `runc exec`
2016-07-15 10:12:31 +08:00
Mrunal Patel ec01ae5f10 Merge pull request #942 from ggaaooppeenngg/fix-typo
Fix typo
2016-07-14 11:18:06 -04:00
Peng Gao 765df7eed0 Fix typo
Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
2016-07-13 23:32:38 +08:00
Hushan Jia bb42f80a86 fix setting net_cls classid
Setting classid of net_cls cgroup failed:

ERRO[0000] process_linux.go:291: setting cgroup config for ready process caused "failed to write 𐀁 to net_cls.classid: write /sys/fs/cgroup/net_cls,net_prio/user.slice/abc/net_cls.classid: invalid argument"
process_linux.go:291: setting cgroup config for ready process caused "failed to write 𐀁 to net_cls.classid: write /sys/fs/cgroup/net_cls,net_prio/user.slice/abc/net_cls.classid: invalid argument"

The spec has classid as a *uint32, the libcontainer configs should match the type.

Signed-off-by: Hushan Jia <hushan.jia@gmail.com>
2016-07-11 05:00:35 +08:00
Yang Hongyang a59d63c5d3 Fix and refactor init args
1. According to docs of Cmd.Path and Cmd.Args from package "os/exec":
   Path is the path of the command to run. Args holds command line
   arguments, including the command as Args[0]. We have mixed usage
   of args. In InitPath(), InitArgs only take arguments, in InitArgs(),
   InitArgs including the command as Args[0]. This is confusing.
2. InitArgs() already have the ability to configure a LinuxFactory
   with the provided absolute path to the init binary and arguements as
   InitPath() does.
3. exec.Command() will take care of serching executable path.
4. The default "/proc/self/exe" instead of os.Args[0] is passed to
   InitArgs in order to allow relative path for the runC binary.

Signed-off-by: Yang Hongyang <imhy.yang@gmail.com>
2016-07-06 23:21:02 -04:00
Yang Hongyang 9ade2cc5ce libcontainer: Add a helper func to set CriuPath
Added a helper func to set CriuPath for LinuxFactory.

Signed-off-by: Yang Hongyang <imhy.yang@gmail.com>
2016-07-06 22:58:55 -04:00
Vishnu kannan c501cc038a Remove unused GetLongBit() function.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-07-06 15:23:01 -07:00
Vishnu kannan 8dd3d63455 Look at modify time to check if kmem limits are initialized.
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2016-07-06 15:14:25 -07:00
Qiang Huang 14e95b2aa9 Make state detection precise
Fixes: https://github.com/opencontainers/runc/issues/871

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-07-05 08:24:13 +08:00
Ben 14e55d1692 Add unit test for setting the CPU RT sched cgroups values at apply time
Added a unit test to verify that 'cpu.rt_runtime_us' and 'cpu.rt_runtime_us'
cgroup values are set when the cgroup is applied to a process.

Signed-off-by: Ben Gray <ben.r.gray@gmail.com>
2016-07-04 13:11:53 +01:00
ben 950700e73c Set the 'cpu.rt_runtime_us' and 'cpu.rt_runtime_us' values of the cpu cgroup
before trying to move the process into the cgroup.

This is required if runc itself is running in SCHED_RR mode, as it is not
possible to add a process in SCHED_RR mode to a cgroup which hasn't been
assigned any RT bandwidth. And RT bandwidth is not inherited, each new
cgroup starts with 0 b/w.

Signed-off-by: Ben Gray <ben.r.gray@gmail.com>
2016-07-04 13:10:21 +01:00
Aleksa Sarai c29695ad0a
rootfs: don't change directory
There's no point in changing directory here. Syscalls are resolved local
to the linkpath, not to the current directory that the process was in
when creating the symlink. Changing directories just confuses people who
are trying to debug things.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-06-24 16:44:40 +10:00
Aleksa Sarai 0f1d6772c6
libcontainer: rootfs: use CleanPath when comparing paths
Comparisons with paths aren't really a good idea unless you're
guaranteed that the comparison will work will all paths that resolve to
the same lexical path as the compared path.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-06-22 01:45:32 +10:00
Petar Petrov f9b72b1b46 Allow additional groups to be overridden in exec
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Signed-off-by: Petar Petrov <pppepito86@gmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
2016-06-21 10:35:11 +03:00
Alexander Morozov cc29e3dded Merge pull request #912 from crosbymichael/fifo-userns
Fix fifo usage with userns and not root users
2016-06-15 13:00:28 -07:00
Qiang Huang 42dfd60643 Merge pull request #904 from euank/fix-cgroup-parsing-err
cgroups: Fix issue if cgroup path contains :
2016-06-14 14:19:20 +08:00
Michael Crosby 5ce88a95f6 Fix fifo usage with userns
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-06-13 20:20:48 -07:00
Mrunal Patel f5b6ff23b8 Merge pull request #881 from rajasec/update-status
Update for stopped container
2016-06-13 16:05:25 -07:00
Alexander Morozov 85873d917e Merge pull request #886 from crosbymichael/start-pipe
Use fifo for create / start instead of signal handling
2016-06-13 12:36:38 -07:00
Michael Crosby 3aacff695d Use fifo for create/start
This removes the use of a signal handler and SIGCONT to signal the init
process to exec the users process.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-06-13 11:26:53 -07:00
Aleksa Sarai 0636bdd45b Merge pull request #874 from crosbymichael/keyring
Add option to disable new session keys
2016-06-12 21:44:45 +10:00
rajasec 146218ab92 Removing unused variable for cgroup subsystem
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-06-12 12:35:49 +05:30
Euan Kemp 394610a396 cgroups: Parse correctly if cgroup path contains :
Prior to this change a cgroup with a `:` character in it's path was not
parsed correctly (as occurs on some instances of systemd cgroups under
some versions of systemd, e.g. 225 with accounting).

This fixes that issue and adds a test.

Signed-off-by: Euan Kemp <euank@coreos.com>
2016-06-10 23:09:03 -07:00
root 56abe735f2 bug fix, LeafWeight nil err
Signed-off-by: root <yangshukui@huawei.com>
2016-06-10 18:11:20 -07:00
Christian Brauner a1f8e0f184 fail if path to devices subsystem is missing
The presence of the "devices" subsystem is a necessary condition for a
(privileged) container.

Signed-off-by: Christian Brauner <cbrauner@suse.com>
2016-06-08 16:44:15 +02:00
rajasec 12869604ca Update for stopped container
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-06-04 22:08:08 +05:30
Mrunal Patel c4e0d94efa Merge pull request #873 from joe2far/patch-1
Fixed typo in docstring
2016-06-03 12:15:29 -07:00
Mrunal Patel c6f09f95f2 Merge pull request #868 from rajasec/libcontainer-readme
Updating README with set interface
2016-06-03 12:02:41 -07:00
Michael Crosby 8c9db3a7a5 Add option to disable new session keys
This adds an `--no-new-keyring` flag to run and create so that a new
session keyring is not created for the container and the calling
processes keyring is inherited.

Fixes #818

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-06-03 11:53:07 -07:00
Michael Crosby c5060ff303 Merge pull request #827 from crosbymichael/create-start
Implement create and start
2016-06-03 10:38:03 -07:00
Joe Farrell f423296b02 Fixed typo in docstring
Signed-off-by: joe2far <joe2farrell@gmail.com>
2016-06-03 18:17:53 +01:00
Mrunal Patel 3211c9f721 Merge pull request #867 from rajasec/selinux-process
Removing the nil check for process label
2016-06-03 07:58:10 -07:00
Daniel, Dao Quang Minh d6189a05cf Merge pull request #869 from crosbymichael/anno
Add annotations to list and state output
2016-06-03 11:12:23 +01:00
Michael Crosby 5abffd3100 Add annotations to list and state output
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-06-02 12:44:43 -07:00
Michael Crosby 1d61abea46 Allow delete of created container
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-06-02 12:26:12 -07:00
Michael Crosby 6c485e6902 Merge pull request #864 from michael-holzheu/seccomp_add_ppc_and_s390x
seccomp: Add ppc and s390x to seccomp/config.go
2016-06-01 14:34:08 -07:00
rajasec 33f0ee9c95 Updating README with set interface
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-06-01 20:55:23 +05:30
rajasec 9742b02856 Removing the nil check for process label
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-06-01 20:29:44 +05:30
Daniel, Dao Quang Minh d5ecf5c67c systemd cgroup: check for Delegate property
Delegate is only available in systemd >218, applying it for older systemd will
result in an error. Therefore we should check for it when testing systemd
properties.

Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
2016-06-01 14:32:24 +00:00
Aleksa Sarai 9dcacfb835 Merge pull request #852 from hqhq/fix_libcontainer_readme
README: Destroy container before fatal
2016-06-01 08:10:05 +10:00
Michael Crosby 6eba9b8ffb Fix SystemError and env lookup
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-05-31 11:10:47 -07:00
Michael Crosby efcd73fb5b Fix signal handling for unit tests
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-05-31 11:10:47 -07:00
Michael Crosby 3fc929f350 Only create a buffered channel of one
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-05-31 11:06:41 -07:00
Michael Crosby 30f1006b33 Fix libcontainer states
Move initialized to created and destoryed to stopped.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-05-31 11:06:41 -07:00
Michael Crosby 3fe7d7f31e Add create and start command for container lifecycle
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-05-31 11:06:41 -07:00
Michael Holzheu bae23b67f8 seccomp: Add ppc and s390x to seccomp/config.go
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
2016-05-31 08:56:07 -04:00
Qiang Huang 6fa490c664 Remove use_hierarchy check when set kernel memory
Kernel memory cannot be set in these circumstances (before kernel 4.6):
1. kernel memory is not initialized, and there are tasks in cgroup
2. kernel memory is not initialized, and use_hierarchy is enabled,
   and there are sub-cgroups

While we don't need to cover case 2 because when we set kernel
memory in runC, it's either:
- in Apply phase when we create the container, and in this case,
  set kernel memory would definitely be valid;
- or in update operation, and in this case, there would be tasks
  in cgroup, we only need to check if kernel memory is initialized
  or not.

Even if we want to check use_hierarchy, we need to check sub-cgroups
as well, but for here, we can just leave it aside.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-05-28 15:22:58 +08:00
Qiang Huang 468428fe3d README: Destroy container before fatal
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-05-28 14:41:06 +08:00
Andrew Vagin c161e65ac6 cr: don't fill veth devices if netns is in EmptyNs
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
2016-05-28 01:19:54 +03:00
Alexander Morozov d57898610b Merge pull request #675 from pankit/master
Allow + in container ID
2016-05-25 10:35:08 -07:00
Aleksa Sarai 1a913c7b89 *: correctly chown() consoles
In user namespaces, we need to make sure we don't chown() the console to
unmapped users. This means we need to get both the UID and GID of the
root user in the container when changing the owner.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-05-22 22:37:13 +10:00
Zhao Lei a0096535a6 Fix some spelling typo in manual and source
infomation -> information
transfered -> transferred

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
2016-05-20 15:04:40 +08:00
Bhanu Valasa 32c2d48a6f libcontainer: Fix Running Comment
Signed-off-by: Bhanu Valasa <valasabk@yahoo.com>
2016-05-19 16:30:29 -04:00
rajasec e33c057114 Updating description in SPEC
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-05-17 22:57:43 +05:30
Aleksa Sarai fdc9fb841e Merge pull request #825 from hqhq/hq_fix_isrunning
Add comments for error cases in status functions
2016-05-17 05:04:25 +00:00
Mrunal Patel b53e466d0c Merge pull request #824 from ggaaooppeenngg/update-nsenter-readme
Update nsenter README
2016-05-16 17:26:32 -07:00
Michael Crosby dd389fd665 Merge pull request #823 from mlaventure/alpine-getlongbit
Fix GetLongBit() returns value when _SC_LONG_BIT is not available
2016-05-16 17:15:52 -07:00
Qiang Huang b6e23f8166 Add comments for error cases in status functions
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-05-16 18:24:07 +08:00
Peng Gao b7219cc2b3 Update nsenter README
Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
2016-05-14 22:38:43 +08:00
Antonio Murdaca 9d14efec4c libcontainer: nsenter: nsexec.c: fix warnings
Fix the following warnings when building runc with gcc 6+:

Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/nsenter/nsexec.c:
In function ‘nsexec’:
Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/nsenter/nsexec.c:322:6:
warning: ‘__s’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
      pr_perror("Failed to open %s", ns);
Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/nsenter/nsexec.c:273:30:
note: ‘__s’ was declared here
 static struct nsenter_config process_nl_attributes(int pipenum, char
*data, int data_size)
                              ^~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-05-14 11:19:44 +02:00
Kenfe-Mickael Laventure 10a3c26c9a Fix GetLongBit() returns value when _SC_LONG_BIT is not available
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2016-05-13 09:37:58 -07:00
Aleksa Sarai e991f041a1 Revert "Need to make sure labels applied to /dev"
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-05-11 23:28:01 +10:00
Mrunal Patel 4a8f0b4db4 Fix cgroup Set when Paths are specified
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-05-09 16:06:03 -07:00
Kenfe-Mickael Laventure 27814ee120 Allow updating kmem.limit_in_bytes if initialized at cgroup creation
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2016-05-06 08:05:15 -07:00
rajasec cb04f48486 Updating error condition in applying apparmor profile
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-05-04 19:10:55 +05:30
Dan Walsh 77f312c51c Need to make sure labels applied to /dev
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2016-05-02 08:17:49 -04:00
Michael Crosby e87c59e2e4 Merge pull request #793 from bboreham/label-sep
Use '=' instead of ':' separator on labels
2016-04-29 15:19:28 -07:00
Jim Berlage c5b0caf76d Correct outdated URL
`libcontainer/cgroups/utils.go` uses an incorrect path to the
documentation for cgroups.  This updates the comment to use the correct
URL.  Fixes #794.

Signed-off-by: Jim Berlage <james.berlage@gmail.com>
2016-04-29 10:44:27 -05:00
Bryan Boreham 4a87beb661 Use '=' instead of ':' separator on labels, which is now deprecated by Docker
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2016-04-29 13:01:44 +01:00
Justin Cormack e18de63108 If possible, apply seccomp rules immediately before exec
See https://github.com/docker/docker/issues/22252

Previously we would apply seccomp rules before applying
capabilities, because it requires CAP_SYS_ADMIN. This
however means that a seccomp profile needs to allow
operations such as setcap() and setuid() which you
might reasonably want to disallow.

If prctl(PR_SET_NO_NEW_PRIVS) has been applied however
setting a seccomp filter is an unprivileged operation.
Therefore if this has been set, apply the seccomp
filter as late as possible, after capabilities have
been dropped and the uid set.

Note a small number of syscalls will take place
after the filter is applied, such as `futex`,
`stat` and `execve`, so these still need to be allowed
in addition to any the program itself needs.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2016-04-27 20:06:14 +01:00
Mrunal Patel 091ed0b043 Merge pull request #777 from cyphar/fix-null-pointer-deref
libcontainer: specconv: fix nil dereference in resource setup
2016-04-24 19:09:30 -07:00
Aleksa Sarai a939c7ecd9 libcontainer: specconv: fix nil dereference in resource setup
This caused issues if someone omitted or set "resources": null, in the
runC config. The panic follows.

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x545b53]

goroutine 1 [running]:
panic(0x7aed40, 0xc820014260)
        /usr/lib64/go/src/runtime/panic.go:464 +0x3e6
github.com/opencontainers/runc/libcontainer/specconv.CreateLibcontainerConfig(0xc8200b0e30, 0x836480, 0x0, 0x0)
        /home/cyphar/src/runc/Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/specconv/spec_linux.go:222 +0xe83
main.createContainer(0xc82007eb40, 0x7ffd8024e439, 0x4, 0xc82008e780, 0x0, 0x0, 0x0, 0x0)
        /home/cyphar/src/runc/utils_linux.go:174 +0x105
main.startContainer(0xc82007eb40, 0xc82008e780, 0x0, 0x0, 0x0)
        /home/cyphar/src/runc/start.go:114 +0x189
main.glob.func11(0xc82007eb40)
        /home/cyphar/src/runc/start.go:78 +0x13e
github.com/codegangsta/cli.Command.Run(0x829a58, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x87ada0, 0x1a, 0x8dff80, ...)
        /home/cyphar/src/runc/Godeps/_workspace/src/github.com/codegangsta/cli/command.go:137 +0x1081
github.com/codegangsta/cli.(*App).Run(0xc82007e900, 0xc82000a050, 0x5, 0x5, 0x0, 0x0)
        /home/cyphar/src/runc/Godeps/_workspace/src/github.com/codegangsta/cli/app.go:176 +0xffa
main.main()
        /home/cyphar/src/runc/main.go:123 +0xc8e

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-04-25 11:52:22 +10:00
Aleksa Sarai 399175c227 Merge pull request #679 from rajasec/selinux-errorcheck
Adding selinux check during container start
2016-04-24 16:24:26 +00:00
Alexander Morozov ae0fc15b1e Merge pull request #608 from inatatsu/reduce-parsing-mountinfo
Eliminate redundant parsing of mountinfo
2016-04-23 22:30:54 -07:00
Mrunal Patel e25811108b Bump up spec and add support for mount label
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-04-22 15:31:39 -07:00
Tatsushi Inagaki eb0a144b5e Rootfs: reduce redundant parsing of mountinfo
Postpone parsing mountinfo until pivot_root() actually failed

Signed-off-by: Tatsushi Inagaki <e29253@jp.ibm.com>
2016-04-22 09:41:28 +09:00
Tatsushi Inagaki 78e1a4fc2e Selinux: reduce redundant parsing of mountinfo
Avoid parsing the whole lines of mountinfo after the mountpoint
is found.

Signed-off-by: Tatsushi Inagaki <e29253@jp.ibm.com>
2016-04-22 09:41:28 +09:00
Tatsushi Inagaki 2a1a6cdf44 Cgroup: reduce redundant parsing of mountinfo
Avoid parsing the whole lines of mountinfo after all mountpoints
of the target subsytems are found, or when the target subsystem
is not enabled.

Signed-off-by: Tatsushi Inagaki <e29253@jp.ibm.com>
2016-04-22 09:41:28 +09:00
rajasec 733ff99f6d Updating kcore in validator test
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-04-21 15:29:19 +05:30
Michael Crosby 7dd87976ed Merge pull request #758 from rajasec/container-pause-comment
Update the comment for container pause
2016-04-19 16:16:41 -07:00