Skip to content
  • Daniel Lezcano's avatar
    pidns: add reboot_pid_ns() to handle the reboot syscall · cf3f8921
    Daniel Lezcano authored
    
    
    In the case of a child pid namespace, rebooting the system does not really
    makes sense.  When the pid namespace is used in conjunction with the other
    namespaces in order to create a linux container, the reboot syscall leads
    to some problems.
    
    A container can reboot the host.  That can be fixed by dropping the
    sys_reboot capability but we are unable to correctly to poweroff/
    halt/reboot a container and the container stays stuck at the shutdown time
    with the container's init process waiting indefinitively.
    
    After several attempts, no solution from userspace was found to reliabily
    handle the shutdown from a container.
    
    This patch propose to make the init process of the child pid namespace to
    exit with a signal status set to : SIGINT if the child pid namespace
    called "halt/poweroff" and SIGHUP if the child pid namespace called
    "reboot".  When the reboot syscall is called and we are not in the initial
    pid namespace, we kill the pid namespace for "HALT", "POWEROFF",
    "RESTART", and "RESTART2".  Otherwise we return EINVAL.
    
    Returning EINVAL is also an easy way to check if this feature is supported
    by the kernel when invoking another 'reboot' option like CAD.
    
    By this way the parent process of the child pid namespace knows if it
    rebooted or not and can take the right decision.
    
    Test case:
    ==========
    
    #include <alloca.h>
    #include <stdio.h>
    #include <sched.h>
    #include <unistd.h>
    #include <signal.h>
    #include <sys/reboot.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    
    #include <linux/reboot.h>
    
    static int do_reboot(void *arg)
    {
            int *cmd = arg;
    
            if (reboot(*cmd))
                    printf("failed to reboot(%d): %m\n", *cmd);
    }
    
    int test_reboot(int cmd, int sig)
    {
            long stack_size = 4096;
            void *stack = alloca(stack_size) + stack_size;
            int status;
            pid_t ret;
    
            ret = clone(do_reboot, stack, CLONE_NEWPID | SIGCHLD, &cmd);
            if (ret < 0) {
                    printf("failed to clone: %m\n");
                    return -1;
            }
    
            if (wait(&status) < 0) {
                    printf("unexpected wait error: %m\n");
                    return -1;
            }
    
            if (!WIFSIGNALED(status)) {
                    printf("child process exited but was not signaled\n");
                    return -1;
            }
    
            if (WTERMSIG(status) != sig) {
                    printf("signal termination is not the one expected\n");
                    return -1;
            }
    
            return 0;
    }
    
    int main(int argc, char *argv[])
    {
            int status;
    
            status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP);
            if (status < 0)
                    return 1;
            printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n");
    
            status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP);
            if (status < 0)
                    return 1;
            printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n");
    
            status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT);
            if (status < 0)
                    return 1;
            printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n");
    
            status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT);
            if (status < 0)
                    return 1;
            printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n");
    
            status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1);
            if (status >= 0) {
                    printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n");
                    return 1;
            }
            printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n");
    
            return 0;
    }
    
    [akpm@linux-foundation.org: tweak and add comments]
    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@free.fr>
    Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
    Tested-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
    Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Tejun Heo <tj@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    cf3f8921