Skip to content
  • Eric W. Biederman's avatar
    fs: Add user namespace member to struct super_block · 6e4eab57
    Eric W. Biederman authored
    
    
    Start marking filesystems with a user namespace owner, s_user_ns.  In
    this change this is only used for permission checks of who may mount a
    filesystem.  Ultimately s_user_ns will be used for translating ids and
    checking capabilities for filesystems mounted from user namespaces.
    
    The default policy for setting s_user_ns is implemented in sget(),
    which arranges for s_user_ns to be set to current_user_ns() and to
    ensure that the mounter of the filesystem has CAP_SYS_ADMIN in that
    user_ns.
    
    The guts of sget are split out into another function sget_userns().
    The function sget_userns calls alloc_super with the specified user
    namespace or it verifies the existing superblock that was found
    has the expected user namespace, and fails with EBUSY when it is not.
    This failing prevents users with the wrong privileges mounting a
    filesystem.
    
    The reason for the split of sget_userns from sget is that in some
    cases such as mount_ns and kernfs_mount_ns a different policy for
    permission checking of mounts and setting s_user_ns is necessary, and
    the existence of sget_userns() allows those policies to be
    implemented.
    
    The helper mount_ns is expected to be used for filesystems such as
    proc and mqueuefs which present per namespace information.  The
    function mount_ns is modified to call sget_userns instead of sget to
    ensure the user namespace owner of the namespace whose information is
    presented by the filesystem is used on the superblock.
    
    For sysfs and cgroup the appropriate permission checks are already in
    place, and kernfs_mount_ns is modified to call sget_userns so that
    the init_user_ns is the only user namespace used.
    
    For the cgroup filesystem cgroup namespace mounts are bind mounts of a
    subset of the full cgroup filesystem and as such s_user_ns must be the
    same for all of them as there is only a single superblock.
    
    Mounts of sysfs that vary based on the network namespace could in principle
    change s_user_ns but it keeps the analysis and implementation of kernfs
    simpler if that is not supported, and at present there appear to be no
    benefits from supporting a different s_user_ns on any sysfs mount.
    
    Getting the details of setting s_user_ns correct has been
    a long process.  Thanks to Pavel Tikhorirorv who spotted a leak
    in sget_userns.  Thanks to Seth Forshee who has kept the work alive.
    
    Thanks-to: Seth Forshee <seth.forshee@canonical.com>
    Thanks-to: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
    Acked-by: default avatarSeth Forshee <seth.forshee@canonical.com>
    Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
    6e4eab57