1. 12 Jul, 2017 1 commit
  2. 11 Oct, 2016 1 commit
  3. 03 Oct, 2016 1 commit
  4. 16 Sep, 2016 2 commits
    • Miklos Szeredi's avatar
      vfs: do get_write_access() on upper layer of overlayfs · 4d0c5ba2
      Miklos Szeredi authored
      The problem with writecount is: we want consistent handling of it for
      underlying filesystems as well as overlayfs.  Making sure i_writecount is
      correct on all layers is difficult.  Instead this patch makes sure that
      when write access is acquired, it's always done on the underlying writable
      layer (called the upper layer).  We must also make sure to look at the
      writecount on this layer when checking for conflicting leases.
      Open for write already updates the upper layer's writecount.  Leaving only
      For truncate copy up must happen before get_write_access() so that the
      writecount is updated on the upper layer.  Problem with this is if
      something fails after that, then copy-up was done needlessly.  E.g. if
      break_lease() was interrupted.  Probably not a big deal in practice.
      Another interesting case is if there's a denywrite on a lower file that is
      then opened for write or truncated.  With this patch these will succeed,
      which is somewhat counterintuitive.  But I think it's still acceptable,
      considering that the copy-up does actually create a different file, so the
      old, denywrite mapping won't be touched.
      On non-overlayfs d_real() is an identity function and d_real_inode() is
      equivalent to d_inode() so this patch doesn't change behavior in that case.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: default avatarJeff Layton <jlayton@poochiereds.net>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
    • Miklos Szeredi's avatar
      locks: fix file locking on overlayfs · c568d683
      Miklos Szeredi authored
      This patch allows flock, posix locks, ofd locks and leases to work
      correctly on overlayfs.
      Instead of using the underlying inode for storing lock context use the
      overlay inode.  This allows locks to be persistent across copy-up.
      This is done by introducing locks_inode() helper and using it instead of
      file_inode() to get the inode in locking code.  For non-overlayfs the two
      are equivalent, except for an extra pointer dereference in locks_inode().
      Since lock operations are in "struct file_operations" we must also make
      sure not to call underlying filesystem's lock operations.  Introcude a
      super block flag MS_NOREMOTELOCK to this effect.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: default avatarJeff Layton <jlayton@poochiereds.net>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
  5. 30 Jun, 2016 1 commit
    • Miklos Szeredi's avatar
      vfs: merge .d_select_inode() into .d_real() · 2d902671
      Miklos Szeredi authored
      The two methods essentially do the same: find the real dentry/inode
      belonging to an overlay dentry.  The difference is in the usage:
      vfs_open() uses ->d_select_inode() and expects the function to perform
      copy-up if necessary based on the open flags argument.
      file_dentry() uses ->d_real() passing in the overlay dentry as well as the
      underlying inode.
      vfs_rename() uses ->d_select_inode() but passes zero flags.  ->d_real()
      with a zero inode would have worked just as well here.
      This patch merges the functionality of ->d_select_inode() into ->d_real()
      by adding an 'open_flags' argument to the latter.
      [Al Viro] Make the signature of d_real() match that of ->d_real() again.
      And constify the inode argument, while we are at it.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
  6. 11 May, 2016 1 commit
  7. 02 May, 2016 1 commit
  8. 30 Mar, 2016 1 commit
  9. 28 Mar, 2016 3 commits
  10. 22 Mar, 2016 1 commit
    • Jann Horn's avatar
      fs/coredump: prevent fsuid=0 dumps into user-controlled directories · 378c6520
      Jann Horn authored
      This commit fixes the following security hole affecting systems where
      all of the following conditions are fulfilled:
       - The fs.suid_dumpable sysctl is set to 2.
       - The kernel.core_pattern sysctl's value starts with "/". (Systems
         where kernel.core_pattern starts with "|/" are not affected.)
       - Unprivileged user namespace creation is permitted. (This is
         true on Linux >=3.8, but some distributions disallow it by
         default using a distro patch.)
      Under these conditions, if a program executes under secure exec rules,
      causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
      namespace, changes its root directory and crashes, the coredump will be
      written using fsuid=0 and a path derived from kernel.core_pattern - but
      this path is interpreted relative to the root directory of the process,
      allowing the attacker to control where a coredump will be written with
      root privileges.
      To fix the security issue, always interpret core_pattern for dumps that
      are written under SUID_DUMP_ROOT relative to the root directory of init.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  11. 22 Jan, 2016 1 commit
    • Al Viro's avatar
      wrappers for ->i_mutex access · 5955102c
      Al Viro authored
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  12. 04 Jan, 2016 1 commit
  13. 10 Jul, 2015 1 commit
    • Eric W. Biederman's avatar
      vfs: Commit to never having exectuables on proc and sysfs. · 90f8572b
      Eric W. Biederman authored
      Today proc and sysfs do not contain any executable files.  Several
      applications today mount proc or sysfs without noexec and nosuid and
      then depend on there being no exectuables files on proc or sysfs.
      Having any executable files show on proc or sysfs would cause
      a user space visible regression, and most likely security problems.
      Therefore commit to never allowing executables on proc and sysfs by
      adding a new flag to mark them as filesystems without executables and
      enforce that flag.
      Test the flag where MNT_NOEXEC is tested today, so that the only user
      visible effect will be that exectuables will be treated as if the
      execute bit is cleared.
      The filesystems proc and sysfs do not currently incoporate any
      executable files so this does not result in any user visible effects.
      This makes it unnecessary to vet changes to proc and sysfs tightly for
      adding exectuable files or changes to chattr that would modify
      existing files, as no matter what the individual file say they will
      not be treated as exectuable files by the vfs.
      Not having to vet changes to closely is important as without this we
      are only one proc_create call (or another goof up in the
      implementation of notify_change) from having problematic executables
      on proc.  Those mistakes are all too easy to make and would create
      a situation where there are security issues or the assumptions of
      some program having to be broken (and cause userspace regressions).
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
  14. 23 Jun, 2015 2 commits
    • Jan Kara's avatar
      fs: Call security_ops->inode_killpriv on truncate · 45f147a1
      Jan Kara authored
      Comment in include/linux/security.h says that ->inode_killpriv() should
      be called when setuid bit is being removed and that similar security
      labels (in fact this applies only to file capabilities) should be
      removed at this time as well. However we don't call ->inode_killpriv()
      when we remove suid bit on truncate.
      We fix the problem by calling ->inode_need_killpriv() and subsequently
      ->inode_killpriv() on truncate the same way as we do it on file write.
      After this patch there's only one user of should_remove_suid() - ocfs2 -
      and indeed it's buggy because it doesn't call ->inode_killpriv() on
      write. However fixing it is difficult because of special locking
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Miklos Szeredi's avatar
      vfs: add file_path() helper · 9bf39ab2
      Miklos Szeredi authored
      	d_path(&file->f_path, ...);
      	file_path(file, ...);
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  15. 19 Jun, 2015 1 commit
    • David Howells's avatar
      overlayfs: Make f_path always point to the overlay and f_inode to the underlay · 4bacc9c9
      David Howells authored
      Make file->f_path always point to the overlay dentry so that the path in
      /proc/pid/fd is correct and to ensure that label-based LSMs have access to the
      overlay as well as the underlay (path-based LSMs probably don't need it).
      Using my union testsuite to set things up, before the patch I see:
      	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
      	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
      	lr-x------. 1 root root 64 Jun  5 14:38 5 -> /a/foo107
      	[root@andromeda union-testsuite]# stat /mnt/a/foo107
      	Device: 23h/35d Inode: 13381       Links: 1
      	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
      	Device: 23h/35d Inode: 13381       Links: 1
      After the patch:
      	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
      	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
      	lr-x------. 1 root root 64 Jun  5 14:22 5 -> /mnt/a/foo107
      	[root@andromeda union-testsuite]# stat /mnt/a/foo107
      	Device: 23h/35d Inode: 40346       Links: 1
      	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
      	Device: 23h/35d Inode: 40346       Links: 1
      Note the change in where /proc/$$/fd/5 points to in the ls command.  It was
      pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
      (which is correct).
      The inode accessed, however, is the lower layer.  The union layer is on device
      25h/37d and the upper layer on 24h/36d.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  16. 11 May, 2015 1 commit
  17. 12 Apr, 2015 3 commits
    • Al Viro's avatar
      ->aio_read and ->aio_write removed · 84363182
      Al Viro authored
      no remaining users
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Andrew Elble's avatar
      NFS: fix BUG() crash in notify_change() with patch to chown_common() · c1b8940b
      Andrew Elble authored
      We have observed a BUG() crash in fs/attr.c:notify_change(). The crash
      occurs during an rsync into a filesystem that is exported via NFS.
      1.) fs/attr.c:notify_change() modifies the caller's version of attr.
      2.) 6de0ec00 ("VFS: make notify_change pass ATTR_KILL_S*ID to
          setattr operations") introduced a BUG() restriction such that "no
          function will ever call notify_change() with both ATTR_MODE and
          ATTR_KILL_S*ID set". Under some circumstances though, it will have
          assisted in setting the caller's version of attr to this very
      3.) 27ac0ffe ("locks: break delegations on any attribute
          modification") introduced code to handle breaking
          delegations. This can result in notify_change() being re-called. attr
          _must_ be explicitly reset to avoid triggering the BUG() established
          in #2.
      4.) The path that that triggers this is via fs/open.c:chmod_common().
          The combination of attr flags set here and in the first call to
          notify_change() along with a later failed break_deleg_wait()
          results in notify_change() being called again via retry_deleg
          without resetting attr.
      Solution is to move retry_deleg in chmod_common() a bit further up to
      ensure attr is completely reset.
      There are other places where this seemingly could occur, such as
      fs/utimes.c:utimes_common(), but the attr flags are not initially
      set in such a way to trigger this.
      Fixes: 27ac0ffe ("locks: break delegations on any attribute modification")
      Reported-by: default avatarEric Meddaugh <etmsys@rit.edu>
      Tested-by: default avatarEric Meddaugh <etmsys@rit.edu>
      Signed-off-by: default avatarAndrew Elble <aweits@rit.edu>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Al Viro's avatar
      drop bogus check in file_open_root() · e5b811e3
      Al Viro authored
      For one thing, LOOKUP_DIRECTORY will be dealt with in do_last().
      For another, name can be an empty string, but not NULL - no callers
      pass that and it would oops immediately if they would.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  18. 25 Mar, 2015 1 commit
  19. 17 Feb, 2015 1 commit
    • Matthew Wilcox's avatar
      vfs: remove get_xip_mem · e748dcd0
      Matthew Wilcox authored
      All callers of get_xip_mem() are now gone.  Remove checks for it,
      initialisers of it, documentation of it and the only implementation of it.
       Also remove mm/filemap_xip.c as it is now empty.  Also remove
      documentation of the long-gone get_xip_page().
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Andreas Dilger <andreas.dilger@intel.com>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  20. 23 Jan, 2015 1 commit
    • Paul Moore's avatar
      fs: create proper filename objects using getname_kernel() · 51689104
      Paul Moore authored
      There are several areas in the kernel that create temporary filename
      objects using the following pattern:
      	int func(const char *name)
      		struct filename *file = { .name = name };
      		return 0;
      ... which for the most part works okay, but it causes havoc within the
      audit subsystem as the filename object does not persist beyond the
      lifetime of the function.  This patch converts all of these temporary
      filename objects into proper filename objects using getname_kernel()
      and putname() which ensure that the filename object persists until the
      audit subsystem is finished with it.
      Also, a special thanks to Al Viro, Guenter Roeck, and Sabrina Dubroca
      for helping resolve a difficult kernel panic on boot related to a
      use-after-free problem in kern_path_create(); the thread can be seen
      at the link below:
       * https://lkml.org/lkml/2015/1/20/710
      This patch includes code that was either based on, or directly written
      by Al in the above thread.
      CC: viro@zeniv.linux.org.uk
      CC: linux@roeck-us.net
      CC: sd@queasysnail.net
      CC: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  21. 13 Dec, 2014 1 commit
    • Heinrich Schuchardt's avatar
      fallocate: create FAN_MODIFY and IN_MODIFY events · 820c12d5
      Heinrich Schuchardt authored
      The fanotify and the inotify API can be used to monitor changes of the
      file system.  System call fallocate() modifies files.  Hence it should
      trigger the corresponding fanotify (FAN_MODIFY) and inotify (IN_MODIFY)
      events.  The most interesting case is FALLOC_FL_COLLAPSE_RANGE because
      this value allows to create arbitrary file content from random data.
      This patch adds the missing call to fsnotify_modify().
      The FAN_MODIFY and IN_MODIFY event will be created when fallocate()
      succeeds.  It will even be created if the file length remains unchanged,
      e.g.  when calling fanotify with flag FALLOC_FL_KEEP_SIZE.
      This logic was primarily chosen to keep the coding simple.
      It resembles the logic of the write() system call.
      When we call write() we always create a FAN_MODIFY event, even in the case
      of overwriting with identical data.
      Events FAN_MODIFY and IN_MODIFY do not provide any guarantee that data was
      actually changed.
      Furthermore even if if the filesize remains unchanged, fallocate() may
      influence whether a subsequent write() will succeed and hence the
      fallocate() call may be considered a modification.
      The fallocate(2) man page teaches: After a successful call, subsequent
      writes into the range specified by offset and len are guaranteed not to
      fail because of lack of disk space.
      So calling fallocate(fd, FALLOC_FL_KEEP_SIZE, offset, len) may result in
      different outcomes of a subsequent write depending on the values of offset
      and len.
      Signed-off-by: Heinrich Schuchardt's avatarHeinrich Schuchardt <xypron.glpk@gmx.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  22. 19 Nov, 2014 1 commit
    • Al Viro's avatar
      new helper: audit_file() · 9f45f5bf
      Al Viro authored
      ... for situations when we don't have any candidate in pathnames - basically,
      in descriptor-based syscalls.
      [Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  23. 07 Nov, 2014 1 commit
  24. 23 Oct, 2014 1 commit
  25. 01 Aug, 2014 1 commit
  26. 06 May, 2014 2 commits
    • Al Viro's avatar
      new methods: ->read_iter() and ->write_iter() · 293bc982
      Al Viro authored
      Beginning to introduce those.  Just the callers for now, and it's
      clumsier than it'll eventually become; once we finish converting
      aio_read and aio_write instances, the things will get nicer.
      For now, these guys are in parallel to ->aio_read() and ->aio_write();
      they take iocb and iov_iter, with everything in iov_iter already
      validated.  File offset is passed in iocb->ki_pos, iov/nr_segs -
      in iov_iter.
      Main concerns in that series are stack footprint and ability to
      split the damn thing cleanly.
      [fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Al Viro's avatar
      replace checking for ->read/->aio_read presence with check in ->f_mode · 7f7f25e8
      Al Viro authored
      Since we are about to introduce new methods (read_iter/write_iter), the
      tests in a bunch of places would have to grow inconveniently.  Check
      once (at open() time) and store results in ->f_mode as FMODE_CAN_READ
      and FMODE_CAN_WRITE resp.  It might end up being a temporary measure -
      once everything switches from ->aio_{read,write} to ->{read,write}_iter
      it might make sense to return to open-coded checks.  We'll see...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  27. 12 Apr, 2014 3 commits
  28. 02 Apr, 2014 4 commits