1. 29 Feb, 2012 1 commit
  2. 04 Jan, 2012 1 commit
  3. 26 Jul, 2011 1 commit
  4. 23 May, 2011 1 commit
  5. 17 Dec, 2010 1 commit
  6. 28 Nov, 2010 2 commits
    • Linus Torvalds's avatar
      Export 'get_pipe_info()' to other users · c66fb347
      Linus Torvalds authored
      And in particular, use it in 'pipe_fcntl()'.
      The other pipe functions do not need to use the 'careful' version, since
      they are only ever called for things that are already known to be pipes.
      The normal read/write/ioctl functions are called through the file
      operations structures, so if a file isn't a pipe, they'd never get
      called.  But pipe_fcntl() is special, and called directly from the
      generic fcntl code, and needs to use the same careful function that the
      splice code is using.
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Linus Torvalds's avatar
      Rename 'pipe_info()' to 'get_pipe_info()' · 71993e62
      Linus Torvalds authored
      .. and change it to take the 'file' pointer instead of an inode, since
      that's what all users want anyway.
      The renaming is preparatory to exporting it to other users.  The old
      'pipe_info()' name was too generic and is already used elsewhere, so
      before making the function public we need to use a more specific name.
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  7. 07 Aug, 2010 2 commits
  8. 30 Jun, 2010 2 commits
  9. 25 May, 2010 1 commit
  10. 21 May, 2010 1 commit
  11. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      The script does the followings.
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
      The conversion was done in the following steps.
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
      6. percpu.h was updated not to include slab.h.
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
  12. 04 Nov, 2009 1 commit
  13. 14 Sep, 2009 1 commit
    • Jan Kara's avatar
      vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode · 148f948b
      Jan Kara authored
      Introduce new function for generic inode syncing (vfs_fsync_range) and use
      it from fsync() path. Introduce also new helper for syncing after a sync
      write (generic_write_sync) using the generic function.
      Use these new helpers for syncing from generic VFS functions. This makes
      O_SYNC writes to block devices acquire i_mutex for syncing. If we really
      care about this, we can make block_fsync() drop the i_mutex and reacquire
      it before it returns.
      CC: Evgeniy Polyakov <zbr@ioremap.net>
      CC: ocfs2-devel@oss.oracle.com
      CC: Joel Becker <joel.becker@oracle.com>
      CC: Felix Blyakher <felixb@sgi.com>
      CC: xfs@oss.sgi.com
      CC: Anton Altaparmakov <aia21@cantab.net>
      CC: linux-ntfs-dev@lists.sourceforge.net
      CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      CC: linux-ext4@vger.kernel.org
      CC: tytso@mit.edu
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
  14. 11 Sep, 2009 1 commit
  15. 19 May, 2009 1 commit
    • Miklos Szeredi's avatar
      splice: fix kmaps in default_file_splice_write() · b2858d7d
      Miklos Szeredi authored
      Unfortunately multiple kmap() within a single thread are deadlockable,
      so writing out multiple buffers with writev() isn't possible.
      Change the implementation so that it does a separate write() for each
      buffer.  This actually simplifies the code a lot since the
      splice_from_pipe() helper can be used.
      This limitation is caused by HIGHMEM pages, and so only affects a
      subset of architectures and configurations.  In the future it may be
      worth to implement default_file_splice_write() in a more efficient way
      on configs that allow it.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
  16. 14 May, 2009 1 commit
  17. 13 May, 2009 1 commit
  18. 11 May, 2009 3 commits
    • Miklos Szeredi's avatar
      splice: implement default splice_write method · 0b0a47f5
      Miklos Szeredi authored
      If f_op->splice_write() is not implemented, fall back to a plain write.
      Use vfs_writev() to write from the pipe buffers.
      This will allow splice on all filesystems and file types.  This
      includes "direct_io" files in fuse which bypass the page cache.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
    • Miklos Szeredi's avatar
      splice: implement default splice_read method · 6818173b
      Miklos Szeredi authored
      If f_op->splice_read() is not implemented, fall back to a plain read.
      Use vfs_readv() to read into previously allocated pages.
      This will allow splice and functions using splice, such as the loop
      device, to work on all filesystems.  This includes "direct_io" files
      in fuse which bypass the page cache.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
    • Miklos Szeredi's avatar
      splice: implement pipe to pipe splicing · 7c77f0b3
      Miklos Szeredi authored
      Allow splice(2) to work when both the input and the output is a pipe.
      Based on the impementation of the tee(2) syscall, but instead of
      duplicating the buffer references move the buffers from the input pipe
      to the output pipe.
      Moving the whole buffer only succeeds if the full length of the buffer
      is spliced.  Otherwise duplicate the buffer, just like tee(2), set the
      length of the output buffer and advance the offset on the input
      Since splice is operating on two pipes, special care needs to be taken
      with locking to prevent AN ABBA deadlock.  Again this is done
      similarly to the tee(2) syscall, first preparing the input and output
      pipes so there's data to consume and space for that data, and then
      doing the move operation while holding both locks.
      If other processes are doing I/O on the same pipes parallel to the
      splice, then by the time both inodes are locked there might be no
      buffers left to move, or no space to move them to.  In this case retry
      the whole operation, including the preparation phase.  This could lead
      to starvation, but I'm not sure if that's serious enough to worry
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
  19. 17 Apr, 2009 1 commit
  20. 15 Apr, 2009 6 commits
  21. 07 Apr, 2009 1 commit
    • Miklos Szeredi's avatar
      splice: fix deadlock in splicing to file · 7bfac9ec
      Miklos Szeredi authored
      There's a possible deadlock in generic_file_splice_write(),
      splice_from_pipe() and ocfs2_file_splice_write():
       - task A calls generic_file_splice_write()
       - this calls inode_double_lock(), which locks i_mutex on both
         pipe->inode and target inode
       - ordering depends on inode pointers, can happen that pipe->inode is
         locked first
       - __splice_from_pipe() needs more data, calls pipe_wait()
       - this releases lock on pipe->inode, goes to interruptible sleep
       - task B calls generic_file_splice_write(), similarly to the first
       - this locks pipe->inode, then tries to lock inode, but that is
         already held by task A
       - task A is interrupted, it tries to lock pipe->inode, but fails, as
         it is already held by task B
       - ABBA deadlock
      Fix this by explicitly ordering locks: the outer lock must be on
      target inode and the inner lock (which is later unlocked and relocked)
      must be on pipe->inode.  This is OK, pipe inodes and target inodes
      form two nonoverlapping sets, generic_file_splice_write() and friends
      are not called with a target which is a pipe.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Acked-by: default avatarMark Fasheh <mfasheh@suse.com>
      Acked-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  22. 03 Apr, 2009 1 commit
  23. 14 Jan, 2009 1 commit
  24. 08 Jan, 2009 1 commit
    • KAMEZAWA Hiroyuki's avatar
      memcg: synchronized LRU · 08e552c6
      KAMEZAWA Hiroyuki authored
      A big patch for changing memcg's LRU semantics.
        - page_cgroup is linked to mem_cgroup's its own LRU (per zone).
        - LRU of page_cgroup is not synchronous with global LRU.
        - page and page_cgroup is one-to-one and statically allocated.
        - To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
          - lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);
        - SwapCache is handled.
      And, when we handle LRU list of page_cgroup, we do following.
      	pc = lookup_page_cgroup(page);
      	lock_page_cgroup(pc); .....................(1)
      	mz = page_cgroup_zoneinfo(pc);
      	.....add to LRU
      But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
      So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.
      This is a trial to remove this dirty nesting of locks.
      This patch changes mz->lru_lock to be zone->lru_lock.
      Then, above sequence will be written as
              spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
      	mem_cgroup_add/remove/etc_lru() {
      		pc = lookup_page_cgroup(page);
      		mz = page_cgroup_zoneinfo(pc);
      		if (PageCgroupUsed(pc)) {
      			....add to LRU
              spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
      This is much simpler.
      (*) We're safe even if we don't take lock_page_cgroup(pc). Because..
          1. When pc->mem_cgroup can be modified.
             - at charge.
             - at account_move().
          2. at charge
             the PCG_USED bit is not set before pc->mem_cgroup is fixed.
          3. at account_move()
             the page is isolated and not on LRU.
        - easy for maintenance.
        - memcg can make use of laziness of pagevec.
        - we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
        - LRU status of memcg will be synchronized with global LRU's one.
        - # of locks are reduced.
        - account_move() is simplified very much.
        - may increase cost of LRU rotation.
          (no impact if memcg is not configured.)
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  25. 30 Oct, 2008 1 commit
  26. 09 Oct, 2008 1 commit
    • Linus Torvalds's avatar
      Don't allow splice() to files opened with O_APPEND · efc968d4
      Linus Torvalds authored
      This is debatable, but while we're debating it, let's disallow the
      combination of splice and an O_APPEND destination.
      It's not entirely clear what the semantics of O_APPEND should be, and
      POSIX apparently expects pwrite() to ignore O_APPEND, for example.  So
      we could make up any semantics we want, including the old ones.
      But Miklos convinced me that we should at least give it some thought,
      and that accepting writes at arbitrary offsets is wrong at least for
      IS_APPEND() files (which always have O_APPEND set, even if the reverse
      isn't true: you can obviously have O_APPEND set on a regular file).
      So disallow O_APPEND entirely for now.  I doubt anybody cares, and this
      way we have one less gray area to worry about.
      Reported-and-argued-for-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Acked-by: default avatarJens Axboe <ens.axboe@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  27. 05 Aug, 2008 1 commit
  28. 27 Jul, 2008 1 commit
  29. 26 Jul, 2008 1 commit
    • Nick Piggin's avatar
      splice: use get_user_pages_fast · bc40d73c
      Nick Piggin authored
      Use get_user_pages_fast in splice.  This reverts some mmap_sem batching
      there, however the biggest problem with mmap_sem tends to be hold times
      blocking out other threads rather than cacheline bouncing.  Further: on
      architectures that implement get_user_pages_fast without locks, mmap_sem
      can be avoided completely anyway.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Reviewed-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  30. 04 Jul, 2008 1 commit