1. 27 Mar, 2006 6 commits
  2. 26 Mar, 2006 7 commits
  3. 25 Mar, 2006 14 commits
  4. 24 Mar, 2006 13 commits
    • Amos Waterland's avatar
      The comment describing how MS_ASYNC works in msync.c is confusing · 16538c40
      Amos Waterland authored
      because of a typo.  This patch just changes "my" to "by", which I
      believe was the original intent.
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      16538c40
    • Davi Arnaut's avatar
      [PATCH] strndup_user() · 96840aa0
      Davi Arnaut authored
      This patch series creates a strndup_user() function to easy copying C strings
      from userspace.  Also we avoid common pitfalls like userspace modifying the
      final \0 after the strlen_user().
      Signed-off-by: default avatarDavi Arnaut <davi.arnaut@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      96840aa0
    • Andrew Morton's avatar
      [PATCH] msync(): use do_fsync() · 8f2e9f15
      Andrew Morton authored
      No need to duplicate all that code.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8f2e9f15
    • Andrew Morton's avatar
      [PATCH] msync: fix return value · 676758bd
      Andrew Morton authored
      msync() does a strange thing.  Essentially:
      
      	vma = find_vma();
      	for ( ; ; ) {
      		if (!vma)
      			return -ENOMEM;
      		...
      		vma = vma->vm_next;
      	}
      
      so an msync() request which starts within or before a valid VMA and which ends
      within or beyond the final VMA will incorrectly return -ENOMEM.
      
      Fix.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      676758bd
    • Andrew Morton's avatar
      [PATCH] msync(MS_SYNC): don't hold mmap_sem while syncing · 707c21c8
      Andrew Morton authored
      It seems bad to hold mmap_sem while performing synchronous disk I/O.  Alter
      the msync(MS_SYNC) code so that the lock is released while we sync the file.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      707c21c8
    • Andrew Morton's avatar
      [PATCH] msync(): perform dirty page levelling · 9c50823e
      Andrew Morton authored
      It seems sensible to perform dirty page throttling in msync: as the application
      dirties pages we can kick off pdflush early, or even force the msync() caller
      to perform writeout, or even throttle the msync() caller.
      
      The main effect of this is to start disk writeback earlier if we've just
      discovered that a large amount of pagecache has been dirtied.  (Otherwise it
      wouldn't happen for up to five seconds, next time pdflush wakes up).
      
      It also will cause the page-dirtying process to get panalised for dirtying
      those pages rather than whacking someone else with the problem.
      
      We should do this for munmap() and possibly even exit(), too.
      
      We drop the mmap_sem while performing the dirty page balancing.  It doesn't
      seem right to hold mmap_sem for that long.
      
      Note that this patch only affects MS_ASYNC.  MS_SYNC will be syncing all the
      dirty pages anyway.
      
      We note that msync(MS_SYNC) does a full-file-sync inside mmap_sem, and always
      has.  We can fix that up...
      
      The patch also tightens up the mmap_sem coverage in sys_msync(): no point in
      taking it while we perform the incoming arg checking.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9c50823e
    • Andrew Morton's avatar
      [PATCH] set_page_dirty() return value fixes · 4741c9fd
      Andrew Morton authored
      We need set_page_dirty() to return true if it actually transitioned the page
      from a clean to dirty state.  This wasn't right in a couple of places.  Do a
      kernel-wide audit, fix things up.
      
      This leaves open the possibility of returning a negative errno from
      set_page_dirty() sometime in the future.  But we don't do that at present.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4741c9fd
    • Andrew Morton's avatar
      [PATCH] balance_dirty_pages_ratelimited: take nr_pages arg · fa5a734e
      Andrew Morton authored
      Modify balance_dirty_pages_ratelimited() so that it can take a
      number-of-pages-which-I-just-dirtied argument.  For msync().
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fa5a734e
    • Andrew Morton's avatar
      [PATCH] fadvise(): write commands · ebcf28e1
      Andrew Morton authored
      Add two new linux-specific fadvise extensions():
      
      LINUX_FADV_ASYNC_WRITE: start async writeout of any dirty pages between file
      offsets `offset' and `offset+len'.  Any pages which are currently under
      writeout are skipped, whether or not they are dirty.
      
      LINUX_FADV_WRITE_WAIT: wait upon writeout of any dirty pages between file
      offsets `offset' and `offset+len'.
      
      By combining these two operations the application may do several things:
      
      LINUX_FADV_ASYNC_WRITE: push some or all of the dirty pages at the disk.
      
      LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE: push all of the currently dirty
      pages at the disk.
      
      LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE, LINUX_FADV_WRITE_WAIT: push all
      of the currently dirty pages at the disk, wait until they have been written.
      
      It should be noted that none of these operations write out the file's
      metadata.  So unless the application is strictly performing overwrites of
      already-instantiated disk blocks, there are no guarantees here that the data
      will be available after a crash.
      
      To complete this suite of operations I guess we should have a "sync file
      metadata only" operation.  This gives applications access to all the building
      blocks needed for all sorts of sync operations.  But sync-metadata doesn't fit
      well with the fadvise() interface.  Probably it should be a new syscall:
      sys_fmetadatasync().
      
      The patch also diddles with the meaning of `endbyte' in sys_fadvise64_64().
      It is made to represent that last affected byte in the file (ie: it is
      inclusive).  Generally, all these byterange and pagerange functions are
      inclusive so we can easily represent EOF with -1.
      
      As Ulrich notes, these two functions are somewhat abusive of the fadvise()
      concept, which appears to be "set the future policy for this fd".
      
      But these commands are a perfect fit with the fadvise() impementation, and
      several of the existing fadvise() commands are synchronous and don't affect
      future policy either.   I think we can live with the slight incongruity.
      
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ebcf28e1
    • Andrew Morton's avatar
      [PATCH] filemap_fdatawrite_range() api: clarify -end parameter · 469eb4d0
      Andrew Morton authored
      I had trouble understanding working out whether filemap_fdatawrite_range()'s
      `end' parameter describes the last-byte-to-be-written or the last-plus-one.
      Clarify that in comments.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      469eb4d0
    • Paul Jackson's avatar
      [PATCH] cpuset: memory_spread_slab drop useless PF_SPREAD_PAGE check · b2455396
      Paul Jackson authored
      The hook in the slab cache allocation path to handle cpuset memory
      spreading for tasks in cpusets with 'memory_spread_slab' enabled has a
      modest performance bug.  The hook calls into the memory spreading handler
      alternate_node_alloc() if either of 'memory_spread_slab' or
      'memory_spread_page' is enabled, even though the handler does nothing
      (albeit harmlessly) for the page case
      
      Fix - drop PF_SPREAD_PAGE from the set of flag bits that are used to
      trigger a call to alternate_node_alloc().
      
      The page case is handled by separate hooks -- see the calls conditioned on
      cpuset_do_page_mem_spread() in mm/filemap.c
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b2455396
    • Paul Jackson's avatar
      [PATCH] cpuset memory spread slab cache optimizations · c61afb18
      Paul Jackson authored
      The hooks in the slab cache allocator code path for support of NUMA
      mempolicies and cpuset memory spreading are in an important code path.  Many
      systems will use neither feature.
      
      This patch optimizes those hooks down to a single check of some bits in the
      current tasks task_struct flags.  For non NUMA systems, this hook and related
      code is already ifdef'd out.
      
      The optimization is done by using another task flag, set if the task is using
      a non-default NUMA mempolicy.  Taking this flag bit along with the
      PF_SPREAD_PAGE and PF_SPREAD_SLAB flag bits added earlier in this 'cpuset
      memory spreading' patch set, one can check for the combination of any of these
      special case memory placement mechanisms with a single test of the current
      tasks task_struct flags.
      
      This patch also tightens up the code, to save a few bytes of kernel text
      space, and moves some of it out of line.  Due to the nested inlines called
      from multiple places, we were ending up with three copies of this code, which
      once we get off the main code path (for local node allocation) seems a bit
      wasteful of instruction memory.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c61afb18
    • Paul Jackson's avatar
      [PATCH] cpuset memory spread slab cache implementation · 101a5001
      Paul Jackson authored
      Provide the slab cache infrastructure to support cpuset memory spreading.
      
      See the previous patches, cpuset_mem_spread, for an explanation of cpuset
      memory spreading.
      
      This patch provides a slab cache SLAB_MEM_SPREAD flag.  If set in the
      kmem_cache_create() call defining a slab cache, then any task marked with the
      process state flag PF_MEMSPREAD will spread memory page allocations for that
      cache over all the allowed nodes, instead of preferring the local (faulting)
      node.
      
      On systems not configured with CONFIG_NUMA, this results in no change to the
      page allocation code path for slab caches.
      
      On systems with cpusets configured in the kernel, but the "memory_spread"
      cpuset option not enabled for the current tasks cpuset, this adds a call to a
      cpuset routine and failed bit test of the processor state flag PF_SPREAD_SLAB.
      
      For tasks so marked, a second inline test is done for the slab cache flag
      SLAB_MEM_SPREAD, and if that is set and if the allocation is not
      in_interrupt(), this adds a call to to a cpuset routine that computes which of
      the tasks mems_allowed nodes should be preferred for this allocation.
      
      ==> This patch adds another hook into the performance critical
          code path to allocating objects from the slab cache, in the
          ____cache_alloc() chunk, below.  The next patch optimizes this
          hook, reducing the impact of the combined mempolicy plus memory
          spreading hooks on this critical code path to a single check
          against the tasks task_struct flags word.
      
      This patch provides the generic slab flags and logic needed to apply memory
      spreading to a particular slab.
      
      A subsequent patch will mark a few specific slab caches for this placement
      policy.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      101a5001