Skip to content
  • Joonsoo Kim's avatar
    mm/slab: lockless decision to grow cache · 801faf0d
    Joonsoo Kim authored
    
    
    To check whether free objects exist or not precisely, we need to grab a
    lock.  But, accuracy isn't that important because race window would be
    even small and if there is too much free object, cache reaper would reap
    it.  So, this patch makes the check for free object exisistence not to
    hold a lock.  This will reduce lock contention in heavily allocation
    case.
    
    Note that until now, n->shared can be freed during the processing by
    writing slabinfo, but, with some trick in this patch, we can access it
    freely within interrupt disabled period.
    
    Below is the result of concurrent allocation/free in slab allocation
    benchmark made by Christoph a long time ago.  I make the output simpler.
    The number shows cycle count during alloc/free respectively so less is
    better.
    
      * Before
      Kmalloc N*alloc N*free(32): Average=248/966
      Kmalloc N*alloc N*free(64): Average=261/949
      Kmalloc N*alloc N*free(128): Average=314/1016
      Kmalloc N*alloc N*free(256): Average=741/1061
      Kmalloc N*alloc N*free(512): Average=1246/1152
      Kmalloc N*alloc N*free(1024): Average=2437/1259
      Kmalloc N*alloc N*free(2048): Average=4980/1800
      Kmalloc N*alloc N*free(4096): Average=9000/2078
    
      * After
      Kmalloc N*alloc N*free(32): Average=344/792
      Kmalloc N*alloc N*free(64): Average=347/882
      Kmalloc N*alloc N*free(128): Average=390/959
      Kmalloc N*alloc N*free(256): Average=393/1067
      Kmalloc N*alloc N*free(512): Average=683/1229
      Kmalloc N*alloc N*free(1024): Average=1295/1325
      Kmalloc N*alloc N*free(2048): Average=2513/1664
      Kmalloc N*alloc N*free(4096): Average=4742/2172
    
    It shows that allocation performance decreases for the object size up to
    128 and it may be due to extra checks in cache_alloc_refill().  But,
    with considering improvement of free performance, net result looks the
    same.  Result for other size class looks very promising, roughly, 50%
    performance improvement.
    
    Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Jesper Dangaard Brouer <brouer@redhat.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    801faf0d