• Mel Gorman's avatar
    mm, page_alloc: inline the fast path of the zonelist iterator · 682a3385
    Mel Gorman authored
    The page allocator iterates through a zonelist for zones that match the
    addressing limitations and nodemask of the caller but many allocations
    will not be restricted.  Despite this, there is always functional call
    overhead which builds up.
    
    This patch inlines the optimistic basic case and only calls the iterator
    function for the complex case.  A hindrance was the fact that
    cpuset_current_mems_allowed is used in the fastpath as the allowed
    nodemask even though all nodes are allowed on most systems.  The patch
    handles this by only considering cpuset_current_mems_allowed if a cpuset
    exists.  As well as being faster in the fast-path, this removes some
    junk in the slowpath.
    
    The performance difference on a page allocator microbenchmark is;
    
                                                 4.6.0-rc2                  4.6.0-rc2
                                          statinline-v1r20              optiter-v1r20
      Min      alloc-odr0-1               412.00 (  0.00%)           382.00 (  7.28%)
      Min      alloc-odr0-2               301.00 (  0.00%)           282.00 (  6.31%)
      Min      alloc-odr0-4               247.00 (  0.00%)           233.00 (  5.67%)
      Min      alloc-odr0-8               215.00 (  0.00%)           203.00 (  5.58%)
      Min      alloc-odr0-16              199.00 (  0.00%)           188.00 (  5.53%)
      Min      alloc-odr0-32              191.00 (  0.00%)           182.00 (  4.71%)
      Min      alloc-odr0-64              187.00 (  0.00%)           177.00 (  5.35%)
      Min      alloc-odr0-128             185.00 (  0.00%)           175.00 (  5.41%)
      Min      alloc-odr0-256             193.00 (  0.00%)           184.00 (  4.66%)
      Min      alloc-odr0-512             207.00 (  0.00%)           197.00 (  4.83%)
      Min      alloc-odr0-1024            213.00 (  0.00%)           203.00 (  4.69%)
      Min      alloc-odr0-2048            220.00 (  0.00%)           209.00 (  5.00%)
      Min      alloc-odr0-4096            226.00 (  0.00%)           214.00 (  5.31%)
      Min      alloc-odr0-8192            229.00 (  0.00%)           218.00 (  4.80%)
      Min      alloc-odr0-16384           229.00 (  0.00%)           219.00 (  4.37%)
    
    perf indicated that next_zones_zonelist disappeared in the profile and
    __next_zones_zonelist did not appear.  This is expected as the
    micro-benchmark would hit the inlined fast-path every time.
    Signed-off-by: 's avatarMel Gorman <mgorman@techsingularity.net>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Jesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    682a3385