Skip to content
  • Michal Hocko's avatar
    mm, page_alloc: remove stop_machine from build_all_zonelists · 11cd8638
    Michal Hocko authored
    build_all_zonelists has been (ab)using stop_machine to make sure that
    zonelists do not change while somebody is looking at them.  This is is
    just a gross hack because a) it complicates the context from which we
    can call build_all_zonelists (see 3f906ba2 ("mm/memory-hotplug:
    switch locking to a percpu rwsem")) and b) is is not really necessary
    especially after "mm, page_alloc: simplify zonelist initialization" and
    c) it doesn't really provide the protection it claims (see below).
    
    Updates of the zonelists happen very seldom, basically only when a zone
    becomes populated during memory online or when it loses all the memory
    during offline.  A racing iteration over zonelists could either miss a
    zone or try to work on one zone twice.  Both of these are something we
    can live with occasionally because there will always be at least one
    zone visible so we are not likely to fail allocation too easily for
    example.
    
    Please note that the original stop_machine approach doesn't really
    provide a better exclusion because the iteration might be interrupted
    half way (unless the whole iteration is preempt disabled which is not
    the case in most cases) so the some zones could still be seen twice or a
    zone missed.
    
    I have run the pathological online/offline of the single memblock in the
    movable zone while stressing the same small node with some memory
    pressure.
    
    Node 1, zone      DMA
      pages free     0
            min      0
            low      0
            high     0
            spanned  0
            present  0
            managed  0
            protection: (0, 943, 943, 943)
    Node 1, zone    DMA32
      pages free     227310
            min      8294
            low      10367
            high     12440
            spanned  262112
            present  262112
            managed  241436
            protection: (0, 0, 0, 0)
    Node 1, zone   Normal
      pages free     0
            min      0
            low      0
            high     0
            spanned  0
            present  0
            managed  0
            protection: (0, 0, 0, 1024)
    Node 1, zone  Movable
      pages free     32722
            min      85
            low      117
            high     149
            spanned  32768
            present  32768
            managed  32768
            protection: (0, 0, 0, 0)
    
    root@test1:/sys/devices/system/node/node1# while true
    do
    	echo offline > memory34/state
    	echo online_movable > memory34/state
    done
    
    root@test1:/mnt/data/test/linux-3.7-rc5# numactl --preferred=1 make -j4
    
    and it survived without any unexpected behavior.  While this is not
    really a great testing coverage it should exercise the allocation path
    quite a lot.
    
    Link: http://lkml.kernel.org/r/20170721143915.14161-8-mhocko@kernel.org
    
    
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Joonsoo Kim <js1304@gmail.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Shaohua Li <shaohua.li@intel.com>
    Cc: Toshi Kani <toshi.kani@hpe.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    11cd8638