• Shakeel Butt's avatar
    mm/vmscan.c: prevent useless kswapd loops · 584810d3
    Shakeel Butt authored
    commit dffcac2cb88e4ec5906235d64a83d802580b119e upstream.
    
    In production we have noticed hard lockups on large machines running
    large jobs due to kswaps hoarding lru lock within isolate_lru_pages when
    sc->reclaim_idx is 0 which is a small zone.  The lru was couple hundred
    GiBs and the condition (page_zonenum(page) > sc->reclaim_idx) in
    isolate_lru_pages() was basically skipping GiBs of pages while holding
    the LRU spinlock with interrupt disabled.
    
    On further inspection, it seems like there are two issues:
    
    (1) If kswapd on the return from balance_pgdat() could not sleep (i.e.
        node is still unbalanced), the classzone_idx is unintentionally set
        to 0 and the whole reclaim cycle of kswapd will try to reclaim only
        the lowest and smallest zone while traversing the whole memory.
    
    (2) Fundamentally isolate_lru_pages() is really bad when the
        allocation has woken kswapd for a smaller zone on a very large machine
        running very large jobs.  It can hoard the LRU spinlock while skipping
        over 100s of GiBs of pages.
    
    This patch only fixes (1).  (2) needs a more fundamental solution.  To
    fix (1), in the kswapd context, if pgdat->kswapd_classzone_idx is
    invalid use the classzone_idx of the previous kswapd loop otherwise use
    the one the waker has requested.
    
    Link: http://lkml.kernel.org/r/20190701201847.251028-1-shakeelb@google.com
    Fixes: e716f2eb ("mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx")
    Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
    Reviewed-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
    Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Hillf Danton <hdanton@sina.com>
    Cc: Roman Gushchin <guro@fb.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    584810d3
Name
Last commit
Last update
..
kasan Loading commit data...
Kconfig Loading commit data...
Kconfig.debug Loading commit data...
Makefile Loading commit data...
backing-dev.c Loading commit data...
balloon_compaction.c Loading commit data...
bootmem.c Loading commit data...
cleancache.c Loading commit data...
cma.c Loading commit data...
cma.h Loading commit data...
cma_debug.c Loading commit data...
compaction.c Loading commit data...
debug.c Loading commit data...
debug_page_ref.c Loading commit data...
dmapool.c Loading commit data...
early_ioremap.c Loading commit data...
fadvise.c Loading commit data...
failslab.c Loading commit data...
filemap.c Loading commit data...
frame_vector.c Loading commit data...
frontswap.c Loading commit data...
gup.c Loading commit data...
highmem.c Loading commit data...
hmm.c Loading commit data...
huge_memory.c Loading commit data...
hugetlb.c Loading commit data...
hugetlb_cgroup.c Loading commit data...
hwpoison-inject.c Loading commit data...
init-mm.c Loading commit data...
internal.h Loading commit data...
interval_tree.c Loading commit data...
khugepaged.c Loading commit data...
kmemleak-test.c Loading commit data...
kmemleak.c Loading commit data...
ksm.c Loading commit data...
list_lru.c Loading commit data...
maccess.c Loading commit data...
madvise.c Loading commit data...
memblock.c Loading commit data...
memcontrol.c Loading commit data...
memory-failure.c Loading commit data...
memory.c Loading commit data...
memory_hotplug.c Loading commit data...
mempolicy.c Loading commit data...
mempool.c Loading commit data...
memtest.c Loading commit data...
migrate.c Loading commit data...
mincore.c Loading commit data...
mlock.c Loading commit data...
mm_init.c Loading commit data...
mmap.c Loading commit data...
mmu_context.c Loading commit data...
mmu_notifier.c Loading commit data...
mmzone.c Loading commit data...
mprotect.c Loading commit data...
mremap.c Loading commit data...
msync.c Loading commit data...
nobootmem.c Loading commit data...
nommu.c Loading commit data...
oom_kill.c Loading commit data...
page-writeback.c Loading commit data...
page_alloc.c Loading commit data...
page_counter.c Loading commit data...
page_ext.c Loading commit data...
page_idle.c Loading commit data...
page_io.c Loading commit data...
page_isolation.c Loading commit data...
page_owner.c Loading commit data...
page_poison.c Loading commit data...
page_vma_mapped.c Loading commit data...
pagewalk.c Loading commit data...
percpu-internal.h Loading commit data...
percpu-km.c Loading commit data...
percpu-stats.c Loading commit data...
percpu-vm.c Loading commit data...
percpu.c Loading commit data...
pgtable-generic.c Loading commit data...
process_vm_access.c Loading commit data...
quicklist.c Loading commit data...
readahead.c Loading commit data...
rmap.c Loading commit data...
rodata_test.c Loading commit data...
shmem.c Loading commit data...
slab.c Loading commit data...
slab.h Loading commit data...
slab_common.c Loading commit data...
slob.c Loading commit data...
slub.c Loading commit data...
sparse-vmemmap.c Loading commit data...
sparse.c Loading commit data...
swap.c Loading commit data...
swap_cgroup.c Loading commit data...
swap_slots.c Loading commit data...
swap_state.c Loading commit data...
swapfile.c Loading commit data...
truncate.c Loading commit data...
usercopy.c Loading commit data...
userfaultfd.c Loading commit data...
util.c Loading commit data...
vmacache.c Loading commit data...
vmalloc.c Loading commit data...
vmpressure.c Loading commit data...
vmscan.c Loading commit data...
vmstat.c Loading commit data...
workingset.c Loading commit data...
z3fold.c Loading commit data...
zbud.c Loading commit data...
zpool.c Loading commit data...
zsmalloc.c Loading commit data...
zswap.c Loading commit data...