Skip to content
  • Michal Hocko's avatar
    mm, vmscan: get rid of throttle_vm_writeout · bf484383
    Michal Hocko authored
    throttle_vm_writeout() was introduced back in 2005 to fix OOMs caused by
    excessive pageout activity during the reclaim.  Too many pages could be
    put under writeback therefore LRUs would be full of unreclaimable pages
    until the IO completes and in turn the OOM killer could be invoked.
    
    There have been some important changes introduced since then in the
    reclaim path though.  Writers are throttled by balance_dirty_pages when
    initiating the buffered IO and later during the memory pressure, the
    direct reclaim is throttled by wait_iff_congested if the node is
    considered congested by dirty pages on LRUs and the underlying bdi is
    congested by the queued IO.  The kswapd is throttled as well if it
    encounters pages marked for immediate reclaim or under writeback which
    signals that that there are too many pages under writeback already.
    Finally should_reclaim_retry does congestion_wait if the reclaim cannot
    make any progress and there are too many dirty/writeback pages.
    
    Another important aspect is that we do not issue any IO from the direct
    reclaim context anymore.  In a heavy parallel load this could queue a
    lot of IO which would be very scattered and thus unefficient which would
    just make the problem worse.
    
    This three mechanisms should throttle and keep the amount of IO in a
    steady state even under heavy IO and memory pressure so yet another
    throttling point doesn't really seem helpful.  Quite contrary, Mikulas
    Patocka has reported that swap backed by dm-crypt doesn't work properly
    because the swapout IO cannot make sufficient progress as the writeout
    path depends on dm_crypt worker which has to allocate memory to perform
    the encryption.  In order to guarantee a forward progress it relies on
    the mempool allocator.  mempool_alloc(), however, prefers to use the
    underlying (usually page) allocator before it grabs objects from the
    pool.  Such an allocation can dive into the memory reclaim and
    consequently to throttle_vm_writeout.  If there are too many dirty or
    pages under writeback it will get throttled even though it is in fact a
    flusher to clear pending pages.
    
      kworker/u4:0    D ffff88003df7f438 10488     6      2	0x00000000
      Workqueue: kcryptd kcryptd_crypt [dm_crypt]
      Call Trace:
        schedule+0x3c/0x90
        schedule_timeout+0x1d8/0x360
        io_schedule_timeout+0xa4/0x110
        congestion_wait+0x86/0x1f0
        throttle_vm_writeout+0x44/0xd0
        shrink_zone_memcg+0x613/0x720
        shrink_zone+0xe0/0x300
        do_try_to_free_pages+0x1ad/0x450
        try_to_free_pages+0xef/0x300
        __alloc_pages_nodemask+0x879/0x1210
        alloc_pages_current+0xa1/0x1f0
        new_slab+0x2d7/0x6a0
        ___slab_alloc+0x3fb/0x5c0
        __slab_alloc+0x51/0x90
        kmem_cache_alloc+0x27b/0x310
        mempool_alloc_slab+0x1d/0x30
        mempool_alloc+0x91/0x230
        bio_alloc_bioset+0xbd/0x260
        kcryptd_crypt+0x114/0x3b0 [dm_crypt]
    
    Let's just drop throttle_vm_writeout altogether.  It is not very much
    helpful anymore.
    
    I have tried to test a potential writeback IO runaway similar to the one
    described in the original patch which has introduced that [1].  Small
    virtual machine (512MB RAM, 4 CPUs, 2G of swap space and disk image on a
    rather slow NFS in a sync mode on the host) with 8 parallel writers each
    writing 1G worth of data.  As soon as the pagecache fills up and the
    direct reclaim hits then I start anon memory consumer in a loop
    (allocating 300M and exiting after populating it) in the background to
    make the memory pressure even stronger as well as to disrupt the steady
    state for the IO.  The direct reclaim is throttled because of the
    congestion as well as kswapd hitting congestion_wait due to nr_immediate
    but throttle_vm_writeout doesn't ever trigger the sleep throughout the
    test.  Dirty+writeback are close to nr_dirty_threshold with some
    fluctuations caused by the anon consumer.
    
    [1] https://www2.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm3/broken-out/vm-pageout-throttling.patch
    Link: http://lkml.kernel.org/r/1471171473-21418-1-git-send-email-mhocko@kernel.org
    
    
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: NeilBrown <neilb@suse.com>
    Cc: Ondrej Kozina <okozina@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    bf484383