Commit db73ee0d authored by Michal Hocko's avatar Michal Hocko Committed by Linus Torvalds

mm, vmscan: do not loop on too_many_isolated for ever

Tetsuo Handa has reported[1][2][3] that direct reclaimers might get
stuck in too_many_isolated loop basically for ever because the last few
pages on the LRU lists are isolated by the kswapd which is stuck on fs
locks when doing the pageout or slab reclaim.  This in turn means that
there is nobody to actually trigger the oom killer and the system is
basically unusable.

too_many_isolated has been introduced by commit 35cd7815 ("vmscan:
throttle direct reclaim when too many pages are isolated already") to
prevent from pre-mature oom killer invocations because back then no
reclaim progress could indeed trigger the OOM killer too early.

But since the oom detection rework in commit 0a0337e0 ("mm, oom:
rework oom detection") the allocation/reclaim retry loop considers all
the reclaimable pages and throttles the allocation at that layer so we
can loosen the direct reclaim throttling.

Make shrink_inactive_list loop over too_many_isolated bounded and
returns immediately when the situation hasn't resolved after the first
sleep.

Replace congestion_wait by a simple schedule_timeout_interruptible
because we are not really waiting on the IO congestion in this path.

Please note that this patch can theoretically cause the OOM killer to
trigger earlier while there are many pages isolated for the reclaim
which makes progress only very slowly.  This would be obvious from the
oom report as the number of isolated pages are printed there.  If we
ever hit this should_reclaim_retry should consider those numbers in the
evaluation in one way or another.

[1] http://lkml.kernel.org/r/201602092349.ACG81273.OSVtMJQHLOFOFF@I-love.SAKURA.ne.jp
[2] http://lkml.kernel.org/r/201702212335.DJB30777.JOFMHSFtVLQOOF@I-love.SAKURA.ne.jp
[3] http://lkml.kernel.org/r/201706300914.CEH95859.FMQOLVFHJFtOOS@I-love.SAKURA.ne.jp

[mhocko@suse.com: switch to uninterruptible sleep]
  Link: http://lkml.kernel.org/r/20170724065048.GB25221@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170710074842.23175-1-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: default avatarMel Gorman <mgorman@suse.de>
Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
Acked-by: default avatarRik van Riel <riel@redhat.com>
Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 77ff4657
......@@ -1743,9 +1743,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
int file = is_file_lru(lru);
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
bool stalled = false;
while (unlikely(too_many_isolated(pgdat, file, sc))) {
congestion_wait(BLK_RW_ASYNC, HZ/10);
if (stalled)
return 0;
/* wait a bit for the reclaimer. */
msleep(100);
stalled = true;
/* We are about to die and free our memory. Return now. */
if (fatal_signal_pending(current))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment