Skip to content
  • Tetsuo Handa's avatar
    mm: don't warn about allocations which stall for too long · fbb78e97
    Tetsuo Handa authored
    commit 400e2249 upstream.
    
    Commit 63f53dea ("mm: warn about allocations which stall for too
    long") was a great step for reducing possibility of silent hang up
    problem caused by memory allocation stalls.  But this commit reverts it,
    for it is possible to trigger OOM lockup and/or soft lockups when many
    threads concurrently called warn_alloc() (in order to warn about memory
    allocation stalls) due to current implementation of printk(), and it is
    difficult to obtain useful information due to limitation of synchronous
    warning approach.
    
    Current printk() implementation flushes all pending logs using the
    context of a thread which called console_unlock().  printk() should be
    able to flush all pending logs eventually unless somebody continues
    appending to printk() buffer.
    
    Since warn_alloc() started appending to printk() buffer while waiting
    for oom_kill_process() to make forward progress when oom_kill_process()
    is processing pending logs, it became possible for warn_alloc() to force
    oom_kill_process() loop inside printk().  As a result, warn_alloc()
    significantly increased possibility of preventing oom_kill_process()
    from making forward progress.
    
    ---------- Pseudo code start ----------
    Before warn_alloc() was introduced:
    
      retry:
        if (mutex_trylock(&oom_lock)) {
          while (atomic_read(&printk_pending_logs) > 0) {
            atomic_dec(&printk_pending_logs);
            print_one_log();
          }
          // Send SIGKILL here.
          mutex_unlock(&oom_lock)
        }
        goto retry;
    
    After warn_alloc() was introduced:
    
      retry:
        if (mutex_trylock(&oom_lock)) {
          while (atomic_read(&printk_pending_logs) > 0) {
            atomic_dec(&printk_pending_logs);
            print_one_log();
          }
          // Send SIGKILL here.
          mutex_unlock(&oom_lock)
        } else if (waited_for_10seconds()) {
          atomic_inc(&printk_pending_logs);
        }
        goto retry;
    ---------- Pseudo code end ----------
    
    Although waited_for_10seconds() becomes true once per 10 seconds,
    unbounded number of threads can call waited_for_10seconds() at the same
    time.  Also, since threads doing waited_for_10seconds() keep doing
    almost busy loop, the thread doing print_one_log() can use little CPU
    resource.  Therefore, this situation can be simplified like
    
    ---------- Pseudo code start ----------
      retry:
        if (mutex_trylock(&oom_lock)) {
          while (atomic_read(&printk_pending_logs) > 0) {
            atomic_dec(&printk_pending_logs);
            print_one_log();
          }
          // Send SIGKILL here.
          mutex_unlock(&oom_lock)
        } else {
          atomic_inc(&printk_pending_logs);
        }
        goto retry;
    ---------- Pseudo code end ----------
    
    when printk() is called faster than print_one_log() can process a log.
    
    One of possible mitigation would be to introduce a new lock in order to
    make sure that no other series of printk() (either oom_kill_process() or
    warn_alloc()) can append to printk() buffer when one series of printk()
    (either oom_kill_process() or warn_alloc()) is already in progress.
    
    Such serialization will also help obtaining kernel messages in readable
    form.
    
    ---------- Pseudo code start ----------
      retry:
        if (mutex_trylock(&oom_lock)) {
          mutex_lock(&oom_printk_lock);
          while (atomic_read(&printk_pending_logs) > 0) {
            atomic_dec(&printk_pending_logs);
            print_one_log();
          }
          // Send SIGKILL here.
          mutex_unlock(&oom_printk_lock);
          mutex_unlock(&oom_lock)
        } else {
          if (mutex_trylock(&oom_printk_lock)) {
            atomic_inc(&printk_pending_logs);
            mutex_unlock(&oom_printk_lock);
          }
        }
        goto retry;
    ---------- Pseudo code end ----------
    
    But this commit does not go that direction, for we don't want to
    introduce a new lock dependency, and we unlikely be able to obtain
    useful information even if we serialized oom_kill_process() and
    warn_alloc().
    
    Synchronous approach is prone to unexpected results (e.g.  too late [1],
    too frequent [2], overlooked [3]).  As far as I know, warn_alloc() never
    helped with providing information other than "something is going wrong".
    I want to consider asynchronous approach which can obtain information
    during stalls with possibly relevant threads (e.g.  the owner of
    oom_lock and kswapd-like threads) and serve as a trigger for actions
    (e.g.  turn on/off tracepoints, ask libvirt daemon to take a memory dump
    of stalling KVM guest for diagnostic purpose).
    
    This commit temporarily loses ability to report e.g.  OOM lockup due to
    unable to invoke the OOM killer due to !__GFP_FS allocation request.
    But asynchronous approach will be able to detect such situation and emit
    warning.  Thus, let's remove warn_alloc().
    
    [1] https://bugzilla.kernel.org/show_bug.cgi?id=192981
    [2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@mail.gmail.com
    [3] commit db73ee0d ("mm, vmscan: do not loop on too_many_isolated for ever"))
    
    Link: http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
    
    
    Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Reported-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
    Reported-by: default avataryuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
    Reported-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    
    [Resolved backport conflict due to missing 82251963, a8e99259, 9e80c719 and
     9a67f648
    
     in 4.9 -- all of which modified this hunk being removed.]
    Signed-off-by: default avatarAmit Shah <amit@kernel.org>
    
    Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
    fbb78e97