Skip to content
  • David Rientjes's avatar
    mm, vmpressure: pass-through notification support · b6bb9811
    David Rientjes authored
    By default, vmpressure events are not pass-through, i.e.  they propagate
    up through the memcg hierarchy until an event notifier is found for any
    threshold level.
    
    This presents a difficulty when a thread waiting on a read(2) for a
    vmpressure event cannot distinguish between local memory pressure and
    memory pressure in a descendant memcg, especially when that thread may
    not control the memcg hierarchy.
    
    Consider a user-controlled child memcg with a smaller limit than a
    top-level memcg controlled by the "Activity Manager" specified in
    Documentation/cgroup-v1/memory.txt.  It may register for memory pressure
    notification for descendant memcgs to make a policy decision: oom kill a
    low priority job, increase the limit, decrease other limits, etc.  If it
    registers for memory pressure notification on the top-level memcg, it
    currently cannot distinguish between memory pressure in its own memcg or
    a descendant memcg, which is user-controlled.
    
    Conversely, if a user registers for memory pressure notification on
    their own descendant memcg, the Activity Manager does not receive any
    pressure notification for that child memcg hierarchy.  Vmpressure events
    are not received for ancestor memcgs if the memcg experiencing pressure
    have notifiers registered, perhaps outside the knowledge of the thread
    waiting on read(2) at the top level.
    
    Both of these are consequences of vmpressure notification not being
    pass-through.
    
    This implements a pass-through behavior for vmpressure events.  When
    writing to control.event_control, vmpressure event handlers may
    optionally specify a mode.  There are two new modes:
    
     - "hierarchy": always propagate memory pressure events up the hierarchy
       regardless if descendant memcgs have their own notifiers registered,
       and
    
     - "local": only receive notifications when the memcg for which the
       event is registered experiences memory pressure.
    
    Of course, processes may register for one notification of "low,local",
    for example, and another for "low".
    
    If no mode is specified, the current behavior is maintained for
    backwards compatibility.
    
    See the change to Documentation/cgroup-v1/memory.txt for full
    specification.
    
    [dan.carpenter@oracle.com: free the same pointer we allocated]
      Link: http://lkml.kernel.org/r/20170613191820.GA20003@elgon.mountain
    Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705311421320.8946@chino.kir.corp.google.com
    
    
    Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
    Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Anton Vorontsov <anton@enomsg.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    b6bb9811