Skip to content
  • KAMEZAWA Hiroyuki's avatar
    memcg: use new logic for page stat accounting · 89c06bd5
    KAMEZAWA Hiroyuki authored
    
    
    Now, page-stat-per-memcg is recorded into per page_cgroup flag by
    duplicating page's status into the flag.  The reason is that memcg has a
    feature to move a page from a group to another group and we have race
    between "move" and "page stat accounting",
    
    Under current logic, assume CPU-A and CPU-B.  CPU-A does "move" and CPU-B
    does "page stat accounting".
    
    When CPU-A goes 1st,
    
                CPU-A                           CPU-B
                                        update "struct page" info.
        move_lock_mem_cgroup(memcg)
        see pc->flags
        copy page stat to new group
        overwrite pc->mem_cgroup.
        move_unlock_mem_cgroup(memcg)
                                        move_lock_mem_cgroup(mem)
                                        set pc->flags
                                        update page stat accounting
                                        move_unlock_mem_cgroup(mem)
    
    stat accounting is guarded by move_lock_mem_cgroup() and "move" logic
    (CPU-A) doesn't see changes in "struct page" information.
    
    But it's costly to have the same information both in 'struct page' and
    'struct page_cgroup'.  And, there is a potential problem.
    
    For example, assume we have PG_dirty accounting in memcg.
    PG_..is a flag for struct page.
    PCG_ is a flag for struct page_cgroup.
    (This is just an example. The same problem can be found in any
     kind of page stat accounting.)
    
    	  CPU-A                               CPU-B
          TestSet PG_dirty
          (delay)                        TestClear PG_dirty
                                         if (TestClear(PCG_dirty))
                                              memcg->nr_dirty--
          if (TestSet(PCG_dirty))
              memcg->nr_dirty++
    
    Here, memcg->nr_dirty = +1, this is wrong.  This race was reported by Greg
    Thelen <gthelen@google.com>.  Now, only FILE_MAPPED is supported but
    fortunately, it's serialized by page table lock and this is not real bug,
    _now_,
    
    If this potential problem is caused by having duplicated information in
    struct page and struct page_cgroup, we may be able to fix this by using
    original 'struct page' information.  But we'll have a problem in "move
    account"
    
    Assume we use only PG_dirty.
    
             CPU-A                   CPU-B
        TestSet PG_dirty
        (delay)                    move_lock_mem_cgroup()
                                   if (PageDirty(page))
                                          new_memcg->nr_dirty++
                                   pc->mem_cgroup = new_memcg;
                                   move_unlock_mem_cgroup()
        move_lock_mem_cgroup()
        memcg = pc->mem_cgroup
        new_memcg->nr_dirty++
    
    accounting information may be double-counted.  This was original reason to
    have PCG_xxx flags but it seems PCG_xxx has another problem.
    
    I think we need a bigger lock as
    
         move_lock_mem_cgroup(page)
         TestSetPageDirty(page)
         update page stats (without any checks)
         move_unlock_mem_cgroup(page)
    
    This fixes both of problems and we don't have to duplicate page flag into
    page_cgroup.  Please note: move_lock_mem_cgroup() is held only when there
    are possibility of "account move" under the system.  So, in most path,
    status update will go without atomic locks.
    
    This patch introduces mem_cgroup_begin_update_page_stat() and
    mem_cgroup_end_update_page_stat() both should be called at modifying
    'struct page' information if memcg takes care of it.  as
    
         mem_cgroup_begin_update_page_stat()
         modify page information
         mem_cgroup_update_page_stat()
         => never check any 'struct page' info, just update counters.
         mem_cgroup_end_update_page_stat().
    
    This patch is slow because we need to call begin_update_page_stat()/
    end_update_page_stat() regardless of accounted will be changed or not.  A
    following patch adds an easy optimization and reduces the cost.
    
    [akpm@linux-foundation.org: s/lock/locked/]
    [hughd@google.com: fix deadlock by avoiding stat lock when anon]
    Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Greg Thelen <gthelen@google.com>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Ying Han <yinghan@google.com>
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    89c06bd5