Skip to content
  • David Hildenbrand's avatar
    mm: migrate: don't rely on __PageMovable() of newpage after unlocking it · 57d81386
    David Hildenbrand authored
    commit e0a352fabce61f730341d119fbedf71ffdb8663f upstream.
    
    We had a race in the old balloon compaction code before b1123ea6
    ("mm: balloon: use general non-lru movable page feature") refactored it
    that became visible after backporting 195a8c43 ("virtio-balloon:
    deflate via a page list") without the refactoring.
    
    The bug existed from commit d6d86c0a ("mm/balloon_compaction:
    redesign ballooned pages management") till b1123ea6 ("mm: balloon:
    use general non-lru movable page feature").  d6d86c0a
    ("mm/balloon_compaction: redesign ballooned pages management") was
    backported to 3.12, so the broken kernels are stable kernels [3.12 -
    4.7].
    
    There was a subtle race between dropping the page lock of the newpage in
    __unmap_and_move() and checking for __is_movable_balloon_page(newpage).
    
    Just after dropping this page lock, virtio-balloon could go ahead and
    deflate the newpage, effectively dequeueing it and clearing PageBalloon,
    in turn making __is_movable_balloon_page(newpage) fail.
    
    This resulted in dropping the reference of the newpage via
    putback_lru_page(newpage) instead of put_page(newpage), leading to
    page->lru getting modified and a !LRU page ending up in the LRU lists.
    With 195a8c43 ("virtio-balloon: deflate via a page list")
    backported, one would suddenly get corrupted lists in
    release_pages_balloon():
    
    - WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
    - list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
    
    Nowadays this race is no longer possible, but it is hidden behind very
    ugly handling of __ClearPageMovable() and __PageMovable().
    
    __ClearPageMovable() will not make __PageMovable() fail, only
    PageMovable().  So the new check (__PageMovable(newpage)) will still
    hold even after newpage was dequeued by virtio-balloon.
    
    If anybody would ever change that special handling, the BUG would be
    introduced again.  So instead, make it explicit and use the information
    of the original isolated page before migration.
    
    This patch can be backported fairly easy to stable kernels (in contrast
    to the refactoring).
    
    Link: http://lkml.kernel.org/r/20190129233217.10747-1-david@redhat.com
    Fixes: d6d86c0a
    
     ("mm/balloon_compaction: redesign ballooned pages management")
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Reported-by: default avatarVratislav Bendel <vbendel@redhat.com>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarRafael Aquini <aquini@redhat.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Dominik Brodowski <linux@dominikbrodowski.net>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Vratislav Bendel <vbendel@redhat.com>
    Cc: Rafael Aquini <aquini@redhat.com>
    Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: <stable@vger.kernel.org>	[3.12 - 4.7]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    57d81386