• Toshi Kani's avatar
    mm/vmalloc: add interfaces to free unmapped page table · 9c7f7bdb
    Toshi Kani authored
    commit b6bdb7517c3d3f41f20e5c2948d6bc3f8897394e upstream.
    
    On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
    create pud/pmd mappings.  A kernel panic was observed on arm64 systems
    with Cortex-A75 in the following steps as described by Hanjun Guo.
    
     1. ioremap a 4K size, valid page table will build,
     2. iounmap it, pte0 will set to 0;
     3. ioremap the same address with 2M size, pgd/pmd is unchanged,
        then set the a new value for pmd;
     4. pte0 is leaked;
     5. CPU may meet exception because the old pmd is still in TLB,
        which will lead to kernel panic.
    
    This panic is not reproducible on x86.  INVLPG, called from iounmap,
    purges all levels of entries associated with purged address on x86.  x86
    still has memory leak.
    
    The patch changes the ioremap path to free unmapped page table(s) since
    doing so in the unmap path has the following issues:
    
     - The iounmap() path is shared with vunmap(). Since vmap() only
       supports pte mappings, making vunmap() to free a pte page is an
       overhead for regular vmap users as they do not need a pte page freed
       up.
    
     - Checking if all entries in a pte page are cleared in the unmap path
       is racy, and serializing this check is expensive.
    
     - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
       Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
       purge.
    
    Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
    clear a given pud/pmd entry and free up a page for the lower level
    entries.
    
    This patch implements their stub functions on x86 and arm64, which work
    as workaround.
    
    [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
    Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
    Fixes: e61ce6ad ("mm: change ioremap to set up huge I/O mappings")
    Reported-by: default avatarLei Li <lious.lilei@hisilicon.com>
    Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: Hanjun Guo <guohanjun@huawei.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Chintan Pandya <cpandya@codeaurora.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    9c7f7bdb
pgtable.c 15.7 KB