1. 28 Mar, 2018 1 commit
    • Toshi Kani's avatar
      mm/vmalloc: add interfaces to free unmapped page table · 9c7f7bdb
      Toshi Kani authored
      commit b6bdb7517c3d3f41f20e5c2948d6bc3f8897394e upstream.
      
      On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
      create pud/pmd mappings.  A kernel panic was observed on arm64 systems
      with Cortex-A75 in the following steps as described by Hanjun Guo.
      
       1. ioremap a 4K size, valid page table will build,
       2. iounmap it, pte0 will set to 0;
       3. ioremap the same address with 2M size, pgd/pmd is unchanged,
          then set the a new value for pmd;
       4. pte0 is leaked;
       5. CPU may meet exception because the old pmd is still in TLB,
          which will lead to kernel panic.
      
      This panic is not reproducible on x86.  INVLPG, called from iounmap,
      purges all levels of entries associated with purged address on x86.  x86
      still has memory leak.
      
      The patch changes the ioremap path to free unmapped page table(s) since
      doing so in the unmap path has the following issues:
      
       - The iounmap() path is shared with vunmap(). Since vmap() only
         supports pte mappings, making vunmap() to free a pte page is an
         overhead for regular vmap users as they do not need a pte page freed
         up.
      
       - Checking if all entries in a pte page are cleared in the unmap path
         is racy, and serializing this check is expensive.
      
       - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
         Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
         purge.
      
      Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
      clear a given pud/pmd entry and free up a page for the lower level
      entries.
      
      This patch implements their stub functions on x86 and arm64, which work
      as workaround.
      
      [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
      Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
      Fixes: e61ce6ad ("mm: change ioremap to set up huge I/O mappings")
      Reported-by: default avatarLei Li <lious.lilei@hisilicon.com>
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c7f7bdb
  2. 14 Apr, 2015 3 commits
    • Toshi Kani's avatar
      x86, mm: support huge KVA mappings on x86 · 6b637835
      Toshi Kani authored
      Implement huge KVA mapping interfaces on x86.
      
      On x86, MTRRs can override PAT memory types with a 4KB granularity.  When
      using a huge page, MTRRs can override the memory type of the huge page,
      which may lead a performance penalty.  The processor can also behave in an
      undefined manner if a huge page is mapped to a memory range that MTRRs
      have mapped with multiple different memory types.  Therefore, the mapping
      code falls back to use a smaller page size toward 4KB when a mapping range
      is covered by non-WB type of MTRRs.  The WB type of MTRRs has no affect on
      the PAT memory types.
      
      pud_set_huge() and pmd_set_huge() call mtrr_type_lookup() to see if a
      given range is covered by MTRRs.  MTRR_TYPE_WRBACK indicates that the
      range is either covered by WB or not covered and the MTRR default value is
      set to WB.  0xFF indicates that MTRRs are disabled.
      
      HAVE_ARCH_HUGE_VMAP is selected when X86_64 or X86_32 with X86_PAE is set.
       X86_32 without X86_PAE is not supported since such config can unlikey be
      benefited from this feature, and there was an issue found in testing.
      
      [fengguang.wu@intel.com: ioremap_pud_capable can be static]
      Signed-off-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6b637835
    • Toshi Kani's avatar
      mm: change ioremap to set up huge I/O mappings · e61ce6ad
      Toshi Kani authored
      ioremap_pud_range() and ioremap_pmd_range() are changed to create huge I/O
      mappings when their capability is enabled, and a request meets required
      conditions -- both virtual & physical addresses are aligned by their huge
      page size, and a requested range fufills their huge page size.  When
      pud_set_huge() or pmd_set_huge() returns zero, i.e.  no-operation is
      performed, the code simply falls back to the next level.
      
      The changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP is defined on
      the architecture.
      Signed-off-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e61ce6ad
    • Toshi Kani's avatar
      lib/ioremap.c: add huge I/O map capability interfaces · 0ddab1d2
      Toshi Kani authored
      Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when
      I/O mappings with pud/pmd are enabled on the kernel.
      
      ioremap_huge_init() calls arch_ioremap_pud_supported() and
      arch_ioremap_pmd_supported() to initialize the capabilities at boot-time.
      
      A new kernel option "nohugeiomap" is also added, so that user can disable
      the huge I/O map capabilities when necessary.
      Signed-off-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0ddab1d2
  3. 07 Mar, 2012 1 commit
  4. 12 Jan, 2011 1 commit
    • Huang Ying's avatar
      ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support · 81e88fdc
      Huang Ying authored
      Generic Hardware Error Source provides a way to report platform
      hardware errors (such as that from chipset). It works in so called
      "Firmware First" mode, that is, hardware errors are reported to
      firmware firstly, then reported to Linux by firmware. This way, some
      non-standard hardware error registers or non-standard hardware link
      can be checked by firmware to produce more valuable hardware error
      information for Linux.
      
      This patch adds POLL/IRQ/NMI notification types support.
      
      Because the memory area used to transfer hardware error information
      from BIOS to Linux can be determined only in NMI, IRQ or timer
      handler, but general ioremap can not be used in atomic context, so a
      special version of atomic ioremap is implemented for that.
      
      Known issue:
      
      - Error information can not be printed for recoverable errors notified
        via NMI, because printk is not NMI-safe. Will fix this via delay
        printing to IRQ context via irq_work or make printk NMI-safe.
      
      v2:
      
      - adjust printk format per comments.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      81e88fdc
  5. 09 Jul, 2010 1 commit
    • Kenji Kaneshige's avatar
      x86, ioremap: Fix incorrect physical address handling in PAE mode · ffa71f33
      Kenji Kaneshige authored
      Current x86 ioremap() doesn't handle physical address higher than
      32-bit properly in X86_32 PAE mode. When physical address higher than
      32-bit is passed to ioremap(), higher 32-bits in physical address is
      cleared wrongly. Due to this bug, ioremap() can map wrong address to
      linear address space.
      
      In my case, 64-bit MMIO region was assigned to a PCI device (ioat
      device) on my system. Because of the ioremap()'s bug, wrong physical
      address (instead of MMIO region) was mapped to linear address space.
      Because of this, loading ioatdma driver caused unexpected behavior
      (kernel panic, kernel hangup, ...).
      Signed-off-by: default avatarKenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      LKML-Reference: <4C1AE680.7090408@jp.fujitsu.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      ffa71f33
  6. 17 Oct, 2007 1 commit
  7. 21 May, 2007 1 commit
    • Alexey Dobriyan's avatar
      Detach sched.h from mm.h · e8edc6e0
      Alexey Dobriyan authored
      First thing mm.h does is including sched.h solely for can_do_mlock() inline
      function which has "current" dereference inside. By dealing with can_do_mlock()
      mm.h can be detached from sched.h which is good. See below, why.
      
      This patch
      a) removes unconditional inclusion of sched.h from mm.h
      b) makes can_do_mlock() normal function in mm/mlock.c
      c) exports can_do_mlock() to not break compilation
      d) adds sched.h inclusions back to files that were getting it indirectly.
      e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
         getting them indirectly
      
      Net result is:
      a) mm.h users would get less code to open, read, preprocess, parse, ... if
         they don't need sched.h
      b) sched.h stops being dependency for significant number of files:
         on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
         after patch it's only 3744 (-8.3%).
      
      Cross-compile tested on
      
      	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
      	alpha alpha-up
      	arm
      	i386 i386-up i386-defconfig i386-allnoconfig
      	ia64 ia64-up
      	m68k
      	mips
      	parisc parisc-up
      	powerpc powerpc-up
      	s390 s390-up
      	sparc sparc-up
      	sparc64 sparc64-up
      	um-x86_64
      	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
      
      as well as my two usual configs.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e8edc6e0
  8. 13 Dec, 2006 1 commit
  9. 01 Oct, 2006 2 commits
    • Haavard Skinnemoen's avatar
      [PATCH] Generic ioremap_page_range: flush_cache_vmap · db71daab
      Haavard Skinnemoen authored
      The existing implementation of ioremap_page_range(), which was taken
      from i386, does this:
      
      	flush_cache_all();
      	/* modify page tables */
      	flush_tlb_all();
      
      I think this is a bit defensive, so this patch changes the generic
      implementation to do:
      
      	/* modify page tables */
      	flush_cache_vmap(start, end);
      
      instead, which is similar to what vmalloc() does. This should still
      be correct because we never modify existing PTEs. According to
      James Bottomley:
      
      The problem the flush_tlb_all() is trying to solve is to avoid stale tlb
      entries in the ioremap area.  We're just being conservative by flushing
      on both map and unmap.  Technically what vmalloc/vfree does (only flush
      the tlb on unmap) is just fine because it means that the only tlb
      entries in the remap area must belong to in-use mappings.
      Signed-off-by: default avatarHaavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: <linux-m32r@ml.linux-m32r.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@parisc-linux.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      db71daab
    • Haavard Skinnemoen's avatar
      [PATCH] Generic ioremap_page_range: implementation · 74588d8b
      Haavard Skinnemoen authored
      This patch adds a generic implementation of ioremap_page_range() in
      lib/ioremap.c based on the i386 implementation. It differs from the
      i386 version in the following ways:
      
        * The PTE flags are passed as a pgprot_t argument and must be
          determined up front by the arch-specific code. No additional
          PTE flags are added.
        * Uses set_pte_at() instead of set_pte()
      
      [bunk@stusta.de: warning fix]
      ]dhowells@redhat.com: nommu build fix]
      Signed-off-by: default avatarHaavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: <linux-m32r@ml.linux-m32r.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@parisc-linux.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      74588d8b