1. 24 Feb, 2011 1 commit
    • Yinghai Lu's avatar
      bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c · 09325873
      Yinghai Lu authored
      mm/bootmem.c contained code paths for both bootmem and no bootmem
      configurations.  They implement about the same set of APIs in
      different ways and as a result bootmem.c contains massive amount of
      #ifdef CONFIG_NO_BOOTMEM.
      Separate out CONFIG_NO_BOOTMEM code into mm/nobootmem.c.  As the
      common part is relatively small, duplicate them in nobootmem.c instead
      of creating a common file or ifdef'ing in bootmem.c.
      The followings are duplicated.
      * {min|max}_low_pfn, max_pfn, saved_max_pfn
      * free_bootmem_late()
      * ___alloc_bootmem()
      * __alloc_bootmem_low()
      The followings are applicable only to nobootmem and moved verbatim.
      * __free_pages_memory()
      * free_all_memory_core_early()
      The followings are not applicable to nobootmem and omitted in
      * reserve_bootmem_node()
      * reserve_bootmem()
      The rest split function bodies according to CONFIG_NO_BOOTMEM.
      Makefile is updated so that only either bootmem.c or nobootmem.c is
      built according to CONFIG_NO_BOOTMEM.
      This patch doesn't introduce any behavior change.
      -tj: Rewrote commit description.
      Suggested-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
  2. 27 Aug, 2010 3 commits
    • Yinghai Lu's avatar
      x86, memblock: Replace e820_/_early string with memblock_ · a9ce6bc1
      Yinghai Lu authored
      1.include linux/memblock.h directly. so later could reduce e820.h reference.
      2 this patch is done by sed scripts mainly
      -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
    • Yinghai Lu's avatar
      x86: Use memblock to replace early_res · 72d7c3b3
      Yinghai Lu authored
      1. replace find_e820_area with memblock_find_in_range
      2. replace reserve_early with memblock_x86_reserve_range
      3. replace free_early with memblock_x86_free_range.
      4. NO_BOOTMEM will switch to use memblock too.
      5. use _e820, _early wrap in the patch, in following patch, will
         replace them all
      6. because memblock_x86_free_range support partial free, we can remove some special care
      7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
         so adjust some calling later in setup.c::setup_arch()
         -- corruption_check and mptable_update
      -v2: Move reserve_brk() early
          Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
          that could happen We have more then 128 RAM entry in E820 tables, and
          memblock_x86_fill() could use memblock_find_in_range() to find a new place for
          memblock.memory.region array.
          and We don't need to use extend_brk() after fill_memblock_area()
          So move reserve_brk() early before fill_memblock_area().
      -v3: Move find_smp_config early
          To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
          in right place.
      -v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
          memblock.reserved already..
          use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
      -v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
          active_region for 32bit does include high pages
          need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
      -v6: Use current_limit instead
      -v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
      -v8: Set memblock_can_resize early to handle EFI with more RAM entries
      -v9: update after kmemleak changes in mainline
      Suggested-by: default avatarDavid S. Miller <davem@davemloft.net>
      Suggested-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
    • Yinghai Lu's avatar
      bootmem, x86: Add weak version of reserve_bootmem_generic · f88eff74
      Yinghai Lu authored
      It will be used memblock_x86_to_bootmem converting
      It is an wrapper for reserve_bootmem, and x86 64bit is using special one.
      Also clean up that version for x86_64. We don't need to take care of numa
      path for that, bootmem can handle it how
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
  3. 20 Jul, 2010 1 commit
    • Yinghai Lu's avatar
      x86,nobootmem: make alloc_bootmem_node fall back to other node when 32bit numa is used · b8ab9f82
      Yinghai Lu authored
      Borislav Petkov reported his 32bit numa system has problem:
      [    0.000000] Reserving total of 4c00 pages for numa KVA remap
      [    0.000000] kva_start_pfn ~ 32800 max_low_pfn ~ 375fe
      [    0.000000] max_pfn = 238000
      [    0.000000] 8202MB HIGHMEM available.
      [    0.000000] 885MB LOWMEM available.
      [    0.000000]   mapped low ram: 0 - 375fe000
      [    0.000000]   low ram: 0 - 375fe000
      [    0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 1000 1000 => 34e7000
      [    0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 200 40 => 34c9d80
      [    0.000000] alloc (nid=0 100000 - 7ee00000) (1000000 - ffffffffffffffff) 180 40 => 34e6140
      [    0.000000] alloc (nid=1 80000000 - c7e60000) (1000000 - ffffffffffffffff) 240 40 => 80000000
      [    0.000000] BUG: unable to handle kernel paging request at 40000000
      [    0.000000] IP: [<c2c8cff1>] __alloc_memory_core_early+0x147/0x1d6
      [    0.000000] *pdpt = 0000000000000000 *pde = f000ff53f000ff00
      [    0.000000] Call Trace:
      [    0.000000]  [<c2c8b4f8>] ? __alloc_bootmem_node+0x216/0x22f
      [    0.000000]  [<c2c90c9b>] ? sparse_early_usemaps_alloc_node+0x5a/0x10b
      [    0.000000]  [<c2c9149e>] ? sparse_init+0x1dc/0x499
      [    0.000000]  [<c2c79118>] ? paging_init+0x168/0x1df
      [    0.000000]  [<c2c780ff>] ? native_pagetable_setup_start+0xef/0x1bb
      looks like it allocates too much high address for bootmem.
      Try to cut limit with get_max_mapped()
      Reported-by: default avatarBorislav Petkov <borislav.petkov@amd.com>
      Tested-by: default avatarConny Seidel <conny.seidel@amd.com>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: <stable@kernel.org>		[2.6.34.x]
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  4. 01 Apr, 2010 2 commits
    • Yinghai Lu's avatar
      bootmem, x86: Fix 32bit numa system without RAM on node 0 · aa235fc7
      Yinghai Lu authored
      When 32bit numa is used, free_all_bootmem() will still only go over with
      node id 0.
      If node 0 doesn't have RAM installed, the lowest populated node
      becomes low RAM.
      This one fixes BOOTMEM path by iterating over the bdata_list.
      -v3: add more comments, and fix bootmem path too.
      -v4: seperate from one big patch
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4BB416D7.6090203@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
    • Yinghai Lu's avatar
      nobootmem, x86: Fix 32bit numa system without RAM on node 0 · 33799858
      Yinghai Lu authored
      On one system without RAM on node0, got following boot dump with a 32
      bit NUMA kernel:
      early_node_map[4] active PFN ranges
          1: 0x00000010 -> 0x00000099
          1: 0x00000100 -> 0x0007da00
          1: 0x0007e800 -> 0x0007ffa0
          1: 0x0007ffae -> 0x0007ffb0
      Subtract (29 early reservations)
        #000 [0000001000 - 0000002000]
        #001 [0000089000 - 000008f000]
        #002 [0000091000 - 0000093500]
        #027 [007cbfef40 - 007e800000]
        #028 [007e9ca000 - 007ff95000]
      (0 free memory ranges)
      Initializing HighMem for node 0 (00000000:00000000)
      Initializing HighMem for node 1 (00000000:00000000)
      Memory: 0k/2096832k available (6662k kernel code, 2096300k reserved, 4829k data, 484k init, 0k highmem)
      Checking if this processor honours the WP bit even in supervisor mode...Ok.
      swapper: page allocation failure. order:0, mode:0x0
      Pid: 0, comm: swapper Not tainted 2.6.34-rc3-tip-03818-g4b1ea6c-dirty #35
      Call Trace:
       [<4087a5dc>] ? printk+0xf/0x11
       [<40286728>] __alloc_pages_nodemask+0x417/0x487
       [<402a9ce1>] new_slab+0xe2/0x1fe
       [<402aa5b2>] kmem_cache_open+0x185/0x358
       [<402abbc0>] T.954+0x1c/0x60
       [<40d52a29>] kmem_cache_init+0x24/0x113
       [<40d39738>] start_kernel+0x166/0x2e4
       [<40d3940e>] ? unknown_bootoption+0x0/0x18e
       [<40d390ce>] i386_start_kernel+0xce/0xd5
      Node 1 DMA per-cpu:
      CPU    0: hi:    0, btch:   1 usd:   0
      Node 1 Normal per-cpu:
      CPU    0: hi:    0, btch:   1 usd:   0
      active_anon:0 inactive_anon:0 isolated_anon:0
       active_file:0 inactive_file:0 isolated_file:0
       unevictable:0 dirty:0 writeback:0 unstable:0
       free:0 slab_reclaimable:0 slab_unreclaimable:0
       mapped:0 shmem:0 pagetables:0 bounce:0
      When 32bit NUMA is used, free_all_bootmem() will still only go over with
      node id 0.
      If node 0 doesn't have RAM installed, We need to go with node1
      because early_node_map still use 1 for all ranges, and ram from node1
      become low ram.
      Use MAX_NUMNODES like 64-bit NUMA does.
      Note: BOOTMEM path has the same problem.
            this bug exist before We have NO_BOOTMEM support.
      -v3: add more comments, and fix bootmem path too.
      -v4: seperate bootmem path fix
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4BB41689.9090502@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
  5. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      The script does the followings.
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
      The conversion was done in the following steps.
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
      6. percpu.h was updated not to include slab.h.
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
  6. 24 Mar, 2010 1 commit
    • Jiri Kosina's avatar
      x86: Remove excessive early_res debug output · c26f91a3
      Jiri Kosina authored
      Commit 08677214 ("x86: Make 64 bit use early_res instead
      of bootmem  before slab") introduced early_res replacement for
      bootmem, but left code  in __free_pages_memory() which dumps all
      the ranges that are beeing freed,  without any additional
      information, causing some noise in dmesg during  bootup.
      Just remove printing of the ranges, that doesn't provide
      anything useful  anyway.
      While at it, remove other commented-out KERN_DEBUG messages in
      the NO_BOOTMEM code as well.
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Found-OK-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <alpine.LNX.2.00.1003220931360.18642@pobox.suse.cz>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  7. 12 Feb, 2010 1 commit
  8. 15 Dec, 2009 1 commit
  9. 10 Nov, 2009 1 commit
    • FUJITA Tomonori's avatar
      bootmem: Add free_bootmem_late() · 9f993ac3
      FUJITA Tomonori authored
      Add a new function for freeing bootmem after the bootmem
      allocator has been released and the unreserved pages given to
      the page allocator.
      This allows us to reserve bootmem and then release it if we
      later discover it was not needed.
      ( This new API will be used by the swiotlb code to recover
        a significant amount of RAM (64MB). )
      Signed-off-by: default avatarFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Acked-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Cc: chrisw@sous-sol.org
      Cc: dwmw2@infradead.org
      Cc: joerg.roedel@amd.com
      Cc: muli@il.ibm.com
      Cc: hannes@cmpxchg.org
      Cc: tj@kernel.org
      Cc: akpm@linux-foundation.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1257849980-22640-7-git-send-email-fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  10. 27 Aug, 2009 1 commit
  11. 08 Jul, 2009 1 commit
  12. 19 Jun, 2009 1 commit
  13. 11 Jun, 2009 2 commits
  14. 01 Mar, 2009 1 commit
    • Tejun Heo's avatar
      bootmem, x86: further fixes for arch-specific bootmem wrapping · d0c4f570
      Tejun Heo authored
      Impact: fix new breakages introduced by previous fix
      Commit c1329375 tried to clean up
      bootmem arch wrapper but it wasn't quite correct.  Before the commit,
      the followings were broken.
      * Low level interface functions prefixed with __ ignored arch
      * reserve_bootmem(...) can't be mapped into
        reserve_bootmem_node(NODE_DATA(0)->bdata, ...) because the node is
        not preference here.  The region specified MUST fall into the
        specified region; otherwise, it will panic.
      After the commit,
      * If allocation fails for the arch preferred node, it should fallback
        to whatever is available.  Instead, it simply failed allocation.
      There are too many internal details to allow generic wrapping and
      still keep things simple for archs.  Plus, all that arch wants is a
      way to prefer certain node over another.
      This patch drops the generic wrapping around alloc_bootmem_core() and
      add alloc_bootmem_core() instead.  If necessary, arch can define
      bootmem_arch_referred_node() macro or function which takes all
      allocation information and returns the preferred node.  bootmem
      generic code will always try the preferred node first and then
      fallback to other nodes as usual.
      Breakages noted and changes reviewed by Johannes Weiner.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
  15. 24 Feb, 2009 1 commit
    • Tejun Heo's avatar
      bootmem: clean up arch-specific bootmem wrapping · c1329375
      Tejun Heo authored
      Impact: cleaner and consistent bootmem wrapping
      By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
      arch-specific wrappers for bootmem allocation.  However, this is done
      a bit strangely in that only the high level convenience macros can be
      changed while lower level, but still exported, interface functions
      can't be wrapped.  This not only is messy but also leads to strange
      situation where alloc_bootmem() does what the arch wants it to do but
      the equivalent __alloc_bootmem() call doesn't although they should be
      able to be used interchangeably.
      This patch updates bootmem such that archs can override / wrap the
      backend function - alloc_bootmem_core() instead of the highlevel
      interface functions to allow simpler and consistent wrapping.  Also,
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@saeurebad.de>
  16. 06 Jan, 2009 1 commit
  17. 16 Oct, 2008 1 commit
  18. 20 Aug, 2008 1 commit
  19. 15 Aug, 2008 1 commit
    • Mikulas Patocka's avatar
      bootmem allocator: alloc_bootmem_core(): page-align the end offset · 627240aa
      Mikulas Patocka authored
      This is the minimal sequence that jams the allocator:
      void *p, *q, *r;
      p = alloc_bootmem(PAGE_SIZE);
      q = alloc_bootmem(64);
      free_bootmem(p, PAGE_SIZE);
      p = alloc_bootmem(PAGE_SIZE);
      r = alloc_bootmem(64);
      after this sequence (assuming that the allocator was empty or page-aligned
      before), pointer "q" will be equal to pointer "r".
      What's hapenning inside the allocator:
      p = alloc_bootmem(PAGE_SIZE);
      in allocator: last_end_off == PAGE_SIZE, bitmap contains bits 10000...
      q = alloc_bootmem(64);
      in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 11000...
      free_bootmem(p, PAGE_SIZE);
      in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 01000...
      p = alloc_bootmem(PAGE_SIZE);
      in allocator: last_end_off == PAGE_SIZE, bitmap contains 11000...
      r = alloc_bootmem(64);
      and now:
      it finds bit "2", as a place where to allocate (sidx)
      it hits the condition
      if (bdata->last_end_off && PFN_DOWN(bdata->last_end_off) + 1 == sidx))
      start_off = ALIGN(bdata->last_end_off, align);
      -you can see that the condition is true, so it assigns start_off =
      ALIGN(bdata->last_end_off, align); (that is PAGE_SIZE) and allocates
      over already allocated block.
      With the patch it tries to continue at the end of previous allocation only
      if the previous allocation ended in the middle of the page.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@saeurebad.de>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  20. 24 Jul, 2008 17 commits