1. 28 Jul, 2010 7 commits
  2. 27 Jul, 2010 1 commit
  3. 22 Jul, 2010 5 commits
  4. 19 Jul, 2010 1 commit
  5. 05 Jul, 2010 1 commit
  6. 01 Jul, 2010 1 commit
    • Peter Zijlstra's avatar
      sched: Cure nr_iowait_cpu() users · 8c215bd3
      Peter Zijlstra authored
      Commit 0224cf4c (sched: Intoduce get_cpu_iowait_time_us())
      broke things by not making sure preemption was indeed disabled
      by the callers of nr_iowait_cpu() which took the iowait value of
      the current cpu.
      This resulted in a heap of preempt warnings. Cure this by making
      nr_iowait_cpu() take a cpu number and fix up the callers to pass
      in the right number.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: linux-pm@lists.linux-foundation.org
      LKML-Reference: <1277968037.1868.120.camel@laptop>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  7. 30 Jun, 2010 1 commit
    • Michal Hocko's avatar
      futex: futex_find_get_task remove credentails check · 7a0ea09a
      Michal Hocko authored
      futex_find_get_task is currently used (through lookup_pi_state) from two
      contexts, futex_requeue and futex_lock_pi_atomic.  None of the paths
      looks it needs the credentials check, though.  Different (e)uids
      shouldn't matter at all because the only thing that is important for
      shared futex is the accessibility of the shared memory.
      The credentail check results in glibc assert failure or process hang (if
      glibc is compiled without assert support) for shared robust pthread
      mutex with priority inheritance if a process tries to lock already held
      lock owned by a process with a different euid:
      pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed.
      The problem is that futex_lock_pi_atomic which is called when we try to
      lock already held lock checks the current holder (tid is stored in the
      futex value) to get the PI state.  It uses lookup_pi_state which in turn
      gets task struct from futex_find_get_task.  ESRCH is returned either
      when the task is not found or if credentials check fails.
      futex_lock_pi_atomic simply returns if it gets ESRCH.  glibc code,
      however, doesn't expect that robust lock returns with ESRCH because it
      should get either success or owner died.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarDarren Hart <dvhltc@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  8. 29 Jun, 2010 1 commit
  9. 25 Jun, 2010 1 commit
  10. 23 Jun, 2010 2 commits
    • Peter Zijlstra's avatar
      sched: silence PROVE_RCU in sched_fork() · 86951599
      Peter Zijlstra authored
      Because cgroup_fork() is ran before sched_fork() [ from copy_process() ]
      and the child's pid is not yet visible the child is pinned to its
      cgroup. Therefore we can silence this warning.
      A nicer solution would be moving cgroup_fork() to right after
      dup_task_struct() and exclude PF_STARTING from task_subsys_state().
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
    • Daniel J Blueman's avatar
      rcu: apply RCU protection to wake_affine() · f3b577de
      Daniel J Blueman authored
      The task_group() function returns a pointer that must be protected
      by either RCU, the ->alloc_lock, or the cgroup lock (see the
      rcu_dereference_check() in task_subsys_state(), which is invoked by
      task_group()).  The wake_affine() function currently does none of these,
      which means that a concurrent update would be within its rights to free
      the structure returned by task_group().  Because wake_affine() uses this
      structure only to compute load-balancing heuristics, there is no reason
      to acquire either of the two locks.
      Therefore, this commit introduces an RCU read-side critical section that
      starts before the first call to task_group() and ends after the last use
      of the "tg" pointer returned from task_group().  Thanks to Li Zefan for
      pointing out the need to extend the RCU read-side critical section from
      that proposed by the original patch.
      Signed-off-by: default avatarDaniel J Blueman <daniel.blueman@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
  11. 18 Jun, 2010 1 commit
    • Alex,Shi's avatar
      sched: Fix over-scheduling bug · 3c93717c
      Alex,Shi authored
      Commit e7097159 ("sched: Optimize unused cgroup configuration") introduced
      an imbalanced scheduling bug.
      If we do not use CGROUP, function update_h_load won't update h_load. When the
      system has a large number of tasks far more than logical CPU number, the
      incorrect cfs_rq[cpu]->h_load value will cause load_balance() to pull too
      many tasks to the local CPU from the busiest CPU. So the busiest CPU keeps
      going in a round robin. That will hurt performance.
      The issue was found originally by a scientific calculation workload that
      developed by Yanmin. With that commit, the workload performance drops
      about 40%.
       CPU  before    after
       00   : 2       : 7
       01   : 1       : 7
       02   : 11      : 6
       03   : 12      : 7
       04   : 6       : 6
       05   : 11      : 7
       06   : 10      : 6
       07   : 12      : 7
       08   : 11      : 6
       09   : 12      : 6
       10   : 1       : 6
       11   : 1       : 6
       12   : 6       : 6
       13   : 2       : 6
       14   : 2       : 6
       15   : 1       : 6
      Reviewed-by: default avatarYanmin zhang <yanmin.zhang@intel.com>
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1276754893.9452.5442.camel@debian>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  12. 17 Jun, 2010 1 commit
  13. 11 Jun, 2010 1 commit
    • Steven Rostedt's avatar
      perf/tracing: Fix regression of perf losing kprobe events · a8fb2608
      Steven Rostedt authored
      With the addition of the code to shrink the kernel tracepoint
      infrastructure, we lost kprobes being traced by perf. The reason
      is that I tested if the "tp_event->class->perf_probe" existed before
      enabling it. This prevents "ftrace only" events (like the function
      trace events) from being enabled by perf.
      Unfortunately, kprobe events do not use perf_probe. This causes
      kprobes to be missed by perf. To fix this, we add the test to
      see if "tp_event->class->reg" exists as well as perf_probe.
      Normal trace events have only "perf_probe" but no "reg" function,
      and kprobes and syscalls have the "reg" but no "perf_probe".
      The ftrace unique events do not have either, so this is a valid
      test. If a kprobe or syscall is not to be probed by perf, the
      "reg" function is called anyway, and will return a failure and
      prevent perf from probing it.
      Reported-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Tested-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  14. 10 Jun, 2010 1 commit
  15. 09 Jun, 2010 1 commit
    • Thomas Gleixner's avatar
      genirq: Deal with desc->set_type() changing desc->chip · 46732475
      Thomas Gleixner authored
      The set_type() function can change the chip implementation when the
      trigger mode changes. That might result in using an non-initialized
      irq chip when called from __setup_irq() or when called via
      set_irq_type() on an already enabled irq. 
      The set_irq_type() function should not be called on an enabled irq,
      but because we forgot to put a check into it, we have a bunch of users
      which grew the habit of doing that and it never blew up as the
      function is serialized via desc->lock against all users of desc->chip
      and they never hit the non-initialized irq chip issue.
      The easy fix for the __setup_irq() issue would be to move the
      irq_chip_set_defaults(desc->chip) call after the trigger setting to
      make sure that a chip change is covered.
      But as we have already users, which do the type setting after
      request_irq(), the safe fix for now is to call irq_chip_set_defaults()
      from __irq_set_trigger() when desc->set_type() changed the irq chip.
      It needs a deeper analysis whether we should refuse to change the chip
      on an already enabled irq, but that'd be a large scale change to fix
      all the existing users. So that's neither stable nor 2.6.35 material.
      Reported-by: default avatarEsben Haabendal <eha@doredevelopment.dk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: linuxppc-dev <linuxppc-dev@ozlabs.org>
      Cc: stable@kernel.org
  16. 08 Jun, 2010 2 commits
    • Peter Zijlstra's avatar
      sched: Fix PROVE_RCU vs cpu_cgroup · dc61b1d6
      Peter Zijlstra authored
      PROVE_RCU has a few issues with the cpu_cgroup because the scheduler
      typically holds rq->lock around the css rcu derefs but the generic
      cgroup code doesn't (and can't) know about that lock.
      Provide means to add extra checks to the css dereference and use that
      in the scheduler to annotate its users.
      The addition of rq->lock to these checks is correct because the
      cgroup_subsys::attach() method takes the rq->lock for each task it
      moves, therefore by holding that lock, we ensure the task is pinned to
      the current cgroup and the RCU derefence is valid.
      That leaves one genuine race in __sched_setscheduler() where we used
      task_group() without holding any of the required locks and thus raced
      with the cgroup code. Solve this by moving the check under the
      appropriate lock.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    • Peter Zijlstra's avatar
      perf: Fix signed comparison in perf_adjust_period() · f6ab91ad
      Peter Zijlstra authored
      Frederic reported that frequency driven swevents didn't work properly
      and even caused a division-by-zero error.
      It turns out there are two bugs, the division-by-zero comes from a
      failure to deal with that in perf_calculate_period().
      The other was more interesting and turned out to be a wrong comparison
      in perf_adjust_period(). The comparison was between an s64 and u64 and
      got implicitly converted to an unsigned comparison. The problem is
      that period_left is typically < 0, so it ended up being always true.
      Cure this by making the local period variables s64.
      Reported-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  17. 05 Jun, 2010 8 commits
    • Rusty Russell's avatar
      module: fix bne2 "gave up waiting for init of module libcrc32c" · 9bea7f23
      Rusty Russell authored
      Problem: it's hard to avoid an init routine stumbling over a
      request_module these days.  And it's not clear it's always a bad idea:
      for example, a module like kvm with dynamic dependencies on kvm-intel
      or kvm-amd would be neater if it could simply request_module the right
      In this particular case, it's libcrc32c:
      If another module is waiting inside resolve_symbol() for libcrc32c to
      finish initializing (ie. bne2 depends on libcrc32c) then it does so
      holding the module lock, and our request_module() can't make progress
      until that is released.
      Waiting inside resolve_symbol() without the lock isn't all that hard:
      we just need to pass the -EBUSY up the call chain so we can sleep
      where we don't hold the lock.  Error reporting is a bit trickier: we
      need to copy the name of the unfinished module before releasing the
      Other notes:
      1) This also fixes a theoretical issue where a weak dependency would allow
         symbol version mismatches to be ignored.
      2) We rename use_module to ref_module to make life easier for the only
         external user (the out-of-tree ksplice patches).
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Tim Abbot <tabbott@ksplice.com>
      Tested-by: default avatarBrandon Philips <bphilips@suse.de>
    • Rusty Russell's avatar
      module: verify_export_symbols under the lock · be593f4c
      Rusty Russell authored
      It disabled preempt so it was "safe", but nothing stops another module
      slipping in before this module is added to the global list now we don't
      hold the lock the whole time.
      So we check this just after we check for duplicate modules, and just
      before we put the module in the global list.
      (find_symbol finds symbols in coming and going modules, too).
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
    • Linus Torvalds's avatar
      module: move find_module check to end · 3bafeb62
      Linus Torvalds authored
      I think Rusty may have made the lock a bit _too_ finegrained there, and
      didn't add it to some places that needed it. It looks, for example, like
      PATCH 1/2 actually drops the lock in places where it's needed
      ("find_module()" is documented to need it, but now load_module() didn't
      hold it at all when it did the find_module()).
      Rather than adding a new "module_loading" list, I think we should be able
      to just use the existing "modules" list, and just fix up the locking a
      In fact, maybe we could just move the "look up existing module" a bit
      later - optimistically assuming that the module doesn't exist, and then
      just undoing the work if it turns out that we were wrong, just before
      adding ourselves to the list.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
    • Rusty Russell's avatar
      module: make locking more fine-grained. · 75676500
      Rusty Russell authored
      Kay Sievers <kay.sievers@vrfy.org> reports that we still have some
      contention over module loading which is slowing boot.
      Linus also disliked a previous "drop lock and regrab" patch to fix the
      bne2 "gave up waiting for init of module libcrc32c" message.
      This is more ambitious: we only grab the lock where we need it.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Brandon Philips <brandon@ifup.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
    • Rusty Russell's avatar
      module: Make module sysfs functions private. · 6407ebb2
      Rusty Russell authored
      These were placed in the header in ef665c1a to get the various
      SYSFS/MODULE config combintations to compile.
      That may have been necessary then, but it's not now.  These functions
      are all local to module.c.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
    • Rusty Russell's avatar
      module: move sysfs exposure to end of load_module · 80a3d1bb
      Rusty Russell authored
      This means a little extra work, but is more logical: we don't put
      anything in sysfs until we're about to put the module into the
      global list an parse its parameters.
      This also gives us a logical place to put duplicate module detection
      in the next patch.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
    • Rusty Russell's avatar
      module: fix kdb's illicit use of struct module_use. · c8e21ced
      Rusty Russell authored
      Linus changed the structure, and luckily this didn't compile any more.
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Martin Hicks <mort@sgi.com>
    • Linus Torvalds's avatar
      module: Make the 'usage' lists be two-way · 2c02dfe7
      Linus Torvalds authored
      When adding a module that depends on another one, we used to create a
      one-way list of "modules_which_use_me", so that module unloading could
      see who needs a module.
      It's actually quite simple to make that list go both ways: so that we
      not only can see "who uses me", but also see a list of modules that are
      "used by me".
      In fact, we always wanted that list in "module_unload_free()": when we
      unload a module, we want to also release all the other modules that are
      used by that module.  But because we didn't have that list, we used to
      first iterate over all modules, and then iterate over each "used by me"
      list of that module.
      By making the list two-way, we simplify module_unload_free(), and it
      allows for some trivial fixes later too.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned & rebased)
  18. 04 Jun, 2010 3 commits
  19. 03 Jun, 2010 1 commit
    • Peter Zijlstra's avatar
      perf: Fix crash in swevents · c6df8d5a
      Peter Zijlstra authored
      Frederic reported that because swevents handling doesn't disable IRQs
      anymore, we can get a recursion of perf_adjust_period(), once from
      overflow handling and once from the tick.
      If both call ->disable, we get a double hlist_del_rcu() and trigger
      a LIST_POISON2 dereference.
      Since we don't actually need to stop/start a swevent to re-programm
      the hardware (lack of hardware to program), simply nop out these
      callbacks for the swevent pmu.
      Reported-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1275557609.27810.35218.camel@twins>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>