1. 14 Jul, 2017 1 commit
    • Wanpeng Li's avatar
      sched/cputime: Don't use smp_processor_id() in preemptible context · 0e4097c3
      Wanpeng Li authored
      Recent kernels trigger this warning:
      
       BUG: using smp_processor_id() in preemptible [00000000] code: 99-trinity/181
       caller is debug_smp_processor_id+0x17/0x19
       CPU: 0 PID: 181 Comm: 99-trinity Not tainted 4.12.0-01059-g2a42eb95 #1
       Call Trace:
        dump_stack+0x82/0xb8
        check_preemption_disabled()
        debug_smp_processor_id()
        vtime_delta()
        task_cputime()
        thread_group_cputime()
        thread_group_cputime_adjusted()
        wait_consider_task()
        do_wait()
        SYSC_wait4()
        do_syscall_64()
        entry_SYSCALL64_slow_path()
      
      As Frederic pointed out:
      
      | Although those sched_clock_cpu() things seem to only matter when the
      | sched_clock() is unstable. And that stability is a condition for nohz_full
      | to work anyway. So probably sched_clock() alone would be enough.
      
      This patch fixes it by replacing sched_clock_cpu() with sched_clock() to
      avoid calling smp_processor_id() in a preemptible context.
      Reported-by: default avatarXiaolong Ye <xiaolong.ye@intel.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1499586028-7402-1-git-send-email-wanpeng.li@hotmail.com
      [ Prettified the changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0e4097c3
  2. 05 Jul, 2017 5 commits
  3. 04 Jul, 2017 1 commit
    • Ingo Molnar's avatar
      Revert "sched/cputime: Refactor the cputime_adjust() code" · 3b9c08ae
      Ingo Molnar authored
      This reverts commit 72298e5c.
      
      As Peter explains:
      
      > Argh, no... That code was perfectly fine. The new code otoh is
      > convoluted.
      >
      > The old code had the following form:
      >
      >         if (exception1)
      >           deal with exception1
      >
      >         if (execption2)
      >           deal with exception2
      >
      >         do normal stuff
      >
      > Which is as simple and straight forward as it gets.
      >
      > The new code otoh reads like:
      >
      >         if (!exception1) {
      >                 if (exception2)
      >                   deal with exception 2
      >                 else
      >                   do normal stuff
      >         }
      
      So restore the old form.
      
      Also fix the comment describing the logic, as it was confusing.
      Requested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Gustavo A. R. Silva <garsilva@embeddedor.com>
      Cc: Frans Klaver <fransklaver@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3b9c08ae
  4. 30 Jun, 2017 1 commit
    • Gustavo A. R. Silva's avatar
      sched/cputime: Refactor the cputime_adjust() code · 72298e5c
      Gustavo A. R. Silva authored
      Address a Coverity false positive, which is caused by overly
      convoluted code:
      
      Value assigned to variable 'utime' at line 619:utime = rtime;
      is overwritten at line 642:utime = rtime - stime; before it
      can be used. This makes such variable assignment useless.
      
      Remove this variable assignment and refactor the code related.
      
      Addresses-Coverity-ID: 1371643
      Signed-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Cc: Frans Klaver <fransklaver@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/20170629184128.GA5271@embeddedgusSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      72298e5c
  5. 27 Apr, 2017 1 commit
  6. 02 Mar, 2017 2 commits
  7. 01 Feb, 2017 11 commits
  8. 14 Jan, 2017 3 commits
  9. 15 Nov, 2016 3 commits
  10. 30 Sep, 2016 4 commits
  11. 18 Aug, 2016 3 commits
    • Stanislaw Gruszka's avatar
      sched/cputime: Improve scalability by not accounting thread group tasks pending runtime · a1eb1411
      Stanislaw Gruszka authored
      Commit:
      
        d670ec13 ("posix-cpu-timers: Cure SMP wobbles")
      
      started accounting thread group tasks pending runtime in thread_group_cputime().
      
      Another commit:
      
        6e998916 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency")
      
      updated scheduler runtime statistics (call update_curr()) when reading task pending
      runtime. Those changes cause bad performance of SYS_times() and
      SYS_clock_gettimes(CLOCK_PROCESS_CPUTIME_ID) syscalls, especially on
      larger systems with many CPUs.
      
      While we would like to have cpuclock monotonicity kept i.e. have
      problems fixed by above commits stay fixed, we also would like to have
      good performance.
      
      However when we notice that change from commit d670ec13 is not
      longer needed to solve problem addressed by that commit, because of
      change from the second commit 6e998916, we can get room for
      optimization. Since we update task while reading it's pending runtime
      in task_sched_runtime(), clock_gettime(CLOCK_PROCESS_CPUTIME_ID) will
      see updated values and on testcase from d670ec13 process cpuclock
      will not be smaller than thread cpuclock.
      
      I tested the patch on testcases from commits d670ec13,
      6e998916 and some other cpuclock/cputimers testcases and
      did not found cpuclock monotonicity problems or other malfunction.
      
      This patch has the drawback that we will not provide thread group cputime
      up-to-date to the last moment. For example when arming cputime timer,
      we will arm it with possibly a bit outdated values and that timer will
      trigger earlier compared to behaviour without the patch. However that
      was the behaviour before d670ec13 commit (kernel v3.1) so it's
      unlikely to affect applications.
      
      Patch improves related syscall performance, as measured by Giovanni's
      benchmarks described in commit:
      
        6075620b ("sched/cputime: Mitigate performance regression in times()/clock_gettime()")
      
      The benchmark results are:
      
      SYS_clock_gettime():
      
        threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                               (pre-6e998916)
        2          3.48        2.23 ( 35.68%)        3.06 ( 11.83%)        1.08 ( 68.81%)
        5          3.33        2.83 ( 14.84%)        3.25 (  2.40%)        0.71 ( 78.55%)
        8          3.37        2.84 ( 15.80%)        3.26 (  3.30%)        0.56 ( 83.49%)
        12         3.32        3.09 (  6.69%)        3.37 ( -1.60%)        0.42 ( 87.28%)
        21         4.01        3.14 ( 21.70%)        3.90 (  2.74%)        0.35 ( 91.35%)
        30         3.63        3.28 (  9.75%)        3.36 (  7.41%)        0.28 ( 92.23%)
        48         3.71        3.02 ( 18.69%)        3.11 ( 16.27%)        0.39 ( 89.39%)
        79         3.75        2.88 ( 23.23%)        3.16 ( 15.74%)        0.46 ( 87.76%)
        110        3.81        2.95 ( 22.62%)        3.25 ( 14.80%)        0.56 ( 85.41%)
        128        3.88        3.05 ( 21.28%)        3.31 ( 14.76%)        0.62 ( 84.10%)
      
      SYS_times():
      
        threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                               (pre-6e998916)
        2          3.65        2.27 ( 37.94%)        3.25 ( 11.03%)        1.62 ( 55.71%)
        5          3.45        2.78 ( 19.34%)        3.17 (  7.92%)        2.33 ( 32.28%)
        8          3.52        2.79 ( 20.66%)        3.22 (  8.69%)        2.06 ( 41.44%)
        12         3.29        3.02 (  8.33%)        3.36 ( -2.04%)        2.00 ( 39.18%)
        21         4.07        3.10 ( 23.86%)        3.92 (  3.78%)        2.07 ( 49.18%)
        30         3.87        3.33 ( 13.80%)        3.40 ( 12.17%)        1.89 ( 51.12%)
        48         3.79        2.96 ( 21.94%)        3.16 ( 16.61%)        1.69 ( 55.46%)
        79         3.88        2.88 ( 25.82%)        3.28 ( 15.42%)        1.60 ( 58.81%)
        110        3.90        2.98 ( 23.73%)        3.38 ( 13.35%)        1.73 ( 55.61%)
        128        4.00        3.10 ( 22.40%)        3.38 ( 15.45%)        1.66 ( 58.52%)
      Reported-and-tested-by: default avatarGiovanni Gherdovich <ggherdovich@suse.cz>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/20160817093043.GA25206@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a1eb1411
    • Wanpeng Li's avatar
      sched/cputime: Resync steal time when guest & host lose sync · 03cbc732
      Wanpeng Li authored
      Commit:
      
        57430218 ("sched/cputime: Count actually elapsed irq & softirq time")
      
      ... fixed a bug but also triggered a regression:
      
      On an i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four
      CPU hog processes(for loop) running in the guest, I hot-unplug the pCPUs
      on host one by one until there is only one left, then observe CPU utilization
      via 'top' in the guest, it shows:
      
        100% st for cpu0(housekeeping)
         75% st for other CPUs (nohz full mode)
      
      However, w/o this commit it shows the correct 75% for all four CPUs.
      
      When a guest is interrupted for a longer amount of time, missed clock ticks
      are not redelivered later. Because of that, we should not limit the amount
      of steal time accounted to the amount of time that the calling functions
      think have passed.
      
      However, the interval returned by account_other_time() is NOT rounded down
      to the nearest jiffy, while the base interval in get_vtime_delta() it is
      subtracted from is, so the max cputime limit is required to avoid underflow.
      
      This patch fixes the regression by limiting the account_other_time() from
      get_vtime_delta() to avoid underflow, and lets the other three call sites
      (in account_other_time() and steal_account_process_time()) account however
      much steal time the host told us elapsed.
      Suggested-by: default avatarRik van Riel <riel@redhat.com>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/1471399546-4069-1-git-send-email-wanpeng.li@hotmail.com
      [ Improved the changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      03cbc732
    • Peter Zijlstra's avatar
      sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression · 173be9a1
      Peter Zijlstra authored
      Mike reports:
      
       Roughly 10% of the time, ltp testcase getrusage04 fails:
       getrusage04    0  TINFO  :  Expected timers granularity is 4000 us
       getrusage04    0  TINFO  :  Using 1 as multiply factor for max [us]time increment (1000+4000us)!
       getrusage04    0  TINFO  :  utime:           0us; stime:         179us
       getrusage04    0  TINFO  :  utime:        3751us; stime:           0us
       getrusage04    1  TFAIL  :  getrusage04.c:133: stime increased > 5000us:
      
      And tracked it down to the case where the task simply doesn't get
      _any_ [us]time ticks.
      
      Update the code to assume all rtime is utime when we lack information,
      thus ensuring a task that elides the tick gets time accounted.
      Reported-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Tested-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Fredrik Markstrom <fredrik.markstrom@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: stable@vger.kernel.org # 4.3+
      Fixes: 9d7fb042 ("sched/cputime: Guarantee stime + utime == rtime")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      173be9a1
  12. 11 Aug, 2016 2 commits
  13. 14 Jul, 2016 3 commits