Skip to content
  • Stanislaw Gruszka's avatar
    sched/cputime: Improve scalability by not accounting thread group tasks pending runtime · a1eb1411
    Stanislaw Gruszka authored
    Commit:
    
      d670ec13 ("posix-cpu-timers: Cure SMP wobbles")
    
    started accounting thread group tasks pending runtime in thread_group_cputime().
    
    Another commit:
    
      6e998916 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency")
    
    updated scheduler runtime statistics (call update_curr()) when reading task pending
    runtime. Those changes cause bad performance of SYS_times() and
    SYS_clock_gettimes(CLOCK_PROCESS_CPUTIME_ID) syscalls, especially on
    larger systems with many CPUs.
    
    While we would like to have cpuclock monotonicity kept i.e. have
    problems fixed by above commits stay fixed, we also would like to have
    good performance.
    
    However when we notice that change from commit d670ec13 is not
    longer needed to solve problem addressed by that commit, because of
    change from the second commit 6e998916, we can get room for
    optimization. Since we update task while reading it's pending runtime
    in task_sched_runtime(), clock_gettime(CLOCK_PROCESS_CPUTIME_ID) will
    see updated values and on testcase from d670ec13 process cpuclock
    will not be smaller than thread cpuclock.
    
    I tested the patch on testcases from commits d670ec13,
    6e998916 and some other cpuclock/cputimers testcases and
    did not found cpuclock monotonicity problems or other malfunction.
    
    This patch has the drawback that we will not provide thread group cputime
    up-to-date to the last moment. For example when arming cputime timer,
    we will arm it with possibly a bit outdated values and that timer will
    trigger earlier compared to behaviour without the patch. However that
    was the behaviour before d670ec13 commit (kernel v3.1) so it's
    unlikely to affect applications.
    
    Patch improves related syscall performance, as measured by Giovanni's
    benchmarks described in commit:
    
      6075620b ("sched/cputime: Mitigate performance regression in times()/clock_gettime()")
    
    The benchmark results are:
    
    SYS_clock_gettime():
    
      threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                             (pre-6e998916)
      2          3.48        2.23 ( 35.68%)        3.06 ( 11.83%)        1.08 ( 68.81%)
      5          3.33        2.83 ( 14.84%)        3.25 (  2.40%)        0.71 ( 78.55%)
      8          3.37        2.84 ( 15.80%)        3.26 (  3.30%)        0.56 ( 83.49%)
      12         3.32        3.09 (  6.69%)        3.37 ( -1.60%)        0.42 ( 87.28%)
      21         4.01        3.14 ( 21.70%)        3.90 (  2.74%)        0.35 ( 91.35%)
      30         3.63        3.28 (  9.75%)        3.36 (  7.41%)        0.28 ( 92.23%)
      48         3.71        3.02 ( 18.69%)        3.11 ( 16.27%)        0.39 ( 89.39%)
      79         3.75        2.88 ( 23.23%)        3.16 ( 15.74%)        0.46 ( 87.76%)
      110        3.81        2.95 ( 22.62%)        3.25 ( 14.80%)        0.56 ( 85.41%)
      128        3.88        3.05 ( 21.28%)        3.31 ( 14.76%)        0.62 ( 84.10%)
    
    SYS_times():
    
      threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                             (pre-6e998916
    
    )
      2          3.65        2.27 ( 37.94%)        3.25 ( 11.03%)        1.62 ( 55.71%)
      5          3.45        2.78 ( 19.34%)        3.17 (  7.92%)        2.33 ( 32.28%)
      8          3.52        2.79 ( 20.66%)        3.22 (  8.69%)        2.06 ( 41.44%)
      12         3.29        3.02 (  8.33%)        3.36 ( -2.04%)        2.00 ( 39.18%)
      21         4.07        3.10 ( 23.86%)        3.92 (  3.78%)        2.07 ( 49.18%)
      30         3.87        3.33 ( 13.80%)        3.40 ( 12.17%)        1.89 ( 51.12%)
      48         3.79        2.96 ( 21.94%)        3.16 ( 16.61%)        1.69 ( 55.46%)
      79         3.88        2.88 ( 25.82%)        3.28 ( 15.42%)        1.60 ( 58.81%)
      110        3.90        2.98 ( 23.73%)        3.38 ( 13.35%)        1.73 ( 55.61%)
      128        4.00        3.10 ( 22.40%)        3.38 ( 15.45%)        1.66 ( 58.52%)
    
    Reported-and-tested-by: default avatarGiovanni Gherdovich <ggherdovich@suse.cz>
    Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Mike Galbraith <mgalbraith@suse.de>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Wanpeng Li <wanpeng.li@hotmail.com>
    Link: http://lkml.kernel.org/r/20160817093043.GA25206@redhat.com
    
    
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    a1eb1411