• Paolo Valente's avatar
    block, bfq: update blkio stats outside the scheduler lock · 24bfd19b
    Paolo Valente authored
    bfq invokes various blkg_*stats_* functions to update the statistics
    contained in the special files blkio.bfq.* in the blkio controller
    groups, i.e., the I/O accounting related to the proportional-share
    policy provided by bfq. The execution of these functions takes a
    considerable percentage, about 40%, of the total per-request execution
    time of bfq (i.e., of the sum of the execution time of all the bfq
    functions that have to be executed to process an I/O request from its
    creation to its destruction).  This reduces the request-processing
    rate sustainable by bfq noticeably, even on a multicore CPU. In fact,
    the bfq functions that invoke blkg_*stats_* functions cannot be
    executed in parallel with the rest of the code of bfq, because both
    are executed under the same same per-device scheduler lock.
    
    To reduce this slowdown, this commit moves, wherever possible, the
    invocation of these functions (more precisely, of the bfq functions
    that invoke blkg_*stats_* functions) outside the critical sections
    protected by the scheduler lock.
    
    With this change, and with all blkio.bfq.* statistics enabled, the
    throughput grows, e.g., from 250 to 310 KIOPS (+25%) on an Intel
    i7-4850HQ, in case of 8 threads doing random I/O in parallel on
    null_blk, with the latter configured with 0 latency. We obtained the
    same or higher throughput boosts, up to +30%, with other processors
    (some figures are reported in the documentation). For our tests, we
    used the script [1], with which our results can be easily reproduced.
    
    NOTE. This commit still protects the invocation of blkg_*stats_*
    functions with the request_queue lock, because the group these
    functions are invoked on may otherwise disappear before or while these
    functions are executed.  Fortunately, tests without even this lock
    show, by difference, that the serialization caused by this lock has a
    little impact (at most ~5% of throughput reduction).
    
    [1] https://github.com/Algodev-github/IOSpeedTested-by: 's avatarLee Tibbert <lee.tibbert@gmail.com>
    Tested-by: 's avatarOleksandr Natalenko <oleksandr@natalenko.name>
    Signed-off-by: 's avatarPaolo Valente <paolo.valente@linaro.org>
    Signed-off-by: 's avatarLuca Miccio <lucmiccio@gmail.com>
    Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
    24bfd19b