Skip to content
  • Josef Bacik's avatar
    block: introduce blk-iolatency io controller · d7067512
    Josef Bacik authored
    
    
    Current IO controllers for the block layer are less than ideal for our
    use case.  The io.max controller is great at hard limiting, but it is
    not work conserving.  This patch introduces io.latency.  You provide a
    latency target for your group and we monitor the io in short windows to
    make sure we are not exceeding those latency targets.  This makes use of
    the rq-qos infrastructure and works much like the wbt stuff.  There are
    a few differences from wbt
    
     - It's bio based, so the latency covers the whole block layer in addition to
       the actual io.
     - We will throttle all IO types that comes in here if we need to.
     - We use the mean latency over the 100ms window.  This is because writes can
       be particularly fast, which could give us a false sense of the impact of
       other workloads on our protected workload.
     - By default there's no throttling, we set the queue_depth to INT_MAX so that
       we can have as many outstanding bio's as we're allowed to.  Only at
       throttle time do we pay attention to the actual queue depth.
     - We backcharge cgroups for root cg issued IO and induce artificial
       delays in order to deal with cases like metadata only or swap heavy
       workloads.
    
    In testing this has worked out relatively well.  Protected workloads
    will throttle noisy workloads down to 1 io at time if they are doing
    normal IO on their own, or induce up to a 1 second delay per syscall if
    they are doing a lot of root issued IO (metadata/swap IO).
    
    Our testing has revolved mostly around our production web servers where
    we have hhvm (the web server application) in a protected group and
    everything else in another group.  We see slightly higher requests per
    second (RPS) on the test tier vs the control tier, and much more stable
    RPS across all machines in the test tier vs the control tier.
    
    Another test we run is a slow memory allocator in the unprotected group.
    Before this would eventually push us into swap and cause the whole box
    to die and not recover at all.  With these patches we see slight RPS
    drops (usually 10-15%) before the memory consumer is properly killed and
    things recover within seconds.
    
    Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
    Acked-by: default avatarTejun Heo <tj@kernel.org>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    d7067512