Skip to content
  • Dennis Zhou's avatar
    blk-iolatency: fix STS_AGAIN handling · 178d1337
    Dennis Zhou authored
    [ Upstream commit c9b3007f ]
    
    The iolatency controller is based on rq_qos. It increments on
    rq_qos_throttle() and decrements on either rq_qos_cleanup() or
    rq_qos_done_bio(). a3fb01ba fixes the double accounting issue where
    blk_mq_make_request() may call both rq_qos_cleanup() and
    rq_qos_done_bio() on REQ_NO_WAIT. So checking STS_AGAIN prevents the
    double decrement.
    
    The above works upstream as the only way we can get STS_AGAIN is from
    blk_mq_get_request() failing. The STS_AGAIN handling isn't a real
    problem as bio_endio() skipping only happens on reserved tag allocation
    failures which can only be caused by driver bugs and already triggers
    WARN.
    
    However, the fix creates a not so great dependency on how STS_AGAIN can
    be propagated. Internally, we (Facebook) carry a patch that kills read
    ahead if a cgroup is io congested or a fatal signal is pending. This
    combined with chained bios progagate their bi_status to the parent is
    not already set can can cause the parent bio to not clean up properly
    even though it was successful. This consequently leaks the inflight
    counter and can hang all IOs under that blkg.
    
    To nip the adverse interaction early, this removes the rq_qos_cleanup()
    callback in iolatency in favor of cleaning up always on the
    rq_qos_done_bio() path.
    
    Fixes: a3fb01ba
    
     ("blk-iolatency: only account submitted bios")
    Debugged-by: default avatarTejun Heo <tj@kernel.org>
    Debugged-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
    178d1337