Skip to content
  • Tejun Heo's avatar
    blk-throttle: add throtl_qnode for dispatch fairness · c5cc2070
    Tejun Heo authored
    
    
    With flat hierarchy, there's only single level of dispatching
    happening and fairness beyond that point is the responsibility of the
    rest of the block layer and driver, which usually works out okay;
    however, with the planned hierarchy support,
    service_queue->bio_lists[] can be filled up by bios from a single
    source.  While the limits would still be honored, it'd be very easy to
    starve IOs from siblings or children.
    
    To avoid such starvation, this patch implements throtl_qnode and
    converts service_queue->bio_lists[] to lists of per-source qnodes
    which in turn contains the bio's.  For example, when a bio is
    dispatched from a child group, the bio doesn't get queued on
    ->bio_lists[] directly but it first gets queued on the group's qnode
    which in turn gets queued on service_queue->queued[].  When
    dispatching for the upper level, the ->queued[] list is consumed in
    round-robing order so that the dispatch windows is consumed fairly by
    all IO sources.
    
    There are two ways a bio can come to a throtl_grp - directly queued to
    the group or dispatched from a child.  For the former
    throtl_grp->qnode_on_self[rw] is used.  For the latter, the child's
    ->qnode_on_parent[rw].
    
    Note that this means that the child which is contributing a bio to its
    parent should stay pinned until all its bios are dispatched to its
    grand-parent.  This patch moves blkg refcnting from bio add/remove
    spots to qnode activation/deactivation so that the blkg containing an
    active qnode is always pinned.  As child pins the parent, this is
    sufficient for keeping the relevant sub-tree pinned while bios are in
    flight.
    
    The starvation issue was spotted by Vivek Goyal.
    
    v2: The original patch used the same throtl_grp->qnode_on_self/parent
        for reads and writes causing RWs to be queued incorrectly if there
        already are outstanding IOs in the other direction.  They should
        be throtl_grp->qnode_on_self/parent[2] so that READs and WRITEs
        can use different qnodes.  Spotted by Vivek Goyal.
    
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
    c5cc2070