1. 15 Nov, 2017 1 commit
  2. 31 Aug, 2017 1 commit
    • Paolo Valente's avatar
      block, bfq: make lookup_next_entity push up vtime on expirations · 80294c3b
      Paolo Valente authored
      To provide a very smooth service, bfq starts to serve a bfq_queue
      only if the queue is 'eligible', i.e., if the same queue would
      have started to be served in the ideal, perfectly fair system that
      bfq simulates internally. This is obtained by associating each
      queue with a virtual start time, and by computing a special system
      virtual time quantity: a queue is eligible only if the system
      virtual time has reached the virtual start time of the
      queue. Finally, bfq guarantees that, when a new queue must be set
      in service, there is always at least one eligible entity for each
      active parent entity in the scheduler. To provide this guarantee,
      the function __bfq_lookup_next_entity pushes up, for each parent
      entity on which it is invoked, the system virtual time to the
      minimum among the virtual start times of the entities in the
      active tree for the parent entity (more precisely, the push up
      occurs if the system virtual time happens to be lower than all
      such virtual start times).
      
      There is however a circumstance in which __bfq_lookup_next_entity
      cannot push up the system virtual time for a parent entity, even
      if the system virtual time is lower than the virtual start times
      of all the child entities in the active tree. It happens if one of
      the child entities is in service. In fact, in such a case, there
      is already an eligible entity, the in-service one, even if it may
      not be not present in the active tree (because in-service entities
      may be removed from the active tree).
      
      Unfortunately, in the last re-design of the
      hierarchical-scheduling engine, the reset of the pointer to the
      in-service entity for a given parent entity--reset to be done as a
      consequence of the expiration of the in-service entity--always
      happens after the function __bfq_lookup_next_entity has been
      invoked. This causes the function to think that there is still an
      entity in service for the parent entity, and then that the system
      virtual time cannot be pushed up, even if actually such a
      no-more-in-service entity has already been properly reinserted
      into the active tree (or in some other tree if no more
      active). Yet, the system virtual time *had* to be pushed up, to be
      ready to correctly choose the next queue to serve. Because of the
      lack of this push up, bfq may wrongly set in service a queue that
      had been speculatively pre-computed as the possible
      next-in-service queue, but that would no more be the one to serve
      after the expiration and the reinsertion into the active trees of
      the previously in-service entities.
      
      This commit addresses this issue by making
      __bfq_lookup_next_entity properly push up the system virtual time
      if an expiration is occurring.
      Signed-off-by: 's avatarPaolo Valente <paolo.valente@linaro.org>
      Tested-by: 's avatarLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: 's avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      80294c3b
  3. 11 Aug, 2017 1 commit
    • Paolo Valente's avatar
      block,bfq: refactor device-idling logic · d5be3fef
      Paolo Valente authored
      The logic that decides whether to idle the device is scattered across
      three functions. Almost all of the logic is in the function
      bfq_bfqq_may_idle, but (1) part of the decision is made in
      bfq_update_idle_window, and (2) the function bfq_bfqq_must_idle may
      switch off idling regardless of the output of bfq_bfqq_may_idle. In
      addition, both bfq_update_idle_window and bfq_bfqq_must_idle make
      their decisions as a function of parameters that are used, for similar
      purposes, also in bfq_bfqq_may_idle. This commit addresses these
      issues by moving all the logic into bfq_bfqq_may_idle.
      Signed-off-by: 's avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      d5be3fef
  4. 29 Jul, 2017 2 commits
    • Paolo Valente's avatar
      block, bfq: consider also in_service_entity to state whether an entity is active · 46d556e6
      Paolo Valente authored
      Groups of BFQ queues are represented by generic entities in BFQ. When
      a queue belonging to a parent entity is deactivated, the parent entity
      may need to be deactivated too, in case the deactivated queue was the
      only active queue for the parent entity. This deactivation may need to
      be propagated upwards if the entity belongs, in its turn, to a further
      higher-level entity, and so on. In particular, the upward propagation
      of deactivation stops at the first parent entity that remains active
      even if one of its child entities has been deactivated.
      
      To decide whether the last non-deactivation condition holds for a
      parent entity, BFQ checks whether the field next_in_service is still
      not NULL for the parent entity, after the deactivation of one of its
      child entity. If it is not NULL, then there are certainly other active
      entities in the parent entity, and deactivations can stop.
      
      Unfortunately, this check misses a corner case: if in_service_entity
      is not NULL, then next_in_service may happen to be NULL, although the
      parent entity is evidently active. This happens if: 1) the entity
      pointed by in_service_entity is the only active entity in the parent
      entity, and 2) according to the definition of next_in_service, the
      in_service_entity cannot be considered as next_in_service. See the
      comments on the definition of next_in_service for details on this
      second point.
      
      Hitting the above corner case causes crashes.
      
      To address this issue, this commit:
      1) Extends the above check on only next_in_service to controlling both
      next_in_service and in_service_entity (if any of them is not NULL,
      then no further deactivation is performed)
      2) Improves the (important) comments on how next_in_service is defined
      and updated; in particular it fixes a few rather obscure paragraphs
      Reported-by: 's avatarEric Wheeler <bfq-sched@lists.ewheeler.net>
      Reported-by: 's avatarRick Yiu <rick_yiu@htc.com>
      Reported-by: 's avatarTom X Nguyen <tom81094@gmail.com>
      Signed-off-by: 's avatarPaolo Valente <paolo.valente@linaro.org>
      Tested-by: 's avatarEric Wheeler <bfq-sched@lists.ewheeler.net>
      Tested-by: 's avatarRick Yiu <rick_yiu@htc.com>
      Tested-by: 's avatarLaurentiu Nicola <lnicola@dend.ro>
      Tested-by: 's avatarTom X Nguyen <tom81094@gmail.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      46d556e6
    • Shaohua Li's avatar
      block: use standard blktrace API to output cgroup info for debug notes · 35fe6d76
      Shaohua Li authored
      Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
      way. Now we have standard blktrace API for this, so convert them to use
      it.
      
      Note, this changes the behavior a little bit. cgroup info isn't output
      by default, we only do this with 'blk_cgroup' option enabled. cgroup
      info isn't output as a string by default too, we only do this with
      'blk_cgname' option enabled. Also cgroup info is output in different
      position of the note string. I think these behavior changes aren't a big
      issue (actually we make trace data shorter which is good), since the
      blktrace note is solely for debugging.
      Signed-off-by: 's avatarShaohua Li <shli@fb.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      35fe6d76
  5. 12 Jul, 2017 1 commit
  6. 03 Jul, 2017 1 commit
    • Paolo Valente's avatar
      block, bfq: don't change ioprio class for a bfq_queue on a service tree · 431b17f9
      Paolo Valente authored
      On each deactivation or re-scheduling (after being served) of a
      bfq_queue, BFQ invokes the function __bfq_entity_update_weight_prio(),
      to perform pending updates of ioprio, weight and ioprio class for the
      bfq_queue. BFQ also invokes this function on I/O-request dispatches,
      to raise or lower weights more quickly when needed, thereby improving
      latency. However, the entity representing the bfq_queue may be on the
      active (sub)tree of a service tree when this happens, and, although
      with a very low probability, the bfq_queue may happen to also have a
      pending change of its ioprio class. If both conditions hold when
      __bfq_entity_update_weight_prio() is invoked, then the entity moves to
      a sort of hybrid state: the new service tree for the entity, as
      returned by bfq_entity_service_tree(), differs from service tree on
      which the entity still is. The functions that handle activations and
      deactivations of entities do not cope with such a hybrid state (and
      would need to become more complex to cope).
      
      This commit addresses this issue by just making
      __bfq_entity_update_weight_prio() not perform also a possible pending
      change of ioprio class, when invoked on an I/O-request dispatch for a
      bfq_queue. Such a change is thus postponed to when
      __bfq_entity_update_weight_prio() is invoked on deactivation or
      re-scheduling of the bfq_queue.
      Reported-by: 's avatarMarco Piazza <mpiazza@gmail.com>
      Reported-by: 's avatarLaurentiu Nicola <lnicola@dend.ro>
      Signed-off-by: 's avatarPaolo Valente <paolo.valente@linaro.org>
      Tested-by: 's avatarMarco Piazza <mpiazza@gmail.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      431b17f9
  7. 08 Jun, 2017 1 commit
    • Paolo Valente's avatar
      block, bfq: access and cache blkg data only when safe · 8f9bebc3
      Paolo Valente authored
      In blk-cgroup, operations on blkg objects are protected with the
      request_queue lock. This is no more the lock that protects
      I/O-scheduler operations in blk-mq. In fact, the latter are now
      protected with a finer-grained per-scheduler-instance lock. As a
      consequence, although blkg lookups are also rcu-protected, blk-mq I/O
      schedulers may see inconsistent data when they access blkg and
      blkg-related objects. BFQ does access these objects, and does incur
      this problem, in the following case.
      
      The blkg_lookup performed in bfq_get_queue, being protected (only)
      through rcu, may happen to return the address of a copy of the
      original blkg. If this is the case, then the blkg_get performed in
      bfq_get_queue, to pin down the blkg, is useless: it does not prevent
      blk-cgroup code from destroying both the original blkg and all objects
      directly or indirectly referred by the copy of the blkg. BFQ accesses
      these objects, which typically causes a crash for NULL-pointer
      dereference of memory-protection violation.
      
      Some additional protection mechanism should be added to blk-cgroup to
      address this issue. In the meantime, this commit provides a quick
      temporary fix for BFQ: cache (when safe) blkg data that might
      disappear right after a blkg_lookup.
      
      In particular, this commit exploits the following facts to achieve its
      goal without introducing further locks.  Destroy operations on a blkg
      invoke, as a first step, hooks of the scheduler associated with the
      blkg. And these hooks are executed with bfqd->lock held for BFQ. As a
      consequence, for any blkg associated with the request queue an
      instance of BFQ is attached to, we are guaranteed that such a blkg is
      not destroyed, and that all the pointers it contains are consistent,
      while that instance is holding its bfqd->lock. A blkg_lookup performed
      with bfqd->lock held then returns a fully consistent blkg, which
      remains consistent until this lock is held. In more detail, this holds
      even if the returned blkg is a copy of the original one.
      
      Finally, also the object describing a group inside BFQ needs to be
      protected from destruction on the blkg_free of the original blkg
      (which invokes bfq_pd_free). This commit adds private refcounting for
      this object, to let it disappear only after no bfq_queue refers to it
      any longer.
      
      This commit also removes or updates some stale comments on locking
      issues related to blk-cgroup operations.
      Reported-by: 's avatarTomas Konir <tomas.konir@gmail.com>
      Reported-by: 's avatarLee Tibbert <lee.tibbert@gmail.com>
      Reported-by: 's avatarMarco Piazza <mpiazza@gmail.com>
      Signed-off-by: 's avatarPaolo Valente <paolo.valente@linaro.org>
      Tested-by: 's avatarTomas Konir <tomas.konir@gmail.com>
      Tested-by: 's avatarLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: 's avatarMarco Piazza <mpiazza@gmail.com>
      Signed-off-by: 's avatarJens Axboe <axboe@fb.com>
      8f9bebc3
  8. 20 Apr, 2017 1 commit
    • Jens Axboe's avatar
      bfq: fix compile error if CONFIG_CGROUPS=n · 659b3394
      Jens Axboe authored
      If we don't have CGROUPS enabled, the compile ends in the
      following misery:
      
      In file included from ../block/bfq-iosched.c:105:0:
      ../block/bfq-iosched.h:819:22: error: array type has incomplete element type
       extern struct cftype bfq_blkcg_legacy_files[];
                            ^
      ../block/bfq-iosched.h:820:22: error: array type has incomplete element type
       extern struct cftype bfq_blkg_files[];
                            ^
      
      Move the declarations under the right ifdef.
      Reported-by: 's avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: 's avatarJens Axboe <axboe@fb.com>
      659b3394
  9. 19 Apr, 2017 1 commit
    • Paolo Valente's avatar
      block, bfq: split bfq-iosched.c into multiple source files · ea25da48
      Paolo Valente authored
      The BFQ I/O scheduler features an optimal fair-queuing
      (proportional-share) scheduling algorithm, enriched with several
      mechanisms to boost throughput and reduce latency for interactive and
      real-time applications. This makes BFQ a large and complex piece of
      code. This commit addresses this issue by splitting BFQ into three
      main, independent components, and by moving each component into a
      separate source file:
      1. Main algorithm: handles the interaction with the kernel, and
      decides which requests to dispatch; it uses the following two further
      components to achieve its goals.
      2. Scheduling engine (Hierarchical B-WF2Q+ scheduling algorithm):
      computes the schedule, using weights and budgets provided by the above
      component.
      3. cgroups support: handles group operations (creation, destruction,
      move, ...).
      Signed-off-by: 's avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: 's avatarJens Axboe <axboe@fb.com>
      ea25da48