1. 21 Nov, 2017 1 commit
    • Kees Cook's avatar
      block/laptop_mode: Convert timers to use timer_setup() · bca237a5
      Kees Cook authored
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: linux-block@vger.kernel.org
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      bca237a5
  2. 16 Nov, 2017 1 commit
  3. 11 Nov, 2017 7 commits
  4. 04 Nov, 2017 2 commits
  5. 03 Nov, 2017 3 commits
  6. 30 Oct, 2017 1 commit
  7. 10 Oct, 2017 1 commit
  8. 03 Oct, 2017 1 commit
  9. 25 Sep, 2017 1 commit
    • Waiman Long's avatar
      blktrace: Fix potential deadlock between delete & sysfs ops · 5acb3cc2
      Waiman Long authored
      The lockdep code had reported the following unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(s_active#228);
                                     lock(&bdev->bd_mutex/1);
                                     lock(s_active#228);
        lock(&bdev->bd_mutex);
      
       *** DEADLOCK ***
      
      The deadlock may happen when one task (CPU1) is trying to delete a
      partition in a block device and another task (CPU0) is accessing
      tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that
      partition.
      
      The s_active isn't an actual lock. It is a reference count (kn->count)
      on the sysfs (kernfs) file. Removal of a sysfs file, however, require
      a wait until all the references are gone. The reference count is
      treated like a rwsem using lockdep instrumentation code.
      
      The fact that a thread is in the sysfs callback method or in the
      ioctl call means there is a reference to the opended sysfs or device
      file. That should prevent the underlying block structure from being
      removed.
      
      Instead of using bd_mutex in the block_device structure, a new
      blk_trace_mutex is now added to the request_queue structure to protect
      access to the blk_trace structure.
      Suggested-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      
      Fix typo in patch subject line, and prune a comment detailing how
      the code used to work.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5acb3cc2
  10. 11 Sep, 2017 1 commit
    • Jens Axboe's avatar
      block: directly insert blk-mq request from blk_insert_cloned_request() · 157f377b
      Jens Axboe authored
      A NULL pointer crash was reported for the case of having the BFQ IO
      scheduler attached to the underlying blk-mq paths of a DM multipath
      device.  The crash occured in blk_mq_sched_insert_request()'s call to
      e->type->ops.mq.insert_requests().
      
      Paolo Valente correctly summarized why the crash occured with:
      "the call chain (dm_mq_queue_rq -> map_request -> setup_clone ->
      blk_rq_prep_clone) creates a cloned request without invoking
      e->type->ops.mq.prepare_request for the target elevator e.  The cloned
      request is therefore not initialized for the scheduler, but it is
      however inserted into the scheduler by blk_mq_sched_insert_request."
      
      All said, a request-based DM multipath device's IO scheduler should be
      the only one used -- when the original requests are issued to the
      underlying paths as cloned requests they are inserted directly in the
      underlying dispatch queue(s) rather than through an additional elevator.
      
      But commit bd166ef1 ("blk-mq-sched: add framework for MQ capable IO
      schedulers") switched blk_insert_cloned_request() from using
      blk_mq_insert_request() to blk_mq_sched_insert_request().  Which
      incorrectly added elevator machinery into a call chain that isn't
      supposed to have any.
      
      To fix this introduce a blk-mq private blk_mq_request_bypass_insert()
      that blk_insert_cloned_request() calls to insert the request without
      involving any elevator that may be attached to the cloned request's
      request_queue.
      
      Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarBart Van Assche <Bart.VanAssche@wdc.com>
      Tested-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      157f377b
  11. 29 Aug, 2017 1 commit
  12. 23 Aug, 2017 1 commit
    • Christoph Hellwig's avatar
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig authored
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      74d46992
  13. 18 Aug, 2017 1 commit
    • Bart Van Assche's avatar
      block: Relax a check in blk_start_queue() · 4ddd56b0
      Bart Van Assche authored
      Calling blk_start_queue() from interrupt context with the queue
      lock held and without disabling IRQs, as the skd driver does, is
      safe. This patch avoids that loading the skd driver triggers the
      following warning:
      
      WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0
      RIP: 0010:blk_start_queue+0x84/0xa0
      Call Trace:
       skd_unquiesce_dev+0x12a/0x1d0 [skd]
       skd_complete_internal+0x1e7/0x5a0 [skd]
       skd_complete_other+0xc2/0xd0 [skd]
       skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd]
       skd_isr+0x14f/0x180 [skd]
       irq_forced_thread_fn+0x2a/0x70
       irq_thread+0x144/0x1a0
       kthread+0x125/0x140
       ret_from_fork+0x2a/0x40
      
      Fixes: commit a038e253 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4ddd56b0
  14. 09 Aug, 2017 3 commits
  15. 24 Jul, 2017 1 commit
  16. 03 Jul, 2017 1 commit
  17. 27 Jun, 2017 4 commits
  18. 21 Jun, 2017 5 commits
  19. 20 Jun, 2017 1 commit
    • Goldwyn Rodrigues's avatar
      block: return on congested block device · 03a07c92
      Goldwyn Rodrigues authored
      A new bio operation flag REQ_NOWAIT is introduced to identify bio's
      orignating from iocb with IOCB_NOWAIT. This flag indicates
      to return immediately if a request cannot be made instead
      of retrying.
      
      Stacked devices such as md (the ones with make_request_fn hooks)
      currently are not supported because it may block for housekeeping.
      For example, an md can have a part of the device suspended.
      For this reason, only request based devices are supported.
      In the future, this feature will be expanded to stacked devices
      by teaching them how to handle the REQ_NOWAIT flags.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      03a07c92
  20. 18 Jun, 2017 3 commits
    • NeilBrown's avatar
      blk: use non-rescuing bioset for q->bio_split. · 93b27e72
      NeilBrown authored
      A rescuing bioset is only useful if there might be bios from
      that same bioset on the bio_list_on_stack queue at a time
      when bio_alloc_bioset() is called.  This never applies to
      q->bio_split.
      
      Allocations from q->bio_split are only ever made from
      blk_queue_split() which is only ever called early in each of
      various make_request_fn()s.  The original bio (call this A)
      is then passed to generic_make_request() and is placed on
      the bio_list_on_stack queue, and the bio that was allocated
      from q->bio_split (B) is processed.
      
      The processing of this may cause other bios to be passed to
      generic_make_request() or may even cause the bio B itself to
      be passed, possible after some prefix has been split off
      (using some other bioset).
      
      generic_make_request() now guarantees that all of these bios
      (B and dependants) will be fully processed before the tail
      of the original bio A gets handled.  None of these early bios
      can possible trigger an allocation from the original
      q->bio_split as they are either too small to require
      splitting or (more likely) are destined for a different queue.
      
      The next time that the original q->bio_split might be used
      by this thread is when A is processed again, as it might
      still be too big to handle directly.  By this time there
      cannot be any other bios allocated from q->bio_split in the
      generic_make_request() queue.  So no rescuing will ever be
      needed.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      93b27e72
    • NeilBrown's avatar
      blk: make the bioset rescue_workqueue optional. · 47e0fb46
      NeilBrown authored
      This patch converts bioset_create() to not create a workqueue by
      default, so alloctions will never trigger punt_bios_to_rescuer().  It
      also introduces a new flag BIOSET_NEED_RESCUER which tells
      bioset_create() to preserve the old behavior.
      
      All callers of bioset_create() that are inside block device drivers,
      are given the BIOSET_NEED_RESCUER flag.
      
      biosets used by filesystems or other top-level users do not
      need rescuing as the bio can never be queued behind other
      bios.  This includes fs_bio_set, blkdev_dio_pool,
      btrfs_bioset, xfs_ioend_bioset, and one allocated by
      target_core_iblock.c.
      
      biosets used by md/raid do not need rescuing as
      their usage was recently audited and revised to never
      risk deadlock.
      
      It is hoped that most, if not all, of the remaining biosets
      can end up being the non-rescued version.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes)
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      47e0fb46
    • NeilBrown's avatar
      blk: replace bioset_create_nobvec() with a flags arg to bioset_create() · 011067b0
      NeilBrown authored
      "flags" arguments are often seen as good API design as they allow
      easy extensibility.
      bioset_create_nobvec() is implemented internally as a variation in
      flags passed to __bioset_create().
      
      To support future extension, make the internal structure part of the
      API.
      i.e. add a 'flags' argument to bioset_create() and discard
      bioset_create_nobvec().
      
      Note that the bio_split allocations in drivers/md/raid* do not need
      the bvec mempool - they should have used bioset_create_nobvec().
      Suggested-by: default avatarChristoph Hellwig <hch@infradead.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@infradead.org>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      011067b0