1. 28 Aug, 2017 2 commits
  2. 23 Aug, 2017 1 commit
    • Christoph Hellwig's avatar
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig authored
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      74d46992
  3. 10 Aug, 2017 3 commits
  4. 26 Jul, 2017 1 commit
    • Scott Bauer's avatar
      nvme: validate admin queue before unquiesce · 7dd1ab16
      Scott Bauer authored
      With a misbehaving controller it's possible we'll never
      enter the live state and create an admin queue. When we
      fail out of reset work it's possible we failed out early
      enough without setting up the admin queue. We tear down
      queues after a failed reset, but needed to do some more
      sanitization.
      
      Fixes 443bd90f: "nvme: host: unquiesce queue in nvme_kill_queues()"
      
      [  189.650995] nvme nvme1: pci function 0000:0b:00.0
      [  317.680055] nvme nvme0: Device not ready; aborting reset
      [  317.680183] nvme nvme0: Removing after probe failure status: -19
      [  317.681258] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [  317.681397] general protection fault: 0000 [#1] SMP KASAN
      [  317.682984] CPU: 3 PID: 477 Comm: kworker/3:2 Not tainted 4.13.0-rc1+ #5
      [  317.683112] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F5 03/07/2016
      [  317.683284] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
      [  317.683398] task: ffff8803b0990000 task.stack: ffff8803c2ef0000
      [  317.683516] RIP: 0010:blk_mq_unquiesce_queue+0x2b/0xa0
      [  317.683614] RSP: 0018:ffff8803c2ef7d40 EFLAGS: 00010282
      [  317.683716] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 1ffff1006fbdcde3
      [  317.683847] RDX: 0000000000000038 RSI: 1ffff1006f5a9245 RDI: 0000000000000000
      [  317.683978] RBP: ffff8803c2ef7d58 R08: 1ffff1007bcdc974 R09: 0000000000000000
      [  317.684108] R10: 1ffff1007bcdc975 R11: 0000000000000000 R12: 00000000000001c0
      [  317.684239] R13: ffff88037ad49228 R14: ffff88037ad492d0 R15: ffff88037ad492e0
      [  317.684371] FS:  0000000000000000(0000) GS:ffff8803de6c0000(0000) knlGS:0000000000000000
      [  317.684519] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  317.684627] CR2: 0000002d1860c000 CR3: 000000045b40d000 CR4: 00000000003406e0
      [  317.684758] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  317.684888] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  317.685018] Call Trace:
      [  317.685084]  nvme_kill_queues+0x4d/0x170 [nvme_core]
      [  317.685185]  nvme_remove_dead_ctrl_work+0x3a/0x90 [nvme]
      [  317.685289]  process_one_work+0x771/0x1170
      [  317.685372]  worker_thread+0xde/0x11e0
      [  317.685452]  ? pci_mmcfg_check_reserved+0x110/0x110
      [  317.685550]  kthread+0x2d3/0x3d0
      [  317.685617]  ? process_one_work+0x1170/0x1170
      [  317.685704]  ? kthread_create_on_node+0xc0/0xc0
      [  317.685785]  ret_from_fork+0x25/0x30
      [  317.685798] Code: 0f 1f 44 00 00 55 48 b8 00 00 00 00 00 fc ff df 48 89 e5 41 54 4c 8d a7 c0 01 00 00 53 48 89 fb 4c 89 e2 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 75 50 48 8b bb c0 01 00 00 e8 33 8a f9 00 0f ba b3
      [  317.685872] RIP: blk_mq_unquiesce_queue+0x2b/0xa0 RSP: ffff8803c2ef7d40
      [  317.685908] ---[ end trace a3f8704150b1e8b4 ]---
      Signed-off-by: default avatarScott Bauer <scott.bauer@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      7dd1ab16
  5. 25 Jul, 2017 1 commit
  6. 20 Jul, 2017 1 commit
  7. 06 Jul, 2017 2 commits
  8. 28 Jun, 2017 4 commits
  9. 27 Jun, 2017 1 commit
    • Jens Axboe's avatar
      nvme: add support for streams and directives · f5d11840
      Jens Axboe authored
      This adds support for Directives in NVMe, particular for the Streams
      directive. Support for Directives is a new feature in NVMe 1.3. It
      allows a user to pass in information about where to store the data, so
      that it the device can do so most effiently. If an application is
      managing and writing data with different life times, mixing differently
      retentioned data onto the same locations on flash can cause write
      amplification to grow. This, in turn, will reduce performance and life
      time of the device.
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f5d11840
  10. 19 Jun, 2017 1 commit
    • Ming Lei's avatar
      nvme: host: unquiesce queue in nvme_kill_queues() · 443bd90f
      Ming Lei authored
      When nvme_kill_queues() is run, queues may be in
      quiesced state, so we forcibly unquiesce queues to avoid
      blocking dispatch, and I/O hang can be avoided in
      remove path.
      
      Peviously we use blk_mq_start_stopped_hw_queues() as
      counterpart of blk_mq_quiesce_queue(), now we have
      introduced blk_mq_unquiesce_queue(), so use it explicitly.
      
      Cc: linux-nvme@lists.infradead.org
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      443bd90f
  11. 18 Jun, 2017 1 commit
  12. 16 Jun, 2017 1 commit
  13. 15 Jun, 2017 12 commits
  14. 13 Jun, 2017 1 commit
  15. 09 Jun, 2017 2 commits
    • Christoph Hellwig's avatar
      blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653
      Christoph Hellwig authored
      Use the same values for use for request completion errors as the return
      value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
      a requeue, and all the others are completed as-is.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      fc17b653
    • Christoph Hellwig's avatar
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig authored
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2a842aca
  16. 07 Jun, 2017 3 commits
    • Kai-Heng Feng's avatar
      nvme: relax APST default max latency to 100ms · 9947d6a0
      Kai-Heng Feng authored
      Christoph Hellwig suggests we should to make APST work out of the box.
      Hence relax the the default max latency to make them able to enter
      deepest power state on default.
      
      Here are id-ctrl excerpts from two high latency NVMes:
      
      vid     : 0x14a4
      ssvid   : 0x1b4b
      mn      : CX2-GB1024-Q11 NVMe LITEON 1024GB
      ps    3 : mp:0.1000W non-operational enlat:5000 exlat:5000 rrt:3 rrl:3
                rwt:3 rwl:3 idle_power:- active_power:-
      ps    4 : mp:0.0100W non-operational enlat:50000 exlat:100000 rrt:4 rrl:4
                rwt:4 rwl:4 idle_power:- active_power:-
      
      vid     : 0x15b7
      ssvid   : 0x1b4b
      mn      : A400 NVMe SanDisk 512GB
      ps    3 : mp:0.0500W non-operational enlat:51000 exlat:10000 rrt:0 rrl:0
                rwt:0 rwl:0 idle_power:- active_power:-
      ps    4 : mp:0.0055W non-operational enlat:1000000 exlat:100000 rrt:0 rrl:0
                rwt:0 rwl:0 idle_power:- active_power:-
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      9947d6a0
    • Kai-Heng Feng's avatar
      nvme: only consider exit latency when choosing useful non-op power states · da87591b
      Kai-Heng Feng authored
      When a NVMe is in non-op states, the latency is exlat.
      The latency will be enlat + exlat only when the NVMe tries to transit
      from operational state right atfer it begins to transit to
      non-operational state, which should be a rare case.
      
      Therefore, as Andy Lutomirski suggests, use exlat only when deciding power
      states to trainsit to.
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      da87591b
    • Ming Lei's avatar
      nvme: fix hang in remove path · 82654b6b
      Ming Lei authored
      We need to start admin queues too in nvme_kill_queues()
      for avoiding hang in remove path[1].
      
      This patch is very similar with 806f026f(nvme: use
      blk_mq_start_hw_queues() in nvme_kill_queues()).
      
      [1] hang stack trace
      [<ffffffff813c9716>] blk_execute_rq+0x56/0x80
      [<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
      [<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
      [<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
      [<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
      [<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
      [<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
      [<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
      [<ffffffff8156d951>] device_del+0x101/0x320
      [<ffffffff8156db8a>] device_unregister+0x1a/0x60
      [<ffffffff8156dc4c>] device_destroy+0x3c/0x50
      [<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
      [<ffffffff815d4858>] nvme_remove+0x78/0x110
      [<ffffffff81452b69>] pci_device_remove+0x39/0xb0
      [<ffffffff81572935>] device_release_driver_internal+0x155/0x210
      [<ffffffff81572a02>] device_release_driver+0x12/0x20
      [<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
      [<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
      [<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
      [<ffffffff810c5ac9>] kthread+0x109/0x140
      [<ffffffff8185800c>] ret_from_fork+0x2c/0x40
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      Fixes: c5552fde("nvme: Enable autonomous power state transitions")
      Reported-by: default avatarRakesh Pandit <rakesh@tuxera.com>
      Tested-by: default avatarRakesh Pandit <rakesh@tuxera.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      82654b6b
  17. 26 May, 2017 2 commits
  18. 22 May, 2017 1 commit
    • Ming Lei's avatar
      nvme: avoid to use blk_mq_abort_requeue_list() · 986f75c8
      Ming Lei authored
      NVMe may add request into requeue list simply and not kick off the
      requeue if hw queues are stopped. Then blk_mq_abort_requeue_list()
      is called in both nvme_kill_queues() and nvme_ns_remove() for
      dealing with this issue.
      
      Unfortunately blk_mq_abort_requeue_list() is absolutely a
      race maker, for example, one request may be requeued during
      the aborting. So this patch just calls blk_mq_kick_requeue_list() in
      nvme_kill_queues() to handle this issue like what nvme_start_queues()
      does. Now all requests in requeue list when queues are stopped will be
      handled by blk_mq_kick_requeue_list() when queues are restarted, either
      in nvme_start_queues() or in nvme_kill_queues().
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarZhang Yi <yizhan@redhat.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      986f75c8