1. 30 Nov, 2017 4 commits
    • Nicholas Bellinger's avatar
      target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK · fe132502
      Nicholas Bellinger authored
      commit 1c21a480 upstream.
      
      This patch fixes bug where early se_cmd exceptions that occur
      before backend execution can result in use-after-free if/when
      a subsequent ABORT_TASK occurs for the same tag.
      
      Since an early se_cmd exception will have had se_cmd added to
      se_session->sess_cmd_list via target_get_sess_cmd(), it will
      not have CMD_T_COMPLETE set by the usual target_complete_cmd()
      backend completion path.
      
      This causes a subsequent ABORT_TASK + __target_check_io_state()
      to signal ABORT_TASK should proceed.  As core_tmr_abort_task()
      executes, it will bring the outstanding se_cmd->cmd_kref count
      down to zero releasing se_cmd, after se_cmd has already been
      queued with error status into fabric driver response path code.
      
      To address this bug, introduce a CMD_T_PRE_EXECUTE bit that is
      set at target_get_sess_cmd() time, and cleared immediately before
      backend driver dispatch in target_execute_cmd() once CMD_T_ACTIVE
      is set.
      
      Then, check CMD_T_PRE_EXECUTE within __target_check_io_state() to
      determine when an early exception has occured, and avoid aborting
      this se_cmd since it will have already been queued into fabric
      driver response path code.
      Reported-by: default avatarDonald White <dew@datera.io>
      Cc: Donald White <dew@datera.io>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe132502
    • Nicholas Bellinger's avatar
      target: Fix quiese during transport_write_pending_qf endless loop · 3c68944b
      Nicholas Bellinger authored
      commit 9574a497 upstream.
      
      This patch fixes a potential end-less loop during QUEUE_FULL,
      where cmd->se_tfo->write_pending() callback fails repeatedly
      but __transport_wait_for_tasks() has already been invoked to
      quiese the outstanding se_cmd descriptor.
      
      To address this bug, this patch adds a CMD_T_STOP|CMD_T_ABORTED
      check within transport_write_pending_qf() and invokes the
      existing se_cmd->t_transport_stop_comp to signal quiese
      completion back to __transport_wait_for_tasks().
      
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
      Cc: Michael Cyr <mikecyr@linux.vnet.ibm.com>
      Cc: Potnuri Bharat Teja <bharat@chelsio.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c68944b
    • Nicholas Bellinger's avatar
      target: Fix caw_sem leak in transport_generic_request_failure · 16870b7b
      Nicholas Bellinger authored
      commit fd2f928b upstream.
      
      With the recent addition of transport_check_aborted_status() within
      transport_generic_request_failure() to avoid sending a SCSI status
      exception after CMD_T_ABORTED w/ TAS=1 has occured, it introduced
      a COMPARE_AND_WRITE early failure regression.
      
      Namely when COMPARE_AND_WRITE fails and se_device->caw_sem has
      been taken by sbc_compare_and_write(), if the new check for
      transport_check_aborted_status() returns true and exits,
      cmd->transport_complete_callback() -> compare_and_write_post()
      is skipped never releasing se_device->caw_sem.
      
      This regression was originally introduced by:
      
        commit e3b88ee9
        Author: Bart Van Assche <bart.vanassche@sandisk.com>
        Date:   Tue Feb 14 16:25:45 2017 -0800
      
            target: Fix handling of aborted failed commands
      
      To address this bug, move the transport_check_aborted_status()
      call after transport_complete_task_attr() and
      cmd->transport_complete_callback().
      
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Bart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16870b7b
    • Nicholas Bellinger's avatar
      target: Fix QUEUE_FULL + SCSI task attribute handling · 66abe4fc
      Nicholas Bellinger authored
      commit 1c79df1f upstream.
      
      This patch fixes a bug during QUEUE_FULL where transport_complete_qf()
      calls transport_complete_task_attr() after it's already been invoked
      by target_complete_ok_work() or transport_generic_request_failure()
      during initial completion, preceeding QUEUE_FULL.
      
      This will result in se_device->simple_cmds, se_device->dev_cur_ordered_id
      and/or se_device->dev_ordered_sync being updated multiple times for
      a single se_cmd.
      
      To address this bug, clear SCF_TASK_ATTR_SET after the first call
      to transport_complete_task_attr(), and avoid updating SCSI task
      attribute related counters for any subsequent calls.
      
      Also, when a se_cmd is deferred due to ordered tags and executed
      via target_restart_delayed_cmds(), set CMD_T_SENT before execution
      matching what target_execute_cmd() does.
      
      Cc: Michael Cyr <mikecyr@linux.vnet.ibm.com>
      Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66abe4fc
  2. 10 Aug, 2017 1 commit
    • Nicholas Bellinger's avatar
      target: Fix node_acl demo-mode + uncached dynamic shutdown regression · 6f48655f
      Nicholas Bellinger authored
      This patch fixes a generate_node_acls = 1 + cache_dynamic_acls = 0
      regression, that was introduced by
      
        commit 01d4d673
        Author: Nicholas Bellinger <nab@linux-iscsi.org>
        Date:   Wed Dec 7 12:55:54 2016 -0800
      
      which originally had the proper list_del_init() usage, but was
      dropped during list review as it was thought unnecessary by HCH.
      
      However, list_del_init() usage is required during the special
      generate_node_acls = 1 + cache_dynamic_acls = 0 case when
      transport_free_session() does a list_del(&se_nacl->acl_list),
      followed by target_complete_nacl() doing the same thing.
      
      This was manifesting as a general protection fault as reported
      by Justin:
      
      kernel: general protection fault: 0000 [#1] SMP
      kernel: Modules linked in:
      kernel: CPU: 0 PID: 11047 Comm: iscsi_ttx Not tainted 4.13.0-rc2.x86_64.1+ #20
      kernel: Hardware name: Intel Corporation S5500BC/S5500BC, BIOS S5500.86B.01.00.0064.050520141428 05/05/2014
      kernel: task: ffff88026939e800 task.stack: ffffc90007884000
      kernel: RIP: 0010:target_put_nacl+0x49/0xb0
      kernel: RSP: 0018:ffffc90007887d70 EFLAGS: 00010246
      kernel: RAX: dead000000000200 RBX: ffff8802556ca000 RCX: 0000000000000000
      kernel: RDX: dead000000000100 RSI: 0000000000000246 RDI: ffff8802556ce028
      kernel: RBP: ffffc90007887d88 R08: 0000000000000001 R09: 0000000000000000
      kernel: R10: ffffc90007887df8 R11: ffffea0009986900 R12: ffff8802556ce020
      kernel: R13: ffff8802556ce028 R14: ffff8802556ce028 R15: ffffffff88d85540
      kernel: FS:  0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      kernel: CR2: 00007fffe36f5f94 CR3: 0000000009209000 CR4: 00000000003406f0
      kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      kernel: Call Trace:
      kernel:  transport_free_session+0x67/0x140
      kernel:  transport_deregister_session+0x7a/0xc0
      kernel:  iscsit_close_session+0x92/0x210
      kernel:  iscsit_close_connection+0x5f9/0x840
      kernel:  iscsit_take_action_for_connection_exit+0xfe/0x110
      kernel:  iscsi_target_tx_thread+0x140/0x1e0
      kernel:  ? wait_woken+0x90/0x90
      kernel:  kthread+0x124/0x160
      kernel:  ? iscsit_thread_get_cpumask+0x90/0x90
      kernel:  ? kthread_create_on_node+0x40/0x40
      kernel:  ret_from_fork+0x22/0x30
      kernel: Code: 00 48 89 fb 4c 8b a7 48 01 00 00 74 68 4d 8d 6c 24 08 4c
      89 ef e8 e8 28 43 00 48 8b 93 20 04 00 00 48 8b 83 28 04 00 00 4c 89
      ef <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 20
      kernel: RIP: target_put_nacl+0x49/0xb0 RSP: ffffc90007887d70
      kernel: ---[ end trace f12821adbfd46fed ]---
      
      To address this, go ahead and use proper list_del_list() for all
      cases of se_nacl->acl_list deletion.
      Reported-by: default avatarJustin Maggard <jmaggard01@gmail.com>
      Tested-by: default avatarJustin Maggard <jmaggard01@gmail.com>
      Cc: Justin Maggard <jmaggard01@gmail.com>
      Cc: stable@vger.kernel.org # 4.1+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      6f48655f
  3. 12 Jul, 2017 1 commit
    • Michal Hocko's avatar
      mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic · dcda9b04
      Michal Hocko authored
      __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
      the page allocator.  This has been true but only for allocations
      requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
      ignored for smaller sizes.  This is a bit unfortunate because there is
      no way to express the same semantic for those requests and they are
      considered too important to fail so they might end up looping in the
      page allocator for ever, similarly to GFP_NOFAIL requests.
      
      Now that the whole tree has been cleaned up and accidental or misled
      usage of __GFP_REPEAT flag has been removed for !costly requests we can
      give the original flag a better name and more importantly a more useful
      semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
      that the allocator would try really hard but there is no promise of a
      success.  This will work independent of the order and overrides the
      default allocator behavior.  Page allocator users have several levels of
      guarantee vs.  cost options (take GFP_KERNEL as an example)
      
       - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
         attempt to free memory at all. The most light weight mode which even
         doesn't kick the background reclaim. Should be used carefully because
         it might deplete the memory and the next user might hit the more
         aggressive reclaim
      
       - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
         allocation without any attempt to free memory from the current
         context but can wake kswapd to reclaim memory if the zone is below
         the low watermark. Can be used from either atomic contexts or when
         the request is a performance optimization and there is another
         fallback for a slow path.
      
       - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
         non sleeping allocation with an expensive fallback so it can access
         some portion of memory reserves. Usually used from interrupt/bh
         context with an expensive slow path fallback.
      
       - GFP_KERNEL - both background and direct reclaim are allowed and the
         _default_ page allocator behavior is used. That means that !costly
         allocation requests are basically nofail but there is no guarantee of
         that behavior so failures have to be checked properly by callers
         (e.g. OOM killer victim is allowed to fail currently).
      
       - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
         and all allocation requests fail early rather than cause disruptive
         reclaim (one round of reclaim in this implementation). The OOM killer
         is not invoked.
      
       - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
         behavior and all allocation requests try really hard. The request
         will fail if the reclaim cannot make any progress. The OOM killer
         won't be triggered.
      
       - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
         and all allocation requests will loop endlessly until they succeed.
         This might be really dangerous especially for larger orders.
      
      Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
      because they already had their semantic.  No new users are added.
      __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
      there is no progress and we have already passed the OOM point.
      
      This means that all the reclaim opportunities have been exhausted except
      the most disruptive one (the OOM killer) and a user defined fallback
      behavior is more sensible than keep retrying in the page allocator.
      
      [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
      [mhocko@suse.com: semantic fix]
        Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
      [mhocko@kernel.org: address other thing spotted by Vlastimil]
        Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dcda9b04
  4. 07 Jul, 2017 9 commits
    • Mike Christie's avatar
      target: fix SAM_STAT_BUSY/TASK_SET_FULL handling · 402242c9
      Mike Christie authored
      If the scsi status was not SAM_STAT_GOOD or there was no transport
      sense, we would ignore the scsi status and do a generic not ready
      LUN communication failure check condition failure.
      
      The problem is that LUN COMM failure is treated as a hard error
      sometimes and will cause apps to get IO errors instead of the OS's SCSI
      layer retrying. For example, the tcmu daemon will return SAM_STAT_QUEUE_FULL
      when memory runs low and can still make progress but wants the initiator to
      reduce the work load. Windows will fail this error directly the app
      instead of retrying.
      
      This patch is based on Nick's "target/iblock: Use -EAGAIN/-ENOMEM to
      propigate SAM BUSY/TASK_SET_FULL" patch here:
      
      http://comments.gmane.org/gmane.linux.scsi.target.devel/11301
      
      but instead of only setting SAM_STAT_GOOD, SAM_STAT_TASK_SET_FULL
      and SAM_STAT_BUSY as success, it sets all non check condition status
      as success so they are passed back to the initiator, so passthrough
      type backends can return all SCSI status codes. Since only passthrough
      uses this, I was not sure if we wanted to add checks for non-passthrough
      and specific codes.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      402242c9
    • Mike Christie's avatar
      target: remove transport_complete · 1a444175
      Mike Christie authored
      transport_complete is no longer used, so drop the code.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      1a444175
    • Mike Christie's avatar
      target: add helper to copy sense to se_cmd buffer · c6d66aba
      Mike Christie authored
      This adds a helper to copy sense from backend module buffer to
      the se_cmd's sense buffer.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      c6d66aba
    • Mike Christie's avatar
      target: do not require a transport_complete for SCF_TRANSPORT_TASK_SENSE · 9fe36984
      Mike Christie authored
      tcmu needs to pass raw sense to target_complete_cmd, but a a
      transport_complete callout is akward to implement for it.
      
      This moves the check for SCF_TRANSPORT_TASK_SENSE so any backend
      can pass sense.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      9fe36984
    • Jiang Yi's avatar
      target: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce · 1d6ef276
      Jiang Yi authored
      This patch addresses a COMPARE_AND_WRITE se_device->caw_sem leak,
      that would be triggered during normal se_cmd shutdown or abort
      via __transport_wait_for_tasks().
      
      This would occur because target_complete_cmd() would catch this
      early and do complete_all(&cmd->t_transport_stop_comp), but since
      target_complete_ok_work() or target_complete_failure_work() are
      never called to invoke se_cmd->transport_complete_callback(),
      the COMPARE_AND_WRITE specific callbacks never release caw_sem.
      
      To address this special case, go ahead and release caw_sem
      directly from target_complete_cmd().
      
      (Remove '&& success' from check, to release caw_sem regardless
       of scsi_status - nab)
      Signed-off-by: default avatarJiang Yi <jiangyilism@gmail.com>
      Cc: <stable@vger.kernel.org> # 3.14+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      1d6ef276
    • Bart Van Assche's avatar
      target: Introduce a function that shows the command state · c00e6220
      Bart Van Assche authored
      Introduce target_show_cmd() and use it where appropriate. If
      transport_wait_for_tasks() takes too long, make it show the
      state of the command it is waiting for.
      
      (Add missing brackets around multi-line conditions - nab)
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: David Disseldorp <ddiss@suse.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      c00e6220
    • Bart Van Assche's avatar
      target: Fix transport_init_se_cmd() · f2b72d6a
      Bart Van Assche authored
      Avoid that aborting a command before it has been submitted onto
      a workqueue triggers the following warning:
      
      INFO: trying to register non-static key.
      the code is fine but needs lockdep annotation.
      turning off the locking correctness validator.
      CPU: 3 PID: 46 Comm: kworker/u8:1 Not tainted 4.12.0-rc2-dbg+ #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
      Workqueue: tmr-iblock target_tmr_work [target_core_mod]
      Call Trace:
       dump_stack+0x86/0xcf
       register_lock_class+0xe8/0x570
       __lock_acquire+0xa1/0x11d0
       lock_acquire+0x59/0x80
       flush_work+0x42/0x2b0
       __cancel_work_timer+0x10c/0x180
       cancel_work_sync+0xb/0x10
       core_tmr_lun_reset+0x352/0x740 [target_core_mod]
       target_tmr_work+0xd6/0x130 [target_core_mod]
       process_one_work+0x1ca/0x3f0
       worker_thread+0x49/0x3b0
       kthread+0x109/0x140
       ret_from_fork+0x31/0x40
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: David Disseldorp <ddiss@suse.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      f2b72d6a
    • Nicholas Bellinger's avatar
      target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support for ABORT_TASK · 5465e7d3
      Nicholas Bellinger authored
      This patch introduces support in target_submit_tmr() for locating a
      unpacked_lun from an existing se_cmd->tag during ABORT_TASK.
      
      When TARGET_SCF_LOOKUP_LUN_FROM_TAG is set, target_submit_tmr()
      will do the extra lookup via target_lookup_lun_from_tag() and
      subsequently invoke transport_lookup_tmr_lun() so a proper
      percpu se_lun->lun_ref is taken before workqueue dispatch into
      se_device->tmr_wq happens.
      
      Aside from the extra target_lookup_lun_from_tag(), the existing
      code-path remains unchanged.
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Reviewed-by: default avatarQuinn Tran <quinn.tran@cavium.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      5465e7d3
    • Nicholas Bellinger's avatar
      target: Add support for TMR percpu reference counting · eeb64d23
      Nicholas Bellinger authored
      This patch introduces TMR percpu reference counting using
      se_lun->lun_ref in transport_lookup_tmr_lun(), following
      how existing non TMR per se_lun reference counting works
      within transport_lookup_cmd_lun().
      
      It also adds explicit transport_lun_remove_cmd() calls to
      drop the reference in the three tmr related locations that
      invoke transport_cmd_check_stop_to_fabric();
      
         - target_tmr_work() during normal ->queue_tm_rsp()
         - target_complete_tmr_failure() during error ->queue_tm_rsp()
         - transport_generic_handle_tmr() during early failure
      
      Also, note the exception paths in transport_generic_free_cmd()
      and transport_cmd_finish_abort() already check SCF_SE_LUN_CMD,
      and will invoke transport_lun_remove_cmd() when necessary.
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Reviewed-by: default avatarQuinn Tran <quinn.tran@cavium.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      eeb64d23
  5. 09 Jun, 2017 1 commit
    • Nicholas Bellinger's avatar
      target: Fix kref->refcount underflow in transport_cmd_finish_abort · 73d4e580
      Nicholas Bellinger authored
      This patch fixes a se_cmd->cmd_kref underflow during CMD_T_ABORTED
      when a fabric driver drops it's second reference from below the
      target_core_tmr.c based callers of transport_cmd_finish_abort().
      
      Recently with the conversion of kref to refcount_t, this bug was
      manifesting itself as:
      
      [705519.601034] refcount_t: underflow; use-after-free.
      [705519.604034] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 20116.512 msecs
      [705539.719111] ------------[ cut here ]------------
      [705539.719117] WARNING: CPU: 3 PID: 26510 at lib/refcount.c:184 refcount_sub_and_test+0x33/0x51
      
      Since the original kref atomic_t based kref_put() didn't check for
      underflow and only invoked the final callback when zero was reached,
      this bug did not manifest in practice since all se_cmd memory is
      using preallocated tags.
      
      To address this, go ahead and propigate the existing return from
      transport_put_cmd() up via transport_cmd_finish_abort(), and
      change transport_cmd_finish_abort() + core_tmr_handle_tas_abort()
      callers to only do their local target_put_sess_cmd() if necessary.
      Reported-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Tested-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: stable@vger.kernel.org # 3.14+
      Tested-by: default avatarGary Guo <ghg@datera.io>
      Tested-by: default avatarChu Yuan Lin <cyl@datera.io>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      73d4e580
  6. 16 May, 2017 1 commit
    • Nicholas Bellinger's avatar
      target: Re-add check to reject control WRITEs with overflow data · 4ff83daa
      Nicholas Bellinger authored
      During v4.3 when the overflow/underflow check was relaxed by
      commit c72c5250:
      
        commit c72c5250
        Author: Roland Dreier <roland@purestorage.com>
        Date:   Wed Jul 22 15:08:18 2015 -0700
      
             target: allow underflow/overflow for PR OUT etc. commands
      
      to allow underflow/overflow for Windows compliance + FCP, a
      consequence was to allow control CDBs to process overflow
      data for iscsi-target with immediate data as well.
      
      As per Roland's original change, continue to allow underflow
      cases for control CDBs to make Windows compliance + FCP happy,
      but until overflow for control CDBs is supported tree-wide,
      explicitly reject all control WRITEs with overflow following
      pre v4.3.y logic.
      Reported-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Roland Dreier <roland@purestorage.com>
      Cc: <stable@vger.kernel.org> # v4.3+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      4ff83daa
  7. 02 May, 2017 1 commit
  8. 31 Mar, 2017 1 commit
  9. 18 Mar, 2017 1 commit
  10. 27 Feb, 2017 1 commit
    • Nicholas Bellinger's avatar
      target: Fix NULL dereference during LUN lookup + active I/O shutdown · bd4e2d29
      Nicholas Bellinger authored
      When transport_clear_lun_ref() is shutting down a se_lun via
      configfs with new I/O in-flight, it's possible to trigger a
      NULL pointer dereference in transport_lookup_cmd_lun() due
      to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD
      checking before incrementing lun->lun_ref.count after
      lun->lun_ref has switched to atomic_t mode.
      
      This results in a NULL pointer dereference as LUN shutdown
      code in core_tpg_remove_lun() continues running after the
      existing ->release() -> core_tpg_lun_ref_release() callback
      completes, and clears the RCU protected se_lun->lun_se_dev
      pointer.
      
      During the OOPs, the state of lun->lun_ref in the process
      which triggered the NULL pointer dereference looks like
      the following on v4.1.y stable code:
      
      struct se_lun {
        lun_link_magic = 4294932337,
        lun_status = TRANSPORT_LUN_STATUS_FREE,
      
        .....
      
        lun_se_dev = 0x0,
        lun_sep = 0x0,
      
        .....
      
        lun_ref = {
          count = {
            counter = 1
          },
          percpu_count_ptr = 3,
          release = 0xffffffffa02fa1e0 <core_tpg_lun_ref_release>,
          confirm_switch = 0x0,
          force_atomic = false,
          rcu = {
            next = 0xffff88154fa1a5d0,
            func = 0xffffffff8137c4c0 <percpu_ref_switch_to_atomic_rcu>
          }
        }
      }
      
      To address this bug, use percpu_ref_tryget_live() to ensure
      once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref
      has switched to atomic_t, all new I/Os will fail to obtain
      a new lun->lun_ref reference.
      
      Also use an explicit percpu_ref_kill_and_confirm() callback
      to block on ->lun_ref_comp to allow the first stage and
      associated RCU grace period to complete, and then block on
      ->lun_ref_shutdown waiting for the final percpu_ref_put()
      to drop the last reference via transport_lun_remove_cmd()
      before continuing with core_tpg_remove_lun() shutdown.
      Reported-by: default avatarRob Millner <rlm@daterainc.com>
      Tested-by: default avatarRob Millner <rlm@daterainc.com>
      Cc: Rob Millner <rlm@daterainc.com>
      Tested-by: default avatarVaibhav Tandon <vst@datera.io>
      Cc: Vaibhav Tandon <vst@datera.io>
      Tested-by: default avatarBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org> # v3.14+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      bd4e2d29
  11. 20 Feb, 2017 1 commit
  12. 09 Feb, 2017 7 commits
  13. 08 Feb, 2017 2 commits
    • Nicholas Bellinger's avatar
      target: Fix multi-session dynamic se_node_acl double free OOPs · 01d4d673
      Nicholas Bellinger authored
      This patch addresses a long-standing bug with multi-session
      (eg: iscsi-target + iser-target) se_node_acl dynamic free
      withini transport_deregister_session().
      
      This bug is caused when a storage endpoint is configured with
      demo-mode (generate_node_acls = 1 + cache_dynamic_acls = 1)
      initiators, and initiator login creates a new dynamic node acl
      and attaches two sessions to it.
      
      After that, demo-mode for the storage instance is disabled via
      configfs (generate_node_acls = 0 + cache_dynamic_acls = 0) and
      the existing dynamic acl is never converted to an explicit ACL.
      
      The end result is dynamic acl resources are released twice when
      the sessions are shutdown in transport_deregister_session().
      
      If the storage instance is not changed to disable demo-mode,
      or the dynamic acl is converted to an explict ACL, or there
      is only a single session associated with the dynamic ACL,
      the bug is not triggered.
      
      To address this big, move the release of dynamic se_node_acl
      memory into target_complete_nacl() so it's only freed once
      when se_node_acl->acl_kref reaches zero.
      
      (Drop unnecessary list_del_init usage - HCH)
      Reported-by: default avatarRob Millner <rlm@daterainc.com>
      Tested-by: default avatarRob Millner <rlm@daterainc.com>
      Cc: Rob Millner <rlm@daterainc.com>
      Cc: stable@vger.kernel.org # 4.1+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      01d4d673
    • Nicholas Bellinger's avatar
      target: Fix early transport_generic_handle_tmr abort scenario · c54eeffb
      Nicholas Bellinger authored
      This patch fixes a bug where incoming task management requests
      can be explicitly aborted during an active LUN_RESET, but who's
      struct work_struct are canceled in-flight before execution.
      
      This occurs when core_tmr_drain_tmr_list() invokes cancel_work_sync()
      for the incoming se_tmr_req->task_cmd->work, resulting in cmd->work
      for target_tmr_work() never getting invoked and the aborted TMR
      waiting indefinately within transport_wait_for_tasks().
      
      To address this case, perform a CMD_T_ABORTED check early in
      transport_generic_handle_tmr(), and invoke the normal path via
      transport_cmd_check_stop_to_fabric() to complete any TMR kthreads
      blocked waiting for CMD_T_STOP in transport_wait_for_tasks().
      
      Also, move the TRANSPORT_ISTATE_PROCESSING assignment earlier
      into transport_generic_handle_tmr() so the existing check in
      core_tmr_drain_tmr_list() avoids attempting abort the incoming
      se_tmr_req->task_cmd->work if it has already been queued into
      se_device->tmr_wq.
      Reported-by: default avatarRob Millner <rlm@daterainc.com>
      Tested-by: default avatarRob Millner <rlm@daterainc.com>
      Cc: Rob Millner <rlm@daterainc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      c54eeffb
  14. 10 Jan, 2017 1 commit
  15. 21 Oct, 2016 1 commit
  16. 20 Oct, 2016 3 commits
    • Nicholas Bellinger's avatar
      Revert "target: Fix residual overflow handling in target_complete_cmd_with_length" · 61f36166
      Nicholas Bellinger authored
      This reverts commit c1ccbfe0.
      
      Reverting this patch, as it incorrectly assumes the additional length
      for INQUIRY in target_complete_cmd_with_length() is SCSI allocation
      length, which breaks existing user-space code when SCSI allocation
      length is smaller than additional length.
      
        root@scsi-mq:~# sg_inq --len=4 -vvvv /dev/sdb
        found bsg_major=253
        open /dev/sdb with flags=0x800
            inquiry cdb: 12 00 00 00 04 00
              duration=0 ms
            inquiry: pass-through requested 4 bytes (data-in) but got -28 bytes
            inquiry: pass-through can't get negative bytes, say it got none
            inquiry: got too few bytes (0)
        INQUIRY resid (32) should never exceed requested len=4
            inquiry: failed requesting 4 byte response: Malformed response to
                     SCSI command [resid=32]
      
      AFAICT the original change was not to address a specific host issue,
      so go ahead and revert to original logic for now.
      
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Sumit Rai <sumitrai96@gmail.com>
      Cc: stable@vger.kernel.org # 4.8+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      61f36166
    • Nicholas Bellinger's avatar
      target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE · 449a1378
      Nicholas Bellinger authored
      This patch addresses a bug where EXTENDED_COPY across multiple LUNs
      results in a CHECK_CONDITION when the source + destination are not
      located on the same physical node.
      
      ESX Host environments expect sense COPY_ABORTED w/ COPY TARGET DEVICE
      NOT REACHABLE to be returned when this occurs, in order to signal
      fallback to local copy method.
      
      As described in section 6.3.3 of spc4r22:
      
        "If it is not possible to complete processing of a segment because the
         copy manager is unable to establish communications with a copy target
         device, because the copy target device does not respond to INQUIRY,
         or because the data returned in response to INQUIRY indicates
         an unsupported logical unit, then the EXTENDED COPY command shall be
         terminated with CHECK CONDITION status, with the sense key set to
         COPY ABORTED, and the additional sense code set to COPY TARGET DEVICE
         NOT REACHABLE."
      
      Tested on v4.1.y with ESX v5.5u2+ with BlockCopy across multiple nodes.
      Reported-by: default avatarNixon Vincent <nixon.vincent@calsoftinc.com>
      Tested-by: default avatarNixon Vincent <nixon.vincent@calsoftinc.com>
      Cc: Nixon Vincent <nixon.vincent@calsoftinc.com>
      Tested-by: default avatarDinesh Israni <ddi@datera.io>
      Signed-off-by: default avatarDinesh Israni <ddi@datera.io>
      Cc: Dinesh Israni <ddi@datera.io>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      449a1378
    • Nicholas Bellinger's avatar
      target: Re-add missing SCF_ACK_KREF assignment in v4.1.y · 527268df
      Nicholas Bellinger authored
      This patch fixes a regression in >= v4.1.y code where the original
      SCF_ACK_KREF assignment in target_get_sess_cmd() was dropped upstream
      in commit 054922bb, but the series for addressing TMR ABORT_TASK +
      LUN_RESET with fabric session reinstatement in commit febe562c still
      depends on this code in transport_cmd_finish_abort().
      
      The regression manifests itself as a se_cmd->cmd_kref +1 leak, where
      ABORT_TASK + LUN_RESET can hang indefinately for a specific I_T session
      for drivers using SCF_ACK_KREF, resulting in hung kthreads.
      
      This patch has been verified with v4.1.y code.
      Reported-by: default avatarVaibhav Tandon <vst@datera.io>
      Tested-by: default avatarVaibhav Tandon <vst@datera.io>
      Cc: Vaibhav Tandon <vst@datera.io>
      Cc: stable@vger.kernel.org # 4.1+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      527268df
  17. 24 Jul, 2016 1 commit
  18. 20 Jul, 2016 3 commits
    • Nicholas Bellinger's avatar
      target: Fix ordered task CHECK_CONDITION early exception handling · 410c29df
      Nicholas Bellinger authored
      If a Simple command is sent with a failure, target_setup_cmd_from_cdb
      returns with TCM_UNSUPPORTED_SCSI_OPCODE or TCM_INVALID_CDB_FIELD.
      
      So in the cases where target_setup_cmd_from_cdb returns an error, we
      never get far enough to call target_execute_cmd to increment simple_cmds.
      Since simple_cmds isn't incremented, the result of the failure from
      target_setup_cmd_from_cdb causes transport_generic_request_failure to
      decrement simple_cmds, due to call to transport_complete_task_attr.
      
      With this dev->simple_cmds or dev->dev_ordered_sync is now -1, not 0.
      So when a subsequent command with an Ordered Task is sent, it causes
      a hang, since dev->simple_cmds is at -1.
      Tested-by: default avatarBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Signed-off-by: default avatarBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Tested-by: default avatarMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      410c29df
    • Nicholas Bellinger's avatar
      target: Fix ordered task target_setup_cmd_from_cdb exception hang · dff0ca9e
      Nicholas Bellinger authored
      If a command with a Simple task attribute is failed due to a Unit
      Attention, then a subsequent command with an Ordered task attribute
      will hang forever.  The reason for this is that the Unit Attention
      status is checked for in target_setup_cmd_from_cdb, before the call
      to target_execute_cmd, which calls target_handle_task_attr, which
      in turn increments dev->simple_cmds.
      
      However, transport_generic_request_failure still calls
      transport_complete_task_attr, which will decrement dev->simple_cmds.
      In this case, simple_cmds is now -1.  So when a command with the
      Ordered task attribute is sent, target_handle_task_attr sees that
      dev->simple_cmds is not 0, so it decides it can't execute the
      command until all the (nonexistent) Simple commands have completed.
      Reported-by: default avatarMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Tested-by: default avatarMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Reported-by: default avatarBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Tested-by: default avatarBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      dff0ca9e
    • Nicholas Bellinger's avatar
      target: Fix race between iscsi-target connection shutdown + ABORT_TASK · 064cdd2d
      Nicholas Bellinger authored
      This patch fixes a race in iscsit_release_commands_from_conn() ->
      iscsit_free_cmd() -> transport_generic_free_cmd() + wait_for_tasks=1,
      where CMD_T_FABRIC_STOP could end up being set after the final
      kref_put() is called from core_tmr_abort_task() context.
      
      This results in transport_generic_free_cmd() blocking indefinately
      on se_cmd->cmd_wait_comp, because the target_release_cmd_kref()
      check for CMD_T_FABRIC_STOP returns false.
      
      To address this bug, make iscsit_release_commands_from_conn()
      do list_splice and set CMD_T_FABRIC_STOP early while holding
      iscsi_conn->cmd_lock.  Also make iscsit_aborted_task() only
      remove iscsi_cmd_t if CMD_T_FABRIC_STOP has not already been
      set.
      
      Finally in target_release_cmd_kref(), only honor fabric_stop
      if CMD_T_ABORTED has been set.
      
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Quinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org # 3.14+
      Tested-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      064cdd2d