• Michael J. Ruhl's avatar
    IB/hfi1: Fix destroy_qp hang after a link down · 6edd85a7
    Michael J. Ruhl authored
    commit b4a4957d3d1c328b733fce783b7264996f866ad2 upstream.
    rvt_destroy_qp() cannot complete until all in process packets have
    been released from the underlying hardware.  If a link down event
    occurs, an application can hang with a kernel stack similar to:
    cat /proc/<app PID>/stack
     quiesce_qp+0x178/0x250 [hfi1]
     rvt_reset_qp+0x23d/0x400 [rdmavt]
     rvt_destroy_qp+0x69/0x210 [rdmavt]
     ib_destroy_qp+0xba/0x1c0 [ib_core]
     nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
     nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
     nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
     nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
    quiesce_qp() waits until all outstanding packets have been freed.
    This wait should be momentary.  During a link down event, the cleanup
    handling does not ensure that all packets caught by the link down are
    flushed properly.
    This is caused by the fact that the freeze path and the link down
    event is handled the same.  This is not correct.  The freeze path
    waits until the HFI is unfrozen and then restarts PIO.  A link down
    is not a freeze event.  The link down path cannot restart the PIO
    until link is restored.  If the PIO path is restarted before the link
    comes up, the application (QP) using the PIO path will hang (until
    link is restored).
    Fix by separating the linkdown path from the freeze path and use the
    link down path for link down events.
    Close a race condition sc_disable() by acquiring both the progress
    and release locks.
    Close a race condition in sc_stop() by moving the setting of the flag
    bits under the alloc lock.
    Cc: <stable@vger.kernel.org> # 4.9.x+
    Fixes: 77241056 ("IB/hfi1: add driver files")
    Reviewed-by: 's avatarMike Marciniszyn <mike.marciniszyn@intel.com>
    Signed-off-by: 's avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
    Signed-off-by: 's avatarDennis Dalessandro <dennis.dalessandro@intel.com>
    Signed-off-by: 's avatarJason Gunthorpe <jgg@mellanox.com>
    Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
