Skip to content
  • Tariq Saeed's avatar
    ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and... · 27bf6305
    Tariq Saeed authored
    
    ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and idletimeout closes conn
    
    Orabug: 18639535
    
    Two node cluster and both nodes hold a lock at PR level and both want to
    convert to EX at the same time.  Master node 1 has sent BAST and then
    closes the connection due to idletime out.  Node 0 receives BAST, sends
    unlock req with cancel flag but gets error -ENOTCONN.  The problem is
    this error is ignored in dlm_send_remote_unlock_request() on the
    **incorrect** assumption that the master is dead.  See NOTE in comment
    why it returns DLM_NORMAL.  Upon getting DLM_NORMAL, node 0 proceeds to
    sends convert (without cancel flg) which fails with -ENOTCONN.  waits 5
    sec and resends.
    
    This time gets DLM_IVLOCKID from the master since lock not found in
    grant, it had been moved to converting queue in response to conv PR->EX
    req.  No way out.
    
    Node 1 (master)				Node 0
    ==============				======
    
      lock mode PR				PR
    
      convert PR -> EX
      mv grant -> convert and que BAST
      ...
                         <-------- convert PR -> EX
      convert que looks like this: ((node 1, PR -> EX) (node 0, PR -> EX))
      ...
                            BAST (want PR -> NL)
                         ------------------>
      ...
      idle timout, conn closed
                                    ...
                                    In response to BAST,
                                    sends unlock with cancel convert flag
                                    gets -ENOTCONN. Ignores and
                                    sends remote convert request
                                    gets -ENOTCONN, waits 5 Sec, retries
      ...
      reconnects
                       <----------------- convert req goes through on next try
      does not find lock on grant que
                       status DLM_IVLOCKID
                       ------------------>
      ...
    
    No way out.  Fix is to keep retrying unlock with cancel flag until it
    succeeds or the master dies.
    
    Signed-off-by: default avatarTariq Saeed <tariq.x.saeed@oracle.com>
    Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
    Cc: Joel Becker <jlbec@evilplan.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    27bf6305