Skip to content
  • Ilya Dryomov's avatar
    libceph: reschedule a tick in finish_hunting() · 6044c692
    Ilya Dryomov authored
    commit 7b4c443d139f1d2b5570da475f7a9cbcef86740c upstream.
    
    If we go without an established session for a while, backoff delay will
    climb to 30 seconds.  The keepalive timeout is also 30 seconds, so it's
    pretty easily hit after a prolonged hunting for a monitor: we don't get
    a chance to send out a keepalive in time, which means we never get back
    a keepalive ack in time, cutting an established session and attempting
    to connect to a different monitor every 30 seconds:
    
      [Sun Apr 1 23:37:05 2018] libceph: mon0 10.80.20.99:6789 session established
      [Sun Apr 1 23:37:36 2018] libceph: mon0 10.80.20.99:6789 session lost, hunting for new mon
      [Sun Apr 1 23:37:36 2018] libceph: mon2 10.80.20.103:6789 session established
      [Sun Apr 1 23:38:07 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
      [Sun Apr 1 23:38:07 2018] libceph: mon1 10.80.20.100:6789 session established
      [Sun Apr 1 23:38:37 2018] libceph: mon1 10.80.20.100:6789 session lost, hunting for new mon
      [Sun Apr 1 23:38:37 2018] libceph: mon2 10.80.20.103:6789 session established
      [Sun Apr 1 23:39:08 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
    
    The regular keepalive interval is 10 seconds.  After ->hunting is
    cleared in finish_hunting(), call __schedule_delayed() to ensure we
    send out a keepalive after 10 seconds.
    
    Cc: stable@vger.kernel.org # 4.7+
    Link: http://tracker.ceph.com/issues/23537
    
    
    Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
    Reviewed-by: default avatarJason Dillaman <dillaman@redhat.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    6044c692