Skip to content
  • NeilBrown's avatar
    sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable. · 743c69e7
    NeilBrown authored
    
    
    The networking layer does not reliably report the distinction between
    a non-block write failing because:
     1/ the queue is too full already and
     2/ a memory allocation attempt failed.
    
    The distinction is important because in the first case it is
    appropriate to retry as soon as the socket reports that it is
    writable, and in the second case a small delay is required as the
    socket will most likely report as writable but kmalloc could still
    fail.
    
    sk_stream_wait_memory() exhibits this distinction nicely, setting
    'vm_wait' if a small wait is needed.  However in the non-blocking case
    it always returns -EAGAIN no matter the cause of the failure.  This
    -EAGAIN call get all the way to sunrpc.
    
    The sunrpc layer expects EAGAIN to indicate the first cause, and
    ENOBUFS to indicate the second.  Various documentation suggests that
    this is not unreasonable, but does not guarantee the desired error
    codes.
    
    The result of getting -EAGAIN when -ENOBUFS is expected is that the
    send is tried again in a tight loop and soft lockups are reported.
    
    so: add tests after calls to xs_sendpages() to translate -EAGAIN into
    -ENOBUFS if the socket is writable.  This cannot happen inside
    xs_sendpages() as the test for "is socket writable" is different
    between TCP and UDP.
    
    With this change, the tight loop retrying xs_sendpages() becomes a
    loop which only retries every 250ms, and so will not trigger a
    soft-lockup warning.
    
    It is possible that the write did fail because the queue was too full
    and by the time xs_sendpages() completed, the queue was writable
    again.  In this case an extra 250ms delay is inserted that isn't
    really needed.  This circumstance suggests a degree of congestion so a
    delay is not necessarily a bad thing, and it can only cause a single
    250ms delay, not a series of them.
    
    Signed-off-by: default avatarNeilBrown <neilb@suse.com>
    Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
    743c69e7