Skip to content
  • Neil Horman's avatar
    af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET · fe1ca8b8
    Neil Horman authored
    [ Upstream commit 89ed5b51
    
     ]
    
    When an application is run that:
    a) Sets its scheduler to be SCHED_FIFO
    and
    b) Opens a memory mapped AF_PACKET socket, and sends frames with the
    MSG_DONTWAIT flag cleared, its possible for the application to hang
    forever in the kernel.  This occurs because when waiting, the code in
    tpacket_snd calls schedule, which under normal circumstances allows
    other tasks to run, including ksoftirqd, which in some cases is
    responsible for freeing the transmitted skb (which in AF_PACKET calls a
    destructor that flips the status bit of the transmitted frame back to
    available, allowing the transmitting task to complete).
    
    However, when the calling application is SCHED_FIFO, its priority is
    such that the schedule call immediately places the task back on the cpu,
    preventing ksoftirqd from freeing the skb, which in turn prevents the
    transmitting task from detecting that the transmission is complete.
    
    We can fix this by converting the schedule call to a completion
    mechanism.  By using a completion queue, we force the calling task, when
    it detects there are no more frames to send, to schedule itself off the
    cpu until such time as the last transmitted skb is freed, allowing
    forward progress to be made.
    
    Tested by myself and the reporter, with good results
    
    Change Notes:
    
    V1->V2:
    	Enhance the sleep logic to support being interruptible and
    allowing for honoring to SK_SNDTIMEO (Willem de Bruijn)
    
    V2->V3:
    	Rearrage the point at which we wait for the completion queue, to
    avoid needing to check for ph/skb being null at the end of the loop.
    Also move the complete call to the skb destructor to avoid needing to
    modify __packet_set_status.  Also gate calling complete on
    packet_read_pending returning zero to avoid multiple calls to complete.
    (Willem de Bruijn)
    
    	Move timeo computation within loop, to re-fetch the socket
    timeout since we also use the timeo variable to record the return code
    from the wait_for_complete call (Neil Horman)
    
    V3->V4:
    	Willem has requested that the control flow be restored to the
    previous state.  Doing so lets us eliminate the need for the
    po->wait_on_complete flag variable, and lets us get rid of the
    packet_next_frame function, but introduces another complexity.
    Specifically, but using the packet pending count, we can, if an
    applications calls sendmsg multiple times with MSG_DONTWAIT set, each
    set of transmitted frames, when complete, will cause
    tpacket_destruct_skb to issue a complete call, for which there will
    never be a wait_on_completion call.  This imbalance will lead to any
    future call to wait_for_completion here to return early, when the frames
    they sent may not have completed.  To correct this, we need to re-init
    the completion queue on every call to tpacket_snd before we enter the
    loop so as to ensure we wait properly for the frames we send in this
    iteration.
    
    	Change the timeout and interrupted gotos to out_put rather than
    out_status so that we don't try to free a non-existant skb
    	Clean up some extra newlines (Willem de Bruijn)
    
    Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
    Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
    Reported-by: default avatarMatteo Croce <mcroce@redhat.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    fe1ca8b8