• Linus Torvalds's avatar
    Mark HI and TASKLET softirq synchronous · 3eb86ff3
    Linus Torvalds authored
    commit 3c53776e29f81719efcf8f7a6e30cdf753bee94d upstream.
    
    Way back in 4.9, we committed 4cd13c21 ("softirq: Let ksoftirqd do
    its job"), and ever since we've had small nagging issues with it.  For
    example, we've had:
    
      1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred")
      8d5755b3 ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
      217f6974 ("net: busy-poll: allow preemption in sk_busy_loop()")
    
    all of which worked around some of the effects of that commit.
    
    The DVB people have also complained that the commit causes excessive USB
    URB latencies, which seems to be due to the USB code using tasklets to
    schedule USB traffic.  This seems to be an issue mainly when already
    living on the edge, but waiting for ksoftirqd to handle it really does
    seem to cause excessive latencies.
    
    Now Hanna Hawa reports that this issue isn't just limited to USB URB and
    DVB, but also causes timeout problems for the Marvell SoC team:
    
     "I'm facing kernel panic issue while running raid 5 on sata disks
      connected to Macchiatobin (Marvell community board with Armada-8040
      SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
      and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
      uses a tasklet to clean the done descriptors from the queue"
    
    The latency problem causes a panic:
    
      mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
      Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
    
    We've discussed simply just reverting the original commit entirely, and
    also much more involved solutions (with per-softirq threads etc).  This
    patch is intentionally stupid and fairly limited, because the issue
    still remains, and the other solutions either got sidetracked or had
    other issues.
    
    We should probably also consider the timer softirqs to be synchronous
    and not be delayed to ksoftirqd (since they were the issue with the
    earlier watchdog problems), but that should be done as a separate patch.
    This does only the tasklet cases.
    Reported-and-tested-by: 's avatarHanna Hawa <hannah@marvell.com>
    Reported-and-tested-by: 's avatarJosef Griebichler <griebichler.josef@gmx.at>
    Reported-by: 's avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
    Cc: Alan Stern <stern@rowland.harvard.edu>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    3eb86ff3