• Tejun Heo's avatar
    workqueue: implement lockup detector · 82607adc
    Tejun Heo authored
    Workqueue stalls can happen from a variety of usage bugs such as
    missing WQ_MEM_RECLAIM flag or concurrency managed work item
    indefinitely staying RUNNING.  These stalls can be extremely difficult
    to hunt down because the usual warning mechanisms can't detect
    workqueue stalls and the internal state is pretty opaque.
    
    To alleviate the situation, this patch implements workqueue lockup
    detector.  It periodically monitors all worker_pools periodically and,
    if any pool failed to make forward progress longer than the threshold
    duration, triggers warning and dumps workqueue state as follows.
    
     BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
     Showing busy workqueues and worker pools:
     workqueue events: flags=0x0
       pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
         pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
     workqueue events_power_efficient: flags=0x80
       pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
         pending: check_lifetime, neigh_periodic_work
     workqueue cgroup_pidlist_destroy: flags=0x0
       pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
         pending: cgroup_pidlist_destroy_work_fn
     ...
    
    The detection mechanism is controller through kernel parameter
    workqueue.watchdog_thresh and can be updated at runtime through the
    sysfs module parameter file.
    
    v2: Decoupled from softlockup control knobs.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Acked-by: default avatarDon Zickus <dzickus@redhat.com>
    Cc: Ulrich Obergfell <uobergfe@redhat.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Chris Mason <clm@fb.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    82607adc
workqueue.h 20.3 KB