Skip to content
  • Thomas Gleixner's avatar
    watchdog/harclockup/perf: Revert a33d4484 ("watchdog/hardlockup/perf:... · 9c388a5e
    Thomas Gleixner authored
    watchdog/harclockup/perf: Revert a33d4484 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
    
    Guenter reported a crash in the watchdog/perf code, which is caused by
    cleanup() and enable() running concurrently. The reason for this is:
    
    The watchdog functions are serialized via the watchdog_mutex and cpu
    hotplug locking, but the enable of the perf based watchdog happens in
    context of the unpark callback of the smpboot thread. But that unpark
    function is not synchronous inside the locking. The unparking of the thread
    just wakes it up and leaves so there is no guarantee when the thread is
    executing.
    
    If it starts running _before_ the cleanup happened then it will create a
    event and overwrite the dead event pointer. The new event is then cleaned
    up because the event is marked dead.
    
        lock(watchdog_mutex);
        lockup_detector_reconfigure();
            cpus_read_lock();
    	stop();
    	   park()
    	update();
    	start();
    	   unpark()
    	cpus_read_unlock();		thread runs()
    					  overwrite dead event ptr
    	cleanup();
    	  free new event, which is active inside perf....
        unlock(watchdog_mutex);
    
    The park side is safe as that actually waits for the thread to reach
    parked state.
    
    Commit a33d4484 removed the protection against this kind of scenario
    under the stupid assumption that the hotplug serialization and the
    watchdog_mutex cover everything. 
    
    Bring it back.
    
    Reverts: a33d4484
    
     ("watchdog/hardlockup/perf: Simplify deferred event destroy")
    Reported-and-tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
    Signed-off-by: default avatarThomas Feels-stupid Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Don Zickus <dzickus@redhat.com>
    Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos
    
    
    9c388a5e