Skip to content
  • Ville Syrjälä's avatar
    ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock · ef2b11c0
    Ville Syrjälä authored
    commit ff165679 upstream.
    
    acpi_remove_pm_notifier() ends up calling flush_workqueue() while
    holding acpi_pm_notifier_lock, and that same lock is taken by
    by the work via acpi_pm_notify_handler(). This can deadlock.
    
    To fix the problem let's split the single lock into two: one to
    protect the dev->wakeup between the work vs. add/remove, and
    another one to handle notifier installation vs. removal.
    
    After commit a1d14934 "workqueue/lockdep: 'Fix' flush_work()
    annotation" I was able to kill the machine (Intel Braswell)
    very easily with 'powertop --auto-tune', runtime suspending i915,
    and trying to wake it up via the USB keyboard. The cases when
    it didn't die are presumably explained by lockdep getting disabled
    by something else (cpu hotplug locking issues usually).
    
    Fortunately I still got a lockdep report over netconsole
    (trickling in very slowly), even though the machine was
    otherwise practically dead:
    
    [  112.179806] ======================================================
    [  114.670858] WARNING: possible circular locking dependency detected
    [  117.155663] 4.13.0-rc6-bsw-bisect-00169-ga1d14934 #119 Not tainted
    [  119.658101] ------------------------------------------------------
    [  121.310242] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command.
    [  121.313294] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
    [  121.313346] xhci_hcd 0000:00:14.0: HC died; cleaning up
    [  121.313485] usb 1-6: USB disconnect, device number 3
    [  121.313501] usb 1-6.2: USB disconnect, device number 4
    [  134.747383] kworker/0:2/47 is trying to acquire lock:
    [  137.220790]  (acpi_pm_notifier_lock){+.+.}, at: [<ffffffff813cafdf>] acpi_pm_notify_handler+0x2f/0x80
    [  139.721524]
    [  139.721524] but task is already holding lock:
    [  144.672922]  ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
    [  147.184450]
    [  147.184450] which lock already depends on the new lock.
    [  147.184450]
    [  154.604711]
    [  154.604711] the existing dependency chain (in reverse order) is:
    [  159.447888]
    [  159.447888] -> #2 ((&dpc->work)){+.+.}:
    [  164.183486]        __lock_acquire+0x1255/0x13f0
    [  166.504313]        lock_acquire+0xb5/0x210
    [  168.778973]        process_one_work+0x1b9/0x720
    [  171.030316]        worker_thread+0x4c/0x440
    [  173.257184]        kthread+0x154/0x190
    [  175.456143]        ret_from_fork+0x27/0x40
    [  177.624348]
    [  177.624348] -> #1 ("kacpi_notify"){+.+.}:
    [  181.850351]        __lock_acquire+0x1255/0x13f0
    [  183.941695]        lock_acquire+0xb5/0x210
    [  186.046115]        flush_workqueue+0xdd/0x510
    [  190.408153]        acpi_os_wait_events_complete+0x31/0x40
    [  192.625303]        acpi_remove_notify_handler+0x133/0x188
    [  194.820829]        acpi_remove_pm_notifier+0x56/0x90
    [  196.989068]        acpi_dev_pm_detach+0x5f/0xa0
    [  199.145866]        dev_pm_domain_detach+0x27/0x30
    [  201.285614]        i2c_device_probe+0x100/0x210
    [  203.411118]        driver_probe_device+0x23e/0x310
    [  205.522425]        __driver_attach+0xa3/0xb0
    [  207.634268]        bus_for_each_dev+0x69/0xa0
    [  209.714797]        driver_attach+0x1e/0x20
    [  211.778258]        bus_add_driver+0x1bc/0x230
    [  213.837162]        driver_register+0x60/0xe0
    [  215.868162]        i2c_register_driver+0x42/0x70
    [  217.869551]        0xffffffffa0172017
    [  219.863009]        do_one_initcall+0x45/0x170
    [  221.843863]        do_init_module+0x5f/0x204
    [  223.817915]        load_module+0x225b/0x29b0
    [  225.757234]        SyS_finit_module+0xc6/0xd0
    [  227.661851]        do_syscall_64+0x5c/0x120
    [  229.536819]        return_from_SYSCALL_64+0x0/0x7a
    [  231.392444]
    [  231.392444] -> #0 (acpi_pm_notifier_lock){+.+.}:
    [  235.124914]        check_prev_add+0x44e/0x8a0
    [  237.024795]        __lock_acquire+0x1255/0x13f0
    [  238.937351]        lock_acquire+0xb5/0x210
    [  240.840799]        __mutex_lock+0x75/0x940
    [  242.709517]        mutex_lock_nested+0x1c/0x20
    [  244.551478]        acpi_pm_notify_handler+0x2f/0x80
    [  246.382052]        acpi_ev_notify_dispatch+0x44/0x5c
    [  248.194412]        acpi_os_execute_deferred+0x14/0x30
    [  250.003925]        process_one_work+0x1ec/0x720
    [  251.803191]        worker_thread+0x4c/0x440
    [  253.605307]        kthread+0x154/0x190
    [  255.387498]        ret_from_fork+0x27/0x40
    [  257.153175]
    [  257.153175] other info that might help us debug this:
    [  257.153175]
    [  262.324392] Chain exists of:
    [  262.324392]   acpi_pm_notifier_lock --> "kacpi_notify" --> (&dpc->work)
    [  262.324392]
    [  267.391997]  Possible unsafe locking scenario:
    [  267.391997]
    [  270.758262]        CPU0                    CPU1
    [  272.431713]        ----                    ----
    [  274.060756]   lock((&dpc->work));
    [  275.646532]                                lock("kacpi_notify");
    [  277.260772]                                lock((&dpc->work));
    [  278.839146]   lock(acpi_pm_notifier_lock);
    [  280.391902]
    [  280.391902]  *** DEADLOCK ***
    [  280.391902]
    [  284.986385] 2 locks held by kworker/0:2/47:
    [  286.524895]  #0:  ("kacpi_notify"){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
    [  288.112927]  #1:  ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720
    [  289.727725]
    
    Fixes: c072530f
    
     (ACPI / PM: Revork the handling of ACPI device wakeup notifications)
    Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    ef2b11c0