Skip to content
  • Chen Yu's avatar
    PCI/PM: Restore the status of PCI devices across hibernation · e60514bd
    Chen Yu authored
    
    
    Currently we saw a lot of "No irq handler" errors during hibernation, which
    caused the system hang finally:
    
      ata4.00: qc timeout (cmd 0xec)
      ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
      ata4.00: revalidation failed (errno=-5)
      ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
      do_IRQ: 31.151 No irq handler for vector
    
    According to above logs, there is an interrupt triggered and it is
    dispatched to CPU31 with a vector number 151, but there is no handler for
    it, thus this IRQ will not get acked and will cause an IRQ flood which
    kills the system.  To be more specific, the 31.151 is an interrupt from the
    AHCI host controller.
    
    After some investigation, the reason why this issue is triggered is because
    the thaw_noirq() function does not restore the MSI/MSI-X settings across
    hibernation.
    
    The scenario is illustrated below:
    
      1. Before hibernation, IRQ 34 is the handler for the AHCI device, which
         is bound to CPU31.
    
      2. Hibernation starts, the AHCI device is put into low power state.
    
      3. All the nonboot CPUs are put offline, so IRQ 34 has to be migrated to
         the last alive one - CPU0.
    
      4. After the snapshot has been created, all the nonboot CPUs are brought
         up again; IRQ 34 remains bound to CPU0.
    
      5. AHCI devices are put into D0.
    
      6. The snapshot is written to the disk.
    
    The issue is triggered in step 6.  The AHCI interrupt should be delivered
    to CPU0, however it is delivered to the original CPU31 instead, which
    causes the "No irq handler" issue.
    
    Ying Huang has provided a clue that, in step 3 it is possible that writing
    to the register might not take effect as the PCI devices have been
    suspended.
    
    In step 3, the IRQ 34 affinity should be modified from CPU31 to CPU0, but
    in fact it is not.  In __pci_write_msi_msg(), if the device is already in
    low power state, the low level MSI message entry will not be updated but
    cached.  During the device restore process after a normal suspend/resume,
    pci_restore_msi_state() writes the cached MSI back to the hardware.
    
    But this is not the case for hibernation.  pci_restore_msi_state() is not
    currently called in pci_pm_thaw_noirq(), although pci_save_state() has
    saved the necessary PCI cached information in pci_pm_freeze_noirq().
    
    Restore the PCI status for the device during hibernation.  Otherwise the
    status might be lost across hibernation (for example, settings for MSI,
    MSI-X, ATS, ACS, IOV, etc.), which might cause problems during hibernation.
    
    Suggested-by: default avatarYing Huang <ying.huang@intel.com>
    Suggested-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
    [bhelgaas: changelog]
    Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    Cc: stable@vger.kernel.org
    Cc: Len Brown <len.brown@intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Rui Zhang <rui.zhang@intel.com>
    Cc: Ying Huang <ying.huang@intel.com>
    e60514bd