Skip to content
  • Vishal Verma's avatar
    nfit: do an ARS scrub on hitting a latent media error · 6839a6d9
    Vishal Verma authored
    
    
    When a latent (unknown to 'badblocks') error is encountered, it will
    trigger a machine check exception. On a system with machine check
    recovery, this will only SIGBUS the process(es) which had the bad page
    mapped (as opposed to a kernel panic on platforms without machine
    check recovery features). In the former case, we want to trigger a full
    rescan of that nvdimm bus. This will allow any additional, new errors
    to be captured in the block devices' badblocks lists, and offending
    operations on them can be trapped early, avoiding machine checks.
    
    This is done by registering a callback function with the
    x86_mce_decoder_chain and calling the new ars_rescan functionality with
    the address in the mce notificatiion.
    
    Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
    Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    6839a6d9