Skip to content
  • BingJing Chang's avatar
    md/raid5: fix data corruption of replacements after originals dropped · 52ba7136
    BingJing Chang authored
    [ Upstream commit d63e2fc804c46e50eee825c5d3a7228e07048b47 ]
    
    During raid5 replacement, the stripes can be marked with R5_NeedReplace
    flag. Data can be read from being-replaced devices and written to
    replacing spares without reading all other devices. (It's 'replace'
    mode. s.replacing = 1) If a being-replaced device is dropped, the
    replacement progress will be interrupted and resumed with pure recovery
    mode. However, existing stripes before being interrupted cannot read
    from the dropped device anymore. It prints lots of WARN_ON messages.
    And it results in data corruption because existing stripes write
    problematic data into its replacement device and update the progress.
    
    \# Erase disks (1MB + 2GB)
    dd if=/dev/zero of=/dev/sda bs=1MB count=2049
    dd if=/dev/zero of=/dev/sdb bs=1MB count=2049
    dd if=/dev/zero of=/dev/sdc bs=1MB count=2049
    dd if=/dev/zero of=/dev/sdd bs=1MB count=2049
    mdadm -C /dev/md0 -amd -R -l5 -n3 -x0 /dev/sd[abc] -z 2097152
    \# Ensure array stores non-zero data
    dd if=/root/data_4GB.iso of=/dev/md0 bs=1MB
    \# Start replacement
    mdadm /dev/md0 -a /dev/sdd
    mdadm /dev/md0 --replace /dev/sda
    
    Then, Hot-plug out /dev/sda during recovery, and wait for recovery done.
    echo check > /sys/block/md0/md/sync_action
    cat /sys/block/md0/md/mismatch_cnt # it will be greater than 0.
    
    Soon after you hot-plug out /dev/sda, you will see many WARN_ON
    messages. The replacement recovery will be interrupted shortly. After
    the recovery finishes, it will result in data corruption.
    
    Actually, it's just an unhandled case of replacement. In commit
    <f94c0b66> (md/raid5: fix interaction of 'replace' and 'recovery'.),
    if a NeedReplace device is not UPTODATE then that is an error, the
    commit just simply print WARN_ON but also mark these corrupted stripes
    with R5_WantReplace. (it means it's ready for writes.)
    
    To fix this case, we can leverage 'sync and replace' mode mentioned in
    commit <9a3e1101
    
    > (md/raid5: detect and handle replacements during
    recovery.). We can add logics to detect and use 'sync and replace' mode
    for these stripes.
    
    Reported-by: default avatarAlex Chen <alexchen@synology.com>
    Reviewed-by: default avatarAlex Wu <alexwu@synology.com>
    Reviewed-by: default avatarChung-Chiang Cheng <cccheng@synology.com>
    Signed-off-by: default avatarBingJing Chang <bingjingc@synology.com>
    Signed-off-by: default avatarShaohua Li <shli@fb.com>
    Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    52ba7136