1. 05 Oct, 2017 1 commit
    • Jonathan Brassow's avatar
      dm raid: fix incorrect status output at the end of a "recover" process · 41dcf197
      Jonathan Brassow authored
      There are three important fields that indicate the overall health and
      status of an array: dev_health, sync_ratio, and sync_action.  They tell
      us the condition of the devices in the array, and the degree to which
      the array is synchronized.
      This commit fixes a condition that is reported incorrectly.  When a member
      of the array is being rebuilt or a new device is added, the "recover"
      process is used to synchronize it with the rest of the array.  When the
      process is complete, but the sync thread hasn't yet been reaped, it is
      possible for the state of MD to be:
       curr_resync_completed = <max dev size> (but not MaxSector)
       and all rdevs to be In_sync.
      This causes the 'array_in_sync' output parameter that is passed to
      rs_get_progress() to be computed incorrectly and reported as 'false' --
      or not in-sync.  This in turn causes the dev_health status characters to
      be reported as all 'a', rather than the proper 'A'.
      This can cause erroneous output for several seconds at a time when tools
      will want to be checking the condition due to events that are raised at
      the end of a sync process.  Fix this by properly calculating the
      'array_in_sync' return parameter in rs_get_progress().
      Also, remove an unnecessary intermediate 'recovery_cp' variable in
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
  2. 25 Jul, 2017 1 commit
  3. 19 Jun, 2017 1 commit
    • Damien Le Moal's avatar
      dm zoned: drive-managed zoned block device target · 3b1a94c8
      Damien Le Moal authored
      The dm-zoned device mapper target provides transparent write access
      to zoned block devices (ZBC and ZAC compliant block devices).
      dm-zoned hides to the device user (a file system or an application
      doing raw block device accesses) any constraint imposed on write
      requests by the device, equivalent to a drive-managed zoned block
      device model.
      Write requests are processed using a combination of on-disk buffering
      using the device conventional zones and direct in-place processing for
      requests aligned to a zone sequential write pointer position.
      A background reclaim process implemented using dm_kcopyd_copy ensures
      that conventional zones are always available for executing unaligned
      write requests. The reclaim process overhead is minimized by managing
      buffer zones in a least-recently-written order and first targeting the
      oldest buffer zones. Doing so, blocks under regular write access (such
      as metadata blocks of a file system) remain stored in conventional
      zones, resulting in no apparent overhead.
      dm-zoned implementation focus on simplicity and on minimizing overhead
      (CPU, memory and storage overhead). For a 14TB host-managed disk with
      256 MB zones, dm-zoned memory usage per disk instance is at most about
      3 MB and as little as 5 zones will be used internally for storing metadata
      and performing buffer zone reclaim operations. This is achieved using
      zone level indirection rather than a full block indirection system for
      managing block movement between zones.
      dm-zoned primary target is host-managed zoned block devices but it can
      also be used with host-aware device models to mitigate potential
      device-side performance degradation due to excessive random writing.
      Zoned block devices can be formatted and checked for use with the dm-zoned
      target using the dmzadm utility available at:
      https://github.com/hgst/dm-zoned-toolsSigned-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      [Mike Snitzer partly refactored Damien's original work to cleanup the code]
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
  4. 24 Apr, 2017 2 commits
    • Mikulas Patocka's avatar
      dm integrity: support larger block sizes · 9d609f85
      Mikulas Patocka authored
      The DM integrity block size can now be 512, 1k, 2k or 4k.  Using larger
      blocks reduces metadata handling overhead.  The block size can be
      configured at table load time using the "block_size:<value>" option;
      where <value> is expressed in bytes (defult is still 512 bytes).
      It is safe to use larger block sizes with DM integrity, because the
      DM integrity journal makes sure that the whole block is updated
      atomically even if the underlying device doesn't support atomic writes
      of that size (e.g. 4k block ontop of a 512b device).
      Depends-on: 2859323e ("block: fix blk_integrity_register to use template's interval_exp if not 0")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Mikulas Patocka's avatar
      dm integrity: various small changes and cleanups · 56b67a4f
      Mikulas Patocka authored
      Some coding style changes.
      Fix a bug that the array test_tag has insufficient size if the digest
      size of internal has is bigger than the tag size.
      The function __fls is undefined for zero argument, this patch fixes
      undefined behavior if the user sets zero interleave_sectors.
      Fix the limit of optional arguments to 8.
      Don't allocate crypt_data on the stack to avoid a BUG with debug kernel.
      Rename all optional argument names to have underscores rather than
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
  5. 27 Mar, 2017 2 commits
    • Heinz Mauelshagen's avatar
      dm raid: add raid4/5/6 journal write-back support via journal_mode option · 6e53636f
      Heinz Mauelshagen authored
      Commit 63c32ed4 ("dm raid: add raid4/5/6 journaling support") added
      journal support to close the raid4/5/6 "write hole" -- in terms of
      writethrough caching.
      Introduce a "journal_mode" feature and use the new
      r5c_journal_mode_set() API to add support for switching the journal
      device's cache mode between write-through (the current default) and
      NOTE: If the journal device is not layered on resilent storage and it
      fails, write-through mode will cause the "write hole" to reoccur.  But
      if the journal fails while in write-back mode it will cause data loss
      for any dirty cache entries unless resilent storage is used for the
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Heinz Mauelshagen's avatar
      dm raid: fix table line argument order in status · 4464e36e
      Heinz Mauelshagen authored
      Commit 3a1c1ef2 ("dm raid: enhance status interface and fixup
      takeover/raid0") added new table line arguments and introduced an
      ordering flaw.  The sequence of the raid10_copies and raid10_format
      raid parameters got reversed which causes lvm2 userspace to fail by
      falsely assuming a changed table line.
      Sequence those 2 parameters as before so that old lvm2 can function
      properly with new kernels by adjusting the table line output as
      documented in Documentation/device-mapper/dm-raid.txt.
      Also, add missing version 1.10.1 highlight to the documention.
      Fixes: 3a1c1ef2 ("dm raid: enhance status interface and fixup takeover/raid0")
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
  6. 24 Mar, 2017 5 commits
    • Mikulas Patocka's avatar
      dm integrity: add recovery mode · c2bcb2b7
      Mikulas Patocka authored
      In recovery mode, we don't:
      - replay the journal
      - check checksums
      - allow writes to the device
      This mode can be used as a last resort for data recovery.  The
      motivation for recovery mode is that when there is a single error in the
      journal, the user should not lose access to the whole device.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Milan Broz's avatar
      dm crypt: optionally support larger encryption sector size · 8f0009a2
      Milan Broz authored
      Add  optional "sector_size"  parameter that specifies encryption sector
      size (atomic unit of block device encryption).
      Parameter can be in range 512 - 4096 bytes and must be power of two.
      For compatibility reasons, the maximal IO must fit into the page limit,
      so the limit is set to the minimal page size possible (4096 bytes).
      NOTE: this device cannot yet be handled by cryptsetup if this parameter
      is set.
      IV for the sector is calculated from the 512 bytes sector offset unless
      the iv_large_sectors option is used.
      Test script using dmsetup:
        DEV_SIZE=$(blockdev --getsz $DEV)
        # dmsetup create test_crypt --table "0 $DEV_SIZE crypt aes-xts-plain64 $KEY 0 $DEV 0 1 sector_size:$BLOCK_SIZE"
        # dmsetup table --showkeys test_crypt
      Signed-off-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Milan Broz's avatar
      dm crypt: introduce new format of cipher with "capi:" prefix · 33d2f09f
      Milan Broz authored
      For the new authenticated encryption we have to support generic composed
      modes (combination of encryption algorithm and authenticator) because
      this is how the kernel crypto API accesses such algorithms.
      To simplify the interface, we accept an algorithm directly in crypto API
      format.  The new format is recognised by the "capi:" prefix.  The
      dmcrypt internal IV specification is the same as for the old format.
      The crypto API cipher specifications format is:
           capi:cbc(aes)-essiv:sha256 (equivalent to old aes-cbc-essiv:sha256)
           capi:xts(aes)-plain64      (equivalent to old aes-xts-plain64)
      Examples of authenticated modes:
      Authenticated modes can only be configured using the new cipher format.
      Note that this format allows user to specify arbitrary combinations that
      can be insecure. (Policy decision is done in cryptsetup userspace.)
      Authenticated encryption algorithms can be of two types, either native
      modes (like GCM) that performs both encryption and authentication
      internally, or composed modes where user can compose AEAD with separate
      specification of encryption algorithm and authenticator.
      For composed mode with HMAC (length-preserving encryption mode like an
      XTS and HMAC as an authenticator) we have to calculate HMAC digest size
      (the separate authentication key is the same size as the HMAC digest).
      Introduce crypt_ctr_auth_cipher() to parse the crypto API string to get
      HMAC algorithm and retrieve digest size from it.
      Also, for HMAC composed mode we need to parse the crypto API string to
      get the cipher mode nested in the specification.  For native AEAD mode
      (like GCM), we can use crypto_tfm_alg_name() API to get the cipher
      Because the HMAC composed mode is not processed the same as the native
      AEAD mode, the CRYPT_MODE_INTEGRITY_HMAC flag is no longer needed and
      "hmac" specification for the table integrity argument is removed.
      Signed-off-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Milan Broz's avatar
      dm crypt: add cryptographic data integrity protection (authenticated encryption) · ef43aa38
      Milan Broz authored
      Allow the use of per-sector metadata, provided by the dm-integrity
      module, for integrity protection and persistently stored per-sector
      Initialization Vector (IV).  The underlying device must support the
      "DM-DIF-EXT-TAG" dm-integrity profile.
      The per-bio integrity metadata is allocated by dm-crypt for every bio.
      Example of low-level mapping table for various types of use:
       # Additional HMAC with CBC-ESSIV, key is concatenated encryption key + HMAC key
       dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 32 J 0"
       dmsetup create y --table "0 $SIZE_INT crypt aes-cbc-essiv:sha256 \
       11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
       00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
       0 /dev/mapper/x 0 1 integrity:32:hmac(sha256)"
       # AEAD (Authenticated Encryption with Additional Data) - GCM with random IVs
       # GCM in kernel uses 96bits IV and we store 128bits auth tag (so 28 bytes metadata space)
       dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 28 J 0"
       dmsetup create y --table "0 $SIZE_INT crypt aes-gcm-random \
       11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
       0 /dev/mapper/x 0 1 integrity:28:aead"
       # Random IV only for XTS mode (no integrity protection but provides atomic random sector change)
       dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 16 J 0"
       dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
       11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
       0 /dev/mapper/x 0 1 integrity:16:none"
       # Random IV with XTS + HMAC integrity protection
       dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 48 J 0"
       dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
       11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
       00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
       0 /dev/mapper/x 0 1 integrity:48:hmac(sha256)"
      Both AEAD and HMAC protection authenticates not only data but also
      sector metadata.
      HMAC protection is implemented through autenc wrapper (so it is
      processed the same way as an authenticated mode).
      In HMAC mode there are two keys (concatenated in dm-crypt mapping
      table).  First is the encryption key and the second is the key for
      authentication (HMAC).  (It is userspace decision if these keys are
      independent or somehow derived.)
      The sector request for AEAD/HMAC authenticated encryption looks like this:
       |----- AAD -------|------ DATA -------|-- AUTH TAG --|
       | (authenticated) | (auth+encryption) |              |
       | sector_LE |  IV |  sector in/out    |  tag in/out  |
      For writes, the integrity fields are calculated during AEAD encryption
      of every sector and stored in bio integrity fields and sent to
      underlying dm-integrity target for storage.
      For reads, the integrity metadata is verified during AEAD decryption of
      every sector (they are filled in by dm-integrity, but the integrity
      fields are pre-allocated in dm-crypt).
      There is also an experimental support in cryptsetup utility for more
      friendly configuration (part of LUKS2 format).
      Because the integrity fields are not valid on initial creation, the
      device must be "formatted".  This can be done by direct-io writes to the
      device (e.g. dd in direct-io mode).  For now, there is available trivial
      tool to do this, see: https://github.com/mbroz/dm_int_toolsSigned-off-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarOndrej Mosnacek <omosnacek@gmail.com>
      Signed-off-by: default avatarVashek Matyas <matyas@fi.muni.cz>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Mikulas Patocka's avatar
      dm: add integrity target · 7eada909
      Mikulas Patocka authored
      The dm-integrity target emulates a block device that has additional
      per-sector tags that can be used for storing integrity information.
      A general problem with storing integrity tags with every sector is that
      writing the sector and the integrity tag must be atomic - i.e. in case of
      crash, either both sector and integrity tag or none of them is written.
      To guarantee write atomicity the dm-integrity target uses a journal. It
      writes sector data and integrity tags into a journal, commits the journal
      and then copies the data and integrity tags to their respective location.
      The dm-integrity target can be used with the dm-crypt target - in this
      situation the dm-crypt target creates the integrity data and passes them
      to the dm-integrity target via bio_integrity_payload attached to the bio.
      In this mode, the dm-crypt and dm-integrity targets provide authenticated
      disk encryption - if the attacker modifies the encrypted device, an I/O
      error is returned instead of random data.
      The dm-integrity target can also be used as a standalone target, in this
      mode it calculates and verifies the integrity tag internally. In this
      mode, the dm-integrity target can be used to detect silent data
      corruption on the disk or in the I/O path.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
  7. 19 Mar, 2017 1 commit
  8. 28 Feb, 2017 1 commit
  9. 16 Feb, 2017 1 commit
  10. 25 Jan, 2017 2 commits
    • Heinz Mauelshagen's avatar
      dm raid: add raid4/5/6 journaling support · 63c32ed4
      Heinz Mauelshagen authored
      Add md raid4/5/6 journaling support (upstream commit bac624f3 started
      the implementation) which closes the write hole (i.e. non-atomic updates
      to stripes) using a dedicated journal device.
      raid4/5/6 stripes hold N data payloads per stripe plus one parity raid4/5
      or two raid6 P/Q syndrome payloads in an in-memory stripe cache.
      Parity or P/Q syndromes used to recover any data payloads in case of a disk
      failure are calculated from the N data payloads and need to be updated on the
      different component devices of the raid device.  Those are non-atomic,
      persistent updates.  Hence a crash can cause failure to update all stripe
      payloads persistently and thus cause data loss during stripe recovery.
      This problem gets addressed by writing whole stripe cache entries (together with
      journal metadata) to a persistent journal entry on a dedicated journal device.
      Only if that journal entry is written successfully, the stripe cache entry is
      updated on the component devices of the raid device (i.e. writethrough type).
      In case of a crash, the entry can be recovered from the journal and be written
      again thus ensuring consistent stripe payload suitable to data recovery.
      Future dependencies:
      once writeback caching being worked on to compensate for the throughput
      implictions involved with writethrough overhead is supported with journaling
      in upstream, an additional patch based on this one will support it in dm-raid.
      Journal resilience related remarks:
      because stripes are recovered from the journal in case of a crash, the
      journal device better be resilient.  Resilience becomes mandatory with
      future writeback support, because loosing the working set in the log
      means data loss as oposed to writethrough, were the loss of the
      journal device 'only' reintroduces the write hole.
      Fix comment on data offsets in parse_dev_params() and initialize
      new_data_offset as well.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Heinz Mauelshagen's avatar
      dm raid: fix transient device failure processing · c63ede3b
      Heinz Mauelshagen authored
      This fix addresses the following 3 failure scenarios:
      1) If a (transiently) inaccessible metadata device is being passed into the
      constructor (e.g. a device tuple '254:4 254:5'), it is processed as if
      '- -' was given.  This erroneously results in a status table line containing
      '- -', which mistakenly differs from what has been passed in.  As a result,
      userspace libdevmapper puts the device tuple seperate from the RAID device
      thus not processing the dependencies properly.
      2) False health status char 'A' instead of 'D' is emitted on the status
      status info line for the meta/data device tuple in this metadata device
      failure case.
      3) If the metadata device is accessible when passed into the constructor
      but the data device (partially) isn't, that leg may be set faulty by the
      raid personality on access to the (partially) unavailable leg.  Restore
      tried in a second raid device resume on such failed leg (status char 'D')
      fails after the (partial) leg returned.
      Fixes for aforementioned failure scenarios:
      - don't release passed in devices in the constructor thus allowing the
        status table line to e.g. contain '254:4 254:5' rather than '- -'
      - emit device status char 'D' rather than 'A' for the device tuple
        with the failed metadata device on the status info line
      - when attempting to restore faulty devices in a second resume, allow the
        device hot remove function to succeed by setting the device to not in-sync
      In case userspace intentionally passes '- -' into the constructor to avoid that
      device tuple (e.g. to split off a raid1 leg temporarily for later re-addition),
      the status table line will correctly show '- -' and the status info line will
      provide a '-' device health character for the non-defined device tuple.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
  11. 14 Dec, 2016 1 commit
    • Michael Witten's avatar
      Documentation/device-mapper: s/getsize/getsz/ · 95f21c5c
      Michael Witten authored
      According to `man blockdev':
               Print device size (32-bit!) in sectors.
               Deprecated in favor of the --getsz option.
               Get size in 512-byte sectors.
      Hence, occurrences of `--getsize' should be replaced with `--getsz',
      which this commit has achieved as follows:
        $ cd "$repo"
        $ git grep -l -e --getsz
        $ cd Documentation/device-mapper
        $ sed -i s/getsize/getsz/g *
      Signed-off-by: default avatarMichael Witten <mfwitten@gmail.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
  12. 08 Dec, 2016 2 commits
  13. 21 Nov, 2016 1 commit
  14. 21 Oct, 2016 1 commit
  15. 17 Oct, 2016 1 commit
  16. 07 Aug, 2016 1 commit
    • Jens Axboe's avatar
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe authored
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      No intended functional changes in this commit.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
  17. 14 Jun, 2016 1 commit
  18. 07 Jun, 2016 1 commit
  19. 05 May, 2016 2 commits
  20. 10 Mar, 2016 1 commit
  21. 10 Dec, 2015 2 commits
    • Sami Tolvanen's avatar
      dm verity: add ignore_zero_blocks feature · 0cc37c2d
      Sami Tolvanen authored
      If ignore_zero_blocks is enabled dm-verity will return zeroes for blocks
      matching a zero hash without validating the content.
      Signed-off-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Sami Tolvanen's avatar
      dm verity: add support for forward error correction · a739ff3f
      Sami Tolvanen authored
      Add support for correcting corrupted blocks using Reed-Solomon.
      This code uses RS(255, N) interleaved across data and hash
      blocks. Each error-correcting block covers N bytes evenly
      distributed across the combined total data, so that each byte is a
      maximum distance away from the others. This makes it possible to
      recover from several consecutive corrupted blocks with relatively
      small space overhead.
      In addition, using verity hashes to locate erasures nearly doubles
      the effectiveness of error correction. Being able to detect
      corrupted blocks also improves performance, because only corrupted
      blocks need to corrected.
      For a 2 GiB partition, RS(255, 253) (two parity bytes for each
      253-byte block) can correct up to 16 MiB of consecutive corrupted
      blocks if erasures can be located, and 8 MiB if they cannot, with
      16 MiB space overhead.
      Signed-off-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
  22. 31 Oct, 2015 1 commit
  23. 09 Oct, 2015 1 commit
  24. 31 Aug, 2015 1 commit
  25. 18 Aug, 2015 1 commit
  26. 16 Jul, 2015 2 commits
  27. 17 Jun, 2015 3 commits