Skip to content
  • Alexey Kardashevskiy's avatar
    vfio: powerpc/spapr: Register memory and define IOMMU v2 · 2157e7b8
    Alexey Kardashevskiy authored
    
    
    The existing implementation accounts the whole DMA window in
    the locked_vm counter. This is going to be worse with multiple
    containers and huge DMA windows. Also, real-time accounting would requite
    additional tracking of accounted pages due to the page size difference -
    IOMMU uses 4K pages and system uses 4K or 64K pages.
    
    Another issue is that actual pages pinning/unpinning happens on every
    DMA map/unmap request. This does not affect the performance much now as
    we spend way too much time now on switching context between
    guest/userspace/host but this will start to matter when we add in-kernel
    DMA map/unmap acceleration.
    
    This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
    New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
    2 new ioctls to register/unregister DMA memory -
    VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
    which receive user space address and size of a memory region which
    needs to be pinned/unpinned and counted in locked_vm.
    New IOMMU splits physical pages pinning and TCE table update
    into 2 different operations. It requires:
    1) guest pages to be registered first
    2) consequent map/unmap requests to work only with pre-registered memory.
    For the default single window case this means that the entire guest
    (instead of 2GB) needs to be pinned before using VFIO.
    When a huge DMA window is added, no additional pinning will be
    required, otherwise it would be guest RAM + 2GB.
    
    The new memory registration ioctls are not supported by
    VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
    will require memory to be preregistered in order to work.
    
    The accounting is done per the user process.
    
    This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
    can do with v1 or v2 IOMMUs.
    
    In order to support memory pre-registration, we need a way to track
    the use of every registered memory region and only allow unregistration
    if a region is not in use anymore. So we need a way to tell from what
    region the just cleared TCE was from.
    
    This adds a userspace view of the TCE table into iommu_table struct.
    It contains userspace address, one per TCE entry. The table is only
    allocated when the ownership over an IOMMU group is taken which means
    it is only used from outside of the powernv code (such as VFIO).
    
    As v2 IOMMU supports IODA2 and pre-IODA2 IOMMUs (which do not support
    DDW API), this creates a default DMA window for IODA2 for consistency.
    
    Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
    [aw: for the vfio related changes]
    Acked-by: default avatarAlex Williamson <alex.williamson@redhat.com>
    Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    2157e7b8