Skip to content
  • Andrea Arcangeli's avatar
    userfaultfd: change the read API to return a uffd_msg · a9b85f94
    Andrea Arcangeli authored
    
    
    I had requests to return the full address (not the page aligned one) to
    userland.
    
    It's not entirely clear how the page offset could be relevant because
    userfaults aren't like SIGBUS that can sigjump to a different place and it
    actually skip resolving the fault depending on a page offset.  There's
    currently no real way to skip the fault especially because after a
    UFFDIO_COPY|ZEROPAGE, the fault is optimized to be retried within the
    kernel without having to return to userland first (not even self modifying
    code replacing the .text that touched the faulting address would prevent
    the fault to be repeated).  Userland cannot skip repeating the fault even
    more so if the fault was triggered by a KVM secondary page fault or any
    get_user_pages or any copy-user inside some syscall which will return to
    kernel code.  The second time FAULT_FLAG_RETRY_NOWAIT won't be set leading
    to a SIGBUS being raised because the userfault can't wait if it cannot
    release the mmap_map first (and FAULT_FLAG_RETRY_NOWAIT is required for
    that).
    
    Still returning userland a proper structure during the read() on the uffd,
    can allow to use the current UFFD_API for the future non-cooperative
    extensions too and it looks cleaner as well.  Once we get additional
    fields there's no point to return the fault address page aligned anymore
    to reuse the bits below PAGE_SHIFT.
    
    The only downside is that the read() syscall will read 32bytes instead of
    8bytes but that's not going to be measurable overhead.
    
    The total number of new events that can be extended or of new future bits
    for already shipped events, is limited to 64 by the features field of the
    uffdio_api structure.  If more will be needed a bump of UFFD_API will be
    required.
    
    [akpm@linux-foundation.org: use __packed]
    Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
    Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
    Cc: zhang.zhanghailiang@huawei.com
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Andres Lagar-Cavilla <andreslc@google.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Peter Feiner <pfeiner@google.com>
    Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a9b85f94