Skip to content
  • Andy Lutomirski's avatar
    x86/entry/64: Create a per-CPU SYSCALL entry trampoline · c631a16e
    Andy Lutomirski authored
    commit 3386bc8a
    
     upstream.
    
    Handling SYSCALL is tricky: the SYSCALL handler is entered with every
    single register (except FLAGS), including RSP, live.  It somehow needs
    to set RSP to point to a valid stack, which means it needs to save the
    user RSP somewhere and find its own stack pointer.  The canonical way
    to do this is with SWAPGS, which lets us access percpu data using the
    %gs prefix.
    
    With PAGE_TABLE_ISOLATION-like pagetable switching, this is
    problematic.  Without a scratch register, switching CR3 is impossible, so
    %gs-based percpu memory would need to be mapped in the user pagetables.
    Doing that without information leaks is difficult or impossible.
    
    Instead, use a different sneaky trick.  Map a copy of the first part
    of the SYSCALL asm at a different address for each CPU.  Now RIP
    varies depending on the CPU, so we can use RIP-relative memory access
    to access percpu memory.  By putting the relevant information (one
    scratch slot and the stack address) at a constant offset relative to
    RIP, we can make SYSCALL work without relying on %gs.
    
    A nice thing about this approach is that we can easily switch it on
    and off if we want pagetable switching to be configurable.
    
    The compat variant of SYSCALL doesn't have this problem in the first
    place -- there are plenty of scratch registers, since we don't care
    about preserving r8-r15.  This patch therefore doesn't touch SYSCALL32
    at all.
    
    This patch actually seems to be a small speedup.  With this patch,
    SYSCALL touches an extra cache line and an extra virtual page, but
    the pipeline no longer stalls waiting for SWAPGS.  It seems that, at
    least in a tight loop, the latter outweights the former.
    
    Thanks to David Laight for an optimization tip.
    
    Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Reviewed-by: default avatarBorislav Petkov <bpetkov@suse.de>
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Laight <David.Laight@aculab.com>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: Eduardo Valentin <eduval@amazon.com>
    Cc: Greg KH <gregkh@linuxfoundation.org>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Juergen Gross <jgross@suse.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Link: https://lkml.kernel.org/r/20171204150606.403607157@linutronix.de
    
    
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    c631a16e