1. 09 Dec, 2013 1 commit
  2. 05 Dec, 2013 1 commit
    • Cesar Eduardo Barros's avatar
      crypto: more robust crypto_memneq · fe8c8a12
      Cesar Eduardo Barros authored
      Disabling compiler optimizations can be fragile, since a new
      optimization could be added to -O0 or -Os that breaks the assumptions
      the code is making.
      
      Instead of disabling compiler optimizations, use a dummy inline assembly
      (based on RELOC_HIDE) to block the problematic kinds of optimization,
      while still allowing other optimizations to be applied to the code.
      
      The dummy inline assembly is added after every OR, and has the
      accumulator variable as its input and output. The compiler is forced to
      assume that the dummy inline assembly could both depend on the
      accumulator variable and change the accumulator variable, so it is
      forced to compute the value correctly before the inline assembly, and
      cannot assume anything about its value after the inline assembly.
      
      This change should be enough to make crypto_memneq work correctly (with
      data-independent timing) even if it is inlined at its call sites. That
      can be done later in a followup patch.
      
      Compile-tested on x86_64.
      Signed-off-by: default avatarCesar Eduardo Barros <cesarb@cesarb.eti.br>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      fe8c8a12
  3. 07 Oct, 2013 1 commit
    • James Yonan's avatar
      crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks · 6bf37e5a
      James Yonan authored
      When comparing MAC hashes, AEAD authentication tags, or other hash
      values in the context of authentication or integrity checking, it
      is important not to leak timing information to a potential attacker,
      i.e. when communication happens over a network.
      
      Bytewise memory comparisons (such as memcmp) are usually optimized so
      that they return a nonzero value as soon as a mismatch is found. E.g,
      on x86_64/i5 for 512 bytes this can be ~50 cyc for a full mismatch
      and up to ~850 cyc for a full match (cold). This early-return behavior
      can leak timing information as a side channel, allowing an attacker to
      iteratively guess the correct result.
      
      This patch adds a new method crypto_memneq ("memory not equal to each
      other") to the crypto API that compares memory areas of the same length
      in roughly "constant time" (cache misses could change the timing, but
      since they don't reveal information about the content of the strings
      being compared, they are effectively benign). Iow, best and worst case
      behaviour take the same amount of time to complete (in contrast to
      memcmp).
      
      Note that crypto_memneq (unlike memcmp) can only be used to test for
      equality or inequality, NOT for lexicographical order. This, however,
      is not an issue for its use-cases within the crypto API.
      
      We tried to locate all of the places in the crypto API where memcmp was
      being used for authentication or integrity checking, and convert them
      over to crypto_memneq.
      
      crypto_memneq is declared noinline, placed in its own source file,
      and compiled with optimizations that might increase code size disabled
      ("Os") because a smart compiler (or LTO) might notice that the return
      value is always compared against zero/nonzero, and might then
      reintroduce the same early-return optimization that we are trying to
      avoid.
      
      Using #pragma or __attribute__ optimization annotations of the code
      for disabling optimization was avoided as it seems to be considered
      broken or unmaintained for long time in GCC [1]. Therefore, we work
      around that by specifying the compile flag for memneq.o directly in
      the Makefile. We found that this seems to be most appropriate.
      
      As we use ("Os"), this patch also provides a loop-free "fast-path" for
      frequently used 16 byte digests. Similarly to kernel library string
      functions, leave an option for future even further optimized architecture
      specific assembler implementations.
      
      This was a joint work of James Yonan and Daniel Borkmann. Also thanks
      for feedback from Florian Weimer on this and earlier proposals [2].
      
        [1] http://gcc.gnu.org/ml/gcc/2012-07/msg00211.html
        [2] https://lkml.org/lkml/2013/2/10/131Signed-off-by: default avatarJames Yonan <james@openvpn.net>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Florian Weimer <fw@deneb.enyo.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      6bf37e5a