1. 27 Nov, 2018 1 commit
  2. 26 Sep, 2016 1 commit
  3. 21 Sep, 2016 4 commits
  4. 01 Sep, 2016 1 commit
  5. 29 Aug, 2016 1 commit
    • Martin Schwidefsky's avatar
      RAID/s390: add SIMD implementation for raid6 gen/xor · 474fd6e8
      Martin Schwidefsky authored
      Using vector registers is slightly faster:
      
      raid6: vx128x8  gen() 19705 MB/s
      raid6: vx128x8  xor() 11886 MB/s
      raid6: using algorithm vx128x8 gen() 19705 MB/s
      raid6: .... xor() 11886 MB/s, rmw enabled
      
      vs the software algorithms:
      
      raid6: int64x1  gen()  3018 MB/s
      raid6: int64x1  xor()  1429 MB/s
      raid6: int64x2  gen()  4661 MB/s
      raid6: int64x2  xor()  3143 MB/s
      raid6: int64x4  gen()  5392 MB/s
      raid6: int64x4  xor()  3509 MB/s
      raid6: int64x8  gen()  4441 MB/s
      raid6: int64x8  xor()  3207 MB/s
      raid6: using algorithm int64x4 gen() 5392 MB/s
      raid6: .... xor() 3509 MB/s, rmw enabled
      Signed-off-by: 's avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      474fd6e8
  6. 01 Dec, 2015 1 commit
  7. 31 Aug, 2015 1 commit
    • Ard Biesheuvel's avatar
      md/raid6: delta syndrome for ARM NEON · 0e833e69
      Ard Biesheuvel authored
      This implements XOR syndrome calculation using NEON intrinsics.
      As before, the module can be built for ARM and arm64 from the
      same source.
      
      Relative performance on a Cortex-A57 based system:
      
        raid6: int64x1  gen()   905 MB/s
        raid6: int64x1  xor()   881 MB/s
        raid6: int64x2  gen()  1343 MB/s
        raid6: int64x2  xor()  1286 MB/s
        raid6: int64x4  gen()  1896 MB/s
        raid6: int64x4  xor()  1321 MB/s
        raid6: int64x8  gen()  1773 MB/s
        raid6: int64x8  xor()  1165 MB/s
        raid6: neonx1   gen()  1834 MB/s
        raid6: neonx1   xor()  1278 MB/s
        raid6: neonx2   gen()  2528 MB/s
        raid6: neonx2   xor()  1942 MB/s
        raid6: neonx4   gen()  2888 MB/s
        raid6: neonx4   xor()  2334 MB/s
        raid6: neonx8   gen()  2957 MB/s
        raid6: neonx8   xor()  2232 MB/s
        raid6: using algorithm neonx8 gen() 2957 MB/s
        raid6: .... xor() 2232 MB/s, rmw enabled
      
      Cc: Markus Stockhausen <stockhausen@collogia.de>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: 's avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: 's avatarNeilBrown <neilb@suse.com>
      0e833e69
  8. 11 Jun, 2015 1 commit
  9. 19 May, 2015 1 commit
    • Ingo Molnar's avatar
      x86/fpu: Rename i387.h to fpu/api.h · df6b35f4
      Ingo Molnar authored
      We already have fpu/types.h, move i387.h to fpu/api.h.
      
      The file name has become a misnomer anyway: it offers generic FPU APIs,
      but is not limited to i387 functionality.
      Reviewed-by: 's avatarBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: 's avatarIngo Molnar <mingo@kernel.org>
      df6b35f4
  10. 21 Apr, 2015 4 commits
    • Markus Stockhausen's avatar
      md/raid6 algorithms: xor_syndrome() for SSE2 · a582564b
      Markus Stockhausen authored
      The second and (last) optimized XOR syndrome calculation. This version
      supports right and left side optimization. All CPUs with architecture
      older than Haswell will benefit from it.
      
      It should be noted that SSE2 movntdq kills performance for memory areas
      that are read and written simultaneously in chunks smaller than cache
      line size. So use movdqa instead for P/Q writes in sse21 and sse22 XOR
      functions.
      Signed-off-by: 's avatarMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: 's avatarNeilBrown <neilb@suse.de>
      a582564b
    • Markus Stockhausen's avatar
      md/raid6 algorithms: xor_syndrome() for generic int · 9a5ce91d
      Markus Stockhausen authored
      Start the algorithms with the very basic one. It is left and right
      optimized. That means we can avoid all calculations for unneeded pages
      above the right stop offset. For pages below the left start offset we
      still need the syndrome multiplication but without reading data pages.
      Signed-off-by: 's avatarMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: 's avatarNeilBrown <neilb@suse.de>
      9a5ce91d
    • Markus Stockhausen's avatar
      md/raid6 algorithms: improve test program · 7e92e1d7
      Markus Stockhausen authored
      It is always helpful to have a test tool in place if we implement
      new data critical algorithms. So add some test routines to the raid6
      checker that can prove if the new xor_syndrome() works as expected.
      
      Run through all permutations of start/stop pages per algorithm and
      simulate a xor_syndrome() assisted rmw run. After each rmw check if
      the recovery algorithm still confirms that the stripe is fine.
      Signed-off-by: 's avatarMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: 's avatarNeilBrown <neilb@suse.de>
      7e92e1d7
    • Markus Stockhausen's avatar
      md/raid6 algorithms: delta syndrome functions · fe5cbc6e
      Markus Stockhausen authored
      v3: s-o-b comment, explanation of performance and descision for
      the start/stop implementation
      
      Implementing rmw functionality for RAID6 requires optimized syndrome
      calculation. Up to now we can only generate a complete syndrome. The
      target P/Q pages are always overwritten. With this patch we provide
      a framework for inplace P/Q modification. In the first place simply
      fill those functions with NULL values.
      
      xor_syndrome() has two additional parameters: start & stop. These
      will indicate the first and last page that are changing during a
      rmw run. That makes it possible to avoid several unneccessary loops
      and speed up calculation. The caller needs to implement the following
      logic to make the functions work.
      
      1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source
      blocks inside P/Q between (and including) start and end.
      
      2) modify any block with start <= block <= stop
      
      3) xor_syndrome(disks, start, stop, ...): "Reinsert" all data of
      source blocks into P/Q between (and including) start and end.
      
      Pages between start and stop that won't be changed should be filled
      with a pointer to the kernel zero page. The reasons for not taking NULL
      pages are:
      
      1) Algorithms cross the whole source data line by line. Thus avoid
      additional branches.
      
      2) Having a NULL page avoids calculating the XOR P parity but still
      need calulation steps for the Q parity. Depending on the algorithm
      unrolling that might be only a difference of 2 instructions per loop.
      
      The benchmark numbers of the gen_syndrome() functions are displayed in
      the kernel log. Do the same for the xor_syndrome() functions. This
      will help to analyze performance problems and give an rough estimate
      how well the algorithm works. The choice of the fastest algorithm will
      still depend on the gen_syndrome() performance.
      
      With the start/stop page implementation the speed can vary a lot in real
      life. E.g. a change of page 0 & page 15 on a stripe will be harder to
      compute than the case where page 0 & page 1 are XOR candidates. To be not
      to enthusiatic about the expected speeds we will run a worse case test
      that simulates a change on the upper half of the stripe. So we do:
      
      1) calculation of P/Q for the upper pages
      
      2) continuation of Q for the lower (empty) pages
      Signed-off-by: 's avatarMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: 's avatarNeilBrown <neilb@suse.de>
      fe5cbc6e
  11. 03 Feb, 2015 1 commit
  12. 14 Oct, 2014 1 commit
  13. 27 Aug, 2013 2 commits
    • Max Filippov's avatar
      raid6/test: replace echo -e with printf · c28399b5
      Max Filippov authored
      -e is a non-standard echo option, echo output is
      implementation-dependent when it is used. Replace echo -e with printf as
      suggested by POSIX echo manual.
      
      Cc: NeilBrown <neilb@suse.de>
      Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
      Acked-by: 's avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: 's avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: 's avatarNeilBrown <neilb@suse.de>
      c28399b5
    • Ken Steele's avatar
      RAID: add tilegx SIMD implementation of raid6 · ae77cbc1
      Ken Steele authored
      This change adds TILE-Gx SIMD instructions to the software raid
      (md), modeling the Altivec implementation. This is only for Syndrome
      generation; there is more that could be done to improve recovery,
      as in the recent Intel SSE3 recovery implementation.
      
      The code unrolls 8 times; this turns out to be the best on tilegx
      hardware among the set 1, 2, 4, 8 or 16.  The code reads one
      cache-line of data from each disk, stores P and Q then goes to the
      next cache-line.
      
      The test code in sys/linux/lib/raid6/test reports 2008 MB/s data
      read rate for syndrome generation using 18 disks (16 data and 2
      parity). It was 1512 MB/s before this SIMD optimizations. This is
      running on 1 core with all the data in cache.
      
      This is based on the paper The Mathematics of RAID-6.
      (http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf).
      Signed-off-by: 's avatarKen Steele <ken@tilera.com>
      Signed-off-by: 's avatarChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: 's avatarNeilBrown <neilb@suse.de>
      ae77cbc1
  14. 08 Jul, 2013 1 commit
  15. 13 Dec, 2012 3 commits
  16. 28 May, 2012 1 commit
  17. 22 May, 2012 4 commits
  18. 28 Mar, 2012 2 commits
  19. 31 Oct, 2011 2 commits
  20. 20 Oct, 2011 1 commit
  21. 30 Aug, 2010 1 commit
  22. 11 Aug, 2010 2 commits
  23. 10 Aug, 2010 1 commit
  24. 29 Oct, 2009 1 commit