Skip to content
  • Bob Pearson's avatar
    crc32: optimize loop counter for x86 · 0292c497
    Bob Pearson authored
    
    
    Add two changes that improve the performance of x86 systems
    
    1. replace main loop with incrementing counter this change improves
       the performance of the selftest by about 5-6% on Nehalem CPUs.  The
       apparent reason is that the compiler can use the loop index to perform
       an indexed memory access.  This is reported to make the performance of
       PowerPC CPUs to get worse.
    
    2. replace the rem_len loop with incrementing counter this change
       improves the performance of the selftest, which has more than the usual
       number of occurances, by about 1-2% on x86 CPUs.  In actual work loads
       the length is most often a multiple of 4 bytes and this code does not
       get executed as often if at all.  Again this change is reported to make
       the performance of PowerPC get worse.
    
    [djwong@us.ibm.com: Minor changelog tweaks]
    Signed-off-by: default avatarBob Pearson <rpearson@systemfabricworks.com>
    Signed-off-by: default avatarDarrick J. Wong <djwong@us.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    0292c497