Skip to content
  • Denys Vlasenko's avatar
    x86/asm: Optimize unnecessarily wide TEST instructions · 3e1aa7cb
    Denys Vlasenko authored
    By the nature of the TEST operation, it is often possible to test
    a narrower part of the operand:
    
        "testl $3,  mem"  ->  "testb $3, mem",
        "testq $3, %rcx"  ->  "testb $3, %cl"
    
    This results in shorter instructions, because the TEST instruction
    has no sign-entending byte-immediate forms unlike other ALU ops.
    
    Note that this change does not create any LCP (Length-Changing Prefix)
    stalls, which happen when adding a 0x66 prefix, which happens when
    16-bit immediates are used, which changes such TEST instructions:
    
      [test_opcode] [modrm] [imm32]
    
    to:
    
      [0x66] [test_opcode] [modrm] [imm16]
    
    where [imm16] has a *different length* now: 2 bytes instead of 4.
    This confuses the decoder and slows down execution.
    
    REX prefixes were carefully designed to almost never hit this case:
    adding REX prefix does not change instruction length except MOVABS
    and MOV [addr],RAX instruction.
    
    This patch does not add instructions which would use a 0x66 prefix,
    code chan...
    3e1aa7cb