1. 03 May, 2010 1 commit
    • Brian Gerst's avatar
      x86-32: Rework cache flush denied handler · 40d2e763
      Brian Gerst authored
      The cache flush denied error is an erratum on some AMD 486 clones.  If an invd
      instruction is executed in userspace, the processor calls exception 19 (13 hex)
      instead of #GP (13 decimal).  On cpus where XMM is not supported, redirect
      exception 19 to do_general_protection().  Also, remove die_if_kernel(), since
      this was the last user.
      Signed-off-by: default avatarBrian Gerst <brgerst@gmail.com>
      LKML-Reference: <1269176446-2489-2-git-send-email-brgerst@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      40d2e763
  2. 10 Dec, 2009 8 commits
  3. 14 Oct, 2009 1 commit
    • Steven Rostedt's avatar
      function-graph/x86: Replace unbalanced ret with jmp · 194ec341
      Steven Rostedt authored
      The function graph tracer replaces the return address with a hook
      to trace the exit of the function call. This hook will finish by
      returning to the real location the function should return to.
      
      But the current implementation uses a ret to jump to the real
      return location. This causes a imbalance between calls and ret.
      That is the original function does a call, the ret goes to the
      handler and then the handler does a ret without a matching call.
      
      Although the function graph tracer itself still breaks the branch
      predictor by replacing the original ret, by using a second ret and
      causing an imbalance, it breaks the predictor even more.
      
      This patch replaces the ret with a jmp to keep the calls and ret
      balanced. I tested this on one box and it showed a 1.7% increase in
      performance. Another box only showed a small 0.3% increase. But no
      box that I tested this on showed a decrease in performance by
      making this change.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091013203425.042034383@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      194ec341
  4. 11 Sep, 2009 1 commit
    • Masami Hiramatsu's avatar
      kprobes/x86-32: Move irq-exit functions to kprobes section · a00e817f
      Masami Hiramatsu authored
      Move irq-exit functions to .kprobes.text section to protect against
      kprobes recursion.
      
      When I ran kprobe stress test on x86-32, I found below symbols
      cause unrecoverable recursive probing:
      
      	ret_from_exception
      	ret_from_intr
      	check_userspace
      	restore_all
      	restore_all_notrace
      	restore_nocheck
      	irq_return
      
      And also, I found some interrupt/exception entry points that
      cause similar problems.
      
      This patch moves those symbols (including their container functions)
      to .kprobes.text section to prevent any kprobes probing.
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      LKML-Reference: <20090908164755.24050.81182.stgit@dhcp-100-2-132.bos.redhat.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      a00e817f
  5. 18 Jun, 2009 4 commits
    • Steven Rostedt's avatar
      function-graph: add stack frame test · 71e308a2
      Steven Rostedt authored
      In case gcc does something funny with the stack frames, or the return
      from function code, we would like to detect that.
      
      An arch may implement passing of a variable that is unique to the
      function and can be saved on entering a function and can be tested
      when exiting the function. Usually the frame pointer can be used for
      this purpose.
      
      This patch also implements this for x86. Where it passes in the stack
      frame of the parent function, and will test that frame on exit.
      
      There was a case in x86_32 with optimize for size (-Os) where, for a
      few functions, gcc would align the stack frame and place a copy of the
      return address into it. The function graph tracer modified the copy and
      not the actual return address. On return from the funtion, it did not go
      to the tracer hook, but returned to the parent. This broke the function
      graph tracer, because the return of the parent (where gcc did not do
      this funky manipulation) returned to the location that the child function
      was suppose to. This caused strange kernel crashes.
      
      This test detected the problem and pointed out where the issue was.
      
      This modifies the parameters of one of the functions that the arch
      specific code calls, so it includes changes to arch code to accommodate
      the new prototype.
      
      Note, I notice that the parsic arch implements its own push_return_trace.
      This is now a generic function and the ftrace_push_return_trace should be
      used instead. This patch does not touch that code.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      71e308a2
    • Alexander van Heukelum's avatar
      x86: de-assembler-ize asm/desc.h · bc3f5d3d
      Alexander van Heukelum authored
      asm/desc.h is included in three assembly files, but the only macro
      it defines, GET_DESC_BASE, is never used. This patch removes the
      includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard
      from asm/desc.h.
      Signed-off-by: default avatarAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      bc3f5d3d
    • Alexander van Heukelum's avatar
      i386: fix/simplify espfix stack switching, move it into assembly · dc4c2a0a
      Alexander van Heukelum authored
      The espfix code triggers if we have a protected mode userspace
      application with a 16-bit stack. On returning to userspace, with iret,
      the CPU doesn't restore the high word of the stack pointer. This is an
      "official" bug, and the work-around used in the kernel is to temporarily
      switch to a 32-bit stack segment/pointer pair where the high word of the
      pointer is equal to the high word of the userspace stackpointer.
      
      The current implementation uses THREAD_SIZE to determine the cut-off,
      but there is no good reason not to use the more natural 64kb... However,
      implementing this by simply substituting THREAD_SIZE with 65536 in
      patch_espfix_desc crashed the test application. patch_espfix_desc tries
      to do what is described above, but gets it subtly wrong if the userspace
      stack pointer is just below a multiple of THREAD_SIZE: an overflow
      occurs to bit 13... With a bit of luck, when the kernelspace
      stackpointer is just below a 64kb-boundary, the overflow then ripples
      trough to bit 16 and userspace will see its stack pointer changed by
      65536.
      
      This patch moves all espfix code into entry_32.S. Selecting a 16-bit
      cut-off simplifies the code. The game with changing the limit dynamically
      is removed too. It complicates matters and I see no value in it. Changing
      only the top 16-bit word of ESP is one instruction and it also implies
      that only two bytes of the ESPFIX GDT entry need to be changed and this
      can be implemented in just a handful simple to understand instructions.
      As a side effect, the operation to compute the original ESP from the
      ESPFIX ESP and the GDT entry simplifies a bit too, and the remaining
      three instructions have been expanded inline in entry_32.S.
      
      impact: can now reliably run userspace with ESP=xxxxfffc on 16-bit
      stack segment
      Signed-off-by: default avatarAlexander van Heukelum <heukelum@fastmail.fm>
      Acked-by: default avatarStas Sergeev <stsp@aknet.ru>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      dc4c2a0a
    • Alexander van Heukelum's avatar
      i386: fix return to 16-bit stack from NMI handler · 2e04bc76
      Alexander van Heukelum authored
      Returning to a task with a 16-bit stack requires special care: the iret
      instruction does not restore the high word of esp in that case. The
      espfix code fixes this, but currently is not invoked on NMIs. This means
      that a running task gets the upper word of esp clobbered due intervening
      NMIs. To reproduce, compile and run the following program with the nmi
      watchdog enabled (nmi_watchdog=2 on the command line). Using gdb you can
      see that the high bits of esp contain garbage, while the low bits are
      still correct.
      
      This patch puts the espfix code back into the NMI code path.
      
      The patch is slightly complicated due to the irqtrace infrastructure not
      being NMI-safe. The NMI return path cannot call TRACE_IRQS_IRET.
      Otherwise, the tail of the normal iret-code is correct for the nmi code
      path too. To be able to share this code-path, the TRACE_IRQS_IRET was
      move up a bit. The espfix code exists after the TRACE_IRQS_IRET, but
      this code explicitly disables interrupts. This short interrupts-off
      section is now not traced anymore. The return-to-kernel path now always
      includes the preliminary test to decide if the espfix code should be
      called. This is never the case, but doing it this way keeps the patch as
      simple as possible and the few extra instructions should not affect
      timing in any significant way.
      
       #define _GNU_SOURCE
       #include <stdio.h>
       #include <sys/types.h>
       #include <sys/mman.h>
       #include <unistd.h>
       #include <sys/syscall.h>
       #include <asm/ldt.h>
      
      int modify_ldt(int func, void *ptr, unsigned long bytecount)
      {
              return syscall(SYS_modify_ldt, func, ptr, bytecount);
      }
      
      /* this is assumed to be usable */
       #define SEGBASEADDR 0x10000
       #define SEGLIMIT 0x20000
      
      /* 16-bit segment */
      struct user_desc desc = {
              .entry_number = 0,
              .base_addr = SEGBASEADDR,
              .limit = SEGLIMIT,
              .seg_32bit = 0,
              .contents = 0, /* ??? */
              .read_exec_only = 0,
              .limit_in_pages = 0,
              .seg_not_present = 0,
              .useable = 1
      };
      
      int main(void)
      {
              setvbuf(stdout, NULL, _IONBF, 0);
      
              /* map a 64 kb segment */
              char *pointer = mmap((void *)SEGBASEADDR, SEGLIMIT+1,
                              PROT_EXEC|PROT_READ|PROT_WRITE,
                              MAP_SHARED|MAP_ANONYMOUS, -1, 0);
              if (pointer == NULL) {
                      printf("could not map space\n");
                      return 0;
              }
      
              /* write ldt, new mode */
              int err = modify_ldt(0x11, &desc, sizeof(desc));
              if (err) {
                      printf("error modifying ldt: %i\n", err);
                      return 0;
              }
      
              for (int i=0; i<1000; i++) {
              asm volatile (
                      "pusha\n\t"
                      "mov %ss, %eax\n\t" /* preserve ss:esp */
                      "mov %esp, %ebp\n\t"
                      "push $7\n\t" /* index 0, ldt, user mode */
                      "push $65536-4096\n\t" /* esp */
                      "lss (%esp), %esp\n\t" /* switch to new stack */
                      "push %eax\n\t" /* save old ss:esp on new stack */
                      "push %ebp\n\t"
                      "add $17*65536, %esp\n\t" /* set high bits */
                      "mov %esp, %edx\n\t"
      
                      "mov $10000000, %ecx\n\t" /* wait... */
                      "1: loop 1b\n\t" /* ... a bit */
      
                      "cmp %esp, %edx\n\t"
                      "je 1f\n\t"
                      "ud2\n\t" /* esp changed inexplicably! */
                      "1:\n\t"
                      "sub $17*65536, %esp\n\t" /* restore high bits */
                      "lss (%esp), %esp\n\t" /* restore old ss:esp */
                      "popa\n\t");
      
                      printf("\rx%ix", i);
              }
      
              return 0;
      }
      Signed-off-by: default avatarAlexander van Heukelum <heukelum@fastmail.fm>
      Acked-by: default avatarStas Sergeev <stsp@aknet.ru>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      2e04bc76
  6. 14 Mar, 2009 1 commit
    • Jaswinder Singh Rajput's avatar
      x86: entry_32.S fix compile warnings - fix work mask bit width · 88200bc2
      Jaswinder Singh Rajput authored
      Fix:
      
       arch/x86/kernel/entry_32.S:446: Warning: 00000000080001d1 shortened to 00000000000001d1
       arch/x86/kernel/entry_32.S:457: Warning: 000000000800feff shortened to 000000000000feff
       arch/x86/kernel/entry_32.S:527: Warning: 00000000080001d1 shortened to 00000000000001d1
       arch/x86/kernel/entry_32.S:541: Warning: 000000000800feff shortened to 000000000000feff
       arch/x86/kernel/entry_32.S:676: Warning: 0000000008000091 shortened to 0000000000000091
      
      TIF_SYSCALL_FTRACE is 0x08000000 and until now we checked the
      first 16 bits of the work mask - bit 27 falls outside of that.
      
      Update the entry_32.S code to check the full 32-bit mask.
      
      [ %cx => %ecx fix from Cyrill Gorcunov <gorcunov@gmail.com> ]
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "H. Peter Anvin" <hpa@kernel.org>
      LKML-Reference: <1237012693.18733.3.camel@ht.satnam>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      88200bc2
  7. 23 Feb, 2009 1 commit
    • Stas Sergeev's avatar
      x86: minor cleanup in the espfix code · bda3a897
      Stas Sergeev authored
      Impact: Cleanup
      
      Checkin be44d2aa eliminates the use of
      a 16-bit stack for espfix.  However, at least one instruction remained
      that only operated on the low 16 bits of %esp.
      
      This is not a bug per se because the kernel stack is always an aligned
      4K or 8K block.  Therefore it cannot cross 64K boundaries; this code,
      in fact, relies strictly on that fact.
      
      However, it's a lot cleaner (and, for that matter, smaller) to operate
      on the entire 32-bit register.
      Signed-off-by: default avatarStas Sergeev <stsp@aknet.ru>
      CC: Zachary Amsden <zach@vmware.com>
      CC: Chuck Ebbert <cebbert@redhat.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      bda3a897
  8. 13 Feb, 2009 1 commit
  9. 11 Feb, 2009 1 commit
  10. 09 Feb, 2009 3 commits
    • Tejun Heo's avatar
      x86: implement x86_32 stack protector · 60a5317f
      Tejun Heo authored
      Impact: stack protector for x86_32
      
      Implement stack protector for x86_32.  GDT entry 28 is used for it.
      It's set to point to stack_canary-20 and have the length of 24 bytes.
      CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
      to the stack canary segment on entry.  As %gs is otherwise unused by
      the kernel, the canary can be anywhere.  It's defined as a percpu
      variable.
      
      x86_32 exception handlers take register frame on stack directly as
      struct pt_regs.  With -fstack-protector turned on, gcc copies the
      whole structure after the stack canary and (of course) doesn't copy
      back on return thus losing all changed.  For now, -fno-stack-protector
      is added to all files which contain those functions.  We definitely
      need something better.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      60a5317f
    • Tejun Heo's avatar
      x86: make lazy %gs optional on x86_32 · ccbeed3a
      Tejun Heo authored
      Impact: pt_regs changed, lazy gs handling made optional, add slight
              overhead to SAVE_ALL, simplifies error_code path a bit
      
      On x86_32, %gs hasn't been used by kernel and handled lazily.  pt_regs
      doesn't have place for it and gs is saved/loaded only when necessary.
      In preparation for stack protector support, this patch makes lazy %gs
      handling optional by doing the followings.
      
      * Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs.
      
      * Save and restore %gs along with other registers in entry_32.S unless
        LAZY_GS.  Note that this unfortunately adds "pushl $0" on SAVE_ALL
        even when LAZY_GS.  However, it adds no overhead to common exit path
        and simplifies entry path with error code.
      
      * Define different user_gs accessors depending on LAZY_GS and add
        lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS.  The
        lazy_*_gs() ops are used to save, load and clear %gs lazily.
      
      * Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly.
      
      xen and lguest changes need to be verified.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ccbeed3a
    • Tejun Heo's avatar
      x86: use asm .macro instead of cpp #define in entry_32.S · f0d96110
      Tejun Heo authored
      Impact: cleanup
      
      Use .macro instead of cpp #define where approriate.  This cleans up
      code and will ease future changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f0d96110
  11. 29 Jan, 2009 1 commit
  12. 21 Jan, 2009 1 commit
    • Tejun Heo's avatar
      x86: make x86_32 use tlb_64.c · 02cf94c3
      Tejun Heo authored
      Impact: less contention when issuing invalidate IPI, cleanup
      
      Make x86_32 use the same tlb code as 64bit.  The 64bit code uses
      multiple IPI vectors for tlb shootdown to reduce contention.  This
      patch makes x86_32 allocate the same 8 IPIs as x86_64 and share the
      code paths.
      
      Note that the usage of asmlinkage is inconsistent for x86_32 and 64
      and calls for further cleanup.  This has been noted with a FIXME
      comment in tlb_64.c.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      02cf94c3
  13. 12 Jan, 2009 1 commit
  14. 03 Dec, 2008 2 commits
  15. 27 Nov, 2008 1 commit
  16. 26 Nov, 2008 3 commits
  17. 23 Nov, 2008 1 commit
  18. 16 Nov, 2008 1 commit
    • Frederic Weisbecker's avatar
      tracing/function-return-tracer: support for dynamic ftrace on function return tracer · e7d3737e
      Frederic Weisbecker authored
      This patch adds the support for dynamic tracing on the function return tracer.
      The whole difference with normal dynamic function tracing is that we don't need
      to hook on a particular callback. The only pro that we want is to nop or set
      dynamically the calls to ftrace_caller (which is ftrace_return_caller here).
      
      Some security checks ensure that we are not trying to launch dynamic tracing for
      return tracing while normal function tracing is already running.
      
      An example of trace with getnstimeofday set as a filter:
      
      ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns)
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e7d3737e
  19. 12 Nov, 2008 2 commits
  20. 11 Nov, 2008 3 commits
    • H. Peter Anvin's avatar
      x86: 32 bits: shrink and align IRQ stubs · b7c6244f
      H. Peter Anvin authored
      Shrink the IRQ stubs on 32 bits down to just over four bytes per (we
      fit seven into a 32-byte chunk.)  This shrinks the total icache
      consumption of the IRQ stubs down to an even kilobyte, if all of them
      are in active use.
      
      The downside is that we end up with a double jump, which could have a
      negative effect on some pipelines.  The double jump is always inside
      the same cacheline on any modern chips (the exception being
      486/Elan/Geode which have only 16-byte cachelines, but are unlikely to
      have too many interrupt sources.)
      
      To get the most effect, cache-align the IRQ stubs.
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      b7c6244f
    • H. Peter Anvin's avatar
      x86: 32 bit: interrupt stub consistency with 64 bit · 4687518c
      H. Peter Anvin authored
      Don't generate interrupt stubs for interrupt vectors below
      FIRST_EXTERNAL_VECTOR, and make the table of interrupt vectors
      (interrupt[]) __initconst.  Both of these changes both conserve memory
      and improve consistency with 64 bits.
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      4687518c
    • Frederic Weisbecker's avatar
      tracing, x86: add low level support for ftrace return tracing · caf4b323
      Frederic Weisbecker authored
      Impact: add infrastructure for function-return tracing
      
      Add low level support for ftrace return tracing.
      
      This plug-in stores return addresses on the thread_info structure of
      the current task.
      
      The index of the current return address is initialized when the task
      is the first one (init) and when a process forks (the child). It is
      not needed when a task does a sys_execve because after this syscall,
      it still needs to return on the kernel functions it called.
      
      Note that the code of return_to_handler has been suggested by Steven
      Rostedt as almost all of the ideas of improvements in this V3.
      
      For purpose of security, arch/x86/kernel/process_32.c is not traced
      because __switch_to() changes the current task during its execution.
      That could cause inconsistency in the stored return address of this
      function even if I didn't have any crash after testing with tracing on
      this function enabled.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      caf4b323
  21. 06 Nov, 2008 1 commit
    • Steven Rostedt's avatar
      ftrace: add quick function trace stop · 60a7ecf4
      Steven Rostedt authored
      Impact: quick start and stop of function tracer
      
      This patch adds a way to disable the function tracer quickly without
      the need to run kstop_machine. It adds a new variable called
      function_trace_stop which will stop the calls to functions from mcount
      when set.  This is just an on/off switch and does not handle recursion
      like preempt_disable().
      
      It's main purpose is to help other tracers/debuggers start and stop tracing
      fuctions without the need to call kstop_machine.
      
      The config option HAVE_FUNCTION_TRACE_MCOUNT_TEST is added for archs
      that implement the testing of the function_trace_stop in the mcount
      arch dependent code. Otherwise, the test is done in the C code.
      
      x86 is the only arch at the moment that supports this.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      60a7ecf4
  22. 22 Oct, 2008 1 commit