1. 10 Jul, 2017 1 commit
  2. 17 Feb, 2017 1 commit
    • Daniel Borkmann's avatar
      bpf: make jited programs visible in traces · 74451e66
      Daniel Borkmann authored
      Long standing issue with JITed programs is that stack traces from
      function tracing check whether a given address is kernel code
      through {__,}kernel_text_address(), which checks for code in core
      kernel, modules and dynamically allocated ftrace trampolines. But
      what is still missing is BPF JITed programs (interpreted programs
      are not an issue as __bpf_prog_run() will be attributed to them),
      thus when a stack trace is triggered, the code walking the stack
      won't see any of the JITed ones. The same for address correlation
      done from user space via reading /proc/kallsyms. This is read by
      tools like perf, but the latter is also useful for permanent live
      tracing with eBPF itself in combination with stack maps when other
      eBPF types are part of the callchain. See offwaketime example on
      dumping stack from a map.
      This work tries to tackle that issue by making the addresses and
      symbols known to the kernel. The lookup from *kernel_text_address()
      is implemented through a latched RB tree that can be read under
      RCU in fast-path that is also shared for symbol/size/offset lookup
      for a specific given address in kallsyms. The slow-path iteration
      through all symbols in the seq file done via RCU list, which holds
      a tiny fraction of all exported ksyms, usually below 0.1 percent.
      Function symbols are exported as bpf_prog_<tag>, in order to aide
      debugging and attribution. This facility is currently enabled for
      root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
      is active in any mode. The rationale behind this is that still a lot
      of systems ship with world read permissions on kallsyms thus addresses
      should not get suddenly exposed for them. If that situation gets
      much better in future, we always have the option to change the
      default on this. Likewise, unprivileged programs are not allowed
      to add entries there either, but that is less of a concern as most
      such programs types relevant in this context are for root-only anyway.
      If enabled, call graphs and stack traces will then show a correct
      attribution; one example is illustrated below, where the trace is
      now visible in tooling such as perf script --kallsyms=/proc/kallsyms
      and friends.
        7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
        7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  3. 15 Mar, 2016 1 commit
    • Ard Biesheuvel's avatar
      kallsyms: add support for relative offsets in kallsyms address table · 2213e9a6
      Ard Biesheuvel authored
      Similar to how relative extables are implemented, it is possible to emit
      the kallsyms table in such a way that it contains offsets relative to
      some anchor point in the kernel image rather than absolute addresses.
      On 64-bit architectures, it cuts the size of the kallsyms address table
      in half, since offsets between kernel symbols can typically be expressed
      in 32 bits.  This saves several hundreds of kilobytes of permanent
      .rodata on average.  In addition, the kallsyms address table is no
      longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
      effect, so the relocation work done after decompression now doesn't have
      to do relocation updates for all these values.  This saves up to 24
      bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
      which easily adds up to a couple of megabytes of uncompressed __init
      data on ppc64 or arm64.  Even if these relocation entries typically
      compress well, the combined size reduction of 2.8 MB uncompressed for a
      ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
      KB space saving in the compressed image.
      Since it is useful for some architectures (like x86) to retain the
      ability to emit absolute values as well, this patch also adds support
      for capturing both absolute and relative values when
      KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
      addresses as positive 32-bit values, and addresses relative to the
      lowest encountered relative symbol as negative values, which are
      subtracted from the runtime address of this base symbol to produce the
      actual address.
      Support for the above is enabled by default for all architectures except
      IA-64 and Tile-GX, whose symbols are too far apart to capture in this
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  4. 14 Oct, 2014 1 commit
  5. 08 Aug, 2014 1 commit
  6. 07 Apr, 2014 1 commit
  7. 15 Apr, 2013 1 commit
    • Chen Gang's avatar
      kernel: kallsyms: memory override issue, need check destination buffer length · e3f26752
      Chen Gang authored
        We don't export any symbols > 128 characters, but if we did then
        kallsyms_expand_symbol() would overflow the buffer handed to it.
        So we need check destination buffer length when copying.
        the related test:
          if we define an EXPORT function which name more than 128.
          will panic when call kallsyms_lookup_name by init_kprobes on booting.
          after check the length (provide this patch), it is ok.
          add additional destination buffer length parameter (maxlen)
          if uncompressed string is too long (>= maxlen), it will be truncated.
          not check the parameters whether valid, since it is a static function.
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
  8. 29 May, 2012 1 commit
    • Stephen Boyd's avatar
      vsprintf: fix %ps on non symbols when using kallsyms · 4796dd20
      Stephen Boyd authored
      Using %ps in a printk format will sometimes fail silently and print the
      empty string if the address passed in does not match a symbol that
      kallsyms knows about.  But using %pS will fall back to printing the full
      address if kallsyms can't find the symbol.  Make %ps act the same as %pS
      by falling back to printing the address.
      While we're here also make %ps print the module that a symbol comes from
      so that it matches what %pS already does.  Take this simple function for
      example (in a module):
      	static void test_printk(void)
      		int test;
      		pr_info("with pS: %pS\n", &test);
      		pr_info("with ps: %ps\n", &test);
      Before this patch:
       with pS: 0xdff7df44
       with ps:
      After this patch:
       with pS: 0xdff7df44
       with ps: 0xdff7df44
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  9. 24 Mar, 2011 1 commit
    • Namhyung Kim's avatar
      vsprintf: Introduce %pB format specifier · 0f77a8d3
      Namhyung Kim authored
      The %pB format specifier is for stack backtrace. Its handler
      sprint_backtrace() does symbol lookup using (address-1) to
      ensure the address will not point outside of the function.
      If there is a tail-call to the function marked "noreturn",
      gcc optimized out the code after the call then causes saved
      return address points outside of the function (i.e. the start
      of the next function), so pollutes call trace somewhat.
      This patch adds the %pB printk mechanism that allows architecture
      call-trace printout functions to improve backtrace printouts.
      Signed-off-by: default avatarNamhyung Kim <namhyung@gmail.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-arch@vger.kernel.org
      LKML-Reference: <1300934550-21394-1-git-send-email-namhyung@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  10. 23 Mar, 2011 2 commits
  11. 19 Nov, 2010 1 commit
    • Linus Torvalds's avatar
      Revert "kernel: make /proc/kallsyms mode 400 to reduce ease of attacking" · 33e0d57f
      Linus Torvalds authored
      This reverts commit 59365d13.
      It turns out that this can break certain existing user land setups.
      Quoth Sarah Sharp:
       "On Wednesday, I updated my branch to commit 460781b5 from linus' tree,
        and my box would not boot.  klogd segfaulted, which stalled the whole
        At first I thought it actually hung the box, but it continued booting
        after 5 minutes, and I was able to log in.  It dropped back to the
        text console instead of the graphical bootup display for that period
        of time.  dmesg surprisingly still works.  I've bisected the problem
        down to this commit (commit 59365d13)
        The box is running klogd 1.5.5ubuntu3 (from Jaunty).  Yes, I know
        that's old.  I read the bit in the commit about changing the
        permissions of kallsyms after boot, but if I can't boot that doesn't
      So let's just keep the old default, and encourage distributions to do
      the "chmod -r /proc/kallsyms" in their bootup scripts.  This is not
      worth a kernel option to change default behavior, since it's so easily
      done in user space.
      Reported-and-bisected-by: default avatarSarah Sharp <sarah.a.sharp@linux.intel.com>
      Cc: Marcus Meissner <meissner@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Eugene Teo <eugeneteo@kernel.org>
      Cc: Jesper Juhl <jj@chaosbits.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  12. 17 Nov, 2010 1 commit
    • Marcus Meissner's avatar
      kernel: make /proc/kallsyms mode 400 to reduce ease of attacking · 59365d13
      Marcus Meissner authored
      Making /proc/kallsyms readable only for root by default makes it
      slightly harder for attackers to write generic kernel exploits by
      removing one source of knowledge where things are in the kernel.
      This is the second submit, discussion happened on this on first submit
      and mostly concerned that this is just one hole of the sieve ...  but
      one of the bigger ones.
      Changing the permissions of at least System.map and vmlinux is also
      required to fix the same set, but a packaging issue.
      Target of this starter patch and follow ups is removing any kind of
      kernel space address information leak from the kernel.
      [ Side note: the default of root-only reading is the "safe" value, and
        it's easy enough to then override at any time after boot.  The /proc
        filesystem allows root to change the permissions with a regular
        chmod, so you can "revert" this at run-time by simply doing
          chmod og+r /proc/kallsyms
        as root if you really want regular users to see the kernel symbols.
        It does help some tools like "perf" figure them out without any
        setup, so it may well make sense in some situations.  - Linus ]
      Signed-off-by: default avatarMarcus Meissner <meissner@suse.de>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarEugene Teo <eugeneteo@kernel.org>
      Reviewed-by: default avatarJesper Juhl <jj@chaosbits.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  13. 21 May, 2010 1 commit
  14. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      The script does the followings.
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
      The conversion was done in the following steps.
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
      6. percpu.h was updated not to include slab.h.
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
  15. 10 Nov, 2009 1 commit
  16. 23 Sep, 2009 1 commit
  17. 09 Jun, 2009 1 commit
  18. 31 Mar, 2009 1 commit
  19. 14 Jan, 2009 1 commit
  20. 19 Dec, 2008 1 commit
    • Jan Beulich's avatar
      allow stripping of generated symbols under CONFIG_KALLSYMS_ALL · 9bb48247
      Jan Beulich authored
      Building upon parts of the module stripping patch, this patch
      introduces similar stripping for vmlinux when CONFIG_KALLSYMS_ALL=y.
      Using CONFIG_KALLSYMS_STRIP_GENERATED reduces the overhead of
      CONFIG_KALLSYMS_ALL from 245k/310k to 65k/80k for the (i386/x86-64)
      kernels I tested with.
      The patch also does away with the need to special case the kallsyms-
      internal symbols by making them available even in the first linking
      While it is a generated file, the patch includes the changes to
      scripts/genksyms/keywords.c_shipped, as I'm unsure what the procedure
      here is.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarSam Ravnborg <sam@ravnborg.org>
  21. 20 Nov, 2008 1 commit
  22. 16 Oct, 2008 1 commit
  23. 25 Jul, 2008 1 commit
  24. 29 Apr, 2008 1 commit
  25. 06 Feb, 2008 1 commit
  26. 29 Jan, 2008 1 commit
  27. 29 Nov, 2007 1 commit
  28. 17 Jul, 2007 1 commit
    • Tejun Heo's avatar
      kallsyms: make KSYM_NAME_LEN include space for trailing '\0' · 9281acea
      Tejun Heo authored
      KSYM_NAME_LEN is peculiar in that it does not include the space for the
      trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
      buffer.  This is nonsense and error-prone.  Moreover, when the caller
      forgets that it's very likely to subtly bite back by corrupting the stack
      because the last position of the buffer is always cleared to zero.
      This patch increments KSYM_NAME_LEN by one and updates code accordingly.
      * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
        is fixed.
      * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
        MODULE_NAME_LEN was treated as if it didn't include space for the
        trailing '\0'.  Fix it.
      Signed-off-by: default avatarTejun Heo <htejun@gmail.com>
      Acked-by: default avatarPaulo Marques <pmarques@grupopie.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  29. 16 Jul, 2007 1 commit
  30. 30 May, 2007 1 commit
  31. 08 May, 2007 6 commits
  32. 30 Apr, 2007 1 commit
  33. 08 Dec, 2006 1 commit
  34. 07 Dec, 2006 1 commit