1. 10 Jul, 2019 4 commits
    • Petr Mladek's avatar
      ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code() · 0c0b5477
      Petr Mladek authored
      commit d5b844a2cf507fc7642c9ae80a9d585db3065c28 upstream.
      
      The commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text
      permissions race") causes a possible deadlock between register_kprobe()
      and ftrace_run_update_code() when ftrace is using stop_machine().
      
      The existing dependency chain (in reverse order) is:
      
      -> #1 (text_mutex){+.+.}:
             validate_chain.isra.21+0xb32/0xd70
             __lock_acquire+0x4b8/0x928
             lock_acquire+0x102/0x230
             __mutex_lock+0x88/0x908
             mutex_lock_nested+0x32/0x40
             register_kprobe+0x254/0x658
             init_kprobes+0x11a/0x168
             do_one_initcall+0x70/0x318
             kernel_init_freeable+0x456/0x508
             kernel_init+0x22/0x150
             ret_from_fork+0x30/0x34
             kernel_thread_starter+0x0/0xc
      
      -> #0 (cpu_hotplug_lock.rw_sem){++++}:
             check_prev_add+0x90c/0xde0
             validate_chain.isra.21+0xb32/0xd70
             __lock_acquire+0x4b8/0x928
             lock_acquire+0x102/0x230
             cpus_read_lock+0x62/0xd0
             stop_machine+0x2e/0x60
             arch_ftrace_update_code+0x2e/0x40
             ftrace_run_update_code+0x40/0xa0
             ftrace_startup+0xb2/0x168
             register_ftrace_function+0x64/0x88
             klp_patch_object+0x1a2/0x290
             klp_enable_patch+0x554/0x980
             do_one_initcall+0x70/0x318
             do_init_module+0x6e/0x250
             load_module+0x1782/0x1990
             __s390x_sys_finit_module+0xaa/0xf0
             system_call+0xd8/0x2d0
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(text_mutex);
                                     lock(cpu_hotplug_lock.rw_sem);
                                     lock(text_mutex);
        lock(cpu_hotplug_lock.rw_sem);
      
      It is similar problem that has been solved by the commit 2d1e38f5
      ("kprobes: Cure hotplug lock ordering issues"). Many locks are involved.
      To be on the safe side, text_mutex must become a low level lock taken
      after cpu_hotplug_lock.rw_sem.
      
      This can't be achieved easily with the current ftrace design.
      For example, arm calls set_all_modules_text_rw() already in
      ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c.
      This functions is called:
      
        + outside stop_machine() from ftrace_run_update_code()
        + without stop_machine() from ftrace_module_enable()
      
      Fortunately, the problematic fix is needed only on x86_64. It is
      the only architecture that calls set_all_modules_text_rw()
      in ftrace path and supports livepatching at the same time.
      
      Therefore it is enough to move text_mutex handling from the generic
      kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c:
      
         ftrace_arch_code_modify_prepare()
         ftrace_arch_code_modify_post_process()
      
      This patch basically reverts the ftrace part of the problematic
      commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module
      text permissions race"). And provides x86_64 specific-fix.
      
      Some refactoring of the ftrace code will be needed when livepatching
      is implemented for arm or nds32. These architectures call
      set_all_modules_text_rw() and use stop_machine() at the same time.
      
      Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com
      
      Fixes: 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race")
      Acked-by: 's avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: 's avatarMiroslav Benes <mbenes@suse.cz>
      Reviewed-by: 's avatarMiroslav Benes <mbenes@suse.cz>
      Reviewed-by: 's avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: 's avatarPetr Mladek <pmladek@suse.com>
      [
        As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of
        ftrace_run_update_code() as it is a void function.
      ]
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c0b5477
    • Eiichi Tsukata's avatar
      tracing/snapshot: Resize spare buffer if size changed · 90b89546
      Eiichi Tsukata authored
      commit 46cc0b44428d0f0e81f11ea98217fc0edfbeab07 upstream.
      
      Current snapshot implementation swaps two ring_buffers even though their
      sizes are different from each other, that can cause an inconsistency
      between the contents of buffer_size_kb file and the current buffer size.
      
      For example:
      
        # cat buffer_size_kb
        7 (expanded: 1408)
        # echo 1 > events/enable
        # grep bytes per_cpu/cpu0/stats
        bytes: 1441020
        # echo 1 > snapshot             // current:1408, spare:1408
        # echo 123 > buffer_size_kb     // current:123,  spare:1408
        # echo 1 > snapshot             // current:1408, spare:123
        # grep bytes per_cpu/cpu0/stats
        bytes: 1443700
        # cat buffer_size_kb
        123                             // != current:1408
      
      And also, a similar per-cpu case hits the following WARNING:
      
      Reproducer:
      
        # echo 1 > per_cpu/cpu0/snapshot
        # echo 123 > buffer_size_kb
        # echo 1 > per_cpu/cpu0/snapshot
      
      WARNING:
      
        WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380
        Modules linked in:
        CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 #20
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
        RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380
        Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48
        RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093
        RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8
        RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005
        RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b
        R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000
        R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060
        FS:  00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0
        Call Trace:
         ? trace_array_printk_buf+0x140/0x140
         ? __mutex_lock_slowpath+0x10/0x10
         tracing_snapshot_write+0x4c8/0x7f0
         ? trace_printk_init_buffers+0x60/0x60
         ? selinux_file_permission+0x3b/0x540
         ? tracer_preempt_off+0x38/0x506
         ? trace_printk_init_buffers+0x60/0x60
         __vfs_write+0x81/0x100
         vfs_write+0x1e1/0x560
         ksys_write+0x126/0x250
         ? __ia32_sys_read+0xb0/0xb0
         ? do_syscall_64+0x1f/0x390
         do_syscall_64+0xc1/0x390
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This patch adds resize_buffer_duplicate_size() to check if there is a
      difference between current/spare buffer sizes and resize a spare buffer
      if necessary.
      
      Link: http://lkml.kernel.org/r/20190625012910.13109-1-devel@etsukata.com
      
      Cc: stable@vger.kernel.org
      Fixes: ad909e21 ("tracing: Add internal tracing_snapshot() functions")
      Signed-off-by: 's avatarEiichi Tsukata <devel@etsukata.com>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90b89546
    • Wei Li's avatar
      ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper() · f6880316
      Wei Li authored
      [ Upstream commit 04e03d9a616c19a47178eaca835358610e63a1dd ]
      
      The mapper may be NULL when called from register_ftrace_function_probe()
      with probe->data == NULL.
      
      This issue can be reproduced as follow (it may be covered by compiler
      optimization sometime):
      
      / # cat /sys/kernel/debug/tracing/set_ftrace_filter
      #### all functions enabled ####
      / # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter
      [  206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  206.952402] Mem abort info:
      [  206.952819]   ESR = 0x96000006
      [  206.955326]   Exception class = DABT (current EL), IL = 32 bits
      [  206.955844]   SET = 0, FnV = 0
      [  206.956272]   EA = 0, S1PTW = 0
      [  206.956652] Data abort info:
      [  206.957320]   ISV = 0, ISS = 0x00000006
      [  206.959271]   CM = 0, WnR = 0
      [  206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000
      [  206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000
      [  206.964953] Internal error: Oops: 96000006 [#1] SMP
      [  206.971122] Dumping ftrace buffer:
      [  206.973677]    (ftrace buffer empty)
      [  206.975258] Modules linked in:
      [  206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____))
      [  206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17
      [  206.978955] Hardware name: linux,dummy-virt (DT)
      [  206.979883] pstate: 60000005 (nZCv daif -PAN -UAO)
      [  206.980499] pc : free_ftrace_func_mapper+0x2c/0x118
      [  206.980874] lr : ftrace_count_free+0x68/0x80
      [  206.982539] sp : ffff0000182f3ab0
      [  206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700
      [  206.983632] x27: ffff000013054b40 x26: 0000000000000001
      [  206.984000] x25: ffff00001385f000 x24: 0000000000000000
      [  206.984394] x23: ffff000013453000 x22: ffff000013054000
      [  206.984775] x21: 0000000000000000 x20: ffff00001385fe28
      [  206.986575] x19: ffff000013872c30 x18: 0000000000000000
      [  206.987111] x17: 0000000000000000 x16: 0000000000000000
      [  206.987491] x15: ffffffffffffffb0 x14: 0000000000000000
      [  206.987850] x13: 000000000017430e x12: 0000000000000580
      [  206.988251] x11: 0000000000000000 x10: cccccccccccccccc
      [  206.988740] x9 : 0000000000000000 x8 : ffff000013917550
      [  206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000
      [  206.991008] x5 : ffff0000103da588 x4 : 0000000000000001
      [  206.991395] x3 : 0000000000000001 x2 : ffff000013872a28
      [  206.991771] x1 : 0000000000000000 x0 : 0000000000000000
      [  206.992557] Call trace:
      [  206.993101]  free_ftrace_func_mapper+0x2c/0x118
      [  206.994827]  ftrace_count_free+0x68/0x80
      [  206.995238]  release_probe+0xfc/0x1d0
      [  206.995555]  register_ftrace_function_probe+0x4a8/0x868
      [  206.995923]  ftrace_trace_probe_callback.isra.4+0xb8/0x180
      [  206.996330]  ftrace_dump_callback+0x50/0x70
      [  206.996663]  ftrace_regex_write.isra.29+0x290/0x3a8
      [  206.997157]  ftrace_filter_write+0x44/0x60
      [  206.998971]  __vfs_write+0x64/0xf0
      [  206.999285]  vfs_write+0x14c/0x2f0
      [  206.999591]  ksys_write+0xbc/0x1b0
      [  206.999888]  __arm64_sys_write+0x3c/0x58
      [  207.000246]  el0_svc_common.constprop.0+0x408/0x5f0
      [  207.000607]  el0_svc_handler+0x144/0x1c8
      [  207.000916]  el0_svc+0x8/0xc
      [  207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303)
      [  207.008388] ---[ end trace 7b6d11b5f542bdf1 ]---
      [  207.010126] Kernel panic - not syncing: Fatal exception
      [  207.011322] SMP: stopping secondary CPUs
      [  207.013956] Dumping ftrace buffer:
      [  207.014595]    (ftrace buffer empty)
      [  207.015632] Kernel Offset: disabled
      [  207.017187] CPU features: 0x002,20006008
      [  207.017985] Memory Limit: none
      [  207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.comSigned-off-by: 's avatarWei Li <liwei391@huawei.com>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      f6880316
    • Josh Poimboeuf's avatar
      module: Fix livepatch/ftrace module text permissions race · 2d1a9468
      Josh Poimboeuf authored
      [ Upstream commit 9f255b632bf12c4dd7fc31caee89aa991ef75176 ]
      
      It's possible for livepatch and ftrace to be toggling a module's text
      permissions at the same time, resulting in the following panic:
      
        BUG: unable to handle page fault for address: ffffffffc005b1d9
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0003) - permissions violation
        PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061
        Oops: 0003 [#1] PREEMPT SMP PTI
        CPU: 1 PID: 453 Comm: insmod Tainted: G           O  K   5.2.0-rc1-a188339ca5 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
        RIP: 0010:apply_relocate_add+0xbe/0x14c
        Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08
        RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246
        RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060
        RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000
        RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0
        R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018
        R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74
        FS:  00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         klp_init_object_loaded+0x10f/0x219
         ? preempt_latency_start+0x21/0x57
         klp_enable_patch+0x662/0x809
         ? virt_to_head_page+0x3a/0x3c
         ? kfree+0x8c/0x126
         patch_init+0x2ed/0x1000 [livepatch_test02]
         ? 0xffffffffc0060000
         do_one_initcall+0x9f/0x1c5
         ? kmem_cache_alloc_trace+0xc4/0xd4
         ? do_init_module+0x27/0x210
         do_init_module+0x5f/0x210
         load_module+0x1c41/0x2290
         ? fsnotify_path+0x3b/0x42
         ? strstarts+0x2b/0x2b
         ? kernel_read+0x58/0x65
         __do_sys_finit_module+0x9f/0xc3
         ? __do_sys_finit_module+0x9f/0xc3
         __x64_sys_finit_module+0x1a/0x1c
         do_syscall_64+0x52/0x61
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The above panic occurs when loading two modules at the same time with
      ftrace enabled, where at least one of the modules is a livepatch module:
      
      CPU0					CPU1
      klp_enable_patch()
        klp_init_object_loaded()
          module_disable_ro()
          					ftrace_module_enable()
      					  ftrace_arch_code_modify_post_process()
      				    	    set_all_modules_text_ro()
            klp_write_object_relocations()
              apply_relocate_add()
      	  *patches read-only code* - BOOM
      
      A similar race exists when toggling ftrace while loading a livepatch
      module.
      
      Fix it by ensuring that the livepatch and ftrace code patching
      operations -- and their respective permissions changes -- are protected
      by the text_mutex.
      
      Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.comReported-by: 's avatarJohannes Erdfelt <johannes@erdfelt.com>
      Fixes: 444d13ff ("modules: add ro_after_init support")
      Acked-by: 's avatarJessica Yu <jeyu@kernel.org>
      Reviewed-by: 's avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: 's avatarMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: 's avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      2d1a9468
  2. 03 Jul, 2019 1 commit
    • Sasha Levin's avatar
      Revert "x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP" · a8a53c76
      Sasha Levin authored
      This reverts commit 8190d6fb, which was
      upstream commit 4a6c91fbdef846ec7250b82f2eeeb87ac5f18cf9.
      
      On Tue, Jun 25, 2019 at 09:39:45AM +0200, Sebastian Andrzej Siewior wrote:
      >Please backport commit e74deb11931ff682b59d5b9d387f7115f689698e to
      >stable _or_ revert the backport of commit 4a6c91fbdef84 ("x86/uaccess,
      >ftrace: Fix ftrace_likely_update() vs. SMAP"). It uses
      >user_access_{save|restore}() which has been introduced in the following
      >commit.
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      a8a53c76
  3. 25 Jun, 2019 1 commit
    • Miguel Ojeda's avatar
      tracing: Silence GCC 9 array bounds warning · 50bbae7d
      Miguel Ojeda authored
      commit 0c97bf863efce63d6ab7971dad811601e6171d2f upstream.
      
      Starting with GCC 9, -Warray-bounds detects cases when memset is called
      starting on a member of a struct but the size to be cleared ends up
      writing over further members.
      
      Such a call happens in the trace code to clear, at once, all members
      after and including `seq` on struct trace_iterator:
      
          In function 'memset',
              inlined from 'ftrace_dump' at kernel/trace/trace.c:8914:3:
          ./include/linux/string.h:344:9: warning: '__builtin_memset' offset
          [8505, 8560] from the object at 'iter' is out of the bounds of
          referenced subobject 'seq' with type 'struct trace_seq' at offset
          4368 [-Warray-bounds]
            344 |  return __builtin_memset(p, c, size);
                |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      In order to avoid GCC complaining about it, we compute the address
      ourselves by adding the offsetof distance instead of referring
      directly to the member.
      
      Since there are two places doing this clear (trace.c and trace_kdb.c),
      take the chance to move the workaround into a single place in
      the internal header.
      
      Link: http://lkml.kernel.org/r/20190523124535.GA12931@gmail.comSigned-off-by: 's avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      [ Removed unnecessary parenthesis around "iter" ]
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50bbae7d
  4. 31 May, 2019 1 commit
  5. 25 May, 2019 1 commit
    • Elazar Leibovich's avatar
      tracing: Fix partial reading of trace event's id file · cfa82959
      Elazar Leibovich authored
      commit cbe08bcbbe787315c425dde284dcb715cfbf3f39 upstream.
      
      When reading only part of the id file, the ppos isn't tracked correctly.
      This is taken care by simple_read_from_buffer.
      
      Reading a single byte, and then the next byte would result EOF.
      
      While this seems like not a big deal, this breaks abstractions that
      reads information from files unbuffered. See for example
      https://github.com/golang/go/issues/29399
      
      This code was mentioned as problematic in
      commit cd458ba9
      ("tracing: Do not (ab)use trace_seq in event_id_read()")
      
      An example C code that show this bug is:
      
        #include <stdio.h>
        #include <stdint.h>
      
        #include <sys/types.h>
        #include <sys/stat.h>
        #include <fcntl.h>
        #include <unistd.h>
      
        int main(int argc, char **argv) {
          if (argc < 2)
            return 1;
          int fd = open(argv[1], O_RDONLY);
          char c;
          read(fd, &c, 1);
          printf("First  %c\n", c);
          read(fd, &c, 1);
          printf("Second %c\n", c);
        }
      
      Then run with, e.g.
      
        sudo ./a.out /sys/kernel/debug/tracing/events/tcp/tcp_set_state/id
      
      You'll notice you're getting the first character twice, instead of the
      first two characters in the id file.
      
      Link: http://lkml.kernel.org/r/20181231115837.4932-1-elazar@lightbitslabs.com
      
      Cc: Orit Wasserman <orit.was@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 23725aee ("ftrace: provide an id file for each event")
      Signed-off-by: 's avatarElazar Leibovich <elazar@lightbitslabs.com>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfa82959
  6. 16 May, 2019 1 commit
    • Steven Rostedt (VMware)'s avatar
      tracing/fgraph: Fix set_graph_function from showing interrupts · 02cf6220
      Steven Rostedt (VMware) authored
      [ Upstream commit 5cf99a0f3161bc3ae2391269d134d6bf7e26f00e ]
      
      The tracefs file set_graph_function is used to only function graph functions
      that are listed in that file (or all functions if the file is empty). The
      way this is implemented is that the function graph tracer looks at every
      function, and if the current depth is zero and the function matches
      something in the file then it will trace that function. When other functions
      are called, the depth will be greater than zero (because the original
      function will be at depth zero), and all functions will be traced where the
      depth is greater than zero.
      
      The issue is that when a function is first entered, and the handler that
      checks this logic is called, the depth is set to zero. If an interrupt comes
      in and a function in the interrupt handler is traced, its depth will be
      greater than zero and it will automatically be traced, even if the original
      function was not. But because the logic only looks at depth it may trace
      interrupts when it should not be.
      
      The recent design change of the function graph tracer to fix other bugs
      caused the depth to be zero while the function graph callback handler is
      being called for a longer time, widening the race of this happening. This
      bug was actually there for a longer time, but because the race window was so
      small it seldom happened. The Fixes tag below is for the commit that widen
      the race window, because that commit belongs to a series that will also help
      fix the original bug.
      
      Cc: stable@kernel.org
      Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack")
      Reported-by: 's avatarJoe Lawrence <joe.lawrence@redhat.com>
      Tested-by: 's avatarJoe Lawrence <joe.lawrence@redhat.com>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarSasha Levin <alexander.levin@microsoft.com>
      02cf6220
  7. 04 May, 2019 1 commit
  8. 02 May, 2019 3 commits
    • Peter Zijlstra's avatar
      trace: Fix preempt_enable_no_resched() abuse · 2c7bedd0
      Peter Zijlstra authored
      commit d6097c9e4454adf1f8f2c9547c2fa6060d55d952 upstream.
      
      Unless the very next line is schedule(), or implies it, one must not use
      preempt_enable_no_resched(). It can cause a preemption to go missing and
      thereby cause arbitrary delays, breaking the PREEMPT=y invariant.
      
      Link: http://lkml.kernel.org/r/20190423200318.GY14281@hirez.programming.kicks-ass.net
      
      Cc: Waiman Long <longman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: the arch/x86 maintainers <x86@kernel.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: huang ying <huang.ying.caritas@gmail.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: stable@vger.kernel.org
      Fixes: 2c2d7329 ("tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()")
      Signed-off-by: 's avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c7bedd0
    • Jann Horn's avatar
      tracing: Fix buffer_ref pipe ops · aec0d4aa
      Jann Horn authored
      commit b987222654f84f7b4ca95b3a55eca784cb30235b upstream.
      
      This fixes multiple issues in buffer_pipe_buf_ops:
      
       - The ->steal() handler must not return zero unless the pipe buffer has
         the only reference to the page. But generic_pipe_buf_steal() assumes
         that every reference to the pipe is tracked by the page's refcount,
         which isn't true for these buffers - buffer_pipe_buf_get(), which
         duplicates a buffer, doesn't touch the page's refcount.
         Fix it by using generic_pipe_buf_nosteal(), which refuses every
         attempted theft. It should be easy to actually support ->steal, but the
         only current users of pipe_buf_steal() are the virtio console and FUSE,
         and they also only use it as an optimization. So it's probably not worth
         the effort.
       - The ->get() and ->release() handlers can be invoked concurrently on pipe
         buffers backed by the same struct buffer_ref. Make them safe against
         concurrency by using refcount_t.
       - The pointers stored in ->private were only zeroed out when the last
         reference to the buffer_ref was dropped. As far as I know, this
         shouldn't be necessary anyway, but if we do it, let's always do it.
      
      Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
      Signed-off-by: 's avatarJann Horn <jannh@google.com>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aec0d4aa
    • Wenwen Wang's avatar
      tracing: Fix a memory leak by early error exit in trace_pid_write() · 5cc7ae88
      Wenwen Wang authored
      commit 91862cc7867bba4ee5c8fcf0ca2f1d30427b6129 upstream.
      
      In trace_pid_write(), the buffer for trace parser is allocated through
      kmalloc() in trace_parser_get_init(). Later on, after the buffer is used,
      it is then freed through kfree() in trace_parser_put(). However, it is
      possible that trace_pid_write() is terminated due to unexpected errors,
      e.g., ENOMEM. In that case, the allocated buffer will not be freed, which
      is a memory leak bug.
      
      To fix this issue, free the allocated buffer when an error is encountered.
      
      Link: http://lkml.kernel.org/r/1555726979-15633-1-git-send-email-wang6495@umn.edu
      
      Fixes: f4d34a87 ("tracing: Use pid bitmap instead of a pid array for set_event_pid")
      Cc: stable@vger.kernel.org
      Signed-off-by: 's avatarWenwen Wang <wang6495@umn.edu>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5cc7ae88
  9. 27 Apr, 2019 1 commit
  10. 05 Apr, 2019 1 commit
    • Douglas Anderson's avatar
      tracing: kdb: Fix ftdump to not sleep · 2d412eb3
      Douglas Anderson authored
      [ Upstream commit 31b265b3baaf55f209229888b7ffea523ddab366 ]
      
      As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
      BUG for "sleeping function called from invalid context".
      
      kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
      atomic context.  A very simple solution for this is to add allocation
      flags to ring_buffer_read_prepare() so kdb can call it without
      triggering the allocation error.  This patch does that.
      
      Note that in the original email thread about this, it was suggested
      that perhaps the solution for kdb was to either preallocate the buffer
      ahead of time or create our own iterator.  I'm hoping that this
      alternative of adding allocation flags to ring_buffer_read_prepare()
      can be considered since it means I don't need to duplicate more of the
      core trace code into "trace_kdb.c" (for either creating my own
      iterator or re-preparing a ring allocator whose memory was already
      allocated).
      
      NOTE: another option for kdb is to actually figure out how to make it
      reuse the existing ftrace_dump() function and totally eliminate the
      duplication.  This sounds very appealing and actually works (the "sr
      z" command can be seen to properly dump the ftrace buffer).  The
      downside here is that ftrace_dump() fully consumes the trace buffer.
      Unless that is changed I'd rather not use it because it means "ftdump
      | grep xyz" won't be very useful to search the ftrace buffer since it
      will throw away the whole trace on the first grep.  A future patch to
      dump only the last few lines of the buffer will also be hard to
      implement.
      
      [1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com
      
      Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.orgReported-by: 's avatarBrian Norris <briannorris@chromium.org>
      Signed-off-by: 's avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      2d412eb3
  11. 23 Mar, 2019 2 commits
  12. 27 Feb, 2019 1 commit
  13. 20 Feb, 2019 1 commit
    • Andreas Ziegler's avatar
      tracing/uprobes: Fix output for multiple string arguments · 092fc755
      Andreas Ziegler authored
      commit 0722069a5374b904ec1a67f91249f90e1cfae259 upstream.
      
      When printing multiple uprobe arguments as strings the output for the
      earlier arguments would also include all later string arguments.
      
      This is best explained in an example:
      
      Consider adding a uprobe to a function receiving two strings as
      parameters which is at offset 0xa0 in strlib.so and we want to print
      both parameters when the uprobe is hit (on x86_64):
      
      $ echo 'p:func /lib/strlib.so:0xa0 +0(%di):string +0(%si):string' > \
          /sys/kernel/debug/tracing/uprobe_events
      
      When the function is called as func("foo", "bar") and we hit the probe,
      the trace file shows a line like the following:
      
        [...] func: (0x7f7e683706a0) arg1="foobar" arg2="bar"
      
      Note the extra "bar" printed as part of arg1. This behaviour stacks up
      for additional string arguments.
      
      The strings are stored in a dynamically growing part of the uprobe
      buffer by fetch_store_string() after copying them from userspace via
      strncpy_from_user(). The return value of strncpy_from_user() is then
      directly used as the required size for the string. However, this does
      not take the terminating null byte into account as the documentation
      for strncpy_from_user() cleary states that it "[...] returns the
      length of the string (not including the trailing NUL)" even though the
      null byte will be copied to the destination.
      
      Therefore, subsequent calls to fetch_store_string() will overwrite
      the terminating null byte of the most recently fetched string with
      the first character of the current string, leading to the
      "accumulation" of strings in earlier arguments in the output.
      
      Fix this by incrementing the return value of strncpy_from_user() by
      one if we did not hit the maximum buffer size.
      
      Link: http://lkml.kernel.org/r/20190116141629.5752-1-andreas.ziegler@fau.de
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 5baaa59e ("tracing/probes: Implement 'memory' fetch method for uprobes")
      Acked-by: 's avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: 's avatarAndreas Ziegler <andreas.ziegler@fau.de>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      092fc755
  14. 15 Feb, 2019 1 commit
  15. 21 Dec, 2018 2 commits
  16. 17 Dec, 2018 1 commit
  17. 29 Sep, 2018 1 commit
  18. 23 Sep, 2018 4 commits
    • Philippe Gerum's avatar
      ftrace: ipipe: rely on fully atomic stop_machine() handler · 02637b95
      Philippe Gerum authored
      Now that stop_machine() guarantees fully atomic execution of the stop
      routine via hard interrupt disabling, there is no point in using
      ipipe_critical_enter/exit() for the same purpose in order to patch the
      kernel text.
      02637b95
    • Philippe Gerum's avatar
      lockdep: ipipe: improve detection of out-of-band contexts · 8f474779
      Philippe Gerum authored
      trace_hardirqs_on_virt[_caller]() must be invoked instead of
      trace_hardirqs_on[_caller]() from assembly sites before returning from
      an interrupt/fault, so that the virtual IRQ disable state is checked
      for before switching the tracer's logic state to ON.
      
      This is required as an interrupt may be received and handled by the
      pipeline core although not forwarded to the root domain, when
      interrupts are virtually disabled. In such a case, we want to
      reconcile the tracer's logic with the effect of interrupt pipelining.
      8f474779
    • Philippe Gerum's avatar
      lockdep: ipipe: make the logic aware of interrupt pipelining · a8c071b1
      Philippe Gerum authored
      The lockdep engine will check for the current interrupt state as part
      of the locking validation process, which must encompass:
      
      - the CPU interrupt state
      - the current pipeline domain
      - the virtual interrupt disable flag
      
      so that we can traverse the tracepoints from any context sanely and
      safely.
      
      In addition trace_hardirqs_on_virt_caller() should be called by the
      arch-dependent code when tracking the interrupt state before returning
      to user-space after a kernel entry (exceptions, IRQ). This makes sure
      that the tracking logic only applies to the root domain, and considers
      the virtual disable flag exclusively.
      
      For instance, the kernel may be entered when interrupts are (only)
      virtually disabled for the root domain (i.e. stalled), and we should
      tell the IRQ tracing logic that IRQs are about to be enabled back only
      if the root domain is unstalled before leaving to user-space. In such
      a context, the state of the interrupt bit in the CPU would be
      irrelevant.
      a8c071b1
    • Philippe Gerum's avatar
      ftrace: ipipe: enable tracing from the head domain · e10253e0
      Philippe Gerum authored
      Enabling ftrace for a co-kernel running in the head domain of a
      pipelined interrupt context means to:
      
      - make sure that ftrace's live kernel code patching still runs
        unpreempted by any head domain activity (so that the latter can't
        tread on invalid or half-baked changes in the .text section).
      
      - allow the co-kernel code running in the head domain to traverse
        ftrace's tracepoints safely.
      
      The changes introduced by this commit ensure this by fixing up some
      key critical sections so that interrupts are still disabled in the
      CPU, undoing the interrupt flag virtualization in those particular
      cases.
      e10253e0
  19. 09 Sep, 2018 3 commits
  20. 05 Sep, 2018 1 commit
    • Petr Mladek's avatar
      printk/nmi: Prevent deadlock when accessing the main log buffer in NMI · cd71265a
      Petr Mladek authored
      commit 03fc7f9c upstream.
      
      The commit 719f6a70 ("printk: Use the main logbuf in NMI
      when logbuf_lock is available") brought back the possible deadlocks
      in printk() and NMI.
      
      The check of logbuf_lock is done only in printk_nmi_enter() to prevent
      mixed output. But another CPU might take the lock later, enter NMI, and:
      
            + Both NMIs might be serialized by yet another lock, for example,
      	the one in nmi_cpu_backtrace().
      
            + The other CPU might get stopped in NMI, see smp_send_stop()
      	in panic().
      
      The only safe solution is to use trylock when storing the message
      into the main log-buffer. It might cause reordering when some lines
      go to the main lock buffer directly and others are delayed via
      the per-CPU buffer. It means that it is not useful in general.
      
      This patch replaces the problematic NMI deferred context with NMI
      direct context. It can be used to mark a code that might produce
      many messages in NMI and the risk of losing them is more critical
      than problems with eventual reordering.
      
      The context is then used when dumping trace buffers on oops. It was
      the primary motivation for the original fix. Also the reordering is
      even smaller issue there because some traces have their own time stamps.
      
      Finally, nmi_cpu_backtrace() need not longer be serialized because
      it will always us the per-CPU buffers again.
      
      Fixes: 719f6a70 ("printk: Use the main logbuf in NMI when logbuf_lock is available")
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180627142028.11259-1-pmladek@suse.com
      To: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Acked-by: 's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: 's avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd71265a
  21. 24 Aug, 2018 1 commit
  22. 09 Aug, 2018 1 commit
    • Masami Hiramatsu's avatar
      ring_buffer: tracing: Inherit the tracing setting to next ring buffer · 60baabc3
      Masami Hiramatsu authored
      commit 73c8d894 upstream.
      
      Maintain the tracing on/off setting of the ring_buffer when switching
      to the trace buffer snapshot.
      
      Taking a snapshot is done by swapping the backup ring buffer
      (max_tr_buffer). But since the tracing on/off setting is defined
      by the ring buffer, when swapping it, the tracing on/off setting
      can also be changed. This causes a strange result like below:
      
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 0 > tracing_on
        /sys/kernel/debug/tracing # cat tracing_on
        0
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        0
      
      We don't touch tracing_on, but snapshot changes tracing_on
      setting each time. This is an anomaly, because user doesn't know
      that each "ring_buffer" stores its own tracing-enable state and
      the snapshot is done by swapping ring buffers.
      
      Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
      Cc: stable@vger.kernel.org
      Fixes: debdd57f ("tracing: Make a snapshot feature available from userspace")
      Signed-off-by: 's avatarMasami Hiramatsu <mhiramat@kernel.org>
      [ Updated commit log and comment in the code ]
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60baabc3
  23. 03 Aug, 2018 4 commits
    • Steven Rostedt (VMware)'s avatar
      tracing: Quiet gcc warning about maybe unused link variable · 4681e882
      Steven Rostedt (VMware) authored
      commit 2519c1bb upstream.
      
      Commit 57ea2a34 ("tracing/kprobes: Fix trace_probe flags on
      enable_trace_kprobe() failure") added an if statement that depends on another
      if statement that gcc doesn't see will initialize the "link" variable and
      gives the warning:
      
       "warning: 'link' may be used uninitialized in this function"
      
      It is really a false positive, but to quiet the warning, and also to make
      sure that it never actually is used uninitialized, initialize the "link"
      variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler
      thinks it could be used uninitialized.
      
      Cc: stable@vger.kernel.org
      Fixes: 57ea2a34 ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure")
      Reported-by: 's avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4681e882
    • Artem Savkov's avatar
      tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure · 86428ec1
      Artem Savkov authored
      commit 57ea2a34 upstream.
      
      If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
      it returns an error, but does not unset the tp flags it set previously.
      This results in a probe being considered enabled and failures like being
      unable to remove the probe through kprobe_events file since probes_open()
      expects every probe to be disabled.
      
      Link: http://lkml.kernel.org/r/20180725102826.8300-1-asavkov@redhat.com
      Link: http://lkml.kernel.org/r/20180725142038.4765-1-asavkov@redhat.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 41a7dd42 ("tracing/kprobes: Support ftrace_event_file base multibuffer")
      Acked-by: 's avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: 's avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: 's avatarArtem Savkov <asavkov@redhat.com>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86428ec1
    • Steven Rostedt (VMware)'s avatar
      tracing: Fix possible double free in event_enable_trigger_func() · 10419b0c
      Steven Rostedt (VMware) authored
      commit 15cc7864 upstream.
      
      There was a case that triggered a double free in event_trigger_callback()
      due to the called reg() function freeing the trigger_data and then it
      getting freed again by the error return by the caller. The solution there
      was to up the trigger_data ref count.
      
      Code inspection found that event_enable_trigger_func() has the same issue,
      but is not as easy to trigger (requires harder to trigger failures). It
      needs to be solved slightly different as it needs more to clean up when the
      reg() function fails.
      
      Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: 7862ad18 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands")
      Reivewed-by: 's avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      10419b0c
    • Steven Rostedt (VMware)'s avatar
      tracing: Fix double free of event_trigger_data · 9158a7de
      Steven Rostedt (VMware) authored
      commit 1863c387 upstream.
      
      Running the following:
      
       # cd /sys/kernel/debug/tracing
       # echo 500000 > buffer_size_kb
      [ Or some other number that takes up most of memory ]
       # echo snapshot > events/sched/sched_switch/trigger
      
      Triggers the following bug:
      
       ------------[ cut here ]------------
       kernel BUG at mm/slub.c:296!
       invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
       CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
       RIP: 0010:kfree+0x16c/0x180
       Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f
       RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246
       RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80
       RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500
       RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be
       R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be
       R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00
       FS:  00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0
       Call Trace:
        event_trigger_callback+0xee/0x1d0
        event_trigger_write+0xfc/0x1a0
        __vfs_write+0x33/0x190
        ? handle_mm_fault+0x115/0x230
        ? _cond_resched+0x16/0x40
        vfs_write+0xb0/0x190
        ksys_write+0x52/0xc0
        do_syscall_64+0x5a/0x160
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7f363e16ab50
       Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24
       RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
       RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50
       RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001
       RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700
       R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009
       R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0
       Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper
      86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e
       ---[ end trace d301afa879ddfa25 ]---
      
      The cause is because the register_snapshot_trigger() call failed to
      allocate the snapshot buffer, and then called unregister_trigger()
      which freed the data that was passed to it. Then on return to the
      function that called register_snapshot_trigger(), as it sees it
      failed to register, it frees the trigger_data again and causes
      a double free.
      
      By calling event_trigger_init() on the trigger_data (which only ups
      the reference counter for it), and then event_trigger_free() afterward,
      the trigger_data would not get freed by the registering trigger function
      as it would only up and lower the ref count for it. If the register
      trigger function fails, then the event_trigger_free() called after it
      will free the trigger data normally.
      
      Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home
      
      Cc: stable@vger.kerne.org
      Fixes: 93e31ffb ("tracing: Add 'snapshot' event trigger command")
      Reported-by: 's avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: 's avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9158a7de
  24. 17 Jul, 2018 1 commit
    • Joel Fernandes (Google)'s avatar
      tracing: Reorder display of TGID to be after PID · 36244e3a
      Joel Fernandes (Google) authored
      commit f8494fa3 upstream.
      
      Currently ftrace displays data in trace output like so:
      
                                             _-----=> irqs-off
                                            / _----=> need-resched
                                           | / _---=> hardirq/softirq
                                           || / _--=> preempt-depth
                                           ||| /     delay
                  TASK-PID   CPU    TGID   ||||    TIMESTAMP  FUNCTION
                     | |       |      |    ||||       |         |
                  bash-1091  [000] ( 1091) d..2    28.313544: sched_switch:
      
      However Android's trace visualization tools expect a slightly different
      format due to an out-of-tree patch patch that was been carried for a
      decade, notice that the TGID and CPU fields are reversed:
      
                                             _-----=> irqs-off
                                            / _----=> need-resched
                                           | / _---=> hardirq/softirq
                                           || / _--=> preempt-depth
                                           ||| /     delay
                  TASK-PID    TGID   CPU   ||||    TIMESTAMP  FUNCTION
                     | |        |      |   ||||       |         |
                  bash-1091  ( 1091) [002] d..2    64.965177: sched_switch:
      
      From kernel v4.13 onwards, during which TGID was introduced, tracing
      with systrace on all Android kernels will break (most Android kernels
      have been on 4.9 with Android patches, so this issues hasn't been seen
      yet). From v4.13 onwards things will break.
      
      The chrome browser's tracing tools also embed the systrace viewer which
      uses the legacy TGID format and updates to that are known to be
      difficult to make.
      
      Considering this, I suggest we make this change to the upstream kernel
      and backport it to all Android kernels. I believe this feature is merged
      recently enough into the upstream kernel that it shouldn't be a problem.
      Also logically, IMO it makes more sense to group the TGID with the
      TASK-PID and the CPU after these.
      
      Link: http://lkml.kernel.org/r/20180626000822.113931-1-joel@joelfernandes.org
      
      Cc: jreck@google.com
      Cc: tkjos@google.com
      Cc: stable@vger.kernel.org
      Fixes: 441dae8f ("tracing: Add support for display of tgid in trace output")
      Signed-off-by: 's avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36244e3a
  25. 11 Jul, 2018 1 commit
    • Changbin Du's avatar
      tracing: Fix missing return symbol in function_graph output · 54f1da1f
      Changbin Du authored
      commit 1fe4293f upstream.
      
      The function_graph tracer does not show the interrupt return marker for the
      leaf entry. On leaf entries, we see an unbalanced interrupt marker (the
      interrupt was entered, but nevern left).
      
      Before:
       1)               |  SyS_write() {
       1)               |    __fdget_pos() {
       1)   0.061 us    |      __fget_light();
       1)   0.289 us    |    }
       1)               |    vfs_write() {
       1)   0.049 us    |      rw_verify_area();
       1) + 15.424 us   |      __vfs_write();
       1)   ==========> |
       1)   6.003 us    |      smp_apic_timer_interrupt();
       1)   0.055 us    |      __fsnotify_parent();
       1)   0.073 us    |      fsnotify();
       1) + 23.665 us   |    }
       1) + 24.501 us   |  }
      
      After:
       0)               |  SyS_write() {
       0)               |    __fdget_pos() {
       0)   0.052 us    |      __fget_light();
       0)   0.328 us    |    }
       0)               |    vfs_write() {
       0)   0.057 us    |      rw_verify_area();
       0)               |      __vfs_write() {
       0)   ==========> |
       0)   8.548 us    |      smp_apic_timer_interrupt();
       0)   <========== |
       0) + 36.507 us   |      } /* __vfs_write */
       0)   0.049 us    |      __fsnotify_parent();
       0)   0.066 us    |      fsnotify();
       0) + 50.064 us   |    }
       0) + 50.952 us   |  }
      
      Link: http://lkml.kernel.org/r/1517413729-20411-1-git-send-email-changbin.du@intel.com
      
      Cc: stable@vger.kernel.org
      Fixes: f8b755ac ("tracing/function-graph-tracer: Output arrows signal on hardirq call/return")
      Signed-off-by: 's avatarChangbin Du <changbin.du@intel.com>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54f1da1f