1. 10 Jul, 2019 1 commit
    • Jann Horn's avatar
      ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME · bf71ef96
      Jann Horn authored
      commit 6994eefb0053799d2e07cd140df6c2ea106c41ee upstream.
      
      Fix two issues:
      
      When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU
      reference to the parent's objective credentials, then give that pointer
      to get_cred().  However, the object lifetime rules for things like
      struct cred do not permit unconditionally turning an RCU reference into
      a stable reference.
      
      PTRACE_TRACEME records the parent's credentials as if the parent was
      acting as the subject, but that's not the case.  If a malicious
      unprivileged child uses PTRACE_TRACEME and the parent is privileged, and
      at a later point, the parent process becomes attacker-controlled
      (because it drops privileges and calls execve()), the attacker ends up
      with control over two processes with a privileged ptrace relationship,
      which can be abused to ptrace a suid binary and obtain root privileges.
      
      Fix both of these by always recording the credentials of the process
      that is requesting the creation of the ptrace relationship:
      current_cred() can't change under us, and current is the proper subject
      for access control.
      
      This change is theoretically userspace-visible, but I am not aware of
      any code that it will actually break.
      
      Fixes: 64b875f7 ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf71ef96
  2. 19 Jun, 2019 2 commits
    • Jann Horn's avatar
      ptrace: restore smp_rmb() in __ptrace_may_access() · 7013ea8d
      Jann Horn authored
      commit f6581f5b55141a95657ef5742cf6a6bfa20a109f upstream.
      
      Restore the read memory barrier in __ptrace_may_access() that was deleted
      a couple years ago. Also add comments on this barrier and the one it pairs
      with to explain why they're there (as far as I understand).
      
      Fixes: bfedb589 ("mm: Add a user_ns owner to mm_struct and fix ptrace permission checks")
      Cc: stable@vger.kernel.org
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7013ea8d
    • Eric W. Biederman's avatar
      signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO · 50f806a5
      Eric W. Biederman authored
      [ Upstream commit f6e2aa91a46d2bc79fce9b93a988dbe7655c90c0 ]
      
      Recently syzbot in conjunction with KMSAN reported that
      ptrace_peek_siginfo can copy an uninitialized siginfo to userspace.
      Inspecting ptrace_peek_siginfo confirms this.
      
      The problem is that off when initialized from args.off can be
      initialized to a negaive value.  At which point the "if (off >= 0)"
      test to see if off became negative fails because off started off
      negative.
      
      Prevent the core problem by adding a variable found that is only true
      if a siginfo is found and copied to a temporary in preparation for
      being copied to userspace.
      
      Prevent args.off from being truncated when being assigned to off by
      testing that off is <= the maximum possible value of off.  Convert off
      to an unsigned long so that we should not have to truncate args.off,
      we have well defined overflow behavior so if we add another check we
      won't risk fighting undefined compiler behavior, and so that we have a
      type whose maximum value is easy to test for.
      
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+0d602a1b0d8c95bdf299@syzkaller.appspotmail.com
      Fixes: 84c751bd ("ptrace: add ability to retrieve signals without removing from a queue (v4)")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      50f806a5
  3. 04 May, 2019 1 commit
  4. 31 Jan, 2019 1 commit
  5. 05 Dec, 2018 2 commits
    • Thomas Gleixner's avatar
      ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS · dae4d590
      Thomas Gleixner authored
      commit 46f7ecb1e7359f183f5bbd1e08b90e10e52164f9 upstream
      
      The IBPB control code in x86 removed the usage. Remove the functionality
      which was introduced for this.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dae4d590
    • Jiri Kosina's avatar
      x86/speculation: Apply IBPB more strictly to avoid cross-process data leak · 4741e319
      Jiri Kosina authored
      commit dbfe2953f63c640463c630746cd5d9de8b2f63ae upstream
      
      Currently, IBPB is only issued in cases when switching into a non-dumpable
      process, the rationale being to protect such 'important and security
      sensitive' processess (such as GPG) from data leaking into a different
      userspace process via spectre v2.
      
      This is however completely insufficient to provide proper userspace-to-userpace
      spectrev2 protection, as any process can poison branch buffers before being
      scheduled out, and the newly scheduled process immediately becomes spectrev2
      victim.
      
      In order to minimize the performance impact (for usecases that do require
      spectrev2 protection), issue the barrier only in cases when switching between
      processess where the victim can't be ptraced by the potential attacker (as in
      such cases, the attacker doesn't have to bother with branch buffers at all).
      
      [ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
        PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
        fine-grained ]
      
      Fixes: 18bf3c3e ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
      Originally-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pmSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4741e319
  6. 24 Jul, 2017 1 commit
    • Eric W. Biederman's avatar
      signal: Remove kernel interal si_code magic · cc731525
      Eric W. Biederman authored
      struct siginfo is a union and the kernel since 2.4 has been hiding a union
      tag in the high 16bits of si_code using the values:
      __SI_KILL
      __SI_TIMER
      __SI_POLL
      __SI_FAULT
      __SI_CHLD
      __SI_RT
      __SI_MESGQ
      __SI_SYS
      
      While this looks plausible on the surface, in practice this situation has
      not worked well.
      
      - Injected positive signals are not copied to user space properly
        unless they have these magic high bits set.
      
      - Injected positive signals are not reported properly by signalfd
        unless they have these magic high bits set.
      
      - These kernel internal values leaked to userspace via ptrace_peek_siginfo
      
      - It was possible to inject these kernel internal values and cause the
        the kernel to misbehave.
      
      - Kernel developers got confused and expected these kernel internal values
        in userspace in kernel self tests.
      
      - Kernel developers got confused and set si_code to __SI_FAULT which
        is SI_USER in userspace which causes userspace to think an ordinary user
        sent the signal and that it was not kernel generated.
      
      - The values make it impossible to reorganize the code to transform
        siginfo_copy_to_user into a plain copy_to_user.  As si_code must
        be massaged before being passed to userspace.
      
      So remove these kernel internal si codes and make the kernel code simpler
      and more maintainable.
      
      To replace these kernel internal magic si_codes introduce the helper
      function siginfo_layout, that takes a signal number and an si_code and
      computes which union member of siginfo is being used.  Have
      siginfo_layout return an enumeration so that gcc will have enough
      information to warn if a switch statement does not handle all of union
      members.
      
      A couple of architectures have a messed up ABI that defines signal
      specific duplications of SI_USER which causes more special cases in
      siginfo_layout than I would like.  The good news is only problem
      architectures pay the cost.
      
      Update all of the code that used the previous magic __SI_ values to
      use the new SIL_ values and to call siginfo_layout to get those
      values.  Escept where not all of the cases are handled remove the
      defaults in the switch statements so that if a new case is missed in
      the future the lack will show up at compile time.
      
      Modify the code that copies siginfo si_code to userspace to just copy
      the value and not cast si_code to a short first.  The high bits are no
      longer used to hold a magic union member.
      
      Fixup the siginfo header files to stop including the __SI_ values in
      their constants and for the headers that were missing it to properly
      update the number of si_codes for each signal type.
      
      The fixes to copy_siginfo_from_user32 implementations has the
      interesting property that several of them perviously should never have
      worked as the __SI_ values they depended up where kernel internal.
      With that dependency gone those implementations should work much
      better.
      
      The idea of not passing the __SI_ values out to userspace and then
      not reinserting them has been tested with criu and criu worked without
      changes.
      
      Ref: 2.4.0-test1
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      cc731525
  7. 23 May, 2017 1 commit
    • Eric W. Biederman's avatar
      ptrace: Properly initialize ptracer_cred on fork · c70d9d80
      Eric W. Biederman authored
      When I introduced ptracer_cred I failed to consider the weirdness of
      fork where the task_struct copies the old value by default.  This
      winds up leaving ptracer_cred set even when a process forks and
      the child process does not wind up being ptraced.
      
      Because ptracer_cred is not set on non-ptraced processes whose
      parents were ptraced this has broken the ability of the enlightenment
      window manager to start setuid children.
      
      Fix this by properly initializing ptracer_cred in ptrace_init_task
      
      This must be done with a little bit of care to preserve the current value
      of ptracer_cred when ptrace carries through fork.  Re-reading the
      ptracer_cred from the ptracing process at this point is inconsistent
      with how PT_PTRACE_CAP has been maintained all of these years.
      Tested-by: default avatarTakashi Iwai <tiwai@suse.de>
      Fixes: 64b875f7 ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      c70d9d80
  8. 08 Apr, 2017 1 commit
  9. 02 Mar, 2017 3 commits
  10. 22 Nov, 2016 3 commits
    • Eric W. Biederman's avatar
      ptrace: Don't allow accessing an undumpable mm · 84d77d3f
      Eric W. Biederman authored
      It is the reasonable expectation that if an executable file is not
      readable there will be no way for a user without special privileges to
      read the file.  This is enforced in ptrace_attach but if ptrace
      is already attached before exec there is no enforcement for read-only
      executables.
      
      As the only way to read such an mm is through access_process_vm
      spin a variant called ptrace_access_vm that will fail if the
      target process is not being ptraced by the current process, or
      the current process did not have sufficient privileges when ptracing
      began to read the target processes mm.
      
      In the ptrace implementations replace access_process_vm by
      ptrace_access_vm.  There remain several ptrace sites that still use
      access_process_vm as they are reading the target executables
      instructions (for kernel consumption) or register stacks.  As such it
      does not appear necessary to add a permission check to those calls.
      
      This bug has always existed in Linux.
      
      Fixes: v1.0
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      84d77d3f
    • Eric W. Biederman's avatar
      ptrace: Capture the ptracer's creds not PT_PTRACE_CAP · 64b875f7
      Eric W. Biederman authored
      When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
      overlooked.  This can result in incorrect behavior when an application
      like strace traces an exec of a setuid executable.
      
      Further PT_PTRACE_CAP does not have enough information for making good
      security decisions as it does not report which user namespace the
      capability is in.  This has already allowed one mistake through
      insufficient granulariy.
      
      I found this issue when I was testing another corner case of exec and
      discovered that I could not get strace to set PT_PTRACE_CAP even when
      running strace as root with a full set of caps.
      
      This change fixes the above issue with strace allowing stracing as
      root a setuid executable without disabling setuid.  More fundamentaly
      this change allows what is allowable at all times, by using the correct
      information in it's decision.
      
      Cc: stable@vger.kernel.org
      Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      64b875f7
    • Eric W. Biederman's avatar
      mm: Add a user_ns owner to mm_struct and fix ptrace permission checks · bfedb589
      Eric W. Biederman authored
      During exec dumpable is cleared if the file that is being executed is
      not readable by the user executing the file.  A bug in
      ptrace_may_access allows reading the file if the executable happens to
      enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
      unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).
      
      This problem is fixed with only necessary userspace breakage by adding
      a user namespace owner to mm_struct, captured at the time of exec, so
      it is clear in which user namespace CAP_SYS_PTRACE must be present in
      to be able to safely give read permission to the executable.
      
      The function ptrace_may_access is modified to verify that the ptracer
      has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
      This ensures that if the task changes it's cred into a subordinate
      user namespace it does not become ptraceable.
      
      The function ptrace_attach is modified to only set PT_PTRACE_CAP when
      CAP_SYS_PTRACE is held over task->mm->user_ns.  The intent of
      PT_PTRACE_CAP is to be a flag to note that whatever permission changes
      the task might go through the tracer has sufficient permissions for
      it not to be an issue.  task->cred->user_ns is always the same
      as or descendent of mm->user_ns.  Which guarantees that having
      CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
      credentials.
      
      To prevent regressions mm->dumpable and mm->user_ns are not considered
      when a task has no mm.  As simply failing ptrace_may_attach causes
      regressions in privileged applications attempting to read things
      such as /proc/<pid>/stat
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Fixes: 8409cca7 ("userns: allow ptrace from non-init user namespaces")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      bfedb589
  11. 19 Oct, 2016 1 commit
  12. 11 Oct, 2016 1 commit
  13. 04 Aug, 2016 1 commit
    • Masahiro Yamada's avatar
      tree-wide: replace config_enabled() with IS_ENABLED() · 97f2645f
      Masahiro Yamada authored
      The use of config_enabled() against config options is ambiguous.  In
      practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
      author might have used it for the meaning of IS_ENABLED().  Using
      IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
      clearer.
      
      This commit replaces config_enabled() with IS_ENABLED() where possible.
      This commit is only touching bool config options.
      
      I noticed two cases where config_enabled() is used against a tristate
      option:
      
       - config_enabled(CONFIG_HWMON)
        [ drivers/net/wireless/ath/ath10k/thermal.c ]
      
       - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
        [ drivers/gpu/drm/gma500/opregion.c ]
      
      I did not touch them because they should be converted to IS_BUILTIN()
      in order to keep the logic, but I was not sure it was the authors'
      intention.
      
      Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Nikolay Martynov <mar.kolya@gmail.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: James Cowgill <James.Cowgill@imgtec.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Tony Wu <tung7970@gmail.com>
      Cc: Huaitong Han <huaitong.han@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Cc: "Maciej W. Rozycki" <macro@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      97f2645f
  14. 22 Mar, 2016 2 commits
  15. 21 Jan, 2016 2 commits
    • Jann Horn's avatar
      ptrace: use fsuid, fsgid, effective creds for fs access checks · caaee623
      Jann Horn authored
      By checking the effective credentials instead of the real UID / permitted
      capabilities, ensure that the calling process actually intended to use its
      credentials.
      
      To ensure that all ptrace checks use the correct caller credentials (e.g.
      in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
      flag), use two new flags and require one of them to be set.
      
      The problem was that when a privileged task had temporarily dropped its
      privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
      perform following syscalls with the credentials of a user, it still passed
      ptrace access checks that the user would not be able to pass.
      
      While an attacker should not be able to convince the privileged task to
      perform a ptrace() syscall, this is a problem because the ptrace access
      check is reused for things in procfs.
      
      In particular, the following somewhat interesting procfs entries only rely
      on ptrace access checks:
      
       /proc/$pid/stat - uses the check for determining whether pointers
           should be visible, useful for bypassing ASLR
       /proc/$pid/maps - also useful for bypassing ASLR
       /proc/$pid/cwd - useful for gaining access to restricted
           directories that contain files with lax permissions, e.g. in
           this scenario:
           lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
           drwx------ root root /root
           drwxr-xr-x root root /root/foobar
           -rw-r--r-- root root /root/foobar/secret
      
      Therefore, on a system where a root-owned mode 6755 binary changes its
      effective credentials as described and then dumps a user-specified file,
      this could be used by an attacker to reveal the memory layout of root's
      processes or reveal the contents of files he is not allowed to access
      (through /proc/$pid/cwd).
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      caaee623
    • Oleg Nesterov's avatar
      ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable · 7c3b00e0
      Oleg Nesterov authored
      ptrace_attach() can hang waiting for STOPPED -> TRACED transition if the
      tracee gets frozen in between, change wait_on_bit() to use TASK_KILLABLE.
      
      This doesn't really solve the problem(s) and we probably need to fix the
      freezer.  In particular, note that this means that pm freezer will fail if
      it races attach-to-stopped-task.
      
      And otoh perhaps we can just remove JOBCTL_TRAPPING_BIT altogether, it is
      not clear if we really need to hide this transition from debugger, WNOHANG
      after PTRACE_ATTACH can fail anyway if it races with SIGCONT.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c3b00e0
  16. 28 Oct, 2015 1 commit
    • Tycho Andersen's avatar
      seccomp, ptrace: add support for dumping seccomp filters · f8e529ed
      Tycho Andersen authored
      This patch adds support for dumping a process' (classic BPF) seccomp
      filters via ptrace.
      
      PTRACE_SECCOMP_GET_FILTER allows the tracer to dump the user's classic BPF
      seccomp filters. addr should be an integer which represents the ith seccomp
      filter (0 is the most recently installed filter). data should be a struct
      sock_filter * with enough room for the ith filter, or NULL, in which case
      the filter is not saved. The return value for this command is the number of
      BPF instructions the program represents, or negative in the case of errors.
      Command specific errors are ENOENT: which indicates that there is no ith
      filter in this seccomp tree, and EMEDIUMTYPE, which indicates that the ith
      filter was not installed as a classic BPF filter.
      
      A caveat with this approach is that there is no way to get explicitly at
      the heirarchy of seccomp filters, and users need to memcmp() filters to
      decide which are inherited. This means that a task which installs two of
      the same filter can potentially confuse users of this interface.
      
      v2: * make save_orig const
          * check that the orig_prog exists (not necessary right now, but when
             grows eBPF support it will be)
          * s/n/filter_off and make it an unsigned long to match ptrace
          * count "down" the tree instead of "up" when passing a filter offset
      
      v3: * don't take the current task's lock for inspecting its seccomp mode
          * use a 0x42** constant for the ptrace command value
      
      v4: * don't copy to userspace while holding spinlocks
      
      v5: * add another condition to WARN_ON
      
      v6: * rebase on net-next
      Signed-off-by: default avatarTycho Andersen <tycho.andersen@canonical.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      CC: Will Drewry <wad@chromium.org>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      CC: Andy Lutomirski <luto@amacapital.net>
      CC: Pavel Emelyanov <xemul@parallels.com>
      CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
      CC: Alexei Starovoitov <ast@kernel.org>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8e529ed
  17. 15 Jul, 2015 1 commit
    • Tycho Andersen's avatar
      seccomp: add ptrace options for suspend/resume · 13c4a901
      Tycho Andersen authored
      This patch is the first step in enabling checkpoint/restore of processes
      with seccomp enabled.
      
      One of the things CRIU does while dumping tasks is inject code into them
      via ptrace to collect information that is only available to the process
      itself. However, if we are in a seccomp mode where these processes are
      prohibited from making these syscalls, then what CRIU does kills the task.
      
      This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
      a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
      filters to disable (and re-enable) seccomp filters for another task so that
      they can be successfully dumped (and restored). We restrict the set of
      processes that can disable seccomp through ptrace because although today
      ptrace can be used to bypass seccomp, there is some discussion of closing
      this loophole in the future and we would like this patch to not depend on
      that behavior and be future proofed for when it is removed.
      
      Note that seccomp can be suspended before any filters are actually
      installed; this behavior is useful on criu restore, so that we can suspend
      seccomp, restore the filters, unmap our restore code from the restored
      process' address space, and then resume the task by detaching and have the
      filters resumed as well.
      
      v2 changes:
      
      * require that the tracer have no seccomp filters installed
      * drop TIF_NOTSC manipulation from the patch
      * change from ptrace command to a ptrace option and use this ptrace option
        as the flag to check. This means that as soon as the tracer
        detaches/dies, seccomp is re-enabled and as a corrollary that one can not
        disable seccomp across PTRACE_ATTACHs.
      
      v3 changes:
      
      * get rid of various #ifdefs everywhere
      * report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly
        used
      
      v4 changes:
      
      * get rid of may_suspend_seccomp() in favor of a capable() check in ptrace
        directly
      
      v5 changes:
      
      * check that seccomp is not enabled (or suspended) on the tracer
      Signed-off-by: default avatarTycho Andersen <tycho.andersen@canonical.com>
      CC: Will Drewry <wad@chromium.org>
      CC: Roland McGrath <roland@hack.frob.com>
      CC: Pavel Emelyanov <xemul@parallels.com>
      CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarAndy Lutomirski <luto@amacapital.net>
      [kees: access seccomp.mode through seccomp_mode() instead]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      13c4a901
  18. 17 Apr, 2015 2 commits
    • Oleg Nesterov's avatar
      ptrace: ptrace_detach() can no longer race with SIGKILL · 64a4096c
      Oleg Nesterov authored
      ptrace_detach() re-checks ->ptrace under tasklist lock and calls
      release_task() if __ptrace_detach() returns true.  This was needed because
      the __TASK_TRACED tracee could be killed/untraced, and it could even pass
      exit_notify() before we take tasklist_lock.
      
      But this is no longer possible after 9899d11f "ptrace: ensure
      arch_ptrace/ptrace_request can never race with SIGKILL".  We can turn
      these checks into WARN_ON() and remove release_task().
      
      While at it, document the setting of child->exit_code.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Pavel Labath <labath@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64a4096c
    • Oleg Nesterov's avatar
      ptrace: fix race between ptrace_resume() and wait_task_stopped() · b72c1869
      Oleg Nesterov authored
      ptrace_resume() is called when the tracee is still __TASK_TRACED.  We set
      tracee->exit_code and then wake_up_state() changes tracee->state.  If the
      tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
      wrongly looks like another report from tracee.
      
      This confuses debugger, and since wait_task_stopped() clears ->exit_code
      the tracee can miss a signal.
      
      Test-case:
      
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <sys/wait.h>
      	#include <sys/ptrace.h>
      	#include <pthread.h>
      	#include <assert.h>
      
      	int pid;
      
      	void *waiter(void *arg)
      	{
      		int stat;
      
      		for (;;) {
      			assert(pid == wait(&stat));
      			assert(WIFSTOPPED(stat));
      			if (WSTOPSIG(stat) == SIGHUP)
      				continue;
      
      			assert(WSTOPSIG(stat) == SIGCONT);
      			printf("ERR! extra/wrong report:%x\n", stat);
      		}
      	}
      
      	int main(void)
      	{
      		pthread_t thread;
      
      		pid = fork();
      		if (!pid) {
      			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      			for (;;)
      				kill(getpid(), SIGHUP);
      		}
      
      		assert(pthread_create(&thread, NULL, waiter, NULL) == 0);
      
      		for (;;)
      			ptrace(PTRACE_CONT, pid, 0, SIGCONT);
      
      		return 0;
      	}
      
      Note for stable: the bug is very old, but without 9899d11f "ptrace:
      ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
      should use lock_task_sighand(child).
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarPavel Labath <labath@google.com>
      Tested-by: default avatarPavel Labath <labath@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b72c1869
  19. 17 Feb, 2015 1 commit
  20. 11 Dec, 2014 1 commit
  21. 16 Jul, 2014 1 commit
    • NeilBrown's avatar
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown authored
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      74316201
  22. 06 Mar, 2014 1 commit
  23. 13 Nov, 2013 1 commit
    • Kees Cook's avatar
      exec/ptrace: fix get_dumpable() incorrect tests · d049f74f
      Kees Cook authored
      The get_dumpable() return value is not boolean.  Most users of the
      function actually want to be testing for non-SUID_DUMP_USER(1) rather than
      SUID_DUMP_DISABLE(0).  The SUID_DUMP_ROOT(2) is also considered a
      protected state.  Almost all places did this correctly, excepting the two
      places fixed in this patch.
      
      Wrong logic:
          if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
              or
          if (dumpable == 0) { /* be protective */ }
              or
          if (!dumpable) { /* be protective */ }
      
      Correct logic:
          if (dumpable != SUID_DUMP_USER) { /* be protective */ }
              or
          if (dumpable != 1) { /* be protective */ }
      
      Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
      user was able to ptrace attach to processes that had dropped privileges to
      that user.  (This may have been partially mitigated if Yama was enabled.)
      
      The macros have been moved into the file that declares get/set_dumpable(),
      which means things like the ia64 code can see them too.
      
      CVE-2013-2929
      Reported-by: default avatarVasily Kulikov <segoon@openwall.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d049f74f
  24. 11 Sep, 2013 1 commit
  25. 06 Aug, 2013 1 commit
  26. 09 Jul, 2013 2 commits
    • Oleg Nesterov's avatar
      ptrace: PTRACE_DETACH should do flush_ptrace_hw_breakpoint(child) · fab840fc
      Oleg Nesterov authored
      Change ptrace_detach() to call flush_ptrace_hw_breakpoint(child).  This
      frees the slots for non-ptrace PERF_TYPE_BREAKPOINT users, and this
      ensures that the tracee won't be killed by SIGTRAP triggered by the
      active breakpoints.
      
      Test-case:
      
      	unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
      	{
      		unsigned long dr7;
      
      		dr7 = ((len | type) & 0xf)
      			<< (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
      		if (enable)
      			dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
      
      		return dr7;
      	}
      
      	int write_dr(int pid, int dr, unsigned long val)
      	{
      		return ptrace(PTRACE_POKEUSER, pid,
      				offsetof (struct user, u_debugreg[dr]),
      				val);
      	}
      
      	void func(void)
      	{
      	}
      
      	int main(void)
      	{
      		int pid, stat;
      		unsigned long dr7;
      
      		pid = fork();
      		if (!pid) {
      			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      			kill(getpid(), SIGHUP);
      
      			func();
      			return 0x13;
      		}
      
      		assert(pid == waitpid(-1, &stat, 0));
      		assert(WSTOPSIG(stat) == SIGHUP);
      
      		assert(write_dr(pid, 0, (long)func) == 0);
      		dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
      		assert(write_dr(pid, 7, dr7) == 0);
      
      		assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
      		assert(pid == waitpid(-1, &stat, 0));
      		assert(stat == 0x1300);
      
      		return 0;
      	}
      
      Before this patch the child is killed after PTRACE_DETACH.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fab840fc
    • Oleg Nesterov's avatar
      ptrace: revert "Prepare to fix racy accesses on task breakpoints" · 7c8df286
      Oleg Nesterov authored
      This reverts commit bf26c018 ("Prepare to fix racy accesses on task
      breakpoints").
      
      The patch was fine but we can no longer race with SIGKILL after commit
      9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race
      with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
      ->ptrace_bps[] can't go away.
      
      Now that ptrace_get_breakpoints/ptrace_put_breakpoints have no callers,
      we can kill them and remove task->ptrace_bp_refcnt.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarMichael Neuling <mikey@neuling.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c8df286
  27. 03 Jul, 2013 1 commit
    • Andrey Vagin's avatar
      ptrace: add ability to get/set signal-blocked mask · 29000cae
      Andrey Vagin authored
      crtools uses a parasite code for dumping processes.  The parasite code is
      injected into a process with help PTRACE_SEIZE.
      
      Currently crtools blocks signals from a parasite code.  If a process has
      pending signals, crtools wait while a process handles these signals.
      
      This method is not suitable for stopped tasks.  A stopped task can have a
      few pending signals, when we will try to execute a parasite code, we will
      need to drop SIGSTOP, but all other signals must remain pending, because a
      state of processes must not be changed during checkpointing.
      
      This patch adds two ptrace commands to set/get signal-blocked mask.
      
      I think gdb can use this commands too.
      
      [akpm@linux-foundation.org: be consistent with brace layout]
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      29000cae
  28. 29 Jun, 2013 1 commit
  29. 08 May, 2013 1 commit
  30. 01 May, 2013 1 commit
    • Andrey Vagin's avatar
      ptrace: add ability to retrieve signals without removing from a queue (v4) · 84c751bd
      Andrey Vagin authored
      This patch adds a new ptrace request PTRACE_PEEKSIGINFO.
      
      This request is used to retrieve information about pending signals
      starting with the specified sequence number.  Siginfo_t structures are
      copied from the child into the buffer starting at "data".
      
      The argument "addr" is a pointer to struct ptrace_peeksiginfo_args.
      struct ptrace_peeksiginfo_args {
      	u64 off;	/* from which siginfo to start */
      	u32 flags;
      	s32 nr;		/* how may siginfos to take */
      };
      
      "nr" has type "s32", because ptrace() returns "long", which has 32 bits on
      i386 and a negative values is used for errors.
      
      Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping
      signals from process-wide queue.  If this flag is not set, signals are
      read from a per-thread queue.
      
      The request PTRACE_PEEKSIGINFO returns a number of dumped signals.  If a
      signal with the specified sequence number doesn't exist, ptrace returns
      zero.  The request returns an error, if no signal has been dumped.
      
      Errors:
      EINVAL - one or more specified flags are not supported or nr is negative
      EFAULT - buf or addr is outside your accessible address space.
      
      A result siginfo contains a kernel part of si_code which usually striped,
      but it's required for queuing the same siginfo back during restore of
      pending signals.
      
      This functionality is required for checkpointing pending signals.  Pedro
      Alves suggested using it in "gdb" to peek at pending signals.  gdb already
      uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already
      dequeued.  This functionality allows gdb to look at the pending signals
      which were not reported yet.
      
      The prototype of this code was developed by Oleg Nesterov.
      Signed-off-by: default avatarAndrew Vagin <avagin@openvz.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pedro Alves <palves@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      84c751bd