1. 14 May, 2019 1 commit
    • Josh Poimboeuf's avatar
      cpu/speculation: Add 'mitigations=' cmdline option · ed1dfe83
      Josh Poimboeuf authored
      commit 98af8452945c55652de68536afdde3b520fec429 upstream
      
      Keeping track of the number of mitigations for all the CPU speculation
      bugs has become overwhelming for many users.  It's getting more and more
      complicated to decide which mitigations are needed for a given
      architecture.  Complicating matters is the fact that each arch tends to
      have its own custom way to mitigate the same vulnerability.
      
      Most users fall into a few basic categories:
      
      a) they want all mitigations off;
      
      b) they want all reasonable mitigations on, with SMT enabled even if
         it's vulnerable; or
      
      c) they want all reasonable mitigations on, with SMT disabled if
         vulnerable.
      
      Define a set of curated, arch-independent options, each of which is an
      aggregation of existing options:
      
      - mitigations=off: Disable all mitigations.
      
      - mitigations=auto: [default] Enable all the default mitigations, but
        leave SMT enabled, even if it's vulnerable.
      
      - mitigations=auto,nosmt: Enable all the default mitigations, disabling
        SMT if needed by a mitigation.
      
      Currently, these options are placeholders which don't actually do
      anything.  They will be fleshed out in upcoming patches.
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
      Reviewed-by: default avatarJiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux-s390@vger.kernel.org
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-arch@vger.kernel.org
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Phil Auld <pauld@redhat.com>
      Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed1dfe83
  2. 05 Apr, 2019 1 commit
    • Valentin Schneider's avatar
      cpu/hotplug: Mute hotplug lockdep during init · 2ae0dd16
      Valentin Schneider authored
      [ Upstream commit ce48c457b95316b9a01b5aa9d4456ce820df94b4 ]
      
      Since we've had:
      
        commit cb538267ea1e ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations")
      
      we've been getting some lockdep warnings during init, such as on HiKey960:
      
      [    0.820495] WARNING: CPU: 4 PID: 0 at kernel/cpu.c:316 lockdep_assert_cpus_held+0x3c/0x48
      [    0.820498] Modules linked in:
      [    0.820509] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G S                4.20.0-rc5-00051-g4cae42a #34
      [    0.820511] Hardware name: HiKey960 (DT)
      [    0.820516] pstate: 600001c5 (nZCv dAIF -PAN -UAO)
      [    0.820520] pc : lockdep_assert_cpus_held+0x3c/0x48
      [    0.820523] lr : lockdep_assert_cpus_held+0x38/0x48
      [    0.820526] sp : ffff00000a9cbe50
      [    0.820528] x29: ffff00000a9cbe50 x28: 0000000000000000
      [    0.820533] x27: 00008000b69e5000 x26: ffff8000bff4cfe0
      [    0.820537] x25: ffff000008ba69e0 x24: 0000000000000001
      [    0.820541] x23: ffff000008fce000 x22: ffff000008ba70c8
      [    0.820545] x21: 0000000000000001 x20: 0000000000000003
      [    0.820548] x19: ffff00000a35d628 x18: ffffffffffffffff
      [    0.820552] x17: 0000000000000000 x16: 0000000000000000
      [    0.820556] x15: ffff00000958f848 x14: 455f3052464d4d34
      [    0.820559] x13: 00000000769dde98 x12: ffff8000bf3f65a8
      [    0.820564] x11: 0000000000000000 x10: ffff00000958f848
      [    0.820567] x9 : ffff000009592000 x8 : ffff00000958f848
      [    0.820571] x7 : ffff00000818ffa0 x6 : 0000000000000000
      [    0.820574] x5 : 0000000000000000 x4 : 0000000000000001
      [    0.820578] x3 : 0000000000000000 x2 : 0000000000000001
      [    0.820582] x1 : 00000000ffffffff x0 : 0000000000000000
      [    0.820587] Call trace:
      [    0.820591]  lockdep_assert_cpus_held+0x3c/0x48
      [    0.820598]  static_key_enable_cpuslocked+0x28/0xd0
      [    0.820606]  arch_timer_check_ool_workaround+0xe8/0x228
      [    0.820610]  arch_timer_starting_cpu+0xe4/0x2d8
      [    0.820615]  cpuhp_invoke_callback+0xe8/0xd08
      [    0.820619]  notify_cpu_starting+0x80/0xb8
      [    0.820625]  secondary_start_kernel+0x118/0x1d0
      
      We've also had a similar warning in sched_init_smp() for every
      asymmetric system that would enable the sched_asym_cpucapacity static
      key, although that was singled out in:
      
        commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")
      
      Those warnings are actually harmless, since we cannot have hotplug
      operations at the time they appear. Instead of starting to sprinkle
      useless hotplug lock operations in the init codepaths, mute the
      warnings until they start warning about real problems.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: cai@gmx.us
      Cc: daniel.lezcano@linaro.org
      Cc: dietmar.eggemann@arm.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: longman@redhat.com
      Cc: marc.zyngier@arm.com
      Cc: mark.rutland@arm.com
      Link: https://lkml.kernel.org/r/1545243796-23224-2-git-send-email-valentin.schneider@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2ae0dd16
  3. 03 Apr, 2019 1 commit
    • Thomas Gleixner's avatar
      cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n · 5f6b5b8b
      Thomas Gleixner authored
      commit 206b92353c839c0b27a0b9bec24195f93fd6cf7a upstream.
      
      Tianyu reported a crash in a CPU hotplug teardown callback when booting a
      kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
      parameter.
      
      It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
      forever in case that a bringup callback fails. Unfortunately this issue was
      not recognized when the CPU hotplug code was reworked, so the shortcoming
      just stayed in place.
      
      When a bringup callback fails, the CPU hotplug code rolls back the
      operation and takes the CPU offline.
      
      The 'nosmt' command line argument uses a bringup failure to abort the
      bringup of SMT sibling CPUs. This partial bringup is required due to the
      MCE misdesign on Intel CPUs.
      
      With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
      CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
      teardown of a CPU including the synchronizations in various facilities like
      RCU, NOHZ and others.
      
      As a consequence the teardown callbacks which must be executed on the
      outgoing CPU within stop machine with interrupts disabled are executed on
      the control CPU in interrupt enabled and preemptible context causing the
      kernel to crash and burn. The pre state machine code has a different
      failure mode which is more subtle and resulting in a less obvious use after
      free crash because the control side frees resources which are still in use
      by the undead CPU.
      
      But this is not a x86 only problem. Any architecture which supports the
      SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
      likely to be triggered because in 99.99999% of the cases all bringup
      callbacks succeed.
      
      The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
      all architectures as the following architectures have either no hotplug
      support at all or not all subarchitectures support it:
      
       alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).
      
      Crashing the kernel in such a situation is not an acceptable state
      either.
      
      Implement a minimal rollback variant by limiting the teardown to the point
      where all regular teardown callbacks have been invoked and leave the CPU in
      the 'dead' idle state. This has the following consequences:
      
       - the CPU is brought down to the point where the stop_machine takedown
         would happen.
      
       - the CPU stays there forever and is idle
      
       - The CPU is cleared in the CPU active mask, but not in the CPU online
         mask which is a legit state.
      
       - Interrupts are not forced away from the CPU
      
       - All facilities which only look at online mask would still see it, but
         that is the case during normal hotplug/unplug operations as well. It's
         just a (way) longer time frame.
      
      This will expose issues, which haven't been exposed before or only seldom,
      because now the normally transient state of being non active but online is
      a permanent state. In testing this exposed already an issue vs. work queues
      where the vmstat code schedules work on the almost dead CPU which ends up
      in an unbound workqueue and triggers 'preemtible context' warnings. This is
      not a problem of this change, it merily exposes an already existing issue.
      Still this is better than crashing fully without a chance to debug it.
      
      This is mainly thought as workaround for those architectures which do not
      support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.
      
      Fixes: 2e1a3483 ("cpu/hotplug: Split out the state walk into functions")
      Reported-by: default avatarTianyu Lan <Tianyu.Lan@microsoft.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarTianyu Lan <Tianyu.Lan@microsoft.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Konrad Wilk <konrad.wilk@oracle.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Mukesh Ojha <mojha@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190326163811.503390616@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f6b5b8b
  4. 12 Feb, 2019 1 commit
    • Josh Poimboeuf's avatar
      cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM · 5e1f1c1f
      Josh Poimboeuf authored
      commit b284909abad48b07d3071a9fc9b5692b3e64914b upstream.
      
      With the following commit:
      
        73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      
      ... the hotplug code attempted to detect when SMT was disabled by BIOS,
      in which case it reported SMT as permanently disabled.  However, that
      code broke a virt hotplug scenario, where the guest is booted with only
      primary CPU threads, and a sibling is brought online later.
      
      The problem is that there doesn't seem to be a way to reliably
      distinguish between the HW "SMT disabled by BIOS" case and the virt
      "sibling not yet brought online" case.  So the above-mentioned commit
      was a bit misguided, as it permanently disabled SMT for both cases,
      preventing future virt sibling hotplugs.
      
      Going back and reviewing the original problems which were attempted to
      be solved by that commit, when SMT was disabled in BIOS:
      
        1) /sys/devices/system/cpu/smt/control showed "on" instead of
           "notsupported"; and
      
        2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
      
      I'd propose that we instead consider #1 above to not actually be a
      problem.  Because, at least in the virt case, it's possible that SMT
      wasn't disabled by BIOS and a sibling thread could be brought online
      later.  So it makes sense to just always default the smt control to "on"
      to allow for that possibility (assuming cpuid indicates that the CPU
      supports SMT).
      
      The real problem is #2, which has a simple fix: change vmx_vm_init() to
      query the actual current SMT state -- i.e., whether any siblings are
      currently online -- instead of looking at the SMT "control" sysfs value.
      
      So fix it by:
      
        a) reverting the original "fix" and its followup fix:
      
           73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
           bc2d8d26 ("cpu/hotplug: Fix SMT supported evaluation")
      
           and
      
        b) changing vmx_vm_init() to query the actual current SMT state --
           instead of the sysfs control value -- to determine whether the L1TF
           warning is needed.  This also requires the 'sched_smt_present'
           variable to exported, instead of 'cpu_smt_control'.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      5e1f1c1f
  5. 05 Dec, 2018 2 commits
    • Thomas Gleixner's avatar
      x86/speculation: Rework SMT state change · 36a4c5fc
      Thomas Gleixner authored
      commit a74cfffb03b73d41e08f84c2e5c87dec0ce3db9f upstream
      
      arch_smt_update() is only called when the sysfs SMT control knob is
      changed. This means that when SMT is enabled in the sysfs control knob the
      system is considered to have SMT active even if all siblings are offline.
      
      To allow finegrained control of the speculation mitigations, the actual SMT
      state is more interesting than the fact that siblings could be enabled.
      
      Rework the code, so arch_smt_update() is invoked from each individual CPU
      hotplug function, and simplify the update function while at it.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36a4c5fc
    • Jiri Kosina's avatar
      x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation · 300a6f27
      Jiri Kosina authored
      commit 53c613fe6349994f023245519265999eed75957f upstream
      
      STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
      (once enabled) prevents cross-hyperthread control of decisions made by
      indirect branch predictors.
      
      Enable this feature if
      
      - the CPU is vulnerable to spectre v2
      - the CPU supports SMT and has SMT siblings online
      - spectre_v2 mitigation autoselection is enabled (default)
      
      After some previous discussion, this leaves STIBP on all the time, as wrmsr
      on crossing kernel boundary is a no-no. This could perhaps later be a bit
      more optimized (like disabling it in NOHZ, experiment with disabling it in
      idle, etc) if needed.
      
      Note that the synchronization of the mask manipulation via newly added
      spec_ctrl_mutex is currently not strictly needed, as the only updater is
      already being serialized by cpu_add_remove_lock, but let's make this a
      little bit more future-proof.
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pmSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      300a6f27
  6. 23 Nov, 2018 1 commit
  7. 13 Nov, 2018 1 commit
    • Jiri Kosina's avatar
      x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation · 8a13906a
      Jiri Kosina authored
      commit 53c613fe6349994f023245519265999eed75957f upstream.
      
      STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
      (once enabled) prevents cross-hyperthread control of decisions made by
      indirect branch predictors.
      
      Enable this feature if
      
      - the CPU is vulnerable to spectre v2
      - the CPU supports SMT and has SMT siblings online
      - spectre_v2 mitigation autoselection is enabled (default)
      
      After some previous discussion, this leaves STIBP on all the time, as wrmsr
      on crossing kernel boundary is a no-no. This could perhaps later be a bit
      more optimized (like disabling it in NOHZ, experiment with disabling it in
      idle, etc) if needed.
      
      Note that the synchronization of the mask manipulation via newly added
      spec_ctrl_mutex is currently not strictly needed, as the only updater is
      already being serialized by cpu_add_remove_lock, but let's make this a
      little bit more future-proof.
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pmSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a13906a
  8. 19 Sep, 2018 2 commits
  9. 15 Aug, 2018 12 commits
  10. 02 Jan, 2018 1 commit
  11. 14 Dec, 2017 1 commit
  12. 21 Oct, 2017 1 commit
    • Thomas Gleixner's avatar
      cpu/hotplug: Reset node state after operation · 1f7c70d6
      Thomas Gleixner authored
      The recent rework of the cpu hotplug internals changed the usage of the per
      cpu state->node field, but missed to clean it up after usage.
      
      So subsequent hotplug operations use the stale pointer from a previous
      operation and hand it into the callback functions. The callbacks then
      dereference a pointer which either belongs to a different facility or
      points to freed and potentially reused memory. In either case data
      corruption and crashes are the obvious consequence.
      
      Reset the node and the last pointers in the per cpu state to NULL after the
      operation which set them has completed.
      
      Fixes: 96abb968 ("smp/hotplug: Allow external multi-instance rollback")
      Reported-by: default avatarTvrtko Ursulin <tursulin@ursulin.net>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710211606130.3213@nanos
      1f7c70d6
  13. 25 Sep, 2017 6 commits
    • Peter Zijlstra's avatar
      smp/hotplug: Hotplug state fail injection · 1db49484
      Peter Zijlstra authored
      Add a sysfs file to one-time fail a specific state. This can be used
      to test the state rollback code paths.
      
      Something like this (hotplug-up.sh):
      
        #!/bin/bash
      
        echo 0 > /debug/sched_debug
        echo 1 > /debug/tracing/events/cpuhp/enable
      
        ALL_STATES=`cat /sys/devices/system/cpu/hotplug/states | cut -d':' -f1`
        STATES=${1:-$ALL_STATES}
      
        for state in $STATES
        do
      	  echo 0 > /sys/devices/system/cpu/cpu1/online
      	  echo 0 > /debug/tracing/trace
      	  echo Fail state: $state
      	  echo $state > /sys/devices/system/cpu/cpu1/hotplug/fail
      	  cat /sys/devices/system/cpu/cpu1/hotplug/fail
      	  echo 1 > /sys/devices/system/cpu/cpu1/online
      
      	  cat /debug/tracing/trace > hotfail-${state}.trace
      
      	  sleep 1
        done
      
      Can be used to test for all possible rollback (barring multi-instance)
      scenarios on CPU-up, CPU-down is a trivial modification of the above.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Cc: max.byungchul.park@gmail.com
      Link: https://lkml.kernel.org/r/20170920170546.972581715@infradead.org
      
      1db49484
    • Peter Zijlstra's avatar
      smp/hotplug: Differentiate the AP completion between up and down · 5ebe7742
      Peter Zijlstra authored
      With lockdep-crossrelease we get deadlock reports that span cpu-up and
      cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
      and cpu-down are globally serialized.
      
        takedown_cpu()
          irq_lock_sparse()
          wait_for_completion(&st->done)
      
                                      cpuhp_thread_fun
                                        cpuhp_up_callback
                                          cpuhp_invoke_callback
                                            irq_affinity_online_cpu
                                              irq_local_spare()
                                              irq_unlock_sparse()
                                        complete(&st->done)
      
      Now that we have consistent AP state, we can trivially separate the
      AP completion between up and down using st->bringup.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: max.byungchul.park@gmail.com
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Link: https://lkml.kernel.org/r/20170920170546.872472799@infradead.org
      
      5ebe7742
    • Peter Zijlstra's avatar
      smp/hotplug: Differentiate the AP-work lockdep class between up and down · 5f4b55e1
      Peter Zijlstra authored
      With lockdep-crossrelease we get deadlock reports that span cpu-up and
      cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
      and cpu-down are globally serialized.
      
        CPU0                  CPU1                    CPU2
        cpuhp_up_callbacks:   takedown_cpu:           cpuhp_thread_fun:
      
        cpuhp_state
                              irq_lock_sparse()
          irq_lock_sparse()
                              wait_for_completion()
                                                      cpuhp_state
                                                      complete()
      
      Now that we have consistent AP state, we can trivially separate the
      AP-work class between up and down using st->bringup.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: max.byungchul.park@gmail.com
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Link: https://lkml.kernel.org/r/20170920170546.922524234@infradead.org
      
      5f4b55e1
    • Peter Zijlstra's avatar
      smp/hotplug: Callback vs state-machine consistency · 724a8688
      Peter Zijlstra authored
      While the generic callback functions have an 'int' return and thus
      appear to be allowed to return error, this is not true for all states.
      
      Specifically, what used to be STARTING/DYING are ran with IRQs
      disabled from critical parts of CPU bringup/teardown and are not
      allowed to fail. Add WARNs to enforce this rule.
      
      But since some callbacks are indeed allowed to fail, we have the
      situation where a state-machine rollback encounters a failure, in this
      case we're stuck, we can't go forward and we can't go back. Also add a
      WARN for that case.
      
      AFAICT this is a fundamental 'problem' with no real obvious solution.
      We want the 'prepare' callbacks to allow failure on either up or down.
      Typically on prepare-up this would be things like -ENOMEM from
      resource allocations, and the typical usage in prepare-down would be
      something like -EBUSY to avoid CPUs being taken away.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Cc: max.byungchul.park@gmail.com
      Link: https://lkml.kernel.org/r/20170920170546.819539119@infradead.org
      
      724a8688
    • Peter Zijlstra's avatar
      smp/hotplug: Rewrite AP state machine core · 4dddfb5f
      Peter Zijlstra authored
      There is currently no explicit state change on rollback. That is,
      st->bringup, st->rollback and st->target are not consistent when doing
      the rollback.
      
      Rework the AP state handling to be more coherent. This does mean we
      have to do a second AP kick-and-wait for rollback, but since rollback
      is the slow path of a slowpath, this really should not matter.
      
      Take this opportunity to simplify the AP thread function to only run a
      single callback per invocation. This unifies the three single/up/down
      modes is supports. The looping it used to do for up/down are achieved
      by retaining should_run and relying on the main smpboot_thread_fn()
      loop.
      
      (I have most of a patch that does the same for the BP state handling,
      but that's not critical and gets a little complicated because
      CPUHP_BRINGUP_CPU does the AP handoff from a callback, which gets
      recursive @st usage, I still have de-fugly that.)
      
      [ tglx: Move cpuhp_down_callbacks() et al. into the HOTPLUG_CPU section to
        	avoid gcc complaining about unused functions. Make the HOTPLUG_CPU
        	one piece instead of having two consecutive ifdef sections of the
        	same type. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Cc: max.byungchul.park@gmail.com
      Link: https://lkml.kernel.org/r/20170920170546.769658088@infradead.org
      
      4dddfb5f
    • Peter Zijlstra's avatar
      smp/hotplug: Allow external multi-instance rollback · 96abb968
      Peter Zijlstra authored
      Currently the rollback of multi-instance states is handled inside
      cpuhp_invoke_callback(). The problem is that when we want to allow an
      explicit state change for rollback, we need to return from the
      function without doing the rollback.
      
      Change cpuhp_invoke_callback() to optionally return the multi-instance
      state, such that rollback can be done from a subsequent call.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Cc: max.byungchul.park@gmail.com
      Link: https://lkml.kernel.org/r/20170920170546.720361181@infradead.org
      
      96abb968
  14. 14 Sep, 2017 1 commit
    • Thomas Gleixner's avatar
      watchdog/hardlockup/perf: Prevent CPU hotplug deadlock · 941154bd
      Thomas Gleixner authored
      The following deadlock is possible in the watchdog hotplug code:
      
        cpus_write_lock()
          ...
            takedown_cpu()
              smpboot_park_threads()
                smpboot_park_thread()
                  kthread_park()
                    ->park() := watchdog_disable()
                      watchdog_nmi_disable()
                        perf_event_release_kernel();
                          put_event()
                            _free_event()
                              ->destroy() := hw_perf_event_destroy()
                                x86_release_hardware()
                                  release_ds_buffers()
                                    get_online_cpus()
      
      when a per cpu watchdog perf event is destroyed which drops the last
      reference to the PMU hardware. The cleanup code there invokes
      get_online_cpus() which instantly deadlocks because the hotplug percpu
      rwsem is write locked.
      
      To solve this add a deferring mechanism:
      
        cpus_write_lock()
      			   kthread_park()
      			    watchdog_nmi_disable(deferred)
      			      perf_event_disable(event);
      			      move_event_to_deferred(event);
      			   ....
        cpus_write_unlock()
        cleaup_deferred_events()
          perf_event_release_kernel()
      
      This is still properly serialized against concurrent hotplug via the
      cpu_add_remove_lock, which is held by the task which initiated the hotplug
      event.
      
      This is also used to handle event destruction when the watchdog threads are
      parked via other mechanisms than CPU hotplug.
      Analyzed-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194146.884469246@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      941154bd
  15. 25 Jul, 2017 1 commit
    • Paul E. McKenney's avatar
      rcu: Migrate callbacks earlier in the CPU-offline timeline · a58163d8
      Paul E. McKenney authored
      RCU callbacks must be migrated away from an outgoing CPU, and this is
      done near the end of the CPU-hotplug operation, after the outgoing CPU is
      long gone.  Unfortunately, this means that other CPU-hotplug callbacks
      can execute while the outgoing CPU's callbacks are still immobilized
      on the long-gone CPU's callback lists.  If any of these CPU-hotplug
      callbacks must wait, either directly or indirectly, for the invocation
      of any of the immobilized RCU callbacks, the system will hang.
      
      This commit avoids such hangs by migrating the callbacks away from the
      outgoing CPU immediately upon its departure, shortly after the return
      from __cpu_die() in takedown_cpu().  Thus, RCU is able to advance these
      callbacks and invoke them, which allows all the after-the-fact CPU-hotplug
      callbacks to wait on these RCU callbacks without risk of a hang.
      
      While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage()
      and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including
      dead code on the one hand and to avoid define-without-use warnings on the
      other hand.
      Reported-by: default avatarJeffrey Hugo <jhugo@codeaurora.org>
      Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.orgSigned-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Richard Weinberger <richard@nod.at>
      a58163d8
  16. 20 Jul, 2017 1 commit
    • Ethan Barnes's avatar
      smp/hotplug: Handle removal correctly in cpuhp_store_callbacks() · 0c96b273
      Ethan Barnes authored
      If cpuhp_store_callbacks() is called for CPUHP_AP_ONLINE_DYN or
      CPUHP_BP_PREPARE_DYN, which are the indicators for dynamically allocated
      states, then cpuhp_store_callbacks() allocates a new dynamic state. The
      first allocation in each range returns CPUHP_AP_ONLINE_DYN or
      CPUHP_BP_PREPARE_DYN.
      
      If cpuhp_remove_state() is invoked for one of these states, then there is
      no protection against the allocation mechanism. So the removal, which
      should clear the callbacks and the name, gets a new state assigned and
      clears that one.
      
      As a consequence the state which should be cleared stays initialized. A
      consecutive CPU hotplug operation dereferences the state callbacks and
      accesses either freed or reused memory, resulting in crashes.
      
      Add a protection against this by checking the name argument for NULL. If
      it's NULL it's a removal. If not, it's an allocation.
      
      [ tglx: Added a comment and massaged changelog ]
      
      Fixes: 5b7aa87e ("cpu/hotplug: Implement setup/removal interface")
      Signed-off-by: default avatarEthan Barnes <ethan.barnes@sandisk.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.or>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Sebastian Siewior <bigeasy@linutronix.d>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/DM2PR04MB398242FC7776D603D9F99C894A60@DM2PR04MB398.namprd04.prod.outlook.com
      0c96b273
  17. 11 Jul, 2017 1 commit
  18. 06 Jul, 2017 1 commit
    • Thomas Gleixner's avatar
      smp/hotplug: Move unparking of percpu threads to the control CPU · 9cd4f1a4
      Thomas Gleixner authored
      Vikram reported the following backtrace:
      
         BUG: scheduling while atomic: swapper/7/0/0x00000002
         CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
         schedule
         schedule_hrtimeout_range_clock
         schedule_hrtimeout
         wait_task_inactive
         __kthread_bind_mask
         __kthread_bind
         __kthread_unpark
         kthread_unpark
         cpuhp_online_idle
         cpu_startup_entry
         secondary_start_kernel
      
      He analyzed correctly that a parked cpu hotplug thread of an offlined CPU
      was still on the runqueue when the CPU came back online and tried to unpark
      it. This causes the thread which invoked kthread_unpark() to call
      wait_task_inactive() and subsequently schedule() with preemption disabled.
      His proposed workaround was to "make sure" that a parked thread has
      scheduled out when the CPU goes offline, so the situation cannot happen.
      
      But that's still wrong because the root cause is not the fact that the
      percpu thread is still on the runqueue and neither that preemption is
      disabled, which could be simply solved by enabling preemption before
      calling kthread_unpark().
      
      The real issue is that the calling thread is the idle task of the upcoming
      CPU, which is not supposed to call anything which might sleep.  The moron,
      who wrote that code, missed completely that kthread_unpark() might end up
      in schedule().
      
      The solution is simpler than expected. The thread which controls the
      hotplug operation is waiting for the CPU to call complete() on the hotplug
      state completion. So the idle task of the upcoming CPU can set its state to
      CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the control
      task on a different CPU, which then can safely do the unpark and kick the
      now unparked hotplug thread of the upcoming CPU to complete the bringup to
      the final target state.
      
      Control CPU                     AP
      
      bringup_cpu();
        __cpu_up()  ------------>
      				bringup_ap();
        bringup_wait_for_ap()
          wait_for_completion();
                                      cpuhp_online_idle();
                      <------------    complete();
          unpark(AP->stopper);
          unpark(AP->hotplugthread);
                                      while(1)
                                        do_idle();
          kick(AP->hotplugthread);
          wait_for_completion();	hotplug_thread()
      				  run_online_callbacks();
      				  complete();
      
      Fixes: 8df3e07e ("cpu/hotplug: Let upcoming cpu bring itself fully up")
      Reported-by: default avatarVikram Mulukutla <markivx@codeaurora.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Sewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707042218020.2131@nanosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      9cd4f1a4
  19. 30 Jun, 2017 1 commit
  20. 22 Jun, 2017 1 commit
    • Thomas Gleixner's avatar
      genirq/cpuhotplug: Handle managed IRQs on CPU hotplug · c5cb83bb
      Thomas Gleixner authored
      If a CPU goes offline, interrupts affine to the CPU are moved away. If the
      outgoing CPU is the last CPU in the affinity mask the migration code breaks
      the affinity and sets it it all online cpus.
      
      This is a problem for affinity managed interrupts as CPU hotplug is often
      used for power management purposes. If the affinity is broken, the
      interrupt is not longer affine to the CPUs to which it was allocated.
      
      The affinity spreading allows to lay out multi queue devices in a way that
      they are assigned to a single CPU or a group of CPUs. If the last CPU goes
      offline, then the queue is not longer used, so the interrupt can be
      shutdown gracefully and parked until one of the assigned CPUs comes online
      again.
      
      Add a graceful shutdown mechanism into the irq affinity breaking code path,
      mark the irq as MANAGED_SHUTDOWN and leave the affinity mask unmodified.
      
      In the online path, scan the active interrupts for managed interrupts and
      if the interrupt is functional and the newly online CPU is part of the
      affinity mask, restart the interrupt if it is marked MANAGED_SHUTDOWN or if
      the interrupts is started up, try to add the CPU back to the effective
      affinity mask.
      Originally-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170619235447.273417334@linutronix.de
      c5cb83bb
  21. 12 Jun, 2017 1 commit
    • Arnd Bergmann's avatar
      cpu/hotplug: Remove unused check_for_tasks() function · 57de7212
      Arnd Bergmann authored
      clang -Wunused-function found one remaining function that was
      apparently meant to be removed in a recent code cleanup:
      
      kernel/cpu.c:565:20: warning: unused function 'check_for_tasks' [-Wunused-function]
      
      Sebastian explained: The function became unused unintentionally, but there
      is already a failure check, when a task cannot be removed from the outgoing
      cpu in the scheduler code, so bringing it back is not really giving any
      extra value.
      
      Fixes: 530e9b76 ("cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Link: http://lkml.kernel.org/r/20170608085544.2257132-1-arnd@arndb.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      57de7212
  22. 03 Jun, 2017 1 commit