Commits (1101)

Too many changes to show.

To preserve performance only 1000 of 1000+ files are displayed.

......@@ -380,6 +380,7 @@ What: /sys/devices/system/cpu/vulnerabilities
Date: January 2018
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Information about CPU vulnerabilities
......@@ -392,8 +393,7 @@ Description: Information about CPU vulnerabilities
"Vulnerable" CPU is affected and no mitigation in effect
"Mitigation: $M" CPU is affected and mitigation $M is in effect
Details about the l1tf file can be found in
See also: Documentation/admin-guide/hw-vuln/index.rst
What: /sys/devices/system/cpu/smt
Hardware vulnerabilities
This section describes CPU vulnerabilities and provides an overview of the
possible mitigations along with guidance for selecting mitigations if they
are configurable at compile, boot or run time.
.. toctree::
:maxdepth: 1
......@@ -445,6 +445,7 @@ The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
module parameter is ignored and writes to the sysfs file are rejected.
.. _mitigation_selection:
Mitigation selection guide
......@@ -556,7 +557,7 @@ When nested virtualization is in use, three operating systems are involved:
the bare metal hypervisor, the nested hypervisor and the nested virtual
machine. VMENTER operations from the nested hypervisor into the nested
guest will always be processed by the bare metal hypervisor. If KVM is the
bare metal hypervisor it wiil:
bare metal hypervisor it will:
- Flush the L1D cache on every switch from the nested hypervisor to the
nested virtual machine, so that the nested hypervisor's secrets are not
This diff is collapsed.
......@@ -17,14 +17,12 @@ etc.
This section describes CPU vulnerabilities and provides an overview of the
possible mitigations along with guidance for selecting mitigations if they
are configurable at compile, boot or run time.
This section describes CPU vulnerabilities and their mitigations.
.. toctree::
:maxdepth: 1
Here is a set of documents aimed at users who are trying to track down
problems and bugs in particular.
......@@ -1971,7 +1971,7 @@
Default is 'flush'.
For details see: Documentation/admin-guide/l1tf.rst
For details see: Documentation/admin-guide/hw-vuln/l1tf.rst
l2cr= [PPC]
......@@ -2214,6 +2214,32 @@
Format: <first>,<last>
Specifies range of consoles to be captured by the MDA.
mds= [X86,INTEL]
Control mitigation for the Micro-architectural Data
Sampling (MDS) vulnerability.
Certain CPUs are vulnerable to an exploit against CPU
internal buffers which can forward information to a
disclosure gadget under certain conditions.
In vulnerable processors, the speculatively
forwarded data can be used in a cache side channel
attack, to access data to which the attacker does
not have direct access.
This parameter controls the MDS mitigation. The
options are:
full - Enable MDS mitigation on vulnerable CPUs
full,nosmt - Enable MDS mitigation and disable
SMT on vulnerable CPUs
off - Unconditionally disable MDS mitigation
Not specifying this option is equivalent to
For details see: Documentation/admin-guide/hw-vuln/mds.rst
mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory
Amount of memory to be used when the kernel is not able
to see the whole system memory or for test.
......@@ -2362,6 +2388,40 @@
in the "bleeding edge" mini2440 support kernel at
[X86,PPC,S390] Control optional mitigations for CPU
vulnerabilities. This is a set of curated,
arch-independent options, each of which is an
aggregation of existing arch-specific options.
Disable all optional CPU mitigations. This
improves system performance, but it may also
expose users to several CPU vulnerabilities.
Equivalent to: nopti [X86,PPC]
nospectre_v1 [PPC]
nobp=0 [S390]
nospectre_v2 [X86,PPC,S390]
spectre_v2_user=off [X86]
spec_store_bypass_disable=off [X86,PPC]
l1tf=off [X86]
mds=off [X86]
auto (default)
Mitigate all CPU vulnerabilities, but leave SMT
enabled, even if it's vulnerable. This is for
users who don't want to be surprised by SMT
getting disabled across kernel upgrades, or who
have other ways of avoiding SMT-based attacks.
Equivalent to: (default behavior)
Mitigate all CPU vulnerabilities, disabling SMT
if needed. This is for users who always want to
be fully mitigated, even if it means losing SMT.
Equivalent to: l1tf=flush,nosmt [X86]
mds=full,nosmt [X86]
parameter allows control of the logging verbosity for
......@@ -2680,7 +2740,11 @@
nosmt=force: Force disable SMT, cannot be undone
via the sysfs control file.
nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2
nospectre_v1 [PPC] Disable mitigations for Spectre Variant 1 (bounds
check bypass). With this option data leaks are possible
in the system.
nospectre_v2 [X86,PPC_FSL_BOOK3E] Disable all mitigations for the Spectre variant 2
(indirect branch prediction) vulnerability. System may
allow data leaks with this option, which is equivalent
to spectre_v2=off.
......@@ -370,11 +370,15 @@ autosuspend the interface's device. When the usage counter is = 0
then the interface is considered to be idle, and the kernel may
autosuspend the device.
Drivers need not be concerned about balancing changes to the usage
counter; the USB core will undo any remaining "get"s when a driver
is unbound from its interface. As a corollary, drivers must not call
any of the ``usb_autopm_*`` functions after their ``disconnect``
routine has returned.
Drivers must be careful to balance their overall changes to the usage
counter. Unbalanced "get"s will remain in effect when a driver is
unbound from its interface, preventing the device from going into
runtime suspend should the interface be bound to a driver again. On
the other hand, drivers are allowed to achieve this balance by calling
the ``usb_autopm_*`` functions even after their ``disconnect`` routine
has returned -- say from within a work-queue routine -- provided they
retain an active reference to the interface (via ``usb_get_intf`` and
Drivers using the async routines are responsible for their own
synchronization and mutual exclusion.
......@@ -86,6 +86,7 @@ implementation.
:maxdepth: 2
Korean translations
This diff is collapsed.
......@@ -402,6 +402,7 @@ tcp_min_rtt_wlen - INTEGER
minimum RTT when it is moved to a longer path (e.g., due to traffic
engineering). A longer window makes the filter more resistant to RTT
inflations such as transient congestion. The unit is seconds.
Possible values: 0 - 86400 (1 day)
Default: 300
tcp_moderate_rcvbuf - BOOLEAN
# -*- coding: utf-8; mode: python -*-
project = "X86 architecture specific documentation"
latex_documents = [
('index', 'x86.tex', project,
'The kernel development community', 'manual'),
x86 architecture specifics
.. toctree::
:maxdepth: 1
Microarchitectural Data Sampling (MDS) mitigation
.. _mds:
Microarchitectural Data Sampling (MDS) is a family of side channel attacks
on internal buffers in Intel CPUs. The variants are:
- Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
- Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
- Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
- Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091)
MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
dependent load (store-to-load forwarding) as an optimization. The forward
can also happen to a faulting or assisting load operation for a different
memory address, which can be exploited under certain conditions. Store
buffers are partitioned between Hyper-Threads so cross thread forwarding is
not possible. But if a thread enters or exits a sleep state the store
buffer is repartitioned which can expose data from one thread to the other.
MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
L1 miss situations and to hold data which is returned or sent in response
to a memory or I/O operation. Fill buffers can forward data to a load
operation and also write data to the cache. When the fill buffer is
deallocated it can retain the stale data of the preceding operations which
can then be forwarded to a faulting or assisting load operation, which can
be exploited under certain conditions. Fill buffers are shared between
Hyper-Threads so cross thread leakage is possible.
MLPDS leaks Load Port Data. Load ports are used to perform load operations
from memory or I/O. The received data is then forwarded to the register
file or a subsequent operation. In some implementations the Load Port can
contain stale data from a previous operation which can be forwarded to
faulting or assisting loads under certain conditions, which again can be
exploited eventually. Load ports are shared between Hyper-Threads so cross
thread leakage is possible.
MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from
memory that takes a fault or assist can leave data in a microarchitectural
structure that may later be observed using one of the same methods used by
Exposure assumptions
It is assumed that attack code resides in user space or in a guest with one
exception. The rationale behind this assumption is that the code construct
needed for exploiting MDS requires:
- to control the load to trigger a fault or assist
- to have a disclosure gadget which exposes the speculatively accessed
data for consumption through a side channel.
- to control the pointer through which the disclosure gadget exposes the
The existence of such a construct in the kernel cannot be excluded with
100% certainty, but the complexity involved makes it extremly unlikely.
There is one exception, which is untrusted BPF. The functionality of
untrusted BPF is limited, but it needs to be thoroughly investigated
whether it can be used to create such a construct.
Mitigation strategy
All variants have the same mitigation strategy at least for the single CPU
thread case (SMT off): Force the CPU to clear the affected buffers.
This is achieved by using the otherwise unused and obsolete VERW
instruction in combination with a microcode update. The microcode clears
the affected CPU buffers when the VERW instruction is executed.
For virtualization there are two ways to achieve CPU buffer
clearing. Either the modified VERW instruction or via the L1D Flush
command. The latter is issued when L1TF mitigation is enabled so the extra
VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to
be issued.
If the VERW instruction with the supplied segment selector argument is
executed on a CPU without the microcode update there is no side effect
other than a small number of pointlessly wasted CPU cycles.
This does not protect against cross Hyper-Thread attacks except for MSBDS
which is only exploitable cross Hyper-thread when one of the Hyper-Threads
enters a C-state.
The kernel provides a function to invoke the buffer clearing:
The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
(idle) transitions.
As a special quirk to address virtualization scenarios where the host has
the microcode updated, but the hypervisor does not (yet) expose the
MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the
hope that it might actually clear the buffers. The state is reflected
According to current knowledge additional mitigations inside the kernel
itself are not required because the necessary gadgets to expose the leaked
data cannot be controlled in a way which allows exploitation from malicious
user space or VM guests.
Kernel internal mitigation modes
======= ============================================================
off Mitigation is disabled. Either the CPU is not affected or
mds=off is supplied on the kernel command line
full Mitigation is enabled. CPU is affected and MD_CLEAR is
advertised in CPUID.
vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not
advertised in CPUID. That is mainly for virtualization
scenarios where the host has the updated microcode but the
hypervisor does not expose MD_CLEAR in CPUID. It's a best
effort approach without guarantee.
======= ============================================================
If the CPU is affected and mds=off is not supplied on the kernel command
line then the kernel selects the appropriate mitigation mode depending on
the availability of the MD_CLEAR CPUID bit.
Mitigation points
1. Return to user space
When transitioning from kernel to user space the CPU buffers are flushed
on affected CPUs when the mitigation is not disabled on the kernel
command line. The migitation is enabled through the static key
The mitigation is invoked in prepare_exit_to_usermode() which covers
all but one of the kernel to user space transitions. The exception
is when we return from a Non Maskable Interrupt (NMI), which is
handled directly in do_nmi().
(The reason that NMI is special is that prepare_exit_to_usermode() can
enable IRQs. In NMI context, NMIs are blocked, and we don't want to
enable IRQs with NMIs blocked.)
2. C-State transition
When a CPU goes idle and enters a C-State the CPU buffers need to be
cleared on affected CPUs when SMT is active. This addresses the
repartitioning of the store buffer when one of the Hyper-Threads enters
a C-State.
When SMT is inactive, i.e. either the CPU does not support it or all
sibling threads are offline CPU buffer clearing is not required.
The idle clearing is enabled on CPUs which are only affected by MSBDS
and not by any other MDS variant. The other MDS variants cannot be
protected against cross Hyper-Thread attacks because the Fill Buffer and
the Load Ports are shared. So on CPUs affected by other variants, the
idle clearing would be a window dressing exercise and is therefore not
The invocation is controlled by the static key mds_idle_clear which is
switched depending on the chosen mitigation mode and the SMT state of
the system.
The buffer clear is only invoked before entering the C-State to prevent
that stale data from the idling CPU from spilling to the Hyper-Thread
sibling after the store buffer got repartitioned and all entries are
available to the non idle sibling.
When coming out of idle the store buffer is partitioned again so each
sibling has half of it available. The back from idle CPU could be then
speculatively exposed to contents of the sibling. The buffers are
flushed either on exit to user space or on VMENTER so malicious code
in user space or the guest cannot speculatively access them.
The mitigation is hooked into all variants of halt()/mwait(), but does
not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver
has been superseded by the intel_idle driver around 2010 and is
preferred on all affected CPUs which are expected to gain the MD_CLEAR
functionality in microcode. Aside of that the IO-Port mechanism is a
legacy interface which is only used on older systems which are either
not affected or do not receive microcode updates anymore.
# SPDX-License-Identifier: GPL-2.0
NAME = Petit Gorille
......@@ -480,7 +480,7 @@ endif
ifeq ($(cc-name),clang)
ifneq ($(CROSS_COMPILE),)
CLANG_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%))
GCC_TOOLCHAIN_DIR := $(dir $(shell which $(LD)))
GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
......@@ -653,8 +653,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, int-in-bool-context)
KBUILD_CFLAGS += $(call cc-disable-warning, attribute-alias)
KBUILD_CFLAGS += $(call cc-option,-Oz,-Os)
KBUILD_CFLAGS += $(call cc-disable-warning,maybe-uninitialized,)
KBUILD_CFLAGS += -Os $(call cc-disable-warning,maybe-uninitialized,)
KBUILD_CFLAGS += -O2 $(call cc-disable-warning,maybe-uninitialized,)
......@@ -9,6 +9,7 @@ CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_PID_NS is not set
......@@ -107,6 +107,7 @@ ENTRY(stext)
; r2 = pointer to uboot provided cmdline or external DTB in mem
; These are handled later in handle_uboot_args()
st r0, [@uboot_tag]
st r1, [@uboot_magic]
st r2, [@uboot_arg]
......@@ -35,6 +35,7 @@ unsigned int intr_to_DE_cnt;
/* Part of U-boot ABI: see head.S */
int __initdata uboot_tag;
int __initdata uboot_magic;
char __initdata *uboot_arg;
const struct machine_desc *machine_desc;
......@@ -433,6 +434,8 @@ static inline bool uboot_arg_invalid(unsigned long addr)
#define UBOOT_TAG_NONE 0
#define UBOOT_TAG_DTB 2
/* We always pass 0 as magic from U-boot */
void __init handle_uboot_args(void)
......@@ -448,6 +451,11 @@ void __init handle_uboot_args(void)
goto ignore_uboot_args;
if (uboot_magic != UBOOT_MAGIC_VALUE) {
pr_warn(IGNORE_ARGS "non zero uboot magic\n");
goto ignore_uboot_args;
if (uboot_tag != UBOOT_TAG_NONE &&
uboot_arg_invalid((unsigned long)uboot_arg)) {
pr_warn(IGNORE_ARGS "invalid uboot arg: '%px'\n", uboot_arg);
......@@ -1393,7 +1393,21 @@ ENTRY(efi_stub_entry)
@ Preserve return value of efi_entry() in r4
mov r4, r0
bl cache_clean_flush
@ our cache maintenance code relies on CP15 barrier instructions
@ but since we arrived here with the MMU and caches configured
@ by UEFI, we must check that the CP15BEN bit is set in SCTLR.
@ Note that this bit is RAO/WI on v6 and earlier, so the ISB in
@ the enable path will be executed on v7+ only.
mrc p15, 0, r1, c1, c0, 0 @ read SCTLR
tst r1, #(1 << 5) @ CP15BEN bit set?
bne 0f
orr r1, r1, #(1 << 5) @ CP15 barrier instructions
mcr p15, 0, r1, c1, c0, 0 @ write SCTLR
ARM( .inst 0xf57ff06f @ v7+ isb )
THUMB( isb )
0: bl cache_clean_flush