Commit 9bdcb44e authored by Rafael J. Wysocki's avatar Rafael J. Wysocki

cpufreq: schedutil: New governor based on scheduler utilization data

Add a new cpufreq scaling governor, called "schedutil", that uses
scheduler-provided CPU utilization information as input for making
its decisions.

Doing that is possible after commit 34e2c555 (cpufreq: Add
mechanism for registering utilization update callbacks) that
introduced cpufreq_update_util() called by the scheduler on
utilization changes (from CFS) and RT/DL task status updates.
In particular, CPU frequency scaling decisions may be based on
the the utilization data passed to cpufreq_update_util() by CFS.

The new governor is relatively simple.

The frequency selection formula used by it depends on whether or not
the utilization is frequency-invariant.  In the frequency-invariant
case the new CPU frequency is given by

	next_freq = 1.25 * max_freq * util / max

where util and max are the last two arguments of cpufreq_update_util().
In turn, if util is not frequency-invariant, the maximum frequency in
the above formula is replaced with the current frequency of the CPU:

	next_freq = 1.25 * curr_freq * util / max

The coefficient 1.25 corresponds to the frequency tipping point at
(util / max) = 0.8.

All of the computations are carried out in the utilization update
handlers provided by the new governor.  One of those handlers is
used for cpufreq policies shared between multiple CPUs and the other
one is for policies with one CPU only (and therefore it doesn't need
to use any extra synchronization means).

The governor supports fast frequency switching if that is supported
by the cpufreq driver in use and possible for the given policy.
In the fast switching case, all operations of the governor take
place in its utilization update handlers.  If fast switching cannot
be used, the frequency switch operations are carried out with the
help of a work item which only calls __cpufreq_driver_target()
(under a mutex) to trigger a frequency update (to a value already
computed beforehand in one of the utilization update handlers).

Currently, the governor treats all of the RT and DL tasks as
"unknown utilization" and sets the frequency to the allowed
maximum when updated from the RT or DL sched classes.  That
heavy-handed approach should be replaced with something more
subtle and specifically targeted at RT and DL tasks.

The governor shares some tunables management code with the
"ondemand" and "conservative" governors and uses some common
definitions from cpufreq_governor.h, but apart from that it
is stand-alone.
Signed-off-by: default avatarRafael J. Wysocki <>
Acked-by: default avatarViresh Kumar <>
Acked-by: default avatarPeter Zijlstra (Intel) <>
parent b7898fda
......@@ -107,6 +107,16 @@ config CPU_FREQ_DEFAULT_GOV_CONSERVATIVE
Be aware that not all cpufreq drivers support the conservative
governor. If unsure have a look at the help section of the
driver. Fallback governor will be the performance governor.
bool "schedutil"
Use the 'schedutil' CPUFreq governor by default. If unsure,
have a look at the help section of that governor. The fallback
governor will be 'performance'.
......@@ -188,6 +198,26 @@ config CPU_FREQ_GOV_CONSERVATIVE
If in doubt, say N.
tristate "'schedutil' cpufreq policy governor"
depends on CPU_FREQ
select IRQ_WORK
This governor makes decisions based on the utilization data provided
by the scheduler. It sets the CPU frequency to be proportional to
the utilization/capacity ratio coming from the scheduler. If the
utilization is frequency-invariant, the new frequency is also
proportional to the maximum available frequency. If that is not the
case, it is proportional to the current frequency of the CPU. The
frequency tipping point is at utilization/capacity equal to 80% in
both cases.
To compile this driver as a module, choose M here: the module will
be called cpufreq_schedutil.
If in doubt, say N.
comment "CPU frequency scaling drivers"
......@@ -24,3 +24,4 @@ obj-$(CONFIG_SCHEDSTATS) += stats.o
obj-$(CONFIG_SCHED_DEBUG) += debug.o
obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o
obj-$(CONFIG_CPU_FREQ) += cpufreq.o
obj-$(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) += cpufreq_schedutil.o
This diff is collapsed.
......@@ -1842,6 +1842,14 @@ static inline void cpufreq_update_util(u64 time, unsigned long util, unsigned lo
static inline void cpufreq_trigger_update(u64 time) {}
#endif /* CONFIG_CPU_FREQ */
#ifdef arch_scale_freq_capacity
#ifndef arch_scale_freq_invariant
#define arch_scale_freq_invariant() (true)
#else /* arch_scale_freq_capacity */
#define arch_scale_freq_invariant() (false)
static inline void account_reset_rq(struct rq *rq)
......@@ -15,5 +15,6 @@
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment