include/cobalt/kernel/sched.h · f5a67a249e59d09f5a99ae9d43fd442ac1eea182 · xenomai / xenomai

cobalt/sched: improve watchdog accuracy · f5a67a24

Philippe Gerum authored Sep 04, 2018 and

Jan Kiszka committed Sep 06, 2018



The original watchdog mechanism was based on a sampling method: every
second (built-in value), it used to check the runtime mode of the
current task preempted on the ticking CPU. A per-cpu counter was
increased by one every time rt/primary mode was detected, then checked
against the trigger limit (CONFIG_XENO_OPT_WATCHDOG_TIMEOUT).
Otherwise, the counter was reset to zero.

With this fairly naive approach, it only takes a single hit with
CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=1 to trigger the watchdog, i.e. if
the system-fixed 1s watchdog tick preempts any Xenomai task when it is
running in primary mode on the current CPU, the watchdog fires.

The default value of 4s papered over the inherent imprecision of such
a coarse-grained method, lengthening the odds of observing false
positive triggers.

To improve the accuracy of the watchdog, arm the watchdog timer to
fire at the final trigger date directly, right before switching the
CPU to primary mode (leave_root()), disarming it when the CPU is about
to switch back to secondary mode (enter_root()).

Better accuracy comes at the expense of slightly more overhead when
transitioning between primary and secondary modes, which should be
acceptable for a debug feature which is not affecting the hot path
anyway (i.e. there is no added cost for strictly rt context switches).

Signed-off-by: Philippe Gerum <rpm@xenomai.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

f5a67a24