Commit 3f16906c authored by Philippe Gerum's avatar Philippe Gerum Committed by Jan Kiszka

cobalt/thread: fix scheduler breakage on thread suspension

In the event that xnthread_suspend() adds blocking bit(s) to a thread
which is running in primary mode on a remote CPU at the time of the
call, a spurious rescheduling call is performed, causing the local CPU
to pick its next thread from the remote CPU's runqueue.

CPU0                              CPU1
----                              ----
                                  t2: ...
t1: suspend(t2)                       ...
       |
       |
       +------> t2->sched == t2->sched->curr (i.e. running primary)
                           |
                           |
                           +----> __xnsched_run(t2->sched);
                                        |
                                        |
                                        v
                          CPU0: ___xnsched__run(CPU1.sched);

IOW, CPU0 would pick the next thread from CPU1's runqueue. Conditions
for observing this bug:

- t1 is running in primary mode on the local CPU (such as
  CPU0:sched->curr == t1) , so that the rescheduling request needs no
  prior escalation to the head stage, allowing ___xnsched_run() to
  execute immediately using the sched slot pointer received from
  __xnsched_run().

- t2 is running in primary mode on a remote CPU (such as
  CPU1:sched->curr == t2) when t1 attempts to suspend it via a call to
  xnthread_suspend().
Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
parent e3a8e572
......@@ -962,12 +962,15 @@ void xnthread_suspend(struct xnthread *thread, int mask,
* opportunity for interrupt delivery right before switching
* context, which shortens the uninterruptible code path.
*
* We have to shut irqs off before __xnsched_run() though: if
* an interrupt could preempt us in ___xnsched_run() right
* after the call to xnarch_escalate() but before we grab the
* nklock, we would enter the critical section in
* xnsched_run() while running in secondary mode, which would
* defeat the purpose of xnarch_escalate().
* We have to shut irqs off before calling __xnsched_run()
* though: if an interrupt could preempt us right after
* xnarch_escalate() is passed but before the nklock is
* grabbed, we would enter the critical section in
* ___xnsched_run() from the root domain, which would defeat
* the purpose of escalating the request.
*
* NOTE: using __xnsched_run() for rescheduling allows us to
* break the scheduler lock temporarily.
*/
if (likely(thread == sched->curr)) {
xnsched_set_resched(sched);
......@@ -978,10 +981,13 @@ void xnthread_suspend(struct xnthread *thread, int mask,
return;
}
/*
* If the thread is runnning on another CPU,
* xnsched_run will trigger the IPI as required.
* If the thread is runnning on a remote CPU,
* xnsched_run() will trigger the IPI as required. In
* this case, sched refers to a remote runqueue, so
* make sure to always kick the rescheduling procedure
* for the local one.
*/
__xnsched_run(sched);
__xnsched_run(xnsched_current());
goto out;
}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment