1. 01 Nov, 2018 40 commits
    • Philippe Gerum's avatar
      sched/core: ipipe: do not panic on failed migration to the head stage · 89ec8d23
      Philippe Gerum authored
      __ipipe_migrate_head() should not BUG() unconditionally when failing
      to schedule out a thread, but rather let the real-time core handle the
      situation a bit more gracefully.
    • Philippe Gerum's avatar
      ipipe: timer: prevent double-ack if host timer is not grabbed · a79165b1
      Philippe Gerum authored
      Only timers stolen away from the host kernel should be early acked by
      the pipeline core. Otherwise, the regular IRQ handler associated to
      the timer would duplicate the action. The IRQ line is left masked,
      waiting for the IRQ flow handler to unmask it eventually.
    • Philippe Gerum's avatar
      ipipe: timer: notify co-kernel about entering ONESHOT_STOPPED mode · 2259d365
      Philippe Gerum authored
      Although we don't want to disable the hardware not to wreck the
      outstanding timing requests managed by the co-kernel, we should
      nevertheless notify it about entering the ONESHOT_STOPPED mode, so
      that it may disable the host tick emulation.
    • Philippe Gerum's avatar
      ipipe: timer: do not interpose on undefined handlers · 80cf3074
      Philippe Gerum authored
      There is no point in interposing on clock chip handlers for which
      there was no support originally. In some cases (oneshot_stopped), we
      may even get a kernel fault, jumping to a NULL address.
      Interpose on non-NULL original handlers only.
    • Philippe Gerum's avatar
      ipipe: timer: resume hardware operations in oneshot handler · c23aae1e
      Philippe Gerum authored
      Although we won't allow disabling the hardware when the clock event
      logic switches a device to stopped mode - so that we won't affect the
      timer logic running on the head stage unexpectedly -, we still have to
      enable the hardware when switched (back) to oneshot mode, since it may
      have been stopped prior to interposing on the device in
      Failing to do so would leave the hardware shut down for both regular
      and Xenomai operations, with no mean to bring it up again.
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      sched: idle: ipipe: drop spurious check · a24a1f5b
      Philippe Gerum authored
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      printk: ipipe: defer vprintk() output · 17183846
      Philippe Gerum authored
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      ipipe: tick: revive the host tick after device grab · fb24f93f
      Philippe Gerum authored
      Once the device was grabbed by ipipe_timer_start(), any pending host
      tick programmed in the hardware is basically lost, unknown to the
      co-kernel implementing the proxy handlers.
      Schedule a host event with the latest target time programmed to have
      the co-kernel know about the pending tick.
    • Philippe Gerum's avatar
      PM: ipipe: converge to Dovetail's CPUIDLE management · cb5702e0
      Philippe Gerum authored
      Handle requests for transitioning to deeper C-states the way Dovetail
      does, which prevents us from losing the timer when grabbed by a
      co-kernel, in presence of a CPUIDLE driver.
    • Philippe Gerum's avatar
      ipipe: tick: cap timer_set op to device supported max · 78580ae1
      Philippe Gerum authored
      At this chance, switch the min_delay_tick value to unsigned long to
      match the corresponding clockevent definition.
    • Philippe Gerum's avatar
      ipipe: tick: out-of-band devices require GENERIC_CLOCKEVENTS · a1d64704
      Philippe Gerum authored
      Drop the legacy support for architectures not enabling the generic
      clock event framework, which would only provide periodic timing.
      We don't support any of those archs, and there is no point in running
      a Xenomai co-kernel on a hardware not capable of handling oneshot
      timing requests.
    • Philippe Gerum's avatar
      ftrace: ipipe: rely on fully atomic stop_machine() handler · 92783263
      Philippe Gerum authored
      Now that stop_machine() guarantees fully atomic execution of the stop
      routine via hard interrupt disabling, there is no point in using
      ipipe_critical_enter/exit() for the same purpose in order to patch the
      kernel text.
    • Philippe Gerum's avatar
      stop_machine: ipipe: ensure atomic stop-context operations · aed125e2
      Philippe Gerum authored
      stop_machine() guarantees that all online CPUs are spinning
      non-preemptible in a known code location before a subset of them may
      safely run a stop-context function. This service is typically useful
      for live patching the kernel code, or changing global memory mappings,
      so that no activity could run in parallel until the system has
      returned to a stable state after all stop-context operations have
      When interrupt pipelining is enabled, we have to provide the same
      guarantee by restoring hard interrupt disabling where virtualizing the
      interrupt disable flag would defeat it.
    • Philippe Gerum's avatar
      lockdep: ipipe: improve detection of out-of-band contexts · 3e6b6433
      Philippe Gerum authored
      trace_hardirqs_on_virt[_caller]() must be invoked instead of
      trace_hardirqs_on[_caller]() from assembly sites before returning from
      an interrupt/fault, so that the virtual IRQ disable state is checked
      for before switching the tracer's logic state to ON.
      This is required as an interrupt may be received and handled by the
      pipeline core although not forwarded to the root domain, when
      interrupts are virtually disabled. In such a case, we want to
      reconcile the tracer's logic with the effect of interrupt pipelining.
    • Philippe Gerum's avatar
      lockdep: ipipe: make the logic aware of interrupt pipelining · 393ce342
      Philippe Gerum authored
      The lockdep engine will check for the current interrupt state as part
      of the locking validation process, which must encompass:
      - the CPU interrupt state
      - the current pipeline domain
      - the virtual interrupt disable flag
      so that we can traverse the tracepoints from any context sanely and
      In addition trace_hardirqs_on_virt_caller() should be called by the
      arch-dependent code when tracking the interrupt state before returning
      to user-space after a kernel entry (exceptions, IRQ). This makes sure
      that the tracking logic only applies to the root domain, and considers
      the virtual disable flag exclusively.
      For instance, the kernel may be entered when interrupts are (only)
      virtually disabled for the root domain (i.e. stalled), and we should
      tell the IRQ tracing logic that IRQs are about to be enabled back only
      if the root domain is unstalled before leaving to user-space. In such
      a context, the state of the interrupt bit in the CPU would be
    • Philippe Gerum's avatar
      ipipe: add cpuidle control interface · caf90a26
      Philippe Gerum authored
      Add a kernel interface for sharing CPU idling control between the host
      kernel and a co-kernel. The former invokes ipipe_cpuidle_control()
      which the latter should implement, for determining whether entering a
      sleep state is ok. This hook should return boolean true if so.
      The co-kernel may veto such entry if need be, in order to prevent
      latency spikes, as exiting sleep states might be costly depending on
      the CPU idling operation being used.
    • Philippe Gerum's avatar
      ftrace: ipipe: enable tracing from the head domain · 1430346b
      Philippe Gerum authored
      Enabling ftrace for a co-kernel running in the head domain of a
      pipelined interrupt context means to:
      - make sure that ftrace's live kernel code patching still runs
        unpreempted by any head domain activity (so that the latter can't
        tread on invalid or half-baked changes in the .text section).
      - allow the co-kernel code running in the head domain to traverse
        ftrace's tracepoints safely.
      The changes introduced by this commit ensure this by fixing up some
      key critical sections so that interrupts are still disabled in the
      CPU, undoing the interrupt flag virtualization in those particular
    • Philippe Gerum's avatar
      fork: ipipe: announce mm dismantling · 398ab46e
      Philippe Gerum authored
      IPIPE_KEVT_CLEANUP is emitted before a process memory context is
      entirely dropped, after all the mappings have been exited. Per-process
      resources which might be maintained by the co-kernel could be released
      there, as all tasks have exited.
    • Philippe Gerum's avatar
      sched: ipipe: announce CPU affinity change · d1e6470d
      Philippe Gerum authored
      Emit IPIPE_KEVT_SETAFFINITY to the co-kernel when the target task is
      about to move to another CPU.
      CPU migration can only take place from the root domain, the pipeline
      does not provide any support for migrating tasks from the head domain,
      and derives several key assumptions based on this invariant.
    • Philippe Gerum's avatar
      sched: ipipe: announce signal receipt · 1e6986cc
      Philippe Gerum authored
      Emit IPIPE_KEVT_SIGWAKE when the target task is about to receive a
      (regular) signal. The co-kernel may decide to schedule a transition of
      the recipient to the root domain in order to have it handle that
      signal asap, which is commonly required for keeping the kernel sane.
      This notification is always sent from the context of the issuer.
    • Philippe Gerum's avatar
      sched: ipipe: announce task exit · 4491df7d
      Philippe Gerum authored
      Emit IPIPE_KEVT_EXIT from do_exit() to the co-kernel before the
      current task has dropped the files and mappings it owns.
    • Philippe Gerum's avatar
      KVM: ipipe: keep hypervisor state consistent across domain preemption · 1db0d31d
      Philippe Gerum authored
      In order for the hypervisor to operate properly in presence of a
      co-kernel, we need:
      - the virtualization core to know when the hypervisor stalls due
        to a preemption by the co-kernel.
      - to know when the VM enters and leaves guest mode.
    • Philippe Gerum's avatar
      sched: ipipe: add domain debug checks to common scheduling paths · 5de5c14f
      Philippe Gerum authored
      Catch invalid calls of root-only code from the head domain from common
      paths which may lead to blocking the current task linux-wise. Checks
      are enabled by CONFIG_IPIPE_DEBUG_CONTEXT.
    • Philippe Gerum's avatar
      sched: ipipe: enable task migration between domains · 957ac4c9
      Philippe Gerum authored
      This is the basic code enabling alternate control of tasks between the
      regular kernel and an embedded co-kernel. The changes cover the
      following aspects:
      - extend the per-thread information block with a private area usable
        by the co-kernel for storing additional state information
      - provide the API enabling a scheduler exchange mechanism, so that
        tasks can run under the control of either kernel alternatively. This
        includes a service to move the current task to the head domain under
        the control of the co-kernel, and the converse service to re-enter
        the root domain once the co-kernel has released such task.
      - ensure the generic context switching code can be used from any
        domain, serializing execution as required.
      These changes have to be paired with arch-specific code further
      enabling context switching from the head domain.
    • Philippe Gerum's avatar
      clockevents: ipipe: connect clock chips to abstract tick device · 1befe95a
      Philippe Gerum authored
      Announce all clock event chips as they are registered to the
      out-of-band tick device infrastructure, so that we can interpose on
      key handlers in their descriptors.
    • Philippe Gerum's avatar
      timekeeping: ipipe: forward clock shift value to DSO helpers · 9e2ee43a
      Philippe Gerum authored
      In order to propagate the "host real-time update" event to a co-kernek
      (IPIPE_KEVT_HOSTRT), we need the clock shift value of the monotonic
      clock to be passed to the legacy vDSO handler, for (re)calculating the
      new wall clock time which is eventually announced to the co-kernel.
      Only architectures which still implement the legacy
      update_vsyscall_old() interface need this change.
    • Philippe Gerum's avatar
      ipipe: add kernel event notifiers · 96df83cb
      Philippe Gerum authored
      Add the core API for enabling (regular) kernel event notifications to
      a co-kernel running over the head domain. For instance, such a
      co-kernel may need to know when a task is about to be resumed upon
      signal receipt, or when it gets an access fault trap.
      This commit adds the client-side API for enabling such notification
      for class of events, but does not provide the notification points per
      se, which comes later.
    • Philippe Gerum's avatar
      printk: ipipe: add raw console channel · 4cd03bbb
      Philippe Gerum authored
      A raw output handler (.write_raw) is added to the console descriptor
      for writing (short) text output unmodified, without any logging,
      header or preparation whatsoever, usable from any pipeline domain.
      The dedicated raw_printk() variant formats the output message then
      passes it on to the handler holding a hard spinlock, irqs off.
      This is a very basic debug channel for situations when resorting to
      the fairly complex printk() handling is not an option. Unlike early
      consoles, regular consoles can provide a raw output service past the
      boot sequence. Raw output handlers are typically provided by serial
      console devices.
    • Philippe Gerum's avatar
      dump_stack: ipipe: make dump_stack() domain-aware · 9ac7f531
      Philippe Gerum authored
      When dumping a stack backtrace, we neither need nor want to disable
      root stage IRQs over the head stage, where CPU migration can't
      Conversely, we neither need nor want to disable hard IRQs from the
      head stage, so that latency won't skyrocket either.
    • Philippe Gerum's avatar
      printk: ipipe: defer printk() from head domain · 5f8436dc
      Philippe Gerum authored
      The printk() machinery cannot immediately invoke the console driver(s)
      when called from the head domain, since such driver code belongs to
      the root domain and cannot be shared between domains.
      Output issued from the head domain is formatted then logged into a
      staging buffer, and a dedicated virtual IRQ is posted to the root
      domain for notification. When the virtual IRQ handler runs, the
      contents of the staging buffer is flushed to the printk() interface
      anew, which may eventually pass the output on to the console drivers
      from such a context.
    • Philippe Gerum's avatar
      PM / hibernate: ipipe: protect against out-of-band interrupts · c9a32947
      Philippe Gerum authored
      We must not allow out-of-band activity to resume while we are busy
      suspending the devices in the system, until the PM sleep state has
      been fully entered.
      Pair existing virtual IRQ disabling calls which only apply to the root
      domain with hard ones.
    • Philippe Gerum's avatar
      module: ipipe: enable try_module_get() from hard atomic context · 0c527f67
      Philippe Gerum authored
      We might have out-of-band code calling try_module_get() from the head
      domain, or from a section covered by a hard spinlock where the root
      domain must not reschedule. This requires the preemption management
      calls in try_module_get() (and the converse module_put()) to be
      converted to their hard variant.
      REVISIT: using try_module_get() from such contexts is questionable,
      client domains should be fixed.
    • Philippe Gerum's avatar
      KGDB: ipipe: enable debugging over the head domain · 4e5d5bc7
      Philippe Gerum authored
      Make the KGDB stub runnable over the head domain since we may take
      traps and interrupts from that context too, by converting the locks to
      hard spinlocks.
    • Philippe Gerum's avatar
      context_tracking: ipipe: do not track over the head domain · 58992a82
      Philippe Gerum authored
      Context tracking is a prerequisite for FULL_NOHZ, so that the RCU
      subsystem can detect CPU idleness without relying on the (regular)
      timer tick.
      Out-of-band activity running over the head domain should by definition
      not be involved in such detection logic, as the root domain has no
      knowledge of what happens - and when - on the head domain whatsoever.
    • Philippe Gerum's avatar
      ipipe: add latency tracer · 20700661
      Philippe Gerum authored
      The latency tracer is a variant of ftrace's 'function' tracer
      providing detailed information about the current interrupt state at
      each function entry (i.e. virtual interrupt flag and CPU interrupt
      disable bit). This commit introduces the generic tracer code, which
      builds upon the regular ftrace API.
      The arch-specific code should provide for ipipe_read_tsc(), a helper
      routine returning a 64bit monotonic time value for timestamping
      purpose. HAVE_IPIPE_TRACER_SUPPORT should be selected by the
      arch-specific code for enabling the tracer, which in turn makes
      CONFIG_IPIPE_TRACE available from the Kconfig interface.
    • Philippe Gerum's avatar
      ipipe: add out-of-band tick device · 88f3dede
      Philippe Gerum authored
      The out-of-band tick device manages the timer hardware by interposing
      on selected clockevent handlers transparently, so that a client domain
      (e.g. a co-kernel) eventually controls such hardware for scheduling
      the high-precision timer events it needs to. Those events are
      delivered to out-of-hand activities running on the head stage,
      unimpeded by (only virtually) interrupt-free sections of the regular
      kernel code.
      This commit introduces the generic API for controlling the out-of-band
      tick device from a co-kernel. It also provides for the internal API
      clock event chip drivers should use for enabling high-precision
      timing for their hardware.
    • Philippe Gerum's avatar
      locking: ipipe: add hard lock alternative to regular spinlocks · 7c28f350
      Philippe Gerum authored
      Hard spinlocks manipulate the CPU interrupt mask, without affecting
      the kernel preemption state in locking/unlocking operations.
      This type of spinlock is useful for implementing a critical section to
      serialize concurrent accesses from both in-band and out-of-band
      contexts, i.e. from root and head stages.
      Hard spinlocks exclusively depend on the pre-existing arch-specific
      bits which implement regular spinlocks. They can be seen as basic
      spinlocks still affecting the CPU's interrupt state when all other
      spinlock types only deal with the virtual interrupt flag managed by
      the pipeline core - i.e. only disable interrupts for the regular
      in-band kernel activity.