IRQ-domain.txt 12.9 KB
Newer Older
1 2 3
The irq_domain interrupt number mapping library
4 5 6 7 8 9 10 11

The current design of the Linux kernel uses a single large number
space where each separate IRQ source is assigned a different number.
This is simple when there is only one interrupt controller, but in
systems with multiple interrupt controllers the kernel must ensure
that each one gets assigned non-overlapping allocations of Linux
IRQ numbers.

12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
The number of interrupt controllers registered as unique irqchips
show a rising tendency: for example subdrivers of different kinds
such as GPIO controllers avoid reimplementing identical callback
mechanisms as the IRQ core system by modelling their interrupt
handlers as irqchips, i.e. in effect cascading interrupt controllers.

Here the interrupt number loose all kind of correspondence to
hardware interrupt numbers: whereas in the past, IRQ numbers could
be chosen so they matched the hardware IRQ line into the root
interrupt controller (i.e. the component actually fireing the
interrupt line to the CPU) nowadays this number is just a number.

For this reason we need a mechanism to separate controller-local
interrupt numbers, called hardware irq's, from Linux IRQ numbers.

27 28 29 30 31 32 33 34 35 36
The irq_alloc_desc*() and irq_free_desc*() APIs provide allocation of
irq numbers, but they don't provide any support for reverse mapping of
the controller-local IRQ (hwirq) number into the Linux IRQ number

The irq_domain library adds mapping between hwirq and IRQ numbers on
top of the irq_alloc_desc*() API.  An irq_domain to manage mapping is
preferred over interrupt controller drivers open coding their own
reverse mapping scheme.

37 38 39
irq_domain also implements translation from an abstract irq_fwspec
structure to hwirq numbers (Device Tree and ACPI GSI so far), and can
be easily extended to support other IRQ topology data sources.

41 42 43
irq_domain usage

44 45 46 47
An interrupt controller driver creates and registers an irq_domain by
calling one of the irq_domain_add_*() functions (each mapping method
has a different allocator function, more on that later).  The function
will return a pointer to the irq_domain on success.  The caller must
provide the allocator function with an irq_domain_ops structure.
49 50 51 52 53 54 55 56 57 58 59 60

In most cases, the irq_domain will begin empty without any mappings
between hwirq and IRQ numbers.  Mappings are added to the irq_domain
by calling irq_create_mapping() which accepts the irq_domain and a
hwirq number as arguments.  If a mapping for the hwirq doesn't already
exist then it will allocate a new Linux irq_desc, associate it with
the hwirq, and call the .map() callback so the driver can perform any
required hardware setup.

When an interrupt is received, irq_find_mapping() function should
be used to find the Linux IRQ number from the hwirq number.

61 62 63 64
The irq_create_mapping() function must be called *atleast once*
before any call to irq_find_mapping(), lest the descriptor will not
be allocated.

65 66 67 68
If the driver has the Linux IRQ number or the irq_data pointer, and
needs to know the associated hwirq number (such as in the irq_chip
callbacks) then it can be directly obtained from irq_data->hwirq.

69 70 71
Types of irq_domain mappings

72 73 74 75 76
There are several mechanisms available for reverse mapping from hwirq
to Linux irq, and each mechanism uses a different allocation function.
Which reverse map type should be used depends on the use case.  Each
of the reverse map types are described below:

77 78 79 80 81 82 83


84 85 86 87 88 89 90 91 92 93 94

The linear reverse map maintains a fixed size table indexed by the
hwirq number.  When a hwirq is mapped, an irq_desc is allocated for
the hwirq, and the IRQ number is stored in the table.

The Linear map is a good choice when the maximum number of hwirqs is
fixed and a relatively small number (~ < 256).  The advantages of this
map are fixed time lookup for IRQ numbers, and irq_descs are only
allocated for in-use IRQs.  The disadvantage is that the table must be
as large as the largest possible hwirq number.

95 96 97 98 99
irq_domain_add_linear() and irq_domain_create_linear() are functionally
equivalent, except for the first argument is different - the former
accepts an Open Firmware specific 'struct device_node', while the latter
accepts a more general abstraction 'struct fwnode_handle'.

100 101
The majority of drivers should use the linear map.

102 103 104 105 106 107 108


109 110 111 112 113 114 115 116 117 118

The irq_domain maintains a radix tree map from hwirq numbers to Linux
IRQs.  When an hwirq is mapped, an irq_desc is allocated and the
hwirq is used as the lookup key for the radix tree.

The tree map is a good choice if the hwirq number can be very large
since it doesn't need to allocate a table as large as the largest
hwirq number.  The disadvantage is that hwirq to IRQ number lookup is
dependent on how many entries are in the table.

119 120 121 122 123
irq_domain_add_tree() and irq_domain_create_tree() are functionally
equivalent, except for the first argument is different - the former
accepts an Open Firmware specific 'struct device_node', while the latter
accepts a more general abstraction 'struct fwnode_handle'.

Very few drivers should need this mapping.

126 127 128 129 130 131
No Map


132 133 134 135 136 137 138 139 140 141

The No Map mapping is to be used when the hwirq number is
programmable in the hardware.  In this case it is best to program the
Linux IRQ number into the hardware itself so that no mapping is
required.  Calling irq_create_direct_mapping() will allocate a Linux
IRQ number and call the .map() callback so that driver can program the
Linux IRQ number into the hardware.

Most drivers cannot use this mapping.

142 143 144 145 146 147 148 149


150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169

The Legacy mapping is a special case for drivers that already have a
range of irq_descs allocated for the hwirqs.  It is used when the
driver cannot be immediately converted to use the linear mapping.  For
example, many embedded system board support files use a set of #defines
for IRQ numbers that are passed to struct device registrations.  In that
case the Linux IRQ numbers cannot be dynamically assigned and the legacy
mapping should be used.

The legacy map assumes a contiguous range of IRQ numbers has already
been allocated for the controller and that the IRQ number can be
calculated by adding a fixed offset to the hwirq number, and
visa-versa.  The disadvantage is that it requires the interrupt
controller to manage IRQ allocations and it requires an irq_desc to be
allocated for every hwirq, even if it is unused.

The legacy map should only be used if fixed IRQ mappings must be
supported.  For example, ISA controllers would use the legacy map for
mapping Linux IRQs 0-15 so that existing ISA drivers get the correct IRQ
170 171 172

Most users of legacy mappings should use irq_domain_add_simple() which
will use a legacy domain only if an IRQ range is supplied by the
173 174 175
system and will otherwise use a linear domain mapping. The semantics
of this call are such that if an IRQ range is specified then
descriptors will be allocated on-the-fly for it, and if no range is
Xishi Qiu's avatar
Xishi Qiu committed
specified it will fall through to irq_domain_add_linear() which means
177 178 179 180 181 182 183 184 185 186
*no* irq descriptors will be allocated.

A typical use case for simple domains is where an irqchip provider
is supporting both dynamic and static IRQ assignments.

In order to avoid ending up in a situation where a linear domain is
used and no descriptor gets allocated it is very important to make sure
that the driver using the simple domain call irq_create_mapping()
before any irq_find_mapping() since the latter will actually work
for the static IRQ assignment case.

188 189 190
Hierarchy IRQ domain

191 192
On some architectures, there may be multiple interrupt controllers
involved in delivering an interrupt from the device to the target CPU.
Let's look at a typical interrupt delivering path on x86 platforms::

  Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU
196 197

There are three interrupt controllers involved:

199 200 201 202 203 204 205 206 207
1) IOAPIC controller
2) Interrupt remapping controller
3) Local APIC controller

To support such a hardware topology and make software architecture match
hardware architecture, an irq_domain data structure is built for each
interrupt controller and those irq_domains are organized into hierarchy.
When building irq_domain hierarchy, the irq_domain near to the device is
child and the irq_domain near to CPU is parent. So a hierarchy structure
208 209
as below will be built for the example above::

210 211 212 213 214 215 216 217 218
	CPU Vector irq_domain (root irq_domain to manage CPU vectors)
	Interrupt Remapping irq_domain (manage irq_remapping entries)
	IOAPIC irq_domain (manage IOAPIC delivery entries/pins)

There are four major interfaces to use hierarchy irq_domain:

220 221 222 223 224 225
1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt
   controller related resources to deliver these interrupts.
2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller
   related resources associated with these interrupts.
3) irq_domain_activate_irq(): activate interrupt controller hardware to
   deliver the interrupt.
4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware
227 228
   to stop delivering the interrupt.

229 230
Following changes are needed to support hierarchy irq_domain:

231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253
1) a new field 'parent' is added to struct irq_domain; it's used to
   maintain irq_domain hierarchy information.
2) a new field 'parent_data' is added to struct irq_data; it's used to
   build hierarchy irq_data to match hierarchy irq_domains. The irq_data
   is used to store irq_domain pointer and hardware irq number.
3) new callbacks are added to struct irq_domain_ops to support hierarchy
   irq_domain operations.

With support of hierarchy irq_domain and hierarchy irq_data ready, an
irq_domain structure is built for each interrupt controller, and an
irq_data structure is allocated for each irq_domain associated with an
IRQ. Now we could go one step further to support stacked(hierarchy)
irq_chip. That is, an irq_chip is associated with each irq_data along
the hierarchy. A child irq_chip may implement a required action by
itself or by cooperating with its parent irq_chip.

With stacked irq_chip, interrupt controller driver only needs to deal
with the hardware managed by itself and may ask for services from its
parent irq_chip when needed. So we could achieve a much cleaner
software architecture.

For an interrupt controller driver to support hierarchy irq_domain, it
needs to:

255 256 257 258 259 260 261 262
1) Implement irq_domain_ops.alloc and
2) Optionally implement irq_domain_ops.activate and
3) Optionally implement an irq_chip to manage the interrupt controller
4) No need to implement and irq_domain_ops.unmap,
   they are unused with hierarchy irq_domain.

263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301
Hierarchy irq_domain is in no way x86 specific, and is heavily used to
support other architectures, such as ARM, ARM64 etc.

=== Debugging ===

If you switch on CONFIG_IRQ_DOMAIN_DEBUG (which depends on
CONFIG_IRQ_DOMAIN and CONFIG_DEBUG_FS), you will find a new file in
your debugfs mount point, called irq_domain_mapping. This file
contains a live snapshot of all the IRQ domains in the system:

 name              mapped  linear-max  direct-max  devtree-node
 pl061                  8           8           0  /smb/gpio@e0080000
 pl061                  8           8           0  /smb/gpio@e1050000
 pMSI                   0           0           0  /interrupt-controller@e1101000/v2m@e0080000
 MSI                   37           0           0  /interrupt-controller@e1101000/v2m@e0080000
 GICv2m                37           0           0  /interrupt-controller@e1101000/v2m@e0080000
 GICv2                448         448           0  /interrupt-controller@e1101000

it also iterates over the interrupts to display their mapping in the
domains, and makes the domain stacking visible:

irq    hwirq    chip name        chip data           active  type            domain
    1  0x00019  GICv2            0xffff00000916bfd8     *    LINEAR          GICv2
    2  0x0001d  GICv2            0xffff00000916bfd8          LINEAR          GICv2
    3  0x0001e  GICv2            0xffff00000916bfd8     *    LINEAR          GICv2
    4  0x0001b  GICv2            0xffff00000916bfd8     *    LINEAR          GICv2
    5  0x0001a  GICv2            0xffff00000916bfd8          LINEAR          GICv2
   96  0x81808  MSI              0x          (null)           RADIX          MSI
   96+ 0x00063  GICv2m           0xffff8003ee116980           RADIX          GICv2m
   96+ 0x00063  GICv2            0xffff00000916bfd8          LINEAR          GICv2
   97  0x08800  MSI              0x          (null)     *     RADIX          MSI
   97+ 0x00064  GICv2m           0xffff8003ee116980     *     RADIX          GICv2m
   97+ 0x00064  GICv2            0xffff00000916bfd8     *    LINEAR          GICv2

Here, interrupts 1-5 are only using a single domain, while 96 and 97
are build out of a stack of three domain, each level performing a
particular function.