Skip to content
  • David Howells's avatar
    fscache: Fix dead object requeue · e26bfebd
    David Howells authored
    
    
    Under some circumstances, an fscache object can become queued such that it
    fscache_object_work_func() can be called once the object is in the
    OBJECT_DEAD state.  This results in the kernel oopsing when it tries to
    invoke the handler for the state (which is hard coded to 0x2).
    
    The way this comes about is something like the following:
    
     (1) The object dispatcher is processing a work state for an object.  This
         is done in workqueue context.
    
     (2) An out-of-band event comes in that isn't masked, causing the object to
         be queued, say EV_KILL.
    
     (3) The object dispatcher finishes processing the current work state on
         that object and then sees there's another event to process, so,
         without returning to the workqueue core, it processes that event too.
         It then follows the chain of events that initiates until we reach
         OBJECT_DEAD without going through a wait state (such as
         WAIT_FOR_CLEARANCE).
    
         At this point, object->events may be 0, object->event_mask will be 0
         and oob_event_mask will be 0.
    
     (4) The object dispatcher returns to the workqueue processor, and in due
         course, this sees that the object's work item is still queued and
         invokes it again.
    
     (5) The current state is a work state (OBJECT_DEAD), so the dispatcher
         jumps to it - resulting in an OOPS.
    
    When I'm seeing this, the work state in (1) appears to have been either
    LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is
    fscache_osm_lookup_oob).
    
    The window for (2) is very small:
    
     (A) object->event_mask is cleared whilst the event dispatch process is
         underway - though there's no memory barrier to force this to the top
         of the function.
    
         The window, therefore is from the time the object was selected by the
         workqueue processor and made requeueable to the time the mask was
         cleared.
    
     (B) fscache_raise_event() will only queue the object if it manages to set
         the event bit and the corresponding event_mask bit was set.
    
         The enqueuement is then deferred slightly whilst we get a ref on the
         object and get the per-CPU variable for workqueue congestion.  This
         slight deferral slightly increases the probability by allowing extra
         time for the workqueue to make the item requeueable.
    
    Handle this by giving the dead state a processor function and checking the
    for the dead state address rather than seeing if the processor function is
    address 0x2.  The dead state processor function can then set a flag to
    indicate that it's occurred and give a warning if it occurs more than once
    per object.
    
    If this race occurs, an oops similar to the following is seen (note the RIP
    value):
    
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
    IP: [<0000000000000002>] 0x1
    PGD 0
    Oops: 0010 [#1] SMP
    Modules linked in: ...
    CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1
    Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
    Workqueue: fscache_object fscache_object_work_func [fscache]
    task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000
    RIP: 0010:[<0000000000000002>]  [<0000000000000002>] 0x1
    RSP: 0018:ffff880717547df8  EFLAGS: 00010202
    RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200
    RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480
    RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510
    R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000
    R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600
    FS:  0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
     ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00
     ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000
     ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900
    Call Trace:
     [<ffffffffa0363695>] ? fscache_object_work_func+0xa5/0x200 [fscache]
     [<ffffffff8109d5db>] process_one_work+0x17b/0x470
     [<ffffffff8109e4ac>] worker_thread+0x21c/0x400
     [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
     [<ffffffff810a5acf>] kthread+0xcf/0xe0
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
     [<ffffffff816460d8>] ret_from_fork+0x58/0x90
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
    
    Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    Acked-by: default avatarJeremy McNicoll <jeremymc@redhat.com>
    Tested-by: default avatarFrank Sorenson <sorenson@redhat.com>
    Tested-by: default avatarBenjamin Coddington <bcodding@redhat.com>
    Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
    Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    e26bfebd