Skip to content
  • NeilBrown's avatar
    workqueue: allow rescuer thread to do more work. · 008847f6
    NeilBrown authored
    
    
    When there is serious memory pressure, all workers in a pool could be
    blocked, and a new thread cannot be created because it requires memory
    allocation.
    
    In this situation a WQ_MEM_RECLAIM workqueue will wake up the
    rescuer thread to do some work.
    
    The rescuer will only handle requests that are already on ->worklist.
    If max_requests is 1, that means it will handle a single request.
    
    The rescuer will be woken again in 100ms to handle another max_requests
    requests.
    
    I've seen a machine (running a 3.0 based "enterprise" kernel) with
    thousands of requests queued for xfslogd, which has a max_requests of
    1, and is needed for retiring all 'xfs' write requests.  When one of
    the worker pools gets into this state, it progresses extremely slowly
    and possibly never recovers (only waited an hour or two).
    
    With this patch we leave a pool_workqueue on mayday list
    until it is clearly no longer in need of assistance.  This allows
    all requests to be handled in a timely fashion.
    
    We keep each pool_workqueue on the mayday list until
    need_to_create_worker() is false, and no work for this workqueue is
    found in the pool.
    
    I have tested this in combination with a (hackish) patch which forces
    all work items to be handled by the rescuer thread.  In that context
    it significantly improves performance.  A similar patch for a 3.0
    kernel significantly improved performance on a heavy work load.
    
    Thanks to Jan Kara for some design ideas, and to Dongsu Park for
    some comments and testing.
    
    tj: Inverted the lock order between wq_mayday_lock and pool->lock with
        a preceding patch and simplified this patch.  Added comment and
        updated changelog accordingly.  Dongsu spotted missing get_pwq()
        in the simplified code.
    
    Cc: Dongsu Park <dongsu.park@profitbricks.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
    Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    008847f6