Skip to content
  • John Sperbeck's avatar
    percpu: remove spurious lock dependency between percpu and sched · 5f7dc5a4
    John Sperbeck authored
    [ Upstream commit 198790d9
    
     ]
    
    In free_percpu() we sometimes call pcpu_schedule_balance_work() to
    queue a work item (which does a wakeup) while holding pcpu_lock.
    This creates an unnecessary lock dependency between pcpu_lock and
    the scheduler's pi_lock.  There are other places where we call
    pcpu_schedule_balance_work() without hold pcpu_lock, and this case
    doesn't need to be different.
    
    Moving the call outside the lock prevents the following lockdep splat
    when running tools/testing/selftests/bpf/{test_maps,test_progs} in
    sequence with lockdep enabled:
    
    ======================================================
    WARNING: possible circular locking dependency detected
    5.1.0-dbg-DEV #1 Not tainted
    ------------------------------------------------------
    kworker/23:255/18872 is trying to acquire lock:
    000000000bc79290 (&(&pool->lock)->rlock){-.-.}, at: __queue_work+0xb2/0x520
    
    but task is already holding lock:
    00000000e3e7a6aa (pcpu_lock){..-.}, at: free_percpu+0x36/0x260
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #4 (pcpu_lock){..-.}:
           lock_acquire+0x9e/0x180
           _raw_spin_lock_irqsave+0x3a/0x50
           pcpu_alloc+0xfa/0x780
           __alloc_percpu_gfp+0x12/0x20
           alloc_htab_elem+0x184/0x2b0
           __htab_percpu_map_update_elem+0x252/0x290
           bpf_percpu_hash_update+0x7c/0x130
           __do_sys_bpf+0x1912/0x1be0
           __x64_sys_bpf+0x1a/0x20
           do_syscall_64+0x59/0x400
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
    
    -> #3 (&htab->buckets[i].lock){....}:
           lock_acquire+0x9e/0x180
           _raw_spin_lock_irqsave+0x3a/0x50
           htab_map_update_elem+0x1af/0x3a0
    
    -> #2 (&rq->lock){-.-.}:
           lock_acquire+0x9e/0x180
           _raw_spin_lock+0x2f/0x40
           task_fork_fair+0x37/0x160
           sched_fork+0x211/0x310
           copy_process.part.43+0x7b1/0x2160
           _do_fork+0xda/0x6b0
           kernel_thread+0x29/0x30
           rest_init+0x22/0x260
           arch_call_rest_init+0xe/0x10
           start_kernel+0x4fd/0x520
           x86_64_start_reservations+0x24/0x26
           x86_64_start_kernel+0x6f/0x72
           secondary_startup_64+0xa4/0xb0
    
    -> #1 (&p->pi_lock){-.-.}:
           lock_acquire+0x9e/0x180
           _raw_spin_lock_irqsave+0x3a/0x50
           try_to_wake_up+0x41/0x600
           wake_up_process+0x15/0x20
           create_worker+0x16b/0x1e0
           workqueue_init+0x279/0x2ee
           kernel_init_freeable+0xf7/0x288
           kernel_init+0xf/0x180
           ret_from_fork+0x24/0x30
    
    -> #0 (&(&pool->lock)->rlock){-.-.}:
           __lock_acquire+0x101f/0x12a0
           lock_acquire+0x9e/0x180
           _raw_spin_lock+0x2f/0x40
           __queue_work+0xb2/0x520
           queue_work_on+0x38/0x80
           free_percpu+0x221/0x260
           pcpu_freelist_destroy+0x11/0x20
           stack_map_free+0x2a/0x40
           bpf_map_free_deferred+0x3c/0x50
           process_one_work+0x1f7/0x580
           worker_thread+0x54/0x410
           kthread+0x10f/0x150
           ret_from_fork+0x24/0x30
    
    other info that might help us debug this:
    
    Chain exists of:
      &(&pool->lock)->rlock --> &htab->buckets[i].lock --> pcpu_lock
    
     Possible unsafe locking scenario:
    
           CPU0                    CPU1
           ----                    ----
      lock(pcpu_lock);
                                   lock(&htab->buckets[i].lock);
                                   lock(pcpu_lock);
      lock(&(&pool->lock)->rlock);
    
     *** DEADLOCK ***
    
    3 locks held by kworker/23:255/18872:
     #0: 00000000b36a6e16 ((wq_completion)events){+.+.},
         at: process_one_work+0x17a/0x580
     #1: 00000000dfd966f0 ((work_completion)(&map->work)){+.+.},
         at: process_one_work+0x17a/0x580
     #2: 00000000e3e7a6aa (pcpu_lock){..-.},
         at: free_percpu+0x36/0x260
    
    stack backtrace:
    CPU: 23 PID: 18872 Comm: kworker/23:255 Not tainted 5.1.0-dbg-DEV #1
    Hardware name: ...
    Workqueue: events bpf_map_free_deferred
    Call Trace:
     dump_stack+0x67/0x95
     print_circular_bug.isra.38+0x1c6/0x220
     check_prev_add.constprop.50+0x9f6/0xd20
     __lock_acquire+0x101f/0x12a0
     lock_acquire+0x9e/0x180
     _raw_spin_lock+0x2f/0x40
     __queue_work+0xb2/0x520
     queue_work_on+0x38/0x80
     free_percpu+0x221/0x260
     pcpu_freelist_destroy+0x11/0x20
     stack_map_free+0x2a/0x40
     bpf_map_free_deferred+0x3c/0x50
     process_one_work+0x1f7/0x580
     worker_thread+0x54/0x410
     kthread+0x10f/0x150
     ret_from_fork+0x24/0x30
    
    Signed-off-by: default avatarJohn Sperbeck <jsperbeck@google.com>
    Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
    Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
    5f7dc5a4