1. 22 Feb, 2018 1 commit
  2. 10 Oct, 2017 1 commit
    • Peter Zijlstra's avatar
      locking/lockdep: Fix stacktrace mess · 8b405d5c
      Peter Zijlstra authored
      There is some complication between check_prevs_add() and
      check_prev_add() wrt. saving stack traces. The problem is that we want
      to be frugal with saving stack traces, since it consumes static
      resources.
      
      We'll only know in check_prev_add() if we need the trace, but we can
      call into it multiple times. So we want to do on-demand and re-use.
      
      A further complication is that check_prev_add() can drop graph_lock
      and mess with our static resources.
      
      In any case, the current state; after commit:
      
        ce07a941 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")
      
      is that we'll assume the trace contains valid data once
      check_prev_add() returns '2'. However, as noted by Josh, this is
      false, check_prev_add() can return '2' before having saved a trace,
      this then result in the possibility of using uninitialized data.
      Testing, as reported by Wu, shows a NULL deref.
      
      So simplify.
      
      Since the graph_lock() thing is a debug path that hasn't
      really been used in a long while, take it out back and avoid the
      head-ache.
      
      Further initialize the stack_trace to a known 'empty' state; as long
      as nr_entries == 0, nothing should deref entries. We can then use the
      'entries == NULL' test for a valid trace / on-demand saving.
      Analyzed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: ce07a941 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8b405d5c
  3. 29 Aug, 2017 1 commit
  4. 25 Aug, 2017 1 commit
    • Peter Zijlstra's avatar
      locking/lockdep: Fix workqueue crossrelease annotation · e6f3faa7
      Peter Zijlstra authored
      The new completion/crossrelease annotations interact unfavourable with
      the extant flush_work()/flush_workqueue() annotations.
      
      The problem is that when a single work class does:
      
        wait_for_completion(&C)
      
      and
      
        complete(&C)
      
      in different executions, we'll build dependencies like:
      
        lock_map_acquire(W)
        complete_acquire(C)
      
      and
      
        lock_map_acquire(W)
        complete_release(C)
      
      which results in the dependency chain: W->C->W, which lockdep thinks
      spells deadlock, even though there is no deadlock potential since
      works are ran concurrently.
      
      One possibility would be to change the work 'lock' to recursive-read,
      but that would mean hitting a lockdep limitation on recursive locks.
      Also, unconditinoally switching to recursive-read here would fail to
      detect the actual deadlock on single-threaded workqueues, which do
      have a problem with this.
      
      For now, forcefully disregard these locks for crossrelease.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boqun.feng@gmail.com
      Cc: byungchul.park@lge.com
      Cc: david@fromorbit.com
      Cc: johannes@sipsolutions.net
      Cc: oleg@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e6f3faa7
  5. 14 Aug, 2017 2 commits
  6. 10 Aug, 2017 10 commits
    • Byungchul Park's avatar
      locking/lockdep: Make print_circular_bug() aware of crossrelease · 383a4bc8
      Byungchul Park authored
      print_circular_bug() reporting circular bug assumes that target hlock is
      owned by the current. However, in crossrelease, target hlock can be
      owned by other than the current. So the report format needs to be
      changed to reflect the change.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: boqun.feng@gmail.com
      Cc: kernel-team@lge.com
      Cc: kirill@shutemov.name
      Cc: npiggin@gmail.com
      Cc: walken@google.com
      Cc: willy@infradead.org
      Link: http://lkml.kernel.org/r/1502089981-21272-9-git-send-email-byungchul.park@lge.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      383a4bc8
    • Byungchul Park's avatar
      locking/lockdep: Handle non(or multi)-acquisition of a crosslock · 28a903f6
      Byungchul Park authored
      No acquisition might be in progress on commit of a crosslock. Completion
      operations enabling crossrelease are the case like:
      
         CONTEXT X                         CONTEXT Y
         ---------                         ---------
         trigger completion context
                                           complete AX
                                              commit AX
         wait_for_complete AX
            acquire AX
            wait
      
         where AX is a crosslock.
      
      When no acquisition is in progress, we should not perform commit because
      the lock does not exist, which might cause incorrect memory access. So
      we have to track the number of acquisitions of a crosslock to handle it.
      
      Moreover, in case that more than one acquisition of a crosslock are
      overlapped like:
      
         CONTEXT W        CONTEXT X        CONTEXT Y        CONTEXT Z
         ---------        ---------        ---------        ---------
         acquire AX (gen_id: 1)
                                           acquire A
                          acquire AX (gen_id: 10)
                                           acquire B
                                           commit AX
                                                            acquire C
                                                            commit AX
      
         where A, B and C are typical locks and AX is a crosslock.
      
      Current crossrelease code performs commits in Y and Z with gen_id = 10.
      However, we can use gen_id = 1 to do it, since not only 'acquire AX in X'
      but 'acquire AX in W' also depends on each acquisition in Y and Z until
      their commits. So make it use gen_id = 1 instead of 10 on their commits,
      which adds an additional dependency 'AX -> A' in the example above.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: boqun.feng@gmail.com
      Cc: kernel-team@lge.com
      Cc: kirill@shutemov.name
      Cc: npiggin@gmail.com
      Cc: walken@google.com
      Cc: willy@infradead.org
      Link: http://lkml.kernel.org/r/1502089981-21272-8-git-send-email-byungchul.park@lge.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      28a903f6
    • Byungchul Park's avatar
      locking/lockdep: Detect and handle hist_lock ring buffer overwrite · 23f873d8
      Byungchul Park authored
      The ring buffer can be overwritten by hardirq/softirq/work contexts.
      That cases must be considered on rollback or commit. For example,
      
                |<------ hist_lock ring buffer size ----->|
                ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
      wrapped > iiiiiiiiiiiiiiiiiiiiiii....................
      
                where 'p' represents an acquisition in process context,
                'i' represents an acquisition in irq context.
      
      On irq exit, crossrelease tries to rollback idx to original position,
      but it should not because the entry already has been invalid by
      overwriting 'i'. Avoid rollback or commit for entries overwritten.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: boqun.feng@gmail.com
      Cc: kernel-team@lge.com
      Cc: kirill@shutemov.name
      Cc: npiggin@gmail.com
      Cc: walken@google.com
      Cc: willy@infradead.org
      Link: http://lkml.kernel.org/r/1502089981-21272-7-git-send-email-byungchul.park@lge.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      23f873d8
    • Byungchul Park's avatar
      locking/lockdep: Implement the 'crossrelease' feature · b09be676
      Byungchul Park authored
      Lockdep is a runtime locking correctness validator that detects and
      reports a deadlock or its possibility by checking dependencies between
      locks. It's useful since it does not report just an actual deadlock but
      also the possibility of a deadlock that has not actually happened yet.
      That enables problems to be fixed before they affect real systems.
      
      However, this facility is only applicable to typical locks, such as
      spinlocks and mutexes, which are normally released within the context in
      which they were acquired. However, synchronization primitives like page
      locks or completions, which are allowed to be released in any context,
      also create dependencies and can cause a deadlock.
      
      So lockdep should track these locks to do a better job. The 'crossrelease'
      implementation makes these primitives also be tracked.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: boqun.feng@gmail.com
      Cc: kernel-team@lge.com
      Cc: kirill@shutemov.name
      Cc: npiggin@gmail.com
      Cc: walken@google.com
      Cc: willy@infradead.org
      Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b09be676
    • Byungchul Park's avatar
      locking/lockdep: Make check_prev_add() able to handle external stack_trace · ce07a941
      Byungchul Park authored
      Currently, a space for stack_trace is pinned in check_prev_add(), that
      makes us not able to use external stack_trace. The simplest way to
      achieve it is to pass an external stack_trace as an argument.
      
      A more suitable solution is to pass a callback additionally along with
      a stack_trace so that callers can decide the way to save or whether to
      save. Actually crossrelease needs to do other than saving a stack_trace.
      So pass a stack_trace and callback to handle it, to check_prev_add().
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: boqun.feng@gmail.com
      Cc: kernel-team@lge.com
      Cc: kirill@shutemov.name
      Cc: npiggin@gmail.com
      Cc: walken@google.com
      Cc: willy@infradead.org
      Link: http://lkml.kernel.org/r/1502089981-21272-5-git-send-email-byungchul.park@lge.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ce07a941
    • Byungchul Park's avatar
      locking/lockdep: Change the meaning of check_prev_add()'s return value · 70911fdc
      Byungchul Park authored
      Firstly, return 1 instead of 2 when 'prev -> next' dependency already
      exists. Since the value 2 is not referenced anywhere, just return 1
      indicating success in this case.
      
      Secondly, return 2 instead of 1 when successfully added a lock_list
      entry with saving stack_trace. With that, a caller can decide whether
      to avoid redundant save_trace() on the caller site.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: boqun.feng@gmail.com
      Cc: kernel-team@lge.com
      Cc: kirill@shutemov.name
      Cc: npiggin@gmail.com
      Cc: walken@google.com
      Cc: willy@infradead.org
      Link: http://lkml.kernel.org/r/1502089981-21272-4-git-send-email-byungchul.park@lge.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      70911fdc
    • Byungchul Park's avatar
      locking/lockdep: Add a function building a chain between two classes · 49347a98
      Byungchul Park authored
      Crossrelease needs to build a chain between two classes regardless of
      their contexts. However, add_chain_cache() cannot be used for that
      purpose since it assumes that it's called in the acquisition context
      of the hlock. So this patch introduces a new function doing it.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: boqun.feng@gmail.com
      Cc: kernel-team@lge.com
      Cc: kirill@shutemov.name
      Cc: npiggin@gmail.com
      Cc: walken@google.com
      Cc: willy@infradead.org
      Link: http://lkml.kernel.org/r/1502089981-21272-3-git-send-email-byungchul.park@lge.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      49347a98
    • Byungchul Park's avatar
      locking/lockdep: Refactor lookup_chain_cache() · 545c23f2
      Byungchul Park authored
      Currently, lookup_chain_cache() provides both 'lookup' and 'add'
      functionalities in a function. However, each is useful. So this
      patch makes lookup_chain_cache() only do 'lookup' functionality and
      makes add_chain_cahce() only do 'add' functionality. And it's more
      readable than before.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: boqun.feng@gmail.com
      Cc: kernel-team@lge.com
      Cc: kirill@shutemov.name
      Cc: npiggin@gmail.com
      Cc: walken@google.com
      Cc: willy@infradead.org
      Link: http://lkml.kernel.org/r/1502089981-21272-2-git-send-email-byungchul.park@lge.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      545c23f2
    • Peter Zijlstra's avatar
      locking/lockdep: Avoid creating redundant links · ae813308
      Peter Zijlstra authored
      Two boots + a make defconfig, the first didn't have the redundant bit
      in, the second did:
      
       lock-classes:                         1168       1169 [max: 8191]
       direct dependencies:                  7688       5812 [max: 32768]
       indirect dependencies:               25492      25937
       all direct dependencies:            220113     217512
       dependency chains:                    9005       9008 [max: 65536]
       dependency chain hlocks:             34450      34366 [max: 327680]
       in-hardirq chains:                      55         51
       in-softirq chains:                     371        378
       in-process chains:                    8579       8579
       stack-trace entries:                108073      88474 [max: 524288]
       combined max dependencies:       178738560  169094640
      
       max locking depth:                      15         15
       max bfs queue depth:                   320        329
      
       cyclic checks:                        9123       9190
      
       redundant checks:                                5046
       redundant links:                                 1828
      
       find-mask forwards checks:            2564       2599
       find-mask backwards checks:          39521      39789
      
      So it saves nearly 2k links and a fair chunk of stack-trace entries, but
      as expected, makes no real difference on the indirect dependencies.
      
      At the same time, you see the max BFS depth increase, which is also
      expected, although it could easily be boot variance -- these numbers are
      not entirely stable between boots.
      
      The down side is that the cycles in the graph become larger and thus
      the reports harder to read.
      
      XXX: do we want this as a CONFIG variable, implied by LOCKDEP_SMALL?
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: boqun.feng@gmail.com
      Cc: iamjoonsoo.kim@lge.com
      Cc: kernel-team@lge.com
      Cc: kirill@shutemov.name
      Cc: npiggin@gmail.com
      Cc: walken@google.com
      Link: http://lkml.kernel.org/r/20170303091338.GH6536@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ae813308
    • Peter Zijlstra's avatar
      locking/lockdep: Rework FS_RECLAIM annotation · d92a8cfc
      Peter Zijlstra authored
      A while ago someone, and I cannot find the email just now, asked if we
      could not implement the RECLAIM_FS inversion stuff with a 'fake' lock
      like we use for other things like workqueues etc. I think this should
      be possible which allows reducing the 'irq' states and will reduce the
      amount of __bfs() lookups we do.
      
      Removing the 1 IRQ state results in 4 less __bfs() walks per
      dependency, improving lockdep performance. And by moving this
      annotation out of the lockdep code it becomes easier for the mm people
      to extend.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: boqun.feng@gmail.com
      Cc: iamjoonsoo.kim@lge.com
      Cc: kernel-team@lge.com
      Cc: kirill@shutemov.name
      Cc: npiggin@gmail.com
      Cc: walken@google.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d92a8cfc
  7. 09 Jun, 2017 2 commits
  8. 03 May, 2017 3 commits
    • Michal Hocko's avatar
      mm: introduce memalloc_nofs_{save,restore} API · 7dea19f9
      Michal Hocko authored
      GFP_NOFS context is used for the following 5 reasons currently:
      
       - to prevent from deadlocks when the lock held by the allocation
         context would be needed during the memory reclaim
      
       - to prevent from stack overflows during the reclaim because the
         allocation is performed from a deep context already
      
       - to prevent lockups when the allocation context depends on other
         reclaimers to make a forward progress indirectly
      
       - just in case because this would be safe from the fs POV
      
       - silence lockdep false positives
      
      Unfortunately overuse of this allocation context brings some problems to
      the MM.  Memory reclaim is much weaker (especially during heavy FS
      metadata workloads), OOM killer cannot be invoked because the MM layer
      doesn't have enough information about how much memory is freeable by the
      FS layer.
      
      In many cases it is far from clear why the weaker context is even used
      and so it might be used unnecessarily.  We would like to get rid of
      those as much as possible.  One way to do that is to use the flag in
      scopes rather than isolated cases.  Such a scope is declared when really
      necessary, tracked per task and all the allocation requests from within
      the context will simply inherit the GFP_NOFS semantic.
      
      Not only this is easier to understand and maintain because there are
      much less problematic contexts than specific allocation requests, this
      also helps code paths where FS layer interacts with other layers (e.g.
      crypto, security modules, MM etc...) and there is no easy way to convey
      the allocation context between the layers.
      
      Introduce memalloc_nofs_{save,restore} API to control the scope of
      GFP_NOFS allocation context.  This is basically copying
      memalloc_noio_{save,restore} API we have for other restricted allocation
      context GFP_NOIO.  The PF_MEMALLOC_NOFS flag already exists and it is
      just an alias for PF_FSTRANS which has been xfs specific until recently.
      There are no more PF_FSTRANS users anymore so let's just drop it.
      
      PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
      implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO.  memalloc_noio_flags
      is renamed to current_gfp_context because it now cares about both
      PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts.  Xfs code paths preserve
      their semantic.  kmem_flags_convert() doesn't need to evaluate the flag
      anymore.
      
      This patch shouldn't introduce any functional changes.
      
      Let's hope that filesystems will drop direct GFP_NOFS (resp.  ~__GFP_FS)
      usage as much as possible and only use a properly documented
      memalloc_nofs_{save,restore} checkpoints where they are appropriate.
      
      [akpm@linux-foundation.org: fix comment typo, reflow comment]
      Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7dea19f9
    • Michal Hocko's avatar
      lockdep: allow to disable reclaim lockup detection · 7e784422
      Michal Hocko authored
      The current implementation of the reclaim lockup detection can lead to
      false positives and those even happen and usually lead to tweak the code
      to silence the lockdep by using GFP_NOFS even though the context can use
      __GFP_FS just fine.
      
      See
      
        http://lkml.kernel.org/r/20160512080321.GA18496@dastard
      
      as an example.
      
        =================================
        [ INFO: inconsistent lock state ]
        4.5.0-rc2+ #4 Tainted: G           O
        ---------------------------------
        inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
        kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes:
      
        (&xfs_nondir_ilock_class){++++-+}, at: xfs_ilock+0x177/0x200 [xfs]
      
        {RECLAIM_FS-ON-R} state was registered at:
          mark_held_locks+0x79/0xa0
          lockdep_trace_alloc+0xb3/0x100
          kmem_cache_alloc+0x33/0x230
          kmem_zone_alloc+0x81/0x120 [xfs]
          xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs]
          __xfs_refcount_find_shared+0x75/0x580 [xfs]
          xfs_refcount_find_shared+0x84/0xb0 [xfs]
          xfs_getbmap+0x608/0x8c0 [xfs]
          xfs_vn_fiemap+0xab/0xc0 [xfs]
          do_vfs_ioctl+0x498/0x670
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x12/0x6f
      
               CPU0
               ----
          lock(&xfs_nondir_ilock_class);
          <Interrupt>
            lock(&xfs_nondir_ilock_class);
      
         *** DEADLOCK ***
      
        3 locks held by kswapd0/543:
      
        stack backtrace:
        CPU: 0 PID: 543 Comm: kswapd0 Tainted: G           O    4.5.0-rc2+ #4
        Call Trace:
         lock_acquire+0xd8/0x1e0
         down_write_nested+0x5e/0xc0
         xfs_ilock+0x177/0x200 [xfs]
         xfs_reflink_cancel_cow_range+0x150/0x300 [xfs]
         xfs_fs_evict_inode+0xdc/0x1e0 [xfs]
         evict+0xc5/0x190
         dispose_list+0x39/0x60
         prune_icache_sb+0x4b/0x60
         super_cache_scan+0x14f/0x1a0
         shrink_slab.part.63.constprop.79+0x1e9/0x4e0
         shrink_zone+0x15e/0x170
         kswapd+0x4f1/0xa80
         kthread+0xf2/0x110
         ret_from_fork+0x3f/0x70
      
      To quote Dave:
       "Ignoring whether reflink should be doing anything or not, that's a
        "xfs_refcountbt_init_cursor() gets called both outside and inside
        transactions" lockdep false positive case. The problem here is lockdep
        has seen this allocation from within a transaction, hence a GFP_NOFS
        allocation, and now it's seeing it in a GFP_KERNEL context. Also note
        that we have an active reference to this inode.
      
        So, because the reclaim annotations overload the interrupt level
        detections and it's seen the inode ilock been taken in reclaim
        ("interrupt") context, this triggers a reclaim context warning where
        it thinks it is unsafe to do this allocation in GFP_KERNEL context
        holding the inode ilock..."
      
      This sounds like a fundamental problem of the reclaim lock detection.
      It is really impossible to annotate such a special usecase IMHO unless
      the reclaim lockup detection is reworked completely.  Until then it is
      much better to provide a way to add "I know what I am doing flag" and
      mark problematic places.  This would prevent from abusing GFP_NOFS flag
      which has a runtime effect even on configurations which have lockdep
      disabled.
      
      Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to
      skip the current allocation request.
      
      While we are at it also make sure that the radix tree doesn't
      accidentaly override tags stored in the upper part of the gfp_mask.
      
      Link: http://lkml.kernel.org/r/20170306131408.9828-3-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e784422
    • Nikolay Borisov's avatar
      lockdep: teach lockdep about memalloc_noio_save · 6d7225f0
      Nikolay Borisov authored
      Patch series "scope GFP_NOFS api", v5.
      
      This patch (of 7):
      
      Commit 21caf2fc ("mm: teach mm by current context info to not do I/O
      during memory allocation") added the memalloc_noio_(save|restore)
      functions to enable people to modify the MM behavior by disabling I/O
      during memory allocation.
      
      This was further extended in commit 934f3072 ("mm: clear __GFP_FS
      when PF_MEMALLOC_NOIO is set").
      
      memalloc_noio_* functions prevent allocation paths recursing back into
      the filesystem without explicitly changing the flags for every
      allocation site.
      
      However, lockdep hasn't been keeping up with the changes and it entirely
      misses handling the memalloc_noio adjustments.  Instead, it is left to
      the callers of __lockdep_trace_alloc to call the function after they
      have shaven the respective GFP flags which can lead to false positives:
      
        =================================
         [ INFO: inconsistent lock state ]
         4.10.0-nbor #134 Not tainted
         ---------------------------------
         inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
         fsstress/3365 [HC0[0]:SC0[0]:HE1:SE1] takes:
          (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230
         {IN-RECLAIM_FS-W} state was registered at:
           __lock_acquire+0x62a/0x17c0
           lock_acquire+0xc5/0x220
           down_write_nested+0x4f/0x90
           xfs_ilock+0x141/0x230
           xfs_reclaim_inode+0x12a/0x320
           xfs_reclaim_inodes_ag+0x2c8/0x4e0
           xfs_reclaim_inodes_nr+0x33/0x40
           xfs_fs_free_cached_objects+0x19/0x20
           super_cache_scan+0x191/0x1a0
           shrink_slab+0x26f/0x5f0
           shrink_node+0xf9/0x2f0
           kswapd+0x356/0x920
           kthread+0x10c/0x140
           ret_from_fork+0x31/0x40
         irq event stamp: 173777
         hardirqs last  enabled at (173777): __local_bh_enable_ip+0x70/0xc0
         hardirqs last disabled at (173775): __local_bh_enable_ip+0x37/0xc0
         softirqs last  enabled at (173776): _xfs_buf_find+0x67a/0xb70
         softirqs last disabled at (173774): _xfs_buf_find+0x5db/0xb70
      
         other info that might help us debug this:
          Possible unsafe locking scenario:
      
                CPU0
                ----
           lock(&xfs_nondir_ilock_class);
           <Interrupt>
             lock(&xfs_nondir_ilock_class);
      
          *** DEADLOCK ***
      
         4 locks held by fsstress/3365:
          #0:  (sb_writers#10){++++++}, at: mnt_want_write+0x24/0x50
          #1:  (&sb->s_type->i_mutex_key#12){++++++}, at: vfs_setxattr+0x6f/0xb0
          #2:  (sb_internal#2){++++++}, at: xfs_trans_alloc+0xfc/0x140
          #3:  (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230
      
         stack backtrace:
         CPU: 0 PID: 3365 Comm: fsstress Not tainted 4.10.0-nbor #134
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
         Call Trace:
          kmem_cache_alloc_node_trace+0x3a/0x2c0
          vm_map_ram+0x2a1/0x510
          _xfs_buf_map_pages+0x77/0x140
          xfs_buf_get_map+0x185/0x2a0
          xfs_attr_rmtval_set+0x233/0x430
          xfs_attr_leaf_addname+0x2d2/0x500
          xfs_attr_set+0x214/0x420
          xfs_xattr_set+0x59/0xb0
          __vfs_setxattr+0x76/0xa0
          __vfs_setxattr_noperm+0x5e/0xf0
          vfs_setxattr+0xae/0xb0
          setxattr+0x15e/0x1a0
          path_setxattr+0x8f/0xc0
          SyS_lsetxattr+0x11/0x20
          entry_SYSCALL_64_fastpath+0x23/0xc6
      
      Let's fix this by making lockdep explicitly do the shaving of respective
      GFP flags.
      
      Fixes: 934f3072 ("mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set")
      Link: http://lkml.kernel.org/r/20170306131408.9828-2-mhocko@kernel.orgSigned-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d7225f0
  9. 19 Apr, 2017 1 commit
  10. 16 Mar, 2017 4 commits
  11. 14 Mar, 2017 1 commit
    • Daniel Vetter's avatar
      drm/i915: annote drop_caches debugfs interface with lockdep · 05df49e7
      Daniel Vetter authored
      The trouble we have is that we can't really test all the shrinker
      recursion stuff exhaustively in BAT because any kind of thrashing
      stress test just takes too long.
      
      But that leaves a really big gap open, since shrinker recursions are
      one of the most annoying bugs. Now lockdep already has support for
      checking allocation deadlocks:
      
      - Direct reclaim paths are marked up with
        lockdep_set_current_reclaim_state() and
        lockdep_clear_current_reclaim_state().
      
      - Any allocation paths are marked with lockdep_trace_alloc().
      
      If we simply mark up our debugfs with the reclaim annotations, any
      code and locks taken in there will automatically complete the picture
      with any allocation paths we already have, as long as we have a simple
      testcase in BAT which throws out a few objects using this interface.
      Not stress test or thrashing needed at all.
      
      v2: Need to EXPORT_SYMBOL_GPL to make it compile as a module.
      
      v3: Fixup rebase fail (spotted by Chris).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170312205340.16202-1-daniel.vetter@ffwll.chSigned-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      05df49e7
  12. 02 Mar, 2017 3 commits
  13. 10 Feb, 2017 1 commit
  14. 23 Jan, 2017 1 commit
  15. 06 Dec, 2016 1 commit
    • Dmitry Vyukov's avatar
      lockdep: Fix report formatting · f943fe0f
      Dmitry Vyukov authored
      Since commit:
      
        4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines")
      
      printk() requires KERN_CONT to continue log messages. Lots of printk()
      in lockdep.c and print_ip_sym() don't have it. As the result lockdep
      reports are completely messed up.
      
      Add missing KERN_CONT and inline print_ip_sym() where necessary.
      
      Example of a messed up report:
      
        0-rc5+ #41 Not tainted
        -------------------------------------------------------
        syz-executor0/5036 is trying to acquire lock:
         (
        rtnl_mutex
        ){+.+.+.}
        , at:
        [<ffffffff86b3d6ac>] rtnl_lock+0x1c/0x20
        but task is already holding lock:
         (
        &net->packet.sklist_lock
        ){+.+...}
        , at:
        [<ffffffff873541a6>] packet_diag_dump+0x1a6/0x1920
        which lock already depends on the new lock.
        the existing dependency chain (in reverse order) is:
        -> #3
         (
        &net->packet.sklist_lock
        +.+...}
        ...
      
      Without this patch all scripts that parse kernel bug reports are broken.
      Signed-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andreyknvl@google.com
      Cc: aryabinin@virtuozzo.com
      Cc: joe@perches.com
      Cc: syzkaller@googlegroups.com
      Link: http://lkml.kernel.org/r/1480343083-48731-1-git-send-email-dvyukov@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f943fe0f
  16. 30 Nov, 2016 1 commit
  17. 11 Nov, 2016 1 commit
  18. 08 Jun, 2016 1 commit
  19. 05 May, 2016 1 commit
    • Peter Zijlstra's avatar
      locking/lockdep, sched/core: Implement a better lock pinning scheme · e7904a28
      Peter Zijlstra authored
      The problem with the existing lock pinning is that each pin is of
      value 1; this mean you can simply unpin if you know its pinned,
      without having any extra information.
      
      This scheme generates a random (16 bit) cookie for each pin and
      requires this same cookie to unpin. This means you have to keep the
      cookie in context.
      
      No objsize difference for !LOCKDEP kernels.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e7904a28
  20. 23 Apr, 2016 2 commits
  21. 13 Apr, 2016 1 commit