Skip to content
  • Gang He's avatar
    ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE · 2e7e8bd8
    Gang He authored
    commit ff26cc10 upstream.
    
    If we can't get inode lock immediately in the function
    ocfs2_inode_lock_with_page() when reading a page, we should not return
    directly here, since this will lead to a softlockup problem when the
    kernel is configured with CONFIG_PREEMPT is not set.  The method is to
    get a blocking lock and immediately unlock before returning, this can
    avoid CPU resource waste due to lots of retries, and benefits fairness
    in getting lock among multiple nodes, increase efficiency in case
    modifying the same file frequently from multiple nodes.
    
    The softlockup crash (when set /proc/sys/kernel/softlockup_panic to 1)
    looks like:
    
      Kernel panic - not syncing: softlockup: hung tasks
      CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Call Trace:
        <IRQ>
        dump_stack+0x5c/0x82
        panic+0xd5/0x21e
        watchdog_timer_fn+0x208/0x210
        __hrtimer_run_queues+0xcc/0x200
        hrtimer_interrupt+0xa6/0x1f0
        smp_apic_timer_interrupt+0x34/0x50
        apic_timer_interrupt+0x96/0xa0
        </IRQ>
       RIP: 0010:unlock_page+0x17/0x30
       RSP: 0000:ffffaf154080bc88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
       RAX: dead000000000100 RBX: fffff21e009f5300 RCX: 0000000000000004
       RDX: dead0000000000ff RSI: 0000000000000202 RDI: fffff21e009f5300
       RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaf154080bb00
       R10: ffffaf154080bc30 R11: 0000000000000040 R12: ffff993749a39518
       R13: 0000000000000000 R14: fffff21e009f5300 R15: fffff21e009f5300
        ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
        ocfs2_readpage+0x41/0x2d0 [ocfs2]
        filemap_fault+0x12b/0x5c0
        ocfs2_fault+0x29/0xb0 [ocfs2]
        __do_fault+0x1a/0xa0
        __handle_mm_fault+0xbe8/0x1090
        handle_mm_fault+0xaa/0x1f0
        __do_page_fault+0x235/0x4b0
        trace_do_page_fault+0x3c/0x110
        async_page_fault+0x28/0x30
       RIP: 0033:0x7fa75ded638e
       RSP: 002b:00007ffd6657db18 EFLAGS: 00010287
       RAX: 000055c7662fb700 RBX: 0000000000000001 RCX: 000055c7662fb700
       RDX: 0000000000001770 RSI: 00007fa75e909000 RDI: 000055c7662fb700
       RBP: 0000000000000003 R08: 000000000000000e R09: 0000000000000000
       R10: 0000000000000483 R11: 00007fa75ded61b0 R12: 00007fa75e90a770
       R13: 000000000000000e R14: 0000000000001770 R15: 0000000000000000
    
    About performance improvement, we can see the testing time is reduced,
    and CPU utilization decreases, the detailed data is as follows.  I ran
    multi_mmap test case in ocfs2-test package in a three nodes cluster.
    
    Before applying this patch:
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
       2754 ocfs2te+  20   0  170248   6980   4856 D 80.73 0.341   0:18.71 multi_mmap
       1505 root      rt   0  222236 123060  97224 S 2.658 6.015   0:01.44 corosync
          5 root      20   0       0      0      0 S 1.329 0.000   0:00.19 kworker/u8:0
         95 root      20   0       0      0      0 S 1.329 0.000   0:00.25 kworker/u8:1
       2728 root      20   0       0      0      0 S 0.997 0.000   0:00.24 jbd2/sda1-33
       2721 root      20   0       0      0      0 S 0.664 0.000   0:00.07 ocfs2dc-3C8CFD4
       2750 ocfs2te+  20   0  142976   4652   3532 S 0.664 0.227   0:00.28 mpirun
    
      ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
      Tests with "-b 4096 -C 32768"
      Thu Dec 28 14:44:52 CST 2017
      multi_mmap..................................................Passed.
      Runtime 783 seconds.
    
    After apply this patch:
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
       2508 ocfs2te+  20   0  170248   6804   4680 R 54.00 0.333   0:55.37 multi_mmap
        155 root      20   0       0      0      0 S 2.667 0.000   0:01.20 kworker/u8:3
         95 root      20   0       0      0      0 S 2.000 0.000   0:01.58 kworker/u8:1
       2504 ocfs2te+  20   0  142976   4604   3480 R 1.667 0.225   0:01.65 mpirun
          5 root      20   0       0      0      0 S 1.000 0.000   0:01.36 kworker/u8:0
       2482 root      20   0       0      0      0 S 1.000 0.000   0:00.86 jbd2/sda1-33
        299 root       0 -20       0      0      0 S 0.333 0.000   0:00.13 kworker/2:1H
        335 root       0 -20       0      0      0 S 0.333 0.000   0:00.17 kworker/1:1H
        535 root      20   0   12140   7268   1456 S 0.333 0.355   0:00.34 haveged
       1282 root      rt   0  222284 123108  97224 S 0.333 6.017   0:01.33 corosync
    
      ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
      Tests with "-b 4096 -C 32768"
      Thu Dec 28 15:04:12 CST 2017
      multi_mmap..................................................Passed.
      Runtime 487 seconds.
    
    Link: http://lkml.kernel.org/r/1514447305-30814-1-git-send-email-ghe@suse.com
    Fixes: 1cce4df0
    
     ("ocfs2: do not lock/unlock() inode DLM lock")
    Signed-off-by: default avatarGang He <ghe@suse.com>
    Reviewed-by: default avatarEric Ren <zren@suse.com>
    Acked-by: default avataralex chen <alex.chen@huawei.com>
    Acked-by: default avatarpiaojun <piaojun@huawei.com>
    Cc: Mark Fasheh <mfasheh@versity.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Joseph Qi <jiangqi903@gmail.com>
    Cc: Changwei Ge <ge.changwei@h3c.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    2e7e8bd8