Skip to content
  • Tonghao Zhang's avatar
    openvswitch: Optimize operations for OvS flow_stats. · c4b2bf6b
    Tonghao Zhang authored
    
    
    When calling the flow_free() to free the flow, we call many times
    (cpu_possible_mask, eg. 128 as default) cpumask_next(). That will
    take up our CPU usage if we call the flow_free() frequently.
    When we put all packets to userspace via upcall, and OvS will send
    them back via netlink to ovs_packet_cmd_execute(will call flow_free).
    
    The test topo is shown as below. VM01 sends TCP packets to VM02,
    and OvS forward packtets. When testing, we use perf to report the
    system performance.
    
    VM01 --- OvS-VM --- VM02
    
    Without this patch, perf-top show as below: The flow_free() is
    3.02% CPU usage.
    
    	4.23%  [kernel]            [k] _raw_spin_unlock_irqrestore
    	3.62%  [kernel]            [k] __do_softirq
    	3.16%  [kernel]            [k] __memcpy
    	3.02%  [kernel]            [k] flow_free
    	2.42%  libc-2.17.so        [.] __memcpy_ssse3_back
    	2.18%  [kernel]            [k] copy_user_generic_unrolled
    	2.17%  [kernel]            [k] find_next_bit
    
    When applied this patch, perf-top show as below: Not shown on
    the list anymore.
    
    	4.11%  [kernel]            [k] _raw_spin_unlock_irqrestore
    	3.79%  [kernel]            [k] __do_softirq
    	3.46%  [kernel]            [k] __memcpy
    	2.73%  libc-2.17.so        [.] __memcpy_ssse3_back
    	2.25%  [kernel]            [k] copy_user_generic_unrolled
    	1.89%  libc-2.17.so        [.] _int_malloc
    	1.53%  ovs-vswitchd        [.] xlate_actions
    
    With this patch, the TCP throughput(we dont use Megaflow Cache
    + Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec
    (maybe ~10% performance imporve).
    
    This patch adds cpumask struct, the cpu_used_mask stores the cpu_id
    that the flow used. And we only check the flow_stats on the cpu we
    used, and it is unncessary to check all possible cpu when getting,
    cleaning, and updating the flow_stats. Adding the cpu_used_mask to
    sw_flow struct does’t increase the cacheline number.
    
    Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
    Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    c4b2bf6b