Atomic vs Mutex

说到mutex和atomic,直觉上就知道atomic效率肯定高于mutex(mutex要lock和unlock两次操作,而atomic只需要一次),不过具体高多少呢,自己一直没有量化的结果,所以今天实际测了一下

测试方法

两个变量

volatile unsigned int count;
std::atomic<int> a_count;

单线程循环1000万次,对两个变量进行累加,其中count前后使用mutex保护,具体代码如下:

#include <iostream>
#include <chrono>
#include <atomic>

const int kNum = 10000000;

int main() {

  volatile unsigned int count;
  pthread_mutex_t mutex;

  std::atomic<int> a_count;

  pthread_mutex_init(&mutex, 0);
  count = 0;
  a_count.store(0);

  auto begin = std::chrono::steady_clock::now();
  while (count < kNum) {
    pthread_mutex_lock(&mutex);
    count++;
    pthread_mutex_unlock(&mutex);
  }
  auto end = std::chrono::steady_clock::now();
  std::chrono::duration<int, std::nano> diff = end - begin;
  std::cout << "count = " << count << ", Mutex used: "
    		<< diff.count() << std::endl;

  begin = std::chrono::steady_clock::now();
  while (a_count.fetch_add(1, std::memory_order_relaxed) < kNum-1) {
    //对计数的场景,这里使用std::memory_order_relaxed更合适,性能更高
  };
  end = std::chrono::steady_clock::now();
  diff = end - begin;
  std::cout << "a_count = " << a_count.load() << ", Atomic used: "
    		<< diff.count() << std::endl;

  pthread_mutex_destroy(&mutex);
  return 0;
}

跑了3次,结果如下(单位纳秒):

count = 10000000, Mutex used: 268739652
a_count = 10000000, Atomic used: 114195449
count = 10000000, Mutex used: 271568003
a_count = 10000000, Atomic used: 114331315
count = 10000000, Mutex used: 271557151
a_count = 10000000, Atomic used: 112725464

可以看到Mutex大概是Atomic的接近3倍倍,符合预期,看过一个哥们2008写的测试,Mutex用时也是Atomic的3倍。

perf看下,结果如下:

Samples: 1K of event 'cycles', Event count (approx.): 1160274395
  Children      Self  Command  Shared Object       Symbol                                                                                                                                                                                  ◆
+   99.63%     0.00%  a.out    libc-2.12.so        [.] __libc_start_main                                                                                                                                                                   
+   37.55%     5.42%  a.out    a.out               [.] main                                                                                                                                                                                
+   35.14%    35.08%  a.out    libpthread-2.12.so  [.] pthread_mutex_unlock                                                                                                                                                                
+   32.72%     3.76%  a.out    a.out               [.] std::atomic_fetch_add<int>                                                                                                                                                          
+   28.14%     3.81%  a.out    a.out               [.] std::atomic_fetch_add_explicit<int>                                                                                                                                                 
+   25.39%    25.39%  a.out    libpthread-2.12.so  [.] pthread_mutex_lock                                                                                                                                                                  
+   25.16%    25.16%  a.out    a.out               [.] std::__atomic_base<int>::fetch_add                                                                                                                                                  
+    1.01%     1.01%  a.out    a.out               [.] pthread_mutex_lock@plt                                                                                                                                                              
+    0.17%     0.00%  a.out    ld-2.12.so          [.] _dl_sysdep_start                                                                                                                                                                    
+    0.17%     0.00%  a.out    ld-2.12.so          [.] dl_main                                                                                                                                                                             
+    0.17%     0.00%  a.out    ld-2.12.so          [.] _dl_relocate_object                                                                                                                                                                 
+    0.17%     0.17%  a.out    [kernel.vmlinux]    [k] page_fault                                                                                                                                                                          
+    0.12%     0.00%  a.out    [kernel.vmlinux]    [k] apic_timer_interrupt                                                                                                                                                                
+    0.12%     0.00%  a.out    [kernel.vmlinux]    [k] smp_apic_timer_interrupt                                                                                                                                                            
+    0.09%     0.00%  a.out    [unknown]           [k] 0x00000032b42acde7                                                                                                                                                                  
+    0.09%     0.00%  a.out    [kernel.vmlinux]    [k] stub_execve                                                                                                                                                                         
+    0.09%     0.00%  a.out    [kernel.vmlinux]    [k] sys_execve                                                                                                                                                                          
+    0.09%     0.00%  a.out    [kernel.vmlinux]    [k] do_execve                                                                                                                                                                           
+    0.09%     0.00%  a.out    [kernel.vmlinux]    [k] search_binary_handler                                                                                                                                                               
+    0.09%     0.00%  a.out    [kernel.vmlinux]    [k] load_elf_binary                                                                                                                                                                     
+    0.08%     0.00%  a.out    [kernel.vmlinux]    [k] elf_map                                                                                                                                                                             
+    0.08%     0.00%  a.out    [kernel.vmlinux]    [k] do_mmap_pgoff                                                                                                                                                                       
+    0.08%     0.00%  a.out    [kernel.vmlinux]    [k] mmap_region                                                                                                                                                                         
+    0.08%     0.00%  a.out    [kernel.vmlinux]    [k] perf_event_mmap                                                                                                                                                                     
+    0.08%     0.08%  a.out    [kernel.vmlinux]    [k] kfree                                                                                                                                                                               
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] hrtimer_interrupt                                                                                                                                                                   
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] __run_hrtimer                                                                                                                                                                       
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] tick_sched_timer                                                                                                                                                                    
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] tick_do_update_jiffies64                                                                                                                                                            
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] do_timer                                                                                                                                                                            
+    0.06%     0.06%  a.out    [kernel.vmlinux]    [k] update_wall_time                                                                                                                                                                    
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] irq_exit                                                                                                                                                                            
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] do_softirq                                                                                                                                                                          
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] call_softirq                                                                                                                                                                        
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] __do_softirq                                                                                                                                                                        
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] rcu_process_callbacks                                                                                                                                                               
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] __rcu_process_callbacks                                                                                                                                                             
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] force_quiescent_state                                                                                                                                                               
+    0.06%     0.00%  a.out    [kernel.vmlinux]    [k] rcu_process_dyntick                                                                                                                                                                 
+    0.06%     0.06%  a.out    [kernel.vmlinux]    [k] dyntick_save_progress_counter                                                                                                                                                       
+    0.00%     0.00%  a.out    [kernel.vmlinux]    [k] setup_new_exec                                                                                                                                                                      
+    0.00%     0.00%  a.out    [kernel.vmlinux]    [k] set_task_comm                                                                                                                                                                       
+    0.00%     0.00%  a.out    [kernel.vmlinux]    [k] perf_event_comm                                                                                                                                                                     
+    0.00%     0.00%  a.out    [kernel.vmlinux]    [k] perf_event_context_sched_in                                                                                                                                                         
+    0.00%     0.00%  a.out    [kernel.vmlinux]    [k] perf_pmu_enable                                                                                                                                                                     
+    0.00%     0.00%  a.out    [kernel.vmlinux]    [k] x86_pmu_enable
+    0.00%     0.00%  a.out    [kernel.vmlinux]    [k] intel_pmu_enable_all                                                                                                                                                                
+    0.00%     0.00%  a.out    [kernel.vmlinux]    [k] native_write_msr_safe

主要就三项:

+   35.14%    35.08%  a.out    libpthread-2.12.so  [.] pthread_mutex_unlock
+   32.72%     3.76%  a.out    a.out               [.] std::atomic_fetch_add<int>
+   25.39%    25.39%  a.out    libpthread-2.12.so  [.] pthread_mutex_lock

可以看到pthread_mutex_lock和pthread_mutex_unlock加起来占总60%(unlock比lock多10%,why?),atomic_fetch_add占30%

最后,上火焰图:

总结

测试结果过预期一样,现在有了量化的对比,以后在atomic和mutex都适用的情况下优先选atomic。

其实rocksdb就是将leveldb里Get()实现中一上来就mutex加锁的操作换成atmoic+线程私有存储的方式来进行优化,优化后读操作基本很少再会有互斥,性能提高不少,