perf listDevelop>perf listList of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] ref-cycles [Hardware event] cpu-clock [Software event] task-clock [Software event] page-faults OR faults [Software event] context-switches OR cs [Software event] cpu-migrations OR migrations [Software event] minor-faults [Software event] major-faults [Software event] alignment-faults [Software event] emulation-faults [Software event] dummy [Software event] L1-dcache-loads [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-icache-loads [Hardware cache event] L1-icache-load-misses [Hardware cache event] LLC-loads [Hardware cache event] LLC-load-misses [Hardware cache event] LLC-stores [Hardware cache event] LLC-store-misses [Hardware cache event] dTLB-loads [Hardware cache event] dTLB-load-misses [Hardware cache event] dTLB-stores [Hardware cache event] dTLB-store-misses [Hardware cache event] iTLB-loads [Hardware cache event] iTLB-load-misses [Hardware cache event] branch-loads [Hardware cache event] branch-load-misses [Hardware cache event] branch-instructions OR cpu/branch-instructions/[Kernel PMU event] branch-misses OR cpu/branch-misses/[Kernel PMU event] bus-cycles OR cpu/bus-cycles/[Kernel PMU event] cache-misses OR cpu/cache-misses/[Kernel PMU event] cache-references OR cpu/cache-references/[Kernel PMU event] cpu-cycles OR cpu/cpu-cycles/[Kernel PMU event] instructions OR cpu/instructions/[Kernel PMU event] ref-cycles OR cpu/ref-cycles/[Kernel PMU event] rNNN [Raw hardware event descriptor] cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor](see ‘man perf-list‘ on how to encode it) mem:<addr>[:access][Hardware breakpoint][Tracepoints not available:No such file or directory ] task-clock:目标任务真真占用处理器的时间,单位是毫秒,我们称之为任务执行时间,后面是任务的处理器占用率(执行时间和持续时间的比值)持续时间值从任务提交到任务结束的总时间(总时间在stat结束之后会打印出来)。 context-switches:上下文切换次数,前半部分是切换次数,后面是平均每秒发生次数(M是10的6次方)。 cpu-migrations:处理器迁移,linux为了位置各个处理器的负载均衡,会在特定的条件下将某个任务从一个处理器迁往另外一个处理器,此时便是发生了一次处理器迁移。 page-fault:缺页异常,linux内存管理子系统采用了分页机制,当应用程序请求的页面尚未建立、请求的页面不在内存中或者请求的页面虽在在内存中,但是尚未建立物理地址和虚拟地址的映射关系是,会触发一次缺页异常。 cycles:任务消耗的处理器周期数 instructions:任务执行期间产生的处理器指令数,IPC(instructions perf cycle) IPC是评价处理器与应用程序性能的重要指标。(很多指令需要多个处理周期才能执行完毕), IPC越大越好,说明程序充分利用了处理器的特征。 branches:程序在执行期间遇到的分支指令数。 branch-misses:预测错误的分支指令数 cache-misses:cache时效的次数 cache-references:cache的命中次数perf stat ./a.outDevelop>perf stat -d -p 6371^CPerformance counter stats for process id ‘6371‘:12179.626710 task-clock (msec)# 1.248 CPUs utilized [100.00%]96673 context-switches # 0.008 M/sec [100.00%]0 cpu-migrations # 0.000 K/sec [100.00%]32 page-faults # 0.003 K/sec 20442151906 cycles # 1.678 GHz [29.64%]<not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 7297770919 instructions # 0.36 insns per cycle [44.12%]1236856463 branches # 101.551 M/sec [43.73%]61040864 branch-misses # 4.94% of all branches [42.98%] //分支预测失效数。3268288054 L1-dcache-loads # 268.341 M/sec [27.91%]<not supported> L1-dcache-load-misses 105416823 LLC-loads # 8.655 M/sec [27.47%]6321634 LLC-load-misses # 6.00% of all LL-cache hits [28.42%]9.758418636 seconds time elapsed //表示采集的时间perf top -p 6371-KPerfTop:7027 irqs/sec kernel:30.8% exact:0.0%[4000Hz cycles],(target_pid:6371)-----------------------------------------------------------------------------------------------------------------------34.03% server [.] dpdk::run_dp_thread(void*)16.35% server [.] dpdk::capture::dp_process_cycle()10.61% server [.] dpdk::Interface::send_burst()9.45% server [.]CQdiscHtb::Dequeue()4.34% server [.] update_cconfig_thread_readflag()4.33% server [.] update_cconfig_thread_vsys_readflag(unsigned short)4.02% server [.] _ZN4dpdk7capture10dp_processEv.clone.473.10% server [.] dpdk::Interface::recv_burst()3.08%[vdso][.] __vdso_clock_gettime 2.65% server [.]CQdiscCtrl::Qos_Stream_Control(rte_mbuf*)2.24% server [.] eth_em_recv_scattered_pkts 1.77% librt-2.15.so [.] clock_gettime 1.21% server [.] ipflow_new::flow_ha::dequeue()0.88% server [.] DP_update_vsys_read_flag()–# perf record [options][<command>]–# perf record [options]--<command>[options]–‘-e’:指定性能事件(默认事件: cycles)–‘-p’:指定待分析进程的 PID–‘-t’:指定待分析线程的 TID–‘-a’:分析整个系统的性能(Default)–‘-c’:事件的采样周期–‘-o’:指定输出文件(Default: perf.data )–‘-g’:记录函数间的调用关系–‘-r <priority>’:将 perf top 作为实时任务,优先级为<priority>–‘-u <uid>’:只分析用户<uid>创建的进程perf record -g -e cpu-clock ./a.outperf record -g -e cpu-clock -p 6371# perf report [-i <file> | --input=file]–‘-i’:输入文件名–‘-v’:显示每个符号的地址–‘-d <dso>’:只显示指定 dso 的符号–‘-n’:显示每个符号对应的事件数–‘-v’:显示每个符号的地址–‘--comms=<comm>’只显示指定 comm 的信息–‘-S <symbol name>’只考虑指定符号–‘-U’只显示已解析的符号–‘-g [type,min]’按照[type,min]指定的方式显示函数调用图
perf report -g fractal -i perf.data# To display the perf.data header info, please use --header/--header-only options.## Samples: 20K of event ‘cpu-clock‘# Event count (approx.): 5111500000## Children Self Command Shared Object Symbol# ........ ........ ....... ................. ............................#100.00%0.00% a.out libc-2.15.so [.] __libc_start_main |--- __libc_start_main100.00%0.00% a.out a.out [.] main |--- main __libc_start_main100.00%99.99% a.out a.out [.] test |--- test main __libc_start_main0.00%0.00% a.out [unknown][.]0xec81485354415541|---0xec814853544155410.00%0.00% a.out ld-2.15.so [.]0x0000000000014092|---0x7fd281be90920xec814853544155410.00%0.00% a.out ld-2.15.so [.]0x000000000000325a|---0x7fd281bd825a0x7fd281be90920xec814853544155410.00%0.00% a.out [kernel.kallsyms][k]0x0000000000813c8a|---0xffffffff82613c8a test main __libc_start_main0.00%0.00% a.out ld-2.15.so [.]0x0000000000016897|---0x7fd281beb8970x7fd281bd825a0x7fd281be90920xec814853544155410.00%0.00% a.out [kernel.kallsyms][k] smp_apic_timer_interrupt|--- smp_apic_timer_interrupt0xffffffff82613c8a test main __libc_start_main0.00%0.00% a.out [kernel.kallsyms][k]0x0000000000813152#include<stdlib.h>int test(){unsignedint i=0;while(1) i++;}int main(){ test();}原文:http://www.cnblogs.com/yml435/p/6914467.html