Hardware cache events and perf

User @Margaret points towards a reasonable answer in the comments – read the kernel source to see the mapping for the PMU events. We can check arch/x86/events/intel/core.c for the event definitions. I don’t actually know if “core” here refers to the Core architecture, of just that this is the core fine with most definitions – … Read more

How does Linux perf calculate the cache-references and cache-misses events

The built-in perf events that you are interested in are mapping to the following hardware performance monitoring events on your processor: 523,288,816 cache-references (architectural event: LLC Reference) 205,331,370 cache-misses (architectural event: LLC Misses) 237,794,728 L1-dcache-load-misses L1D.REPLACEMENT 3,495,080,007 L1-dcache-loads MEM_INST_RETIRED.ALL_LOADS 2,039,344,725 L1-dcache-stores MEM_INST_RETIRED.ALL_STORES 531,452,853 L1-icache-load-misses ICACHE_64B.IFTAG_MISS 77,062,627 LLC-loads OFFCORE_RESPONSE (MSR bits 0, 16, 30-37) 27,462,249 LLC-load-misses … Read more

linux perf: how to interpret and find hotspots

With Linux 3.7 perf is finally able to use DWARF information to generate the callgraph: perf record –call-graph dwarf — yourapp perf report -g graph –no-children Neat, but the curses GUI is horrible compared to VTune, KCacheGrind or similar… I recommend to try out FlameGraphs instead, which is a pretty neat visualization: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html Note: In … Read more