用perf查看page-fault

# 参考

https://lrita.github.io/2019/09/27/systemtap-profiling-pagefault/

#

pagefault在使用大量内存的场景下是一个不可忽视的性能损耗，而且在用户态中，该行为是透明的，不好分析和测量，因此必须借助外部工具才能分析。

# perf

# 选项

perf <主选项> xxx 比如 perf stat xxx 输出状态， perf record xxx 记录状态，后续可通过perf scripts 或perf report查看
-a 是采集所有cpu，也可以-C指定cpu
-I 1000 使用perf stat时, 每1000ms输出一次
-p 可以指定进程
-e 统计事件，可以用perf list 查看支持哪些事件多个事件用逗号分隔 perf stat -p 15251 -e major-faults,minor-faults,LLC-load-misses -I 1000 perf stat -p 15251 -e major-faults,minor-faults,LLC-load-misses -- sleep 30

# 事件

LLC-load-misses 最低级的cache miss，就是L3 cache miss

# perf record 示例

我们可以使用perf，很轻松的分析出，哪些代码会经常性的触发pagefault，以及比重。

首先，我们可以使用以下命令采集pagefault发生的次数。

# -a 采集全部CPU上的事件

> perf stat -e page-faults -I 1000 -a
> perf stat -e page-faults,major-faults,minor-faults,LLC-load-misses -I 1000 -a -p 10102

1
2
3
4

或者，我们还可以使用FlameGraph (opens new window)更加直观的看到各部分代码触发pagefault的比例：

# 采集进程10102的30秒pagefault触发数据
> perf record -e page-faults,dTLB-loads,dTLB-load-misses,iTLB-loads,iTLB-load-misses -a -p 10102 -g

# 导出原始数据，此步必须在采集机器上进行，因为需要解析符号。
> perf script > out.stacks

# 执行`perf report` 进入交互节目查看记录的详情

1
2
3
4
5
6
7

我们使用浏览器来打开out.svg就可以直观观察了。

# SystemTap

我们可以使用以下脚本，每 10 秒输出一次相关进程触发的全部pagefault异常的类型与耗时：

#!/usr/bin/stap

/**
 * Tested on Linux 3.10 (CentOS 7)
 */

global fault_entry_time, fault_latency_all, fault_latency_type

function vm_fault_str(fault_type: long) {
    if(vm_fault_contains(fault_type, VM_FAULT_OOM))
        return "OOM";
    else if(vm_fault_contains(fault_type, VM_FAULT_SIGBUS))
        return "SIGBUS";
    else if(vm_fault_contains(fault_type, VM_FAULT_MINOR))
        return "MINOR";
    else if(vm_fault_contains(fault_type, VM_FAULT_MAJOR))
        return "MAJOR";
    else if(vm_fault_contains(fault_type, VM_FAULT_NOPAGE))
        return "NOPAGE";
    else if(vm_fault_contains(fault_type, VM_FAULT_LOCKED))
        return "LOCKED";
    else if(vm_fault_contains(fault_type, VM_FAULT_ERROR))
        return "ERROR";
    return "???";
}

probe vm.pagefault {
	if (pid() == target()) {
		fault_entry_time[tid()] = gettimeofday_us()
	}
}

probe vm.pagefault.return {
	if (!(tid() in fault_entry_time)) next
	latency = gettimeofday_us() - fault_entry_time[tid()]
	fault_latency_all <<< latency
	fault_latency_type[vm_fault_str(fault_type)] <<< latency
}

probe timer.s(10) {
	print("All:\n")
	print(@hist_log(fault_latency_all))
	delete(fault_latency_all)

	foreach (type in fault_latency_type+) {
		print(type,":\n")
                print(@hist_log(fault_latency_type[type]))
        }
        delete(fault_latency_type)
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

# 代码中使用perf_event_open()精确统计perf

https://blog.csdn.net/a515983690/article/details/51504789 (opens new window)

编辑

上次更新: 2023/05/07, 17:27:54

← 大页内存huge_page Bash设置显示全部路径→