通过ELF读取当前进程的符号表并获取符号的版本信息
转https://blog.csdn.net/modisir/article/details/117958192 (opens new window)
Linux 采用 ELF 作为其可链接可执行文件的格式,并提供诸如 nm 之类的工具进行 ELF 符号表的解析。如下例程(vim test.cc):
#include <iostream>
#include <pthread.h>
int main()
{
pthread_cond_t cond;
pthread_condattr_t attr;
pthread_condattr_init(&attr);
pthread_condattr_setclock(&attr, CLOCK_MONOTONIC);
int ret = pthread_cond_init(&cond, &attr);
if (ret != 0) {
std::cout << "call_pthread_cond_init failed." << std::endl;
return ret;
}
pthread_cond_destroy(&cond);
return 0;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
我们进行编译,并通过 nm 查看可执行文件的符号表:
[Linux] $ g++ test.cc -lpthread
[Linux] $ nm -g a.out
0000000000400b88 R _IO_stdin_used
w _Jv_RegisterClasses
U _ZNSolsEPFRSoS_E@@GLIBCXX_3.4
U _ZNSt8ios_base4InitC1Ev@@GLIBCXX_3.4
U _ZNSt8ios_base4InitD1Ev@@GLIBCXX_3.4
00000000006012a0 B _ZSt4cout@@GLIBCXX_3.4
U _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_@@GLIBCXX_3.4
U _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@@GLIBCXX_3.4
0000000000601020 D __DTOR_END__
0000000000601284 A __bss_start
U __cxa_atexit@@GLIBC_2.2.5
0000000000601280 D __data_start
0000000000400b90 R __dso_handle
w __gmon_start__
U __gxx_personality_v0@@CXXABI_1.3
0000000000400aa0 T __libc_csu_fini
0000000000400ab0 T __libc_csu_init
U __libc_start_main@@GLIBC_2.2.5
0000000000601284 A _edata
00000000006013c8 A _end
0000000000400b78 T _fini
0000000000400808 T _init
00000000004008f0 T _start
0000000000601280 W data_start
00000000004009d4 T main
U pthread_cond_destroy@@GLIBC_2.3.2
U pthread_cond_init@@GLIBC_2.3.2
U pthread_condattr_init@@GLIBC_2.2.5
U pthread_condattr_setclock@@GLIBC_2.3.3
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
可以看到,glibc 库函数后面都有一个版本信息,这就又牵扯到了 symbol versioning 机制。1995年,Solaris 的 link editor 和 ld.so 引入了 symbol versioning 机制。知乎作者 MaskRay 的这篇文章对 GNU 的 symbol versioning 机制做了比较详细的描述,可以参考:All about symbol versioning - MaskRay的文章 - 知乎 (opens new window)。此外,我们还可以从 ELF 的 man page 中找到对 symbol versioning entry 的描述。
我们以上面例程中引用的一个 glibc 函数为例,编译时,pthread_cond_init 这个函数使用的版本是GLIBC_2.3.2。而实际上,pthread 库中,pthread_cond_init 这个函数存在两个版本:
[Linux] $ nm -g /lib64/libpthread.so.0 | grep pthread_cond_init
000000390280b0b0 T pthread_cond_init@@GLIBC_2.3.2
000000390280c030 T pthread_cond_init@GLIBC_2.2.5
2
3
除了引用头文件 #include <pthread.h> 并显式调用库函数 pthread_cond_init 以外,glibc 还提供了 dlsym 函数,可以从 libpthread.so 中取得 pthread_cond_init 的指针:
#include <pthread.h>
#include <dlfcn.h>
#include <iostream>
typedef int (*cond_init_func_t)(pthread_cond_t *cond,
const pthread_condattr_t *attr);
extern "C" {
int pthread_cond_init(pthread_cond_t *cond,
const pthread_condattr_t *attr)
{
return 0;
}
}
static int call_pthread_cond_init(pthread_cond_t *cond,
pthread_condattr_t *attr)
{
cond_init_func_t func =
(cond_init_func_t) dlsym(RTLD_NEXT, "pthread_cond_init");
return func(cond, attr);
}
int main()
{
pthread_cond_t cond;
pthread_condattr_t attr;
pthread_condattr_init(&attr);
pthread_condattr_setclock(&attr, CLOCK_MONOTONIC);
int ret = call_pthread_cond_init(&cond, &attr);
if (ret != 0) {
std::cout << "call_pthread_cond_init failed." << std::endl;
return ret;
}
pthread_cond_destroy(&cond);
return 0;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
如上,我们修改一下之前的例程,一方面,override 库函数 pthread_cond_init(glibc 库函数一般都声明为 weak symbol,用同名函数就可以 override);另一方面,使用 dlsym 获取 glibc 原生的 pthread_cond_init 函数指针,并调用它。
我们的预期是,call_pthread_cond_init 将会获得 pthread_cond_init@@GLIBC_2.3.2 的指针,并正常执行。但实际情况是怎样的呢?我们编译运行一下:
[Linux] $ g++ test.cc -lpthread -ldl
[Linux] $ ./a.out
call_pthread_cond_init failed.
2
3
很显然,这并不符合我们的预期。因为,实际上 dlsym 获取到的函数指针并不是 GLIBC_2.3.2 版本,而是 pthread_cond_init@GLIBC_2.2.5(2.2.5 版本的 pthread_cond_init 还不支持使用 CLOCK_MONOTONIC 类型的时钟)。也就是说,dlsym 并没有获取到这个 symbol 的默认版本,这是 glibc 的一个已知问题。
Glibc 自 2.1 版本开始,就引入了一个名为 dlvsym 的库函数,可以在获取函数指针时指定符号的版本。我们再修改一下上面的例程中的 call_pthread_cond_init 函数:
static int call_pthread_cond_init(pthread_cond_t *cond,
pthread_condattr_t *attr)
{
cond_init_func_t func =
(cond_init_func_t) dlvsym(RTLD_NEXT,
"pthread_cond_init",
"GLIBC_2.3.2");
return func(cond, attr);
}
2
3
4
5
6
7
8
9
编译运行一下:
[Linux] $ g++ test.cc -lpthread -ldl
[Linux] $ ./a.out
2
可以看到,获取到正确版本的函数指针以后,就不会报错了。
可是,我们如何同时做到既可以 override glibc 函数,又能够获取 glibc 函数的默认版本呢?可以通过动态库来实现。即,在动态库中 override glibc 函数,并利用 LD_PRELOAD 机制加载(类似于 jemalloc 库和 tcmalloc 库的做法);并在加载阶段从当前可执行程序的ELF中读取版本信息。我们把调用 pthread_cond_init 的例程(test.cc)恢复到初始的样子:
#include <iostream>
#include <pthread.h>
int main()
{
pthread_cond_t cond;
pthread_condattr_t attr;
pthread_condattr_init(&attr);
pthread_condattr_setclock(&attr, CLOCK_MONOTONIC);
int ret = pthread_cond_init(&cond, &attr);
if (ret != 0) {
std::cout << "call_pthread_cond_init failed." << std::endl;
return ret;
}
pthread_cond_destroy(&cond);
return 0;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
然后,我们再实现一个动态库(readelf.cc),从 “/proc/self/exe” 中读取当前可执行文件的路径,并从该文件的ELF中读取符号表;同时,override pthread_cond_init 函数,通过之前获取的版本号,利用 dlvsym 获取 glibc 原生库函数的默认版本,并调用之:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <limits.h>
#include <link.h>
#include <string.h>
#include <pthread.h>
#include <string>
#include <iostream>
#include <algorithm>
#include <tr1/unordered_map>
typedef int (*cond_init_func_t)(pthread_cond_t *cond,
const pthread_condattr_t *attr);
using namespace std;
/*
Symbol version map.
Key: symbol name.
Value: version string.
*/
std::tr1::unordered_map<string, string> sym_versions;
static int call_pthread_cond_init(pthread_cond_t *cond,
const pthread_condattr_t *attr)
{
cond_init_func_t func =
(cond_init_func_t) dlvsym(RTLD_NEXT,
"pthread_cond_init",
sym_versions[string("pthread_cond_init")].c_str());
return func(cond, attr);
}
extern "C" {
int pthread_cond_init(pthread_cond_t *cond,
const pthread_condattr_t *attr)
{
int ret = call_pthread_cond_init(cond, attr);
cout << "Override pthread_cond_init (ret = " << ret << ")." << endl;
return ret;
}
}
__attribute__((constructor))
void readelf()
{
ElfW(Ehdr) ehdr; // ELF header
ElfW(Phdr*) phdrs = NULL; // Program headers
ElfW(Shdr*) shdrs = NULL; // Section headers
ElfW(Dyn*) dyns = NULL; // Dynamic entrys
ElfW(Sym*) syms = NULL; // Symbol table
ElfW(Word) sym_cnt = 0; // Number of symbol entries
char *strtab = NULL; // String table
ElfW(Word) strtab_sz = 0; // Size in byte of string table
ElfW(Off) strtab_off = 0; // File offset of string table
ElfW(Versym*) versyms = NULL;
ElfW(Verneed*) verneeds = NULL;
std::tr1::unordered_map<ElfW(Half), ElfW(Word)> vermap;
char buffer[PATH_MAX];
// Get the absolute path of current executable file.
int res = readlink("/proc/self/exe", buffer, PATH_MAX);
FILE *fp = fopen(buffer, "r");
if (!fp) {
cout << "Failed to open file: " << buffer << endl;
return;
}
// Read the ELF header
fread(&ehdr, 1, sizeof(ehdr), fp);
// Check ELF magic numbers
if (0 != strncmp((char *) ehdr.e_ident, ELFMAG, SELFMAG)) {
cout << "Failed to check ELF magic numbers." << endl;
goto out;
}
// Read the program headers
phdrs = new ElfW(Phdr)[ehdr.e_phnum];
fseek(fp, ehdr.e_phoff, SEEK_SET);
fread(phdrs, ehdr.e_phnum, sizeof(ElfW(Phdr)), fp);
cout << "Read " << ehdr.e_phnum << " program headers." << endl;
for (int phdr_index = 0; phdr_index < ehdr.e_phnum; phdr_index++) {
ElfW(Phdr*) phdr = &phdrs[phdr_index];
if (phdr->p_type != PT_DYNAMIC)
continue;
cout << "Got the dynamic program header." << endl;
dyns = (ElfW(Dyn*)) malloc(phdr->p_filesz);
fseek(fp, phdr->p_offset, SEEK_SET);
fread(dyns, phdr->p_filesz, sizeof(char), fp);
for (ElfW(Dyn*) dyn = dyns; dyn->d_tag != DT_NULL; dyn++) {
switch (dyn->d_tag) {
case DT_STRSZ:
strtab_sz = dyn->d_un.d_val;
cout << "DT_STRSZ value: " << strtab_sz << "." << endl;
break;
default:
break;
}
}
break;
}
// Read section headers
shdrs = new ElfW(Shdr)[ehdr.e_shnum];
fseek(fp, ehdr.e_shoff, SEEK_SET);
fread(shdrs, ehdr.e_shnum, sizeof(ElfW(Shdr)), fp);
cout << "Read " << ehdr.e_shnum << " section headers." << endl;
// Get the section name string table
strtab = new char[std::max((ElfW(Word)) shdrs[ehdr.e_shstrndx].sh_size,
strtab_sz)];
fseek(fp, shdrs[ehdr.e_shstrndx].sh_offset, SEEK_SET);
fread(strtab, shdrs[ehdr.e_shstrndx].sh_size, sizeof(char), fp);
// Read sections
for (int s_idx = 0; s_idx < ehdr.e_shnum; s_idx++) {
ElfW(Shdr*) sh = &shdrs[s_idx];
//cout << s_idx << " " << strtab + sh->sh_name << endl;
if (!strcmp(strtab + sh->sh_name, ".dynsym")) {
sym_cnt = sh->sh_size / sizeof(ElfW(Sym));
syms = new ElfW(Sym)[sym_cnt];
fseek(fp, sh->sh_offset, SEEK_SET);
fread(syms, sh->sh_size, sizeof(char), fp);
cout << ".dynsym: got " << sym_cnt << " symbols." << endl;
} else if (!strcmp(strtab + sh->sh_name, ".dynstr")) {
cout << ".dynstr: offset " << sh->sh_offset
<< " size " << sh->sh_size << "." << endl;
strtab_off = sh->sh_offset;
} else if (!strcmp(strtab + sh->sh_name, ".gnu.version_r")) {
cout << ".gnu.version_r: verneed offset " << sh->sh_offset
<< " size " << sh->sh_size
<< "." << endl;
verneeds = (ElfW(Verneed*)) malloc(sh->sh_size);
fseek(fp, sh->sh_offset, SEEK_SET);
fread(verneeds, sh->sh_size, sizeof(char), fp);
} else if (!strcmp(strtab + sh->sh_name, ".gnu.version")) {
cout << ".gnu.version: versym offset " << sh->sh_offset
<< " size " << sh->sh_size
<< "." << endl;
versyms = (ElfW(Versym*)) malloc(sh->sh_size);
fseek(fp, sh->sh_offset, SEEK_SET);
fread(versyms, sh->sh_size, sizeof(char), fp);
}
}
// Get the symbol name string table
fseek(fp, strtab_off, SEEK_SET);
fread(strtab, strtab_sz, sizeof(char), fp);
// Get verneeds
for (ElfW(Verneed*) vn = verneeds; ; ) {
cout << "verneed " << ":"
<< " vn_version " << vn->vn_version
<< " vn_cnt " << vn->vn_cnt
<< " vn_file " << strtab + vn->vn_file
<< " vn_aux " << vn->vn_aux
<< " vn_next " << vn->vn_next
<< "." << endl;
ElfW(Vernaux*) vna = (ElfW(Vernaux*))((char*)vn + vn->vn_aux);
for (ElfW(Half)i = 0; i < vn->vn_cnt; i++) {
cout << " aux " << i << ": "
<< " vna_name " << strtab + vna->vna_name
<< " vna_other " << vna->vna_other
<< "." << endl;
vermap.insert(std::make_pair<ElfW(Half), ElfW(Word)>
(vna->vna_other, vna->vna_name));
vna = (ElfW(Vernaux*))((char*)vna + vna->vna_next);
}
if (vn->vn_next == 0)
break;
vn = (ElfW(Verneed*)) ((char*)vn + vn->vn_next);
}
// Get versyms
for (ElfW(Word) sym_index = 0; sym_index < sym_cnt; sym_index++) {
ElfW(Sym*) sym = &syms[sym_index];
const char *ver_name = "NONE";
if (versyms[sym_index])
ver_name = strtab + vermap[versyms[sym_index]];
sym_versions.insert(std::make_pair<string, string>
(strtab + sym->st_name, ver_name));
cout << "symbol " << strtab + sym->st_name
<< " version " << ver_name
<< "." << endl;
}
out:
fclose(fp);
if (phdrs)
delete [] phdrs;
if (shdrs)
delete [] shdrs;
if (dyns)
free(dyns);
if (syms)
delete [] syms;
if (strtab)
delete [] strtab;
if (verneeds)
free(verneeds);
if (versyms)
free(versyms);
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
我们分别编译,并以 LD_PRELOAD 方式运行上述程序:
[Linux] $ g++ test.cc -lpthread
[Linux] $ g++ -shared -fPIC -o readelf.so readelf.cc -ldl
[Linux] $ LD_PRELOAD=./readelf.so ./a.out
Read 8 program headers.
Got the dynamic program header.
DT_STRSZ value: 489.
Read 30 section headers.
.dynsym: got 16 symbols.
.dynstr: offset 1048 size 489.
.gnu.version: versym offset 1538 size 32.
.gnu.version_r: verneed offset 1576 size 144.
verneed : vn_version 1 vn_cnt 1 vn_file libc.so.6 vn_aux 16 vn_next 32.
aux 0: vna_name GLIBC_2.2.5 vna_other 4.
verneed : vn_version 1 vn_cnt 2 vn_file libstdc++.so.6 vn_aux 16 vn_next 48.
aux 0: vna_name CXXABI_1.3 vna_other 7.
aux 1: vna_name GLIBCXX_3.4 vna_other 3.
verneed : vn_version 1 vn_cnt 3 vn_file libpthread.so.0 vn_aux 16 vn_next 0.
aux 0: vna_name GLIBC_2.3.3 vna_other 6.
aux 1: vna_name GLIBC_2.2.5 vna_other 5.
aux 2: vna_name GLIBC_2.3.2 vna_other 2.
symbol version NONE.
symbol pthread_cond_destroy version GLIBC_2.3.2.
symbol __gmon_start__ version NONE.
symbol _Jv_RegisterClasses version NONE.
symbol _ZNSt8ios_base4InitC1Ev version GLIBCXX_3.4.
symbol __libc_start_main version GLIBC_2.2.5.
symbol __cxa_atexit version GLIBC_2.2.5.
symbol _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc version GLIBCXX_3.4.
symbol pthread_cond_init version GLIBC_2.3.2.
symbol pthread_condattr_init version GLIBC_2.2.5.
symbol pthread_condattr_setclock version GLIBC_2.3.3.
symbol _ZNSolsEPFRSoS_E version GLIBCXX_3.4.
symbol _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ version GLIBCXX_3.4.
symbol _ZNSt8ios_base4InitD1Ev version GLIBCXX_3.4.
symbol _ZSt4cout version GLIBCXX_3.4.
symbol __gxx_personality_v0 version CXXABI_1.3.
Override pthread_cond_init (ret = 0).
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
如上,当前进程所依赖的符号版本都可以被正确获取,pthread_cond_init 使用了我们 hook 的版本,并且没有报错。