Linux中誰來呼叫C語言中的main?

記得很久以前聽說在Linux執行檔案時,真正的起始點並不是main,加上之前有看到單純ld會幫你偷偷link一些沒看過的object檔案,所以這次就來看到底真相為何?

測試環境

因為很假掰想要順便接觸一下ARM的組語,所以這次測試就使用Qemu跑ARM的Debian。

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 8.0 (jessie)
Release:    8.0
Codename:   jessie

$ file /bin/ls
/bin/ls: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=571db48d9c9e4625b7da206e748e41c237f2b202, stripped

測試原始碼,一樣是大家熟悉的Hellow world

hello1.c
#include <stdio.h>

int main()
{
    printf("Hello World\n");

    return 0;
}

不知道各位還記得前面有提過,執行檔中有.text的section。要執行的機械碼會放在這邊。我們先來看看hello1執行檔會從那邊開始?

$ readelf -h hello1
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           ARM
  Version:                           0x1
  Entry point address:               0x102f0
  Start of program headers:          52 (bytes into file)
...
Section header string table index: 33

從readelf可以看到起始點為0x102f0,那麼0x102f0是在那邊呢?我們再去看symbol table可以看到很巧的就是.text的起始點。

$ objdump -t hello1

hello1:     file format elf32-littlearm

SYMBOL TABLE:
00010134 l    d  .interp    00000000              .interp
...
000102f0 l    d  .text  00000000              .text

好了,那麼.text這邊起始的程式是什麼?

Disassembly of section .text:

000102f0 <_start>:
   102f0:       e3a0b000        mov     fp, #0
   102f4:       e3a0e000        mov     lr, #0
   102f8:       e49d1004        pop     {r1}            ; (ldr r1, [sp], #4)
   102fc:       e1a0200d        mov     r2, sp
   10300:       e52d2004        push    {r2}            ; (str r2, [sp, #-4]!)
   10304:       e52d0004        push    {r0}            ; (str r0, [sp, #-4]!)
   10308:       e59fc010        ldr     ip, [pc, #16]   ; 10320 <_start+0x30>
   1030c:       e52dc004        push    {ip}            ; (str ip, [sp, #-4]!)
   10310:       e59f000c        ldr     r0, [pc, #12]   ; 10324 <_start+0x34>
   10314:       e59f300c        ldr     r3, [pc, #12]   ; 10328 <_start+0x38>
   10318:       ebffffeb        bl      102cc <__libc_start_main@plt>
   1031c:       ebfffff0        bl      102e4 <abort@plt>
   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

很有趣,沒看到main(),反而看到_start。到底是_start是什麼呢?還記得Linker script嗎?裏面有一個ENTRY指令,可以指定程式從那邊開始跑,先來看一下預設的ENTRY是不是也是_start?

$ ld --verbose | grep ENTRY
ENTRY(_start)

目前我們只知道執行檔起始點是_start,而不是main,那顯然有人幫你把執行檔加碼,以至於你的執行檔出現了_start。最偷懶的方式就是去找binary看看是不是有這樣的symbol。

user@host:/usr/lib$ find -name "*.[ao]" -exec nm -A {} \;  2> /dev/null | grep " _start$"
./arm-linux-gnueabi/crt1.o:00000000 T _start
./arm-linux-gnueabi/gcrt1.o:00000000 T _start
./arm-linux-gnueabi/Scrt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/crt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/gcrt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/Scrt1.o:00000000 T _start

OK,的確有object檔案裡面有_start,我們再來確認編譯的時候會不會link這些檔案。

$ gcc -v hello1.c 
Using built-in specs.
COLLECT_GCC=gcc
...
COLLECT_GCC_OPTIONS='-v' '-march=armv4t' '-mfloat-abi=soft' 
...
-X --hash-style=gnu -m armelf_linux_eabi
...
/usr/lib/gcc/arm-linux-gnueabi/4.9/../../../arm-linux-gnueabi/crt1.o 
...

_start會呼叫外部函數__libc_start_main,我們透過LD_DEBUG來看一下。

$ LD_DEBUG=all ./hello1 2>&1 |grep __libc_start_main
       890: symbol=__libc_start_main;  lookup in file=./hello1 [0]
       890: symbol=__libc_start_main;  lookup in file=/lib/arm-linux-gnueabi/libc.so.6 [0]
       890: binding file ./hello1 [0] to /lib/arm-linux-gnueabi/libc.so.6 [0]: normal symbol `__libc_start_main' [GLIBC_2.4]

可以看到,在./hello1中有去找__libc_start_main,最後去libc.so.6找,並且找出libc.so.6__libc_start_main的位址(即binding)。而__libc_start_mainprototype如下

int __libc_start_main(int (*main) (int, char **, char **), int argc, char ** ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (*stack_end));

看到有趣的東西嘛?我有看到

  • main函數當作function pointer傳入
  • main函數的參數
  • 其他不知道三小的function pointer
    • init
    • fini
    • rtld_fini

從這邊我可以猜測這個函數就是呼叫一堆callback function,這些callback function就是上面列的死人骨頭。

手冊的說明可以看到__libc_start_main()是用來執行環境的初始化、呼叫main函數並且傳遞參數,當main函數結束後處理回傳值。手冊提到的範例詳細行為有

  • 檢查權限,確保安全性
  • thread subsystem初始化 (我可不知道什麼thread subsystem唷)
  • rtld_fini註冊release callback function,當shared object結束時使用該callback釋放資源
  • 呼叫init callback function
  • 呼叫main callback function並且帶入參數
  • 當main callback function結束後,將回傳值作為參數呼叫exit

我們再回頭看看_start的組合語言:

000102f0 <_start>:
   102f0:       e3a0b000        mov     fp, #0
   102f4:       e3a0e000        mov     lr, #0
   102f8:       e49d1004        pop     {r1}            ; (ldr r1, [sp], #4)
   102fc:       e1a0200d        mov     r2, sp
   10300:       e52d2004        push    {r2}            ; (str r2, [sp, #-4]!)
   10304:       e52d0004        push    {r0}            ; (str r0, [sp, #-4]!)
   10308:       e59fc010        ldr     ip, [pc, #16]   ; 10320 <_start+0x30>
   1030c:       e52dc004        push    {ip}            ; (str ip, [sp, #-4]!)
   10310:       e59f000c        ldr     r0, [pc, #12]   ; 10324 <_start+0x34>
   10314:       e59f300c        ldr     r3, [pc, #12]   ; 10328 <_start+0x38>
   10318:       ebffffeb        bl      102cc <__libc_start_main@plt>
   1031c:       ebfffff0        bl      102e4 <abort@plt>
   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

有趣的地方是這3個位址

   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

這邊可以看到這3個位址分別是

  • 10320: 000104b4 .word 0x000104b4
    • __libc_csu_fini
  • 10324: 00010420 .word 0x00010420
    • main
  • 10328: 00010448 .word 0x00010448
    • __libc_csu_init

也就是說,main__libc_csu_init分別當作第一和第四參數傳給__libc_start_main,而__libc_csu_fini則被丟到stack,一樣傳給__libc_start_main了。

結論

Linux執行程式的起始點並不是main,而是glibc binary中crt1.o準備的_start。這個start主要將你的main,還有一些hook函數丟給__libc_start_main,接下來libc的__libc_start_main樵好事情後才真正執行你的main,並且還要在main結束後清理戰場。

延伸閱讀

參考資料

完整反組譯程式碼

Hello.dis
$ cat hello1.dis 

hello1:     file format elf32-littlearm


Disassembly of section .init:

0001029c <_init>:
   1029c:   e92d4008    push    {r3, lr}
   102a0:   eb000021    bl  1032c <call_weak_fn>
   102a4:   e8bd4008    pop {r3, lr}
   102a8:   e12fff1e    bx  lr

Disassembly of section .plt:

000102ac <puts@plt-0x14>:
   102ac:   e52de004    push    {lr}        ; (str lr, [sp, #-4]!)
   102b0:   e59fe004    ldr lr, [pc, #4]    ; 102bc <_init+0x20>
   102b4:   e08fe00e    add lr, pc, lr
   102b8:   e5bef008    ldr pc, [lr, #8]!
   102bc:   00010318    .word   0x00010318

000102c0 <puts@plt>:
   102c0:   e28fc600    add ip, pc, #0, 12
   102c4:   e28cca10    add ip, ip, #16, 20 ; 0x10000
   102c8:   e5bcf318    ldr pc, [ip, #792]! ; 0x318

000102cc <__libc_start_main@plt>:
   102cc:   e28fc600    add ip, pc, #0, 12
   102d0:   e28cca10    add ip, ip, #16, 20 ; 0x10000
   102d4:   e5bcf310    ldr pc, [ip, #784]! ; 0x310

000102d8 <__gmon_start__@plt>:
   102d8:   e28fc600    add ip, pc, #0, 12
   102dc:   e28cca10    add ip, ip, #16, 20 ; 0x10000
   102e0:   e5bcf308    ldr pc, [ip, #776]! ; 0x308

000102e4 <abort@plt>:
   102e4:   e28fc600    add ip, pc, #0, 12
   102e8:   e28cca10    add ip, ip, #16, 20 ; 0x10000
   102ec:   e5bcf300    ldr pc, [ip, #768]! ; 0x300

Disassembly of section .text:

000102f0 <_start>:
   102f0:   e3a0b000    mov fp, #0
   102f4:   e3a0e000    mov lr, #0
   102f8:   e49d1004    pop {r1}        ; (ldr r1, [sp], #4)
   102fc:   e1a0200d    mov r2, sp
   10300:   e52d2004    push    {r2}        ; (str r2, [sp, #-4]!)
   10304:   e52d0004    push    {r0}        ; (str r0, [sp, #-4]!)
   10308:   e59fc010    ldr ip, [pc, #16]   ; 10320 <_start+0x30>
   1030c:   e52dc004    push    {ip}        ; (str ip, [sp, #-4]!)
   10310:   e59f000c    ldr r0, [pc, #12]   ; 10324 <_start+0x34>
   10314:   e59f300c    ldr r3, [pc, #12]   ; 10328 <_start+0x38>
   10318:   ebffffeb    bl  102cc <__libc_start_main@plt>
   1031c:   ebfffff0    bl  102e4 <abort@plt>
   10320:   000104b4    .word   0x000104b4
   10324:   00010420    .word   0x00010420
   10328:   00010448    .word   0x00010448

0001032c <call_weak_fn>:
   1032c:   e59f3014    ldr r3, [pc, #20]   ; 10348 <call_weak_fn+0x1c>
   10330:   e59f2014    ldr r2, [pc, #20]   ; 1034c <call_weak_fn+0x20>
   10334:   e08f3003    add r3, pc, r3
   10338:   e7932002    ldr r2, [r3, r2]
   1033c:   e3520000    cmp r2, #0
   10340:   012fff1e    bxeq    lr
   10344:   eaffffe3    b   102d8 <__gmon_start__@plt>
   10348:   00010298    .word   0x00010298
   1034c:   0000001c    .word   0x0000001c

00010350 <deregister_tm_clones>:
   10350:   e59f301c    ldr r3, [pc, #28]   ; 10374 <deregister_tm_clones+0x24>
   10354:   e59f001c    ldr r0, [pc, #28]   ; 10378 <deregister_tm_clones+0x28>
   10358:   e0603003    rsb r3, r0, r3
   1035c:   e3530006    cmp r3, #6
   10360:   912fff1e    bxls    lr
   10364:   e59f3010    ldr r3, [pc, #16]   ; 1037c <deregister_tm_clones+0x2c>
   10368:   e3530000    cmp r3, #0
   1036c:   012fff1e    bxeq    lr
   10370:   e12fff13    bx  r3
   10374:   000205ff    .word   0x000205ff
   10378:   000205fc    .word   0x000205fc
   1037c:   00000000    .word   0x00000000

00010380 <register_tm_clones>:
   10380:   e59f1024    ldr r1, [pc, #36]   ; 103ac <register_tm_clones+0x2c>
   10384:   e59f0024    ldr r0, [pc, #36]   ; 103b0 <register_tm_clones+0x30>
   10388:   e0601001    rsb r1, r0, r1
   1038c:   e1a01141    asr r1, r1, #2
   10390:   e0811fa1    add r1, r1, r1, lsr #31
   10394:   e1b010c1    asrs    r1, r1, #1
   10398:   012fff1e    bxeq    lr
   1039c:   e59f3010    ldr r3, [pc, #16]   ; 103b4 <register_tm_clones+0x34>
   103a0:   e3530000    cmp r3, #0
   103a4:   012fff1e    bxeq    lr
   103a8:   e12fff13    bx  r3
   103ac:   000205fc    .word   0x000205fc
   103b0:   000205fc    .word   0x000205fc
   103b4:   00000000    .word   0x00000000

000103b8 <__do_global_dtors_aux>:
   103b8:   e92d4010    push    {r4, lr}
   103bc:   e59f401c    ldr r4, [pc, #28]   ; 103e0 <__do_global_dtors_aux+0x28>
   103c0:   e5d43000    ldrb    r3, [r4]
   103c4:   e3530000    cmp r3, #0
   103c8:   1a000002    bne 103d8 <__do_global_dtors_aux+0x20>
   103cc:   ebffffdf    bl  10350 <deregister_tm_clones>
   103d0:   e3a03001    mov r3, #1
   103d4:   e5c43000    strb    r3, [r4]
   103d8:   e8bd4010    pop {r4, lr}
   103dc:   e12fff1e    bx  lr
   103e0:   000205fc    .word   0x000205fc

000103e4 <frame_dummy>:
   103e4:   e92d4008    push    {r3, lr}
   103e8:   e59f0028    ldr r0, [pc, #40]   ; 10418 <frame_dummy+0x34>
   103ec:   e5903000    ldr r3, [r0]
   103f0:   e3530000    cmp r3, #0
   103f4:   1a000001    bne 10400 <frame_dummy+0x1c>
   103f8:   e8bd4008    pop {r3, lr}
   103fc:   eaffffdf    b   10380 <register_tm_clones>
   10400:   e59f3014    ldr r3, [pc, #20]   ; 1041c <frame_dummy+0x38>
   10404:   e3530000    cmp r3, #0
   10408:   0afffffa    beq 103f8 <frame_dummy+0x14>
   1040c:   e1a0e00f    mov lr, pc
   10410:   e12fff13    bx  r3
   10414:   eafffff7    b   103f8 <frame_dummy+0x14>
   10418:   000204e8    .word   0x000204e8
   1041c:   00000000    .word   0x00000000

00010420 <main>:
   10420:   e92d4800    push    {fp, lr}
   10424:   e28db004    add fp, sp, #4
   10428:   e59f0014    ldr r0, [pc, #20]   ; 10444 <main+0x24>
   1042c:   ebffffa3    bl  102c0 <puts@plt>
   10430:   e3a03000    mov r3, #0
   10434:   e1a00003    mov r0, r3
   10438:   e24bd004    sub sp, fp, #4
   1043c:   e8bd4800    pop {fp, lr}
   10440:   e12fff1e    bx  lr
   10444:   000104c8    .word   0x000104c8

00010448 <__libc_csu_init>:
   10448:   e92d43f8    push    {r3, r4, r5, r6, r7, r8, r9, lr}
   1044c:   e59f6058    ldr r6, [pc, #88]   ; 104ac <__libc_csu_init+0x64>
   10450:   e59f5058    ldr r5, [pc, #88]   ; 104b0 <__libc_csu_init+0x68>
   10454:   e08f6006    add r6, pc, r6
   10458:   e08f5005    add r5, pc, r5
   1045c:   e0656006    rsb r6, r5, r6
   10460:   e1a07000    mov r7, r0
   10464:   e1a08001    mov r8, r1
   10468:   e1a09002    mov r9, r2
   1046c:   ebffff8a    bl  1029c <_init>
   10470:   e1b06146    asrs    r6, r6, #2
   10474:   0a00000a    beq 104a4 <__libc_csu_init+0x5c>
   10478:   e2455004    sub r5, r5, #4
   1047c:   e3a04000    mov r4, #0
   10480:   e2844001    add r4, r4, #1
   10484:   e5b53004    ldr r3, [r5, #4]!
   10488:   e1a00007    mov r0, r7
   1048c:   e1a01008    mov r1, r8
   10490:   e1a02009    mov r2, r9
   10494:   e1a0e00f    mov lr, pc
   10498:   e12fff13    bx  r3
   1049c:   e1540006    cmp r4, r6
   104a0:   1afffff6    bne 10480 <__libc_csu_init+0x38>
   104a4:   e8bd43f8    pop {r3, r4, r5, r6, r7, r8, r9, lr}
   104a8:   e12fff1e    bx  lr
   104ac:   00010088    .word   0x00010088
   104b0:   00010080    .word   0x00010080

000104b4 <__libc_csu_fini>:
   104b4:   e12fff1e    bx  lr

Disassembly of section .fini:

000104b8 <_fini>:
   104b8:   e92d4008    push    {r3, lr}
   104bc:   e8bd4008    pop {r3, lr}
   104c0:   e12fff1e    bx  lr

comments powered by Disqus