stack alignment & argc location in assembled binaries

Hi there,

For a bit of fun, I'm trying to implement some low-level functionality in FreeBSD and Linux. I had a question regarding some seemingly not-deterministic behavior I came across regarding the initial stack alignment in FreeBSD and, therefore, the location of argc and argv relative to %rsp. I don't have a systems programming background, so maybe this issue has an obvious explanation. I looked around the forum, but I couldn't find anything on this topic.

It doesn't seem to be a recent bug, as it's essentially the same problem indicated in this 10-year old post on stackoverflow: https://stackoverflow.com/questions/8177734/freebsd-amd64-assembly-how-to-read-argc

In short, upon the executable starting, %rsp does not always point to argc. Instead, it appears that argc is located at [rsp+0] roughly half the time and [rsp+8] the other half. In addition, the stack pointer does not seem to be aligned to 16-bytes when the program starts, which I was under the impression was specified by the System-V ABI.

Chapter 11 of the FreeBSD developers handbook, though seemingly very out-of-date, does suggest that argc should always be at [rsp+0].

The answer on stackoverflow seems to be correct in that %rdi correctly contains the location of argc on FreeBSD. In other words, if you subtract %rsp from %rdi, you will get either 0 or 8, randomly. I was wondering if you guys had any further clarification for this seemingly non-deterministic behavior. I'm guessing it has something to do with the transition from a 32bit to 64bit OS. Or maybe it's something wrong with my particular machine/install.

Either way, thanks for taking the time to read my question.

FYI: This issue arises for executables assembled and linked with as/nasm & ld (as shown in the post) and even when I directly create minimal binaries without a linker. I'd be happy to attach any screenshots or terminal outputs.
 
I've also seen this and wondered about it.

My thought was that maybe the top of the stack is not necessarily 16 byte aligned, but argc needs 16byte alignment. Now why should the stack vary in size?
 
Calling convention? FreeBSD uses the C calling convention by pushing arguments on the stack then executing a syscall. Linux (and others) pass arguments using registers.


I'd make a simple C program that reads argc and argv, compile this and look at the actual assembly code with a debugger.
 
Calling convention? FreeBSD uses the C calling convention by pushing arguments on the stack then executing a syscall. Linux (and others) pass arguments using registers.
The OP is asking about '_start'. That's not a syscall. On amd64 FreeBSD uses the Itanium ABI (like all unix-like OSes as far as I know). i386 uses the stack.

You may have learnt that 'main()' is the entry point for C and C++ applications. No, that's a lie. '_start' is the real entry point. '_start' gets linked by default when you build your executable. It is responsible for stuff like calling rtld to load shared libraries, setting up TLS, doing a bit of processing of argc/argv/envv/auxv and then calls 'main()'.

However, if your exectuable is standalone (does not link to libraries) then you can write your own '_start' function. Usually only small assembler programs do that, but larger C applications can also use the technique. Valgrind does this so that the host and the guest load their code at different addresses and also so that there are two stacks.
 
The OP is asking about '_start'. That's not a syscall.
I know, but I figured functions would be called in the same way, by passing arguments on the stack instead of registers.

You may have learnt that 'main()' is the entry point for C and C++ applications. No, that's a lie. '_start' is the real entry point. '_start' gets linked by default when you build your executable. It is responsible for stuff like calling rtld to load shared libraries, setting up TLS, doing a bit of processing of argc/argv/envv/auxv and then calls 'main()'.
I know, main() is just a function that gets called from a bit of 'start'/glue code that's linked to it. That said, it would still be interesting to see how that handles them. Because it does do the right thing, always.
 
The OP is asking about '_start'.
Yes, thanks Paul. Sorry, I should have specified this.

Also SirDice, I was under the impression that at least on my x86-64 machine, FreeBSD does not seem to use the stack for syscalls.

Either way, I might be conflating 2 separate issues that may or may not be related. The very minimal code below returns the low nibble of the stack pointer (0-15). If you assemble and link (or even if you write the ELF binary from scratch and "chmod +x", like I am doing), and check the return value using "echo $?", you will get 8 on FreeBSD and 0 on Linux. I don't know much about this stuff, but to me this means that the stack is 16-byte aligned upon calling _start on Linux, but it is offset by 8-bytes on FreeBSD. However, although in this case there is a difference between Linux and FreeBSD, at least this seems to be deterministic.

Code:
global _start
section .text
_start:
    mov rdi,rsp
    and rdi,0xf
    mov rax,1    ; sys_exit on FreeBSD, use 60 on Linux
    syscall

The questionable part to me is why is argc sometimes located at [rsp] but other times at [rsp+8]. Maybe I'm not seeing things right, but to me it seems to strangely be almost the opposite of what you are saying here Paul:
My thought was that maybe the top of the stack is not necessarily 16 byte aligned, but argc needs 16byte alignment. Now why should the stack vary in size?

It's almost as if the stack is adjusted to be 8-bytes offset from 16-byte alignment (maybe this is so when we call main(), we are back to 16-byte alignment?), but argc is dropped at the original(?) stack location, which may or may not be 16-byte alignment.

That seems to be corroborated by what that stackoverflow post found in the source code at https://github.com/freebsd/freebsd-src/blob/main/sys/amd64/amd64/exec_machdep.c#L394

(and I saw your recent edits on SO Paul :) )

Just to put a few working examples together. The code below, when executed with command line arguments (e.g. ./binary aaa bbb ccc) will either return the number of arguments+1, or zero, randomly, when you check the output with "echo $?".

Code:
global _start
section .text
_start:
    mov rdi,[rsp]
    mov rax,1
    syscall

This code, however, which pulls argc from [rdi] will correctly return the number of arguments+1.
Code:
global _start
section .text
_start:
    mov rdi,[rdi]
    mov rax,1
    syscall

And lastly, this code will print the byte-distance between the locations pointed to by rdi and rsp, which is either 0 or 8.
Code:
global _start
section .text
_start:
    sub rdi,rsp
    mov rax,1
    syscall

I very much appreciate you guys for taking a look into this with me.

-Matt
 
I know, but I figured functions would be called in the same way, by passing arguments on the stack instead of registers.


I know, main() is just a function that gets called from a bit of 'start'/glue code that's linked to it. That said, it would still be interesting to see how that handles them. Because it does do the right thing, always.

How does it help if you only have _start and no main?

Also 'main()' is not just a function. There are at least two differences wrt normal functions:
  • calling 'main()' yourself in any recursive manner is UB
  • implicitly 'main()' returns 0 if no return statement is specified
The question as I see it has 2 parts
  1. Why are both RDI and RSP set?
  2. Why does RSP have an alignment that is an odd multiple of 8?
On Linux amd64, RDI and RSP are the same.
 
I can't see anything in git specific to this. The lines of code setting up RSP and RDI were all added in one go when the amd64 port was first merged in 2003. On i386 'exec_setregs' just sets ESP.
 
Thanks for your help on this Paul. It's good to know I'm not going crazy.

In terms of the second question:
2. Why does RSP have an alignment that is an odd multiple of 8?

I'm not the best with gdb, but I compiled this table for a minimal executable. Maybe it means something to you:

$sp at the first instruction (starti)$sp at _start$sp at main
C compiled, Linux0x7fffffffe1200x7fffffffe120 (same as left)0x7fffffffe010
C compiled, FreeBSD0x7fffffffda380x7fffffffd9f0 (different from left)0x7fffffffd9e0
Assembled & linked, Linux0x7fffffffe1000x7fffffffe100 (same as left)
Assembled & linked, FreeBSD0x7fffffffda280x7fffffffda28 (same as left)

So it looks to me like FreeBSD is doing something funky before _start in C-compiled binaries (? https://github.com/freebsd/freebsd-src/blob/main/lib/csu/amd64/crt1_c.c) that I am not doing for my assembled binaries.

As for the first question, I'm not sure why
a) both RDI and RSP are set, and
b) why they are set DIFFERENTLY. If I put "mov rsp,rdi" at the top of every _start on FreeBSD, do you think there would be any negative consequences? I don't see any.
 
For your questions:
a) well, both do need to be set . %rdi is part of the ABI - passing 1st argument - args. %rsp has to be set to valid stack.
As mentioned in the link @ stackoverflow exec_setregs() is doing this. It goes from do_execve() which does eventually set the registers.

b) because they do represent different things. %rdi as the argument and %rsp as first valid useable stack address (aux vectors, argv, env and others above it).
Yes, after you have the control of the program you can do whatever you want. It depends what you want to do next.

Now that randomness of the %rsp is really interesting. I modified your code slightly to print argc if %rsp is the same as %rdi, otherwise it prints 0.
Code:
.section .text
        .globl _start
_start:
        movl (%rsp), %ecx
        addl $0x0a30, %ecx
        pushq %rcx
        movq %rsp, %rsi
        movl $1, %edi
        movl $2, %edx
        movl $4, %eax
        syscall

       # movl (0xcafec0de), %eax  # trigger crash
        xorl %edi, %edi
        movl $1, %eax
        syscall

When I used it with the crash trigger I was able to compare the core dumps. They are the same, stack address does get randomized though. Randomness is happening on the higher bits of the address but maybe it plays a role in that randomness (don't have access to anything older right now).

Speaking of ABI the best I could find is one that has been quoted few times: https://people.freebsd.org/~obrien/amd64-elf-abi.pdf. From there:
Code:
The end of the input argument area shall be aligned on a 16 byte boundary.
In other words, the value (%rsp - 8) is always a multiple of 16 when control is
transferred to the function entry point.

Which frankly I can't say I'm any wiser - I'd still expect once program starts it would be aligned already.
 
Code:
what@FreeBSD:~% cat file.s
.section .text
.globl main
main:
        movq 8(%rsp), %rdi
        movq $1, %rax
        int $0x80
what@FreeBSD:~% cc file.s
what@FreeBSD:~% ./a.out
[1]what@FreeBSD:~% ./a.out 1 2 3
[4]what@FreeBSD:~% rm a.out

what@FreeBSD:~% as file.s -o file.o
what@FreeBSD:~% ld file.o
ld: warning: cannot find entry symbol _start; not setting start address
what@FreeBSD:~% ld file.o --entry main
what@FreeBSD:~% ./a.out
[1]what@FreeBSD:~% ./a.out 1 2 3
[4]what@FreeBSD:~% ./a.out 1   
[2]what@FreeBSD:~% ./a.out 
[128]what@FreeBSD:~% ????
zsh: no matches found: ????
[1]what@FreeBSD:~% rm a.out
    
what@FreeBSD:~% cc -v file.o
FreeBSD clang version 15.0.7 (https://github.com/llvm/llvm-project.git llvmorg-15.0.7-0-g8dfdcc7b7bf6)
Target: x86_64-unknown-freebsd14.0
Thread model: posix
InstalledDir: /usr/bin
 "/usr/bin/ld" --eh-frame-hdr -dynamic-linker /libexec/ld-elf.so.1 --hash-style=both --enable-new-dtags -o a.out /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtbegin.o -L/usr/lib file.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/crtend.o /usr/lib/crtn.o

what@FreeBSD:~% cat file.s
.section .text
.globl main
main:
        movq (%rdi), %rdi
        movq $1, %rax
        int $0x80
what@FreeBSD:~% as file.s -o file.o
what@FreeBSD:~% ld file.o --entry main
what@FreeBSD:~% ./a.out
[1]what@FreeBSD:~% ./a.out 1 2 3
[4]what@FreeBSD:~% ./a.out 1 2 
[3]what@FreeBSD:~% ./a.out 1 
[2]what@FreeBSD:~% ./a.out  
[1]what@FreeBSD:~%

This seems to be related with ld.lld(1) and %rsp
 
xlevidi I think it has to do something with the stack fixup but I was not able to trace it down yet. I'd put breakpoints ~there to see what's happening. It's past 2am here, I'll have hard time waking up to work ;).
I did quick 'n dirty dtrace script:
Code:
#!/usr/sbin/dtrace -s

fbt::exec_setregs:entry
{
    printf ("stack: 0x%x", arg2);
}

fbt::elf64_freebsd_fixup:entry
{
    printf ("stack_base entry: 0x%x", arg0);
}

fbt::exec_copyout_strings:entry
{
    printf ("stack_base entry: 0x%x", arg1);
}
First example where it doesn't align and the other when it does:
Code:
  0  53329       exec_copyout_strings:entry stack_base entry: 0xfffffe00acb67ba8
  0  52230        elf64_freebsd_fixup:entry stack_base entry: 0xfffffe00acb67ba8
  0  66405               exec_setregs:entry stack: 0x820f6fb40
  0  53329       exec_copyout_strings:entry stack_base entry: 0xfffffe00acb67ba8
  0  52230        elf64_freebsd_fixup:entry stack_base entry: 0xfffffe00acb67ba8
  0  66405               exec_setregs:entry stack: 0x820736738

I'm really curious to see why it gets shifted by extra 8 bytes sometimes.
 
As I thought randomization does play a role here. If you look at the exec_map_stack() you'll see stack is being randomly moved around. I modified the src just so I can see what's happening:
Code:
        printf("stack_top: 0x%lx, stack_addr: 0x%lx\n", stack_top, stack_addr);

        if ((map->flags & MAP_ASLR_STACK) != 0) {
                /* Randomize within the first page of the stack. */
                arc4rand(&stack_off, sizeof(stack_off), 0);
                stack_top -= rounddown2(stack_off & PAGE_MASK, sizeof(void *));

                printf("ASLR stack_top: 0x%lx\n", stack_top);
        }
And the results are:
When %rsp doesn't point to args:
Code:
stack_top: 0x820491000, stack_addr: 0x800491000
ASLR stack_top: 0x820490300
And when %rsp does point to args:
Code:
stack_top: 0x820f24000, stack_addr: 0x800f24000
ASLR stack_top: 0x820f23558

Just to make it clear: I don't expect %rsp to be pointing there, %rdi is. But I was a wondering about this source of randomness. When %rsp is aligned on 16B boundary it doesn't point to args. This makes sense to me - %rsp to be 16B aligned when kernel handovers control to userspace. We don't care what userspace does, if it is static plain binary or if it uses C (crt = C runtime stuff).

Now I still don't understand that is exec_setregs trying to achieve with that 8B alignment and what role ASLR plays here. It almost seems like undesired outcome.
 
To expand a bit on my last sentence from above. As hinted above randomness (ASLR) is the key player here. I'm not going to paste the modified code, I modified the functions I mentioned above to see the partial results when stack_base was being adjusted during do_execve(). First example is when %rsp was not the same (didn't point to args), the latter one is when it did.

Two runs of a test program:
Code:
exec_map_stack: stack_top: 0x82026e000, stack_addr: 0x80026e000
exec_map_stack: stack_off: 0x6748fc83, going down by: c80, stack_top: 0x82026d380 (PAGE_MASK: fff)
do_execve: stack_base after init: 0x82026ce88
do_execve: stack after fixup: 0x82026ce80
exec_setregs: stack: 0x82026ce80, after roundup: 0x82026ce78

rsp            0x82026ce78         0x82026ce78  ; <-- init value of rsp in test program

Code:
exec_map_stack: stack_top: 0x820e01000, stack_addr: 0x800e01000
exec_map_stack: stack_off: 0x8af660a9, going down by: a8, stack_top: 0x820e00f58 (PAGE_MASK: fff)
do_execve: stack_base after init: 0x820e00a60
do_execve: stack after fixup: 0x820e00a58
exec_setregs: stack: 0x820e00a58, after roundup: 0x820e00a58

rsp            0x820e00a58         0x820e00a58 ; <-- init value of rsp in test program

Stack address is moved down few times. Sometimes by random values. We end up in exec_setregs() with a stack address ending either on 8 or on a 0 (such as 0x820e00a58, 0x82026ce80). That ~0xf AND operation is what moves it down randomly depending on the stack address.

Reason why it aligns the %rsp it does is what is mentioned in the ABI book I shared. It is always true: %rsp-8 is always 16B aligned address. I tried to re-read this few times but I don't get the reasoning. Maybe it would make sense deeper in program (e.g. in main() ) but compilers had its own prologue for main and aligned it (stack) there anyway.
 
Very interesting, thank you very much for taking a look into this, Martin.

Reason why it aligns the %rsp it does is what is mentioned in the ABI book I shared. It is always true: %rsp-8 is always 16B aligned address. I tried to re-read this few times but I don't get the reasoning. Maybe it would make sense deeper in program (e.g. in main() ) but compilers had its own prologue for main and aligned it (stack) there anyway.

I also noticed this, and this totally makes sense per the ABI. What's weird to me is that in later versions of the same System-V ABI: https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf , in Figure 3.9: Initial Process Stack (and you can get auto-built versions of this ABI from as recently as last month from their gitlab repo), you can clearly see that %rsp is supposed to point to argc at _start. This table is not present in the older draft ABI that you linked.

Maybe I'm reading this wrong, but it seems to me that x64 FreeBSD does not follow the latest version of the ABI. Do we know if this is true?

And regarding the comment you made earlier:
For your questions:
a) well, both do need to be set . %rdi is part of the ABI - passing 1st argument - args. [...]
I understand that %rdi is set aside for the first argument for function calls in general, but does that convention really extend to _start?

I guess what I'm asking is, is there a documented reason why %rdi points to argc at _start and %rsp does not, even when the ABI seems to explicitly require this, for at least a decade now? ?

-Matt
 
Yw, I was curious myself to see why it is not deterministic. Because even with ASLR I'd expects offsets to be the same (it's the base that gets randomized). At least how I'm used to from Linux.
To close the topic about that alignment: I was able to trace the first occurrence of this on github: releng/5.1 exec_setregs. Here we have the explanation from the author. It's a pity that comment was later removed.

I'm assuming that it's for implied call of the function within _start, most likely main on FreeBSD.

To the ABI. OS developers can create whatever ABI they want restricted by the architecture. While they may have been following System-V they are not restricted to that.

I understand that %rdi is set aside for the first argument for function calls in general, but does that convention really extend to _start?
It does make sense to me - _start is treated as a function.
 
OK! Thanks Martin, Paul, and others! That pretty much answers all my questions then. I don't know why I expected 2 completely different operating systems to function identically in every way.

I'm also glad to see that Google has already indexed this forum post, so future generations will be able to get some guidance on this, at least until everyone switches over to x86-128 FreeBSD.
 
Back
Top