Someone asked me why you have to include stdbool.h to have bool in C. I opened the header and saw that it does #define bool _Bool. I wondered what _Bool is, looked up, type from C99 representing booleans. Gotcha.
What makes _Bool different from char? Well, among other things, the C compiler tries to enforce it only having values 0 or 1.
At this point I opened Compiler Explorer.
void meow() { _Bool x = 3; } |
"meow": push rbp mov rbp, rsp mov BYTE PTR [rbp-1], 1 nop pop rbp ret |
_Bool x = 3; actually becomes an assignment of 1 in Assembly. Huh.
But what if I cast?
int meow(int x) { _Bool y = (_Bool)x; return y; } |
"meow": push rbp mov rbp, rsp mov DWORD PTR [rbp-20], edi cmp DWORD PTR [rbp-20], 0 setne al mov BYTE PTR [rbp-1], al movzx eax, BYTE PTR [rbp-1] pop rbp ret |
Huh! So it turns the plain assignment into x != 0. Interesting!
And then I thought: but what about reading a pointer? So I wrote the following program:
int printf(const char *, ...); int meow(char x) { _Bool *y = (_Bool *)&x; return *y; } int main(void) { printf("%d\n", meow(3)); }
I then compiled it with gcc meow.c -o meow, I ran it, and it output 1!
At this point I was quite surprised, but when I copied the code into Compiler Explorer, I got... very confused.
int meow(char x) { _Bool *y = (_Bool *)&x; return *y; } |
"meow": push rbp mov rbp, rsp mov BYTE PTR [rbp-17], dil lea rax, [rbp-17] mov QWORD PTR [rbp-8], rax mov rax, QWORD PTR [rbp-8] movzx eax, BYTE PTR [rax] movzx eax, al pop rbp ret |
It didn't look like there's anything to convert the 3 into a 1! What's going on here?
I thought maybe there's some weird stuff I'm not understanding, so I went to OnlineGDB, pasted the code, ran it... 3. WHAT!
Well, turns out, I'm doing all of this on a Mac! And macOS hardlinks[1] gcc to clang. If I change the compiler used on Compiler Explorer to clang, I get quite the different output!
int meow(char x) { _Bool *y = (_Bool *)&x; return *y; } |
meow: push rbp mov rbp, rsp mov al, dil mov byte ptr [rbp - 1], al lea rax, [rbp - 1] mov qword ptr [rbp - 16], rax mov rax, qword ptr [rbp - 16] mov al, byte ptr [rax] and al, 1 movzx eax, al pop rbp ret |
So it's quite interesting that clang does make sure that reading from a pointer can only return 0 or 1, while gcc seems to not care about that!
This would be it for my first blog post. Hope you found it interesting!
[1] About macOS toolchain hardlinks
That's not technically accurate, but it was the best way I could think of to explain it in a way that would make many people get what the situation roughly is.
There is actually a shim executable, and every toolchain command line utility is hardlinked to the shim. Then the shim uses the toolchain selected with xcode-select to then actually run the tool. It's kinda sorta somewhat working like busybox.