I also finished my language's reference implementation's C backend recently. My #1 tip if you're considering generating C would be: get familiar with statement expressions and their limitations.
(Yes they're a gcc extension, but clang also supports them.)
They're really helpful when compiling conditional or pattern matching expressions to C. They have a few limitations explained in the link, but the big one for me (which I think isn't mentioned in the link actually) was that you can't generate a return in a statement expression and expect the expression's type to just match the expected one from the context. So let's say I'm generating a function call in C: f(arg1, arg2, arg3), and arg2 is an expression in my language that can't be an expression in C, so I use a statement expression like
f(
arg1, // compiled to a C expr
({ ... }), // statement expression here
arg3, // compiled to a C expr
)
In that ... part I can have a return, but I still need an expression as the last statement. This isn't valid:
f(
arg1,
({ statement1;
return 123; }),
arg3,
)
Because that last thing needs to be an expression. So I have to generate a dummy value after the return, with the right type:
f(
arg1,
({ statement1;
return 123;
(MyStruct) {0}; }), // the actual dummy value depends on the type expected
arg3,
)
Initially I was YOLOing my way without having the expected type in my expression compiler. When I discovered this I had to manually plumb expression types all the way from the type checker through monomorphic AST and lowered AST, and then to the expression compiler...
Is there a particular reason you chose to generate nested expressions rather than something more like A-normal form? I too started with nested expressions in an old compiler project that targeted C as its primary backend, which in my case was initially to help with legibility, compactness, and compile times for the generated C code. But I found it eventually became more awkward (and outright restrictive) than helpful and switched to doing ANF-style linearization into statements, like a linear IR. And once you start doing that for some cases, it's natural to do it for everything; I should have done it a lot sooner.
I had an interpreter already and I wanted to do the simplest thing to generate C, using the existing IRs. The reference implementation will be deleted once the self-hosted compiler works so I don't want to invest too much into it having the perfect compilation pipeline.
Initially I thought the interpreter will be enough, but then it started to take ~10 seconds for the self-hosted compiler to just name resolve, and it became clear that I'll need to either optimize the interpreter somehow, or compile. After some thinking and experimentation (and discovering the statement expressions in C) I thought compiling the lowered AST to C will be the simplest and it'll give me good enough runtime especially if I can do it in a typed way (instead of boxing everything). So far I'm quite happy with the results. The interpreter currently takes about 8 seconds to run self-hosted compiler. If I compile to C, the generated code runs in 0.2 seconds. My guess is that the final self-hosted compiler will take less than 10 seconds to self-compile, which is good enough for me. So I think/hope that I'm done with the reference implementation.
Just type checking the self-hosted compiler with the reference implementation also takes <0.1 seconds right now, so I'm also pretty sure I can get fast feedback while working on the self-hosted compiler until it's done. (I have a flag that just type checks and stops.)
Counterpoint, I've been generating Rust code as a placeholder for my language backend for years. It's certainly not flawless, but it's stood up to the test of time wayyyyy better than I expected.
No, but I could put it on the to-do list. A lot of it is really not very interesting 'cause it's just a placeholder. It's slow, it's inflexible, it generates kinda dogshit Rust code and relies on LLVM to clean it all up. But it's also worked basically flawlessly for years now; breaking compilations are uncommon, miscompilations are basically unknown. Possibly just 'cause I don't have any particularly advanced features yet.
Long story short, a) monomorphize everything to get rid of generics, b) type-erase everything to tuples and value types, and c) just write everything out to a single file.
always-inline functions remove any possible performance penalty for data abstractions
I find this to be slightly hyperbolic. It's only true if code size is not part of what you're measuring as "performance" (e.g. on the web/Wasm you want to produce as tiny executables as possible). It also ignores downstream effects e.g. on the instruction cache.
The examples in the blog post are all typesafe accessors which should compile down to less code than a call to a function that does the same thing, even in the absence of a clever optimizer. (I would expect one of these inline functions to compile down to the address part of a load or store instruction, i.e. less than one instruction. It’s hard to make a function call in less than one instruction.) It’s true that you need be aware of the potential for code bloat, i.e. you should examine the size of the inlined code vs the size of a function call. But inlined code can be smaller, especially when it unlocks further optimizations such as common subexpression elimination.
It's in the context of comparing against macros. Macros are always inline. A common reason people cite for using macros (at least historically) is that it guarantees there's no function call overhead, without having to rely on a "sufficiently smart compiler" to perfectly inline all such functions. Always-inline removes any possible function call performance overhead in the same way that a macro does.
Of course, if you use always-inline on a function which would be better left as a function call, it doesn't apply to you. I think context makes this clear enough.
osa1 | a month ago
I also finished my language's reference implementation's C backend recently. My #1 tip if you're considering generating C would be: get familiar with statement expressions and their limitations.
(Yes they're a gcc extension, but clang also supports them.)
They're really helpful when compiling conditional or pattern matching expressions to C. They have a few limitations explained in the link, but the big one for me (which I think isn't mentioned in the link actually) was that you can't generate a
returnin a statement expression and expect the expression's type to just match the expected one from the context. So let's say I'm generating a function call in C:f(arg1, arg2, arg3), andarg2is an expression in my language that can't be an expression in C, so I use a statement expression likeIn that
...part I can have areturn, but I still need an expression as the last statement. This isn't valid:Because that last thing needs to be an expression. So I have to generate a dummy value after the return, with the right type:
Initially I was YOLOing my way without having the expected type in my expression compiler. When I discovered this I had to manually plumb expression types all the way from the type checker through monomorphic AST and lowered AST, and then to the expression compiler...
pervognsen | a month ago
Is there a particular reason you chose to generate nested expressions rather than something more like A-normal form? I too started with nested expressions in an old compiler project that targeted C as its primary backend, which in my case was initially to help with legibility, compactness, and compile times for the generated C code. But I found it eventually became more awkward (and outright restrictive) than helpful and switched to doing ANF-style linearization into statements, like a linear IR. And once you start doing that for some cases, it's natural to do it for everything; I should have done it a lot sooner.
osa1 | a month ago
I had an interpreter already and I wanted to do the simplest thing to generate C, using the existing IRs. The reference implementation will be deleted once the self-hosted compiler works so I don't want to invest too much into it having the perfect compilation pipeline.
Initially I thought the interpreter will be enough, but then it started to take ~10 seconds for the self-hosted compiler to just name resolve, and it became clear that I'll need to either optimize the interpreter somehow, or compile. After some thinking and experimentation (and discovering the statement expressions in C) I thought compiling the lowered AST to C will be the simplest and it'll give me good enough runtime especially if I can do it in a typed way (instead of boxing everything). So far I'm quite happy with the results. The interpreter currently takes about 8 seconds to run self-hosted compiler. If I compile to C, the generated code runs in 0.2 seconds. My guess is that the final self-hosted compiler will take less than 10 seconds to self-compile, which is good enough for me. So I think/hope that I'm done with the reference implementation.
Just type checking the self-hosted compiler with the reference implementation also takes <0.1 seconds right now, so I'm also pretty sure I can get fast feedback while working on the self-hosted compiler until it's done. (I have a flag that just type checks and stops.)
icefox | a month ago
Counterpoint, I've been generating Rust code as a placeholder for my language backend for years. It's certainly not flawless, but it's stood up to the test of time wayyyyy better than I expected.
jnb | a month ago
That sounds very interesting, did you by chance write anything up about that already or are you considering it?
icefox | a month ago
No, but I could put it on the to-do list. A lot of it is really not very interesting 'cause it's just a placeholder. It's slow, it's inflexible, it generates kinda dogshit Rust code and relies on LLVM to clean it all up. But it's also worked basically flawlessly for years now; breaking compilations are uncommon, miscompilations are basically unknown. Possibly just 'cause I don't have any particularly advanced features yet.
Long story short, a) monomorphize everything to get rid of generics, b) type-erase everything to tuples and value types, and c) just write everything out to a single file.
manuel | a month ago
I find this to be slightly hyperbolic. It's only true if code size is not part of what you're measuring as "performance" (e.g. on the web/Wasm you want to produce as tiny executables as possible). It also ignores downstream effects e.g. on the instruction cache.
[OP] fanf | a month ago
The examples in the blog post are all typesafe accessors which should compile down to less code than a call to a function that does the same thing, even in the absence of a clever optimizer. (I would expect one of these inline functions to compile down to the address part of a load or store instruction, i.e. less than one instruction. It’s hard to make a function call in less than one instruction.) It’s true that you need be aware of the potential for code bloat, i.e. you should examine the size of the inlined code vs the size of a function call. But inlined code can be smaller, especially when it unlocks further optimizations such as common subexpression elimination.
mort | a month ago
It's in the context of comparing against macros. Macros are always inline. A common reason people cite for using macros (at least historically) is that it guarantees there's no function call overhead, without having to rely on a "sufficiently smart compiler" to perfectly inline all such functions. Always-inline removes any possible function call performance overhead in the same way that a macro does.
Of course, if you use always-inline on a function which would be better left as a function call, it doesn't apply to you. I think context makes this clear enough.