(I am the author of kefir compiler).
Thank you for bringing up the issue. The __attribute__ issue of <sys/cdefs.h> is perhaps the most "offending", at least in my experience, because it breaks epoll (and anything packed, in general), constructors, symbol visibility. I ended up bundling the following monkey-patch header with kefir. This is not ideal, but perhaps the sanest thing to do. It actually enabled me to remove bulk of custom patches from the external test suite.
Another failure mode I encountered are buggy fallbacks. Certain projects try to detect the compiler and adjust accordingly, but due to lack of testing on alternative compilers, the fallback code ends up buggy or unmaintained properly. As a compiler author, this is much more frustrating than abrupt failure with "unsupported compiler" error message, because I end up debugging weird miscompilations caused by e.g. some integer typedef width mismatch between the program and some pre-compiled library it uses.
Certain projects try to detect the compiler and adjust accordingly, but due to lack of testing on alternative compilers, the fallback code ends up buggy or unmaintained properly.
Happens with terminals too. I set $TERM to xterm-256color to pretend I'm xterm or all hell would break loose. I really wish I knew how to solve this. I guess our projects just need to become widespread and popular enough? Easy!
I've been told the solution is to detect features rather than implementations, though I was told this by web devs so up to you how much you trust them.
IMO the solution is to have a compliance test suite so any implementer can know exactly that they do comply with xterm-256color or whatever other test string you may want to query against. That way both sides of an API can actually agree on what it means without having to argue about it, and if someone lies about conforming to the spec it's their problem, not the user's.
edit: Actually, now that I think of it, CPU's have a two-pronged approach: they have a vendor/model string and a set of feature flags. So a user can both ask "does this CPU support feature frobnitz?" and get a y/n answer, and also say "is this CPU a Imtel q475? 'cause I know the frobnitz feature on that one is broken and I need to use a workaround." Though that's somewhat easier with hardware than software, 'cause once hardware is manufactured and distributed it really isn't going to change, and anyone who advertises someone else's CPU models is gonna get some serious side-eye. (Though it does happen, I'm sure.)
terminfo is exactly not that! It works by detecting the user's implementation (most commonly by checking $TERM), and then looks it up in a big database of known terminal implementations. It's a "detect features" facade around an ugly "detect implementations" truth. It's actually the worst and needs to be killed with fire.
edit: I just noticed that Browserlist, from the web world, is a bit of the opposite: It acts to the programmer like "detect implementations" but works like "detect features" and I think I like that much better.
I'm not sure "easy" is the word I'd use. The format is pretty arcane. That's a lot to learn for somebody who just wants colors to work. It's certainly not easy compared to setting $TERM to xterm-256color. In fact, I think this behavior of terminfo and ncurses, for all their grandstanding about "the proper way", is in large part what caused so many people to resort to that hack.
A similar two-pronged approach is possible with terminals...
DA will return a service class (terminal, printer or something else), a level (1-5) which defines a set of mandatory features and behaviours, plus a list of extensions that add extra features to that level (eg, rectangular editing or sixel graphics). Though I suspect what these numbers actually mean is not widely known now, and I don't think its uncommon for terminals to report a higher level than they're actually compatible with.
DECRQSS and DECRQM will return the value of some setting or mode. An application can try changing a setting or mode, then using these to see if it actually worked or not.
Lastly there is XTVERSION (\x1b[0q) which reports the terminal name and version. I've run into apps use this alone for feature detection, which I expect will eventually lead to what we saw with browser user agent strings.
I don't think there is really any solution - even popular terminals struggle with this.
For now I've chosen to be idealistic and "do the right thing". The next release of the thing I maintain is adding a new "self" personality (the current release just offers emulations of various hardware or unix console terminals). That "self" personality will set TERM appropriately and there is a matching terminfo file to go with it. Of course this strategy is well known to not work in practice, so for the forseeable future the default personality will remain VT220 so that it will at least work by default.
Apparently the monkey-patched headers approach is also what slimcc does, it seems like a good compromise.
As a compiler author, this is much more frustrating than abrupt failure with "unsupported compiler" error message, because I end up debugging weird miscompilations caused by e.g. some integer typedef width mismatch between the program and some pre-compiled library it uses.
Ah yeah, I believe I've ran into some of these as well, very annoying.
I mostly develop cproc under linux-musl, so I didn't know about __attribute__ being disabled on glibc for other compilers. That is pretty bad, indeed. The comment says that all their uses of attributes are fine if ignored, but this doesn't consider that most application code gets sys/cdefs.h indirectly, and may use attributes that are not safe to ignore. In addition to packed, aligned and constructor are commonly used.
Is this reported in an issue tracker somewhere? It seems that most of their attribute uses in cdefs.h are guarded by __glibc_has_attribute already, so I'm curious about what the blanket __attribute__ disable actually accomplishes and if it could be removed.
Another issue I'd like to bring up is features used by libc headers that compilers don't have a good way of indicating support for (like __has_attribute or __has_builtin). The case that comes to mind is __asm__ labels. NetBSD uses this to rename symbols and #errors when __GNUC__ or __PCC__ is not defined. But I'm not sure what to propose they do instead apart from just try it and let it fail if it isn't supported.
I've also encountered issues with __builtin_va_list, where libc's define va_list to void * (or even conflicting definitions) without __GNUC__. This also can't be tested for with __has_builtin. __has_builtin(__builtin_va_arg) is probably a good enough test, but I'm not sure how you'd even go about getting this fixed this for macOS.
I ran a quick search for uses of __attribute__ in /usr/include/sys and /usr/include/bits and found many that are unguarded (mostly for __format__, __aligned__, __noreturn__); so those would need to be fixed too.
glibc does not seem to prioritize compatibility with non GCC compilers, in general, so I don't know how likely it is that they'd accept those patches. Earlier this year my compiler stopped being able to compile some projects after a system upgrade because glibc added a naked use of __SIZE_TYPE__ to a Linux header. I reported it but still hasn't been fixed, so in the end I just added __X_TYPE__ style predefined macros to match GCC.
The case that comes to mind is __asm__ labels. NetBSD uses this to rename symbols and #errors when __GNUC__ or __PCC__ is not defined. But I'm not sure what to propose they do instead apart from just try it and let it fail if it isn't supported.
Yeah, I can't think of a good solution to this either. But if asm renames are 100% necessary to work in the first place, just trying and failing is probably better than compiler checks.
I've also encountered issues __builtin_va_list, where libc's define this to void * (or even conflicting definitions) without __GNUC__. This also can't be tested for with __has_builtin.
oh that's pretty bad... I expected __has_builtin(__builtin_va_list) would work, but apparently not.
I've also encountered issues with __builtin_va_list, where libc's define va_list to void * (or even conflicting definitions) without GNUC. This also can't be tested for with __has_builtin. __has_builtin(__builtin_va_arg) is probably a good enough test, but I'm not sure how you'd even go about getting this fixed this for macOS.
A lot of these things are because people assume 'alternative' means 'old'. When we started trying to get people to seriously use clang, we pretended to be GCC 4.2.1 (the last GPLv2 version) and treated everything that GCC 4.2.1 could do that we couldn't as a bug. That got us fairly quickly (i.e. in a few years) to the point of being able to be a replacement. We did this because ISO C is woefully inadequate for writing complex software and so there are basically three kinds of codebase:
Small, stand-alone embedded things that can use ISO C.
Things that use GCC extensions.
Things that use MSVC extensions.
A lot of embedded software (especially RTOS things) ends up being in the second category. Some exciting things are in both the second and third category. Being a C compiler is a nice abstract thing, but if you want to be a useful compiler then you have to be a GNU C or a Microsoft C compiler.
But a lot of these headers were designed to work with current compilers and some older ones. Old C ABIs passed all arguments on the stack. This meant that va_args could be implemented entirely as a header (relying on some things that later became UB): Take the address of the last argument as a char*, add its size (stacks grow down), and the result is a void* pointing to the next location. The next argument was accessed by casting the void* to T*, dereferencing it, and updating the va_list pointer to point to the next thing by adding 1 as a T* then casting back to void*.
A lot of these things are because people assume 'alternative' means 'old'. When we started trying to get people to seriously use clang, we pretended to be GCC 4.2.1 (the last GPLv2 version) and treated everything that GCC 4.2.1 could do that we couldn't as a bug. That got us fairly quickly (i.e. in a few years) to the point of being able to be a replacement.
Yeah, I get why things are the way they are. I'm more concerned with how to move forward. While pretending to be gcc might have been the fastest way to getting clang to be a viable replacement and the only practical choice at the time, it's obviously not ideal. It ignores the problem of all the non-standard code entirely. It also leads to a situation like browser user agents. Imagine if another big company came along and wrote a new C compiler that claimed to be GCC 4.2.1 as well as clang 9?
We did this because ISO C is woefully inadequate for writing complex software
Being a C compiler is a nice abstract thing, but if you want to be a useful compiler then you have to be a GNU C or a Microsoft C compiler.
I think this is becoming less and less true over time. WG14 is standardizing features widely used in C codebases like alignment specifiers, thread-local data, atomics, attributes, typeof, enums with large values, etc. At the same time, gcc and clang are becoming stricter, enabling -fno-common by default, banning implicit function declarations, and enforcing stricter type checking. While annoying for developers, this has been effective at getting code changed to be more conformant. When I started my compiler, it behaved like gcc -fno-common, but it was difficult to get projects to care about the duplicate definitions. When gcc made the change, people finally treated it as an issue.
Also, with mechanisms introduced by clang like __has_attribute and __has_builtin, compilers can now advertise support for features on a more granular level. A lot of the ISO C compatibility issues I run into these days are trivial things that could have been written as ISO C without sacrificing readability or performance, and I only implement a small handful of GNU C extensions.
There are large, successful projects like nginx, curl, and ffmpeg that manage to stick very closely to ISO C. I don't think that because most complex projects use GNU C extensions means that you need GNU C extensions for any complex project.
I've sent a lot of ISO C compatibility patches to various projects. For the most part, people are generally fine with these changes and just didn't realize what they were doing wasn't standard because their compiler didn't complain by default.
Old C ABIs passed all arguments on the stack.
Yeah, but the libc has to have arch/ABI-specific headers anyway, and the ABI documents very clearly define what va_list should look like. It's fine if the libc wants to offload it to the compiler by using __builtin_va_*, but falling back to void * on architectures like x86_64 SysV and aarch64 is just broken on any compiler.
The whole preprocessor / include hell is on the level that it's miracle that any C/C++ project even compiles at all (and if it does, lucky if it behaves correctly at runtime). This doesn't matter to just alternative compilers, but also alternative libc / toolchain implementations as well.
This is a topic that's been relevant for me recently - I've been extending a C parser to handle a vendor dialect, which is naturally not the GNU or MSVC one.
jprotopopov | a day ago
(I am the author of kefir compiler). Thank you for bringing up the issue. The
__attribute__issue of<sys/cdefs.h>is perhaps the most "offending", at least in my experience, because it breaks epoll (and anything packed, in general), constructors, symbol visibility. I ended up bundling the following monkey-patch header with kefir. This is not ideal, but perhaps the sanest thing to do. It actually enabled me to remove bulk of custom patches from the external test suite.Another failure mode I encountered are buggy fallbacks. Certain projects try to detect the compiler and adjust accordingly, but due to lack of testing on alternative compilers, the fallback code ends up buggy or unmaintained properly. As a compiler author, this is much more frustrating than abrupt failure with "unsupported compiler" error message, because I end up debugging weird miscompilations caused by e.g. some integer typedef width mismatch between the program and some pre-compiled library it uses.
abnercoimbre | a day ago
Happens with terminals too. I set $TERM to
xterm-256colorto pretend I'm xterm or all hell would break loose. I really wish I knew how to solve this. I guess our projects just need to become widespread and popular enough? Easy!icefox | a day ago
I've been told the solution is to detect features rather than implementations, though I was told this by web devs so up to you how much you trust them.
IMO the solution is to have a compliance test suite so any implementer can know exactly that they do comply with
xterm-256coloror whatever other test string you may want to query against. That way both sides of an API can actually agree on what it means without having to argue about it, and if someone lies about conforming to the spec it's their problem, not the user's.edit: Actually, now that I think of it, CPU's have a two-pronged approach: they have a vendor/model string and a set of feature flags. So a user can both ask "does this CPU support feature
frobnitz?" and get a y/n answer, and also say "is this CPU aImtel q475? 'cause I know thefrobnitzfeature on that one is broken and I need to use a workaround." Though that's somewhat easier with hardware than software, 'cause once hardware is manufactured and distributed it really isn't going to change, and anyone who advertises someone else's CPU models is gonna get some serious side-eye. (Though it does happen, I'm sure.)Forty-Bot | a day ago
Congratulations, you've invented autoconf.
jmtd | 23 hours ago
Or terminfo
invlpg | 23 hours ago
...which breaks as soon as you need a remote SSH terminal.
SSH desperately needs a remote-terminfo protocol as of several decades ago.
muvlon | 20 hours ago
terminfo is exactly not that! It works by detecting the user's implementation (most commonly by checking $TERM), and then looks it up in a big database of known terminal implementations. It's a "detect features" facade around an ugly "detect implementations" truth. It's actually the worst and needs to be killed with fire.
edit: I just noticed that Browserlist, from the web world, is a bit of the opposite: It acts to the programmer like "detect implementations" but works like "detect features" and I think I like that much better.
kiyurica | 10 hours ago
At least with terminfo, you can specify your own implementation (by adding to said database) relatively easily.
muvlon | 9 hours ago
I'm not sure "easy" is the word I'd use. The format is pretty arcane. That's a lot to learn for somebody who just wants colors to work. It's certainly not easy compared to setting
$TERMtoxterm-256color. In fact, I think this behavior of terminfo and ncurses, for all their grandstanding about "the proper way", is in large part what caused so many people to resort to that hack.Lilian | 15 hours ago
This is something compilers could just do for us with the
__has_builtinand__has_featuremacros.davidg | 18 hours ago
A similar two-pronged approach is possible with terminals...
DA will return a service class (terminal, printer or something else), a level (1-5) which defines a set of mandatory features and behaviours, plus a list of extensions that add extra features to that level (eg, rectangular editing or sixel graphics). Though I suspect what these numbers actually mean is not widely known now, and I don't think its uncommon for terminals to report a higher level than they're actually compatible with.
DECRQSS and DECRQM will return the value of some setting or mode. An application can try changing a setting or mode, then using these to see if it actually worked or not.
Lastly there is XTVERSION (
\x1b[0q) which reports the terminal name and version. I've run into apps use this alone for feature detection, which I expect will eventually lead to what we saw with browser user agent strings.davidg | 17 hours ago
I don't think there is really any solution - even popular terminals struggle with this.
For now I've chosen to be idealistic and "do the right thing". The next release of the thing I maintain is adding a new "self" personality (the current release just offers emulations of various hardware or unix console terminals). That "self" personality will set TERM appropriately and there is a matching terminfo file to go with it. Of course this strategy is well known to not work in practice, so for the forseeable future the default personality will remain VT220 so that it will at least work by default.
[OP] lemon | a day ago
Apparently the monkey-patched headers approach is also what slimcc does, it seems like a good compromise.
Ah yeah, I believe I've ran into some of these as well, very annoying.
mcf | a day ago
I mostly develop cproc under linux-musl, so I didn't know about
__attribute__being disabled on glibc for other compilers. That is pretty bad, indeed. The comment says that all their uses of attributes are fine if ignored, but this doesn't consider that most application code getssys/cdefs.hindirectly, and may use attributes that are not safe to ignore. In addition topacked,alignedandconstructorare commonly used.Is this reported in an issue tracker somewhere? It seems that most of their attribute uses in cdefs.h are guarded by
__glibc_has_attributealready, so I'm curious about what the blanket__attribute__disable actually accomplishes and if it could be removed.Another issue I'd like to bring up is features used by libc headers that compilers don't have a good way of indicating support for (like
__has_attributeor__has_builtin). The case that comes to mind is__asm__labels. NetBSD uses this to rename symbols and#errors when__GNUC__or__PCC__is not defined. But I'm not sure what to propose they do instead apart from just try it and let it fail if it isn't supported.I've also encountered issues with
__builtin_va_list, where libc's defineva_listtovoid *(or even conflicting definitions) without__GNUC__. This also can't be tested for with__has_builtin.__has_builtin(__builtin_va_arg)is probably a good enough test, but I'm not sure how you'd even go about getting this fixed this for macOS.[OP] lemon | a day ago
I ran a quick search for uses of
__attribute__in /usr/include/sys and /usr/include/bits and found many that are unguarded (mostly for__format__,__aligned__,__noreturn__); so those would need to be fixed too.glibc does not seem to prioritize compatibility with non GCC compilers, in general, so I don't know how likely it is that they'd accept those patches. Earlier this year my compiler stopped being able to compile some projects after a system upgrade because glibc added a naked use of
__SIZE_TYPE__to a Linux header. I reported it but still hasn't been fixed, so in the end I just added__X_TYPE__style predefined macros to match GCC.Yeah, I can't think of a good solution to this either. But if asm renames are 100% necessary to work in the first place, just trying and failing is probably better than compiler checks.
oh that's pretty bad... I expected
__has_builtin(__builtin_va_list)would work, but apparently not.david_chisnall | 4 hours ago
A lot of these things are because people assume 'alternative' means 'old'. When we started trying to get people to seriously use clang, we pretended to be GCC 4.2.1 (the last GPLv2 version) and treated everything that GCC 4.2.1 could do that we couldn't as a bug. That got us fairly quickly (i.e. in a few years) to the point of being able to be a replacement. We did this because ISO C is woefully inadequate for writing complex software and so there are basically three kinds of codebase:
A lot of embedded software (especially RTOS things) ends up being in the second category. Some exciting things are in both the second and third category. Being a C compiler is a nice abstract thing, but if you want to be a useful compiler then you have to be a GNU C or a Microsoft C compiler.
But a lot of these headers were designed to work with current compilers and some older ones. Old C ABIs passed all arguments on the stack. This meant that
va_argscould be implemented entirely as a header (relying on some things that later became UB): Take the address of the last argument as achar*, add its size (stacks grow down), and the result is avoid*pointing to the next location. The next argument was accessed by casting thevoid*toT*, dereferencing it, and updating theva_listpointer to point to the next thing by adding 1 as aT*then casting back tovoid*.mcf | 39 minutes ago
Yeah, I get why things are the way they are. I'm more concerned with how to move forward. While pretending to be gcc might have been the fastest way to getting clang to be a viable replacement and the only practical choice at the time, it's obviously not ideal. It ignores the problem of all the non-standard code entirely. It also leads to a situation like browser user agents. Imagine if another big company came along and wrote a new C compiler that claimed to be GCC 4.2.1 as well as clang 9?
I think this is becoming less and less true over time. WG14 is standardizing features widely used in C codebases like alignment specifiers, thread-local data, atomics, attributes,
typeof, enums with large values, etc. At the same time, gcc and clang are becoming stricter, enabling-fno-commonby default, banning implicit function declarations, and enforcing stricter type checking. While annoying for developers, this has been effective at getting code changed to be more conformant. When I started my compiler, it behaved likegcc -fno-common, but it was difficult to get projects to care about the duplicate definitions. When gcc made the change, people finally treated it as an issue.Also, with mechanisms introduced by clang like
__has_attributeand__has_builtin, compilers can now advertise support for features on a more granular level. A lot of the ISO C compatibility issues I run into these days are trivial things that could have been written as ISO C without sacrificing readability or performance, and I only implement a small handful of GNU C extensions.There are large, successful projects like nginx, curl, and ffmpeg that manage to stick very closely to ISO C. I don't think that because most complex projects use GNU C extensions means that you need GNU C extensions for any complex project.
I've sent a lot of ISO C compatibility patches to various projects. For the most part, people are generally fine with these changes and just didn't realize what they were doing wasn't standard because their compiler didn't complain by default.
Yeah, but the libc has to have arch/ABI-specific headers anyway, and the ABI documents very clearly define what
va_listshould look like. It's fine if the libc wants to offload it to the compiler by using__builtin_va_*, but falling back tovoid *on architectures like x86_64 SysV and aarch64 is just broken on any compiler.Cloudef | 8 hours ago
The whole preprocessor / include hell is on the level that it's miracle that any C/C++ project even compiles at all (and if it does, lucky if it behaves correctly at runtime). This doesn't matter to just alternative compilers, but also alternative libc / toolchain implementations as well.
calvin | 22 minutes ago
This is a topic that's been relevant for me recently - I've been extending a C parser to handle a vendor dialect, which is naturally not the GNU or MSVC one.
abnercoimbre | 13 minutes ago
How does a vendor end up evolving C into a dialect? Tell us more if you're at liberty to do so :)