The little bool of doom

31 points by dryya a month ago on lobsters | 7 comments

dzwdz | a month ago

Because of the

Fedora [is] a great place to test GCC pre-releases on a large, varied codebase.

remark at the beginning, I really hoped this article would end with something about this bug being reported to gcc and fixed... Except I've just tried this on gcc 15.2, and I still get the nonsensical output.

I wonder if they would even fix it. After all, the standard allows it! Maybe I'm just old-fashioned for expecting equality to be transitive.

After all, the standard allows it!

No, it’s undefined behaviour to memset a bool to -1.

dzwdz | a month ago

That's what I meant. My point was that I think the compiler should still try to act somewhat reasonable when faced with UB.

But, I thought about this a bit more, and I take that back. The assumptions on the values of bools do let you generate better code, and more importantly, setting a bool to -1 is indeed a pretty silly thing to do, maybe unless you're porting legacy code (as in this case).

I saw gcc doing silly stuff on UB so I immediately complained, instead of actually thinking about the situation for a bit more :/

ibookstein | a month ago

I expect that the memset itself isn't undefined behavior, but the attempt to observe the value afterwards is: it is an access of a trap representation not through an lvalue of character type.

Isn't initializing a bool to -1 the issue here, that the compiler should at least warn about? Assigning anything other than true or false to a bool is clearly ub.

No, the UB happens when the value is read, not when it's written. And assigning any non-true/false value to a bool isn't UB; the value is implicitly converted to bool (non-zero/non-null values convert to true, zero/null to false). The reason the bug happened here was that the field was being memset to -1. That's much more difficult for a compiler to warn about, since the type being memset is an array of structs each containing a bool field. You could do that and still have perfectly valid code if the field is overwritten some point before being read, or just not read at all.

memset in general is pretty footgunny. For example, the in-memory representation of null pointers isn't guaranteed to be all zeros, so if you memset a struct or union to 0, you can't portably assume that the pointer values will be null. Compound literals mostly solve that issue.

For example, the in-memory representation of null pointers isn't guaranteed to be all zeros, so if you memset a struct or union to 0, you can't portably assume that the pointer values will be null.

POSIX guarantees that NULL pointers have all bytes zero. Even outside POSIX it’s very hard to find a system where the representation of NULL is nonzero (I can’t think of any other than the Deathstation 9000).