some C habits I employ for the modern day

51 points by FedericoSchonborn a month ago on lobsters | 17 comments

Having programmed in C for 35+ years, the advice here is ... mostly okay. I'm not a fan of the u8, u16 when we now have uint8_t and uint16_t but not enough to say not to do it. I did something similar before C99, but once C99 came out, I changed my code (as I came across it) to use the <stdint.h> types.

As far as NUL terminated strings go, you'll have to go back to the early 70s, when memory was very scarce compared to today, and try to convince them to waste an extra byte per string (or more, depending on how it's done) when they were already wasting one byte per string [1]. But that was then, this is now. There are several methods of doing the "counted" string. There is:

Having the first character contain the length. This limits strings to CHAR_MAX characters, and it makes it hard to refer to a bit of string in the middle.
Using multiple characters (two, four, eight) for the length. Ups the limit of the length of the string, but still has the downside of slices become hard.
struct { size_t len; char *p; }---easy to take slices of strings, but there is overhead. Also, you might want some additional fields to mark if the string is heap allocated, and how many references exist to it. Increases the memory overhead, but hey, what's a few gigabytes these days?
struct { char *p; size_t len; } This is the same as 3, but incompatible, and heaven help you if you mix the two in the same project.

Over the years, I've just accepted NUL strings in C; anything more get tedious to use in C. I've also just accepted the standard C library. I've gone down the path of reimplementing the standard library, and I've come to later regret it. It could have been I didn't have the experience in API design at the time, but I find it mostly, eh.

I also have done the preprocessor abuse (to the point of implementing templates in C!) and ... no. I don't find it worth saving the typing, and it can make finding out how something works ("Where does foo come from? I can't find its declaration! Oh @#@#$@# it's defined in a MACRO! @#@#$@!#$@!#").

[1] The NUL byte. One way to avoid this that I've actually done was to set the high bit on the last character (benefit of being in the US and using US-ASCII). Yes, it complicates the code a bit, but if you have a lot of strings, it saves a lot of memory when its critical (like a 6809 disassembler in 6809 assembly code, where having NUL bytes terminate the strings would increase the string space enough to require a two-byte offset instead of an 8-bit offset---I was able to fit the code inside of 2K, important when there is, at best 64K).

: hawski | a month ago
Just a thought. I've never seen anyone use it (though I'm sure someone did), but another method would be a variable length length header, a bit like UTF-8. First bit set to 0 means 7 bit length header. So strings up to 128 characters would have a one byte overhead. Then two first bits set to 10 would mean 14 bit length header: 16k chars - 2 byte overhead. 110xxxxx, 21 bit length header: 2M chars - 3 byte overhead. 1110xxxx, 28 bit length header: 256M chars - 4 byte overhead. And so forth. It doesn't have to have a linear overhead, otherwise it wouldn't work for full 64 bit lengths. That is obviously a whole different can of worms :)

wink | a month ago

Interesting read and I don't know enough on C to formulate an educated reply...

but it does have some strong "it's a good language in its 7th standards revision if you just use these 15 hacks" energy.

: danso | a month ago
You joke, but this is exactly what modern C programming is.

We know it's got many design mistakes, but there's no other language competing for its niche, and so we all use it with our favourite workarounds, custom headers, linters, long lists of compiler warnings...

Rust and Zig are making good efforts in the systems programming niche, but they have a climb ahead before they start to threaten the mountain that is C.

abnercoimbre | a month ago

Sigh, my brain genuinely can't grok the lack of capitalization in sentences. Lowercasing everything makes me feel lost very quickly ("wait am I reading a new sentence or not?")

I'm a professional C programmer. I'm sure there's helpful material in here. But I had to close the tab.

jitl | a month ago

I didn’t even notice.

: abnercoimbre | a month ago
To each their own!

WilhelmVonWeiner | a month ago

From easiest to hardest:

pipe it through GNU sed sed 's/\. $[a-z]$/. \U\1/g';
pipe it through GNU sed sed 's/$[.?!]$ $[a-z]$/\1 \U\2/g';
paste it into an LLM and ask it to fix the capitalisation;
get used to it, it's how people online and under 30 type now. In fact, trendsetter Ken Thompson does it.

yoshi | a month ago

I do like to keep my blog a place where I don't have to worry too hard about having proper grammar or style in lieu of letting my thoughts flow out easily, but I suppose I did make this post with the explicit purpose of wanting others to read it... It'd probably be better if I made it a bit more accessible in that regard :P

I added some proper capitalization in there; I hope it helps.

makishimu | a month ago

[...] I don’t care for systems where char isn’t 8 bit, so the distinction between u8 and char doesn’t mean anything to me here.

Strictly speaking, char's signedness is defined by implementation and "char is not compatible with signed char and not compatible with unsigned char". Thus, there is one more distinction beside char's width as char may also be i8.

LesleyLai | a month ago

Pretty good advice overall. I don't write C often, but when I do, I follow some of the similar practices.

I can definitely feel the pain of a null-terminated string. All those C libraries that require null-terminated strings severely limit the usability of C++ string_view. That said, I still try to use length+data without caring about the null-terminator in some code, and if you don't need to pass the string to third-party APIs, it should be fine (at least printf supports pointer+length).

I like "parse, don't validate" and "Make invalid states not representable," but I am not sure whether C is the language to exercise that. It feels like C doesn't really provide many of the mechanical checks that other type systems perform, and the verbosity is just too annoying.

I just always return a named struct rather than using tuples. Even in languages with tuple support, I often do that. IMO, the extra information increases readability.

Error handling: I often use tagged unions, too.

Also, the author is right that most of the C standard library has a pretty unergonomic/easy-to-use-wrong API.

spc476 | a month ago

I'd be interested in your list of pretty unergonomic/easy-to-use-wrong API in the C standard library. Aside from the global state (stdin, the hidden random seed) and the str*() and signal(), what are some other problematic functions?

bediger4000 | a month ago

fread and fwrite have the FILE* argument in a different, arguably wrong place than fprintf and other stdio functions. The stdio argument variability and other problems led Fowler, Korn and Vo to write sfio to replace it. I tried to use sfio for a couple of years, but ended up going back to stdio because it's so universal.

lhearachel | a month ago

Even just having fputs and fprintf invert the position of the FILE * argument is so smelly as to be outright annoying.

For fread, I can see the argument to create a similar-looking interface to memcpy. But, then, it should be fread(void *dest, FILE *f, size_t size, size_t n) rather than what it is now.

For fwrite, I've never been able to piece out why the arguments are structured the way that they are.

: hawski | a month ago
I also don't like how puts writes a new line and fputs doesn't. They shouldn't have such a similar name.
: calvin | a month ago
fread and fwrite have the same function signature so you could presumably have function pointers that work with both.

LesleyLai | a month ago

As you said, almost the entirely <string.h> except mem* functions. And you also mentioned rand. Also

<ctype.h>: misleading signature (taking int/returning int), locale dependent
Many functions depend on errno in a non-obvious way. For example, strtol writes to errno if a range error occurs
qsort/bsearch: hard to use, also poor performance because the comparator can't be inlined.