sp.h is the standard library that C deserves

17 points by winter 13 hours ago on lobsters | 17 comments

Over the past year, I’ve been working on fixing C by giving it a high quality, ultra portable standard library. It is not a simple wrapper on top of libc; it doesn’t depend on libc except when required to by the platform. To my knowledge, there is nothing like it.

The library is called sp.h¹. It’s a 15,000 line, single header library written in plain C99. You can find the source code on GitHub, which includes the library itself, lots of example programs, and half a dozen baseball libraries² which extend the core. If you prefer to read a few examples and look through the source, head to GitHub first. Otherwise, let’s get on with the pitch!

Principles

Program directly against syscalls

The fundamental idea is that any C standard library must be written directly against the lowest level primitives available³. It is neither useful nor productive to try to emulate, produce, or interface with the decades of cruft that have accumulated between the OS and the code that you yourself write.

Libc is actively harmful

It is tempting to conform to libc, because swaths of code promise to compile and run if you can simply provide an implementation of libc. But more and more, this is untrue.

Libc does not provide a useful interface for any program. Simple programs would rather use a high level language. Sophisticated programs cannot be written with the primitives that it provides. This has been exacerbated over the past decade as asynchronous programming has become more important. A “fast” program is becoming less about solving e.g. register allocation better than the other compiler and more about e.g. using the right kernel primitives to do IO.

Any interface upon which the fundamental unit of IO is FILE* or upon which a substring is a malformed idea is not just annoying. It’s harmful. sp.h casts it aside⁴.

There Is No Heap

These types underpin the entire library:

typedef enum {
  SP_ALLOCATOR_MODE_ALLOC,
  SP_ALLOCATOR_MODE_FREE,
  SP_ALLOCATOR_MODE_RESIZE,
} sp_mem_alloc_mode_t;

SP_TYPEDEF_FN(
  void*,
  sp_allocator_fn_t,
  void* user_data, sp_mem_alloc_mode_t mode, u64 size, void* ptr
);

typedef struct sp_allocator_t {
  sp_allocator_fn_t on_alloc;
  void* user_data;
} sp_mem_t;

In other words, allocators. They do so by forcing programs to accept that “the ability to allocate any amount of memory from the ether” is not a primitive; it is a fiction. The operating system hands out pages. The runtime on top of it, most often called via malloc(), is what implements the often useful fiction that non-page-sized amounts of memory can be allocated.

Memory is not owned by “the runtime” or “the heap”. Memory is owned by your program. If malloc()-shaped heap allocations are what your program wants, then that’s great! There’s nothing wrong with that. But in my experience, that is an unfortunate default rather than something that is true, and this library seeks to make it opt-in rather than opt-out.

Null-terminated strings are the devil’s work

I have written about this in the past

Null terminated strings mean you cannot:
Return a non-owning substring
Know the length of a string in O(1)
Write lexers and parsers which return ergonomic views into source
Build strings without invalid intermediate values
Plus, of course, the unfathomable number of bugs and security issues that arise from a missing null terminator. Step one to modernizing C is to completely ditch null terminated strings in favor of the humble sp_str_t.

The only downside, I believed, was that you were forced to make an extra copy to interface with any other C API you might come across. I have come to find that this is completely meaningless.

A C standard library built natively around pointer + length strings is shockingly ergonomic. For example, a snippet from a wc clone:

  sp_str_t content = sp_zero;
  sp_io_read_file(mem, path, &content);

  sp_ht(sp_str_t, u32) counts = sp_zero;
  sp_str_ht_init(mem, counts);
  sp_da(sp_str_t) lines = sp_str_split_c8(mem, content, '\n');
  sp_da_for(lines, i) {
    sp_da(sp_str_t) words = sp_str_split_c8(mem, lines[i], ' ');

    sp_da_for(words, j) {
      u32* count = sp_str_ht_get(counts, words[j]);
      if (count) {
        *count = *count + 1;
      } else {
        sp_str_ht_insert(counts, words[j], 1);
      }
    }
  }

If your first reaction is “so what?”, then, yeah, that’s the point. Here’s a piece of C code which reads roughly like any high level language but also never copies data from the source buffer while parsing. In other words, it’s both the most ergonomic version and the most performant version.

Be a part of your software, not aside from it

The library is meant to be read, modified, tweaked, rewritten, or whatever verb you might need to have it serve your purposes. I’ve worked very hard to this end:

The core of the library is ~40 syscalls which are the only platform specific code⁵
The library ships as a single file which needs no configuration
The file is extremely organized, and tagged with @tags for human or LLM search
Every function is part of a namespace

Where the frustrating parts of C seek to hide the OS your program runs on behind an elaborate fiction, sp.h seeks to unify only those things which are true, as thinly as possible while being useful, and then building functionality on top of the exact same primitives that it gives you.

Be extremely portable

sp.h is written in C99, and it compiles against any compiler and libc imaginable. It works on Linux, on Windows, on macOS. It works under a WASM host. It works in the browser. It works with MSVC, and MinGW, it works with or without libc, or with weird ones like Cosmopolitan. It works with the big compilers and it works with TCC.

And, best of all, it does all all of that because it’s small, not because it’s big.

Be explicit

Every time I’ve picked implicit over explicit, I’ve come to regret it and paid the price to fix it:

Errors are always returned and handled by the caller
Programs do not have mutable global state
Functions which allocate take an allocator
Memory is zero initialized

Non-goals

Conformance to existing interfaces

This is not libc. When required to, sp.h will respect libc, and it will always work unobtrusively and completely when embedded in a libc-using program. But it is not libc, and you should not expect it to act like it is.

Obscure architectures and OSes

I write code for x86_64 and aarch64. WASM is becoming more important, but is still secondary to native targets. I don’t care to bloat the library to support a tiny fraction of use cases.

That being said, if you’re interested in using the library on an unsupported platform, I’m more than happy to help, and if we can make the patch reasonable, to merge it.

Performance

The library’s stance, to put it simply, that the juice ain’t worth the squeeze when it comes to low level, compute-bound performance.

Designing software and data structures for performance against unknown use cases on unknown hardware is extremely difficult and the resulting code is much more complicated. Even then, it’s often better to use code written against your actual use case and hardware when performance is that critical.

Things that are off the table might be:

SIMD
A highly optimized hash table rewrite
Figuring out where inlining or LIKELY causes the compiler to produce better code.

Things that are on the table might be:

Providing the correct abstractions to do optimized and/or zero copy IO
Writing APIs that do not require copying data

Of course, doing fine-grained optimization where it’s hurting people is always on the table. Fixing bugs is always on the table. I am not anti optimization; just busy.

A parting thought

The natural question one might have is: Why are you doing this? There have never been more or better languages for systems programming. Why not just use one?

The answer is that C holds a real niche, and not wholly built on legacy. To my knowledge, it’s the only language which:

Can be directly compiled to any machine code imaginable
Has an ecosystem of state-of-the-art optimizing compilers
Is written in the same language as the OS and most libraries
You could write a reasonable compiler for as a personal project

In other words…

C is valuable because it’s simple

Of course, these are all unfair to varying degrees. LLVM exists, so technically everyone has a SOTA compiler. Most languages have FFIs and tooling. The best systems languages are better at C than C is.

And yet, to have something so well-supported, so optimized, so tied to the platforms upon which we write native code, and so approachable is magical.

I want to work with you

I would like nothing more than to make friends and/or help you work on this library, stranger. I’ll help you port it to your weird environment. I’ll explain any of it to you. I’ll listen politely while you tell me I’m terrible at programming. I am certainly no genius at systems programming; everything I have is the product of really bad misunderstandings about how software and computers work, followed by lots of hard work and fun and more software.

I’m on a Discord server or you can find me at #sp on IRC. You can also email me. The domain’s the same as this site, and the handle is my last name⁶.