wolfSSL releases a new product; wolfCOSE a zero alloc C embbedded COSE stack

103 points by aidangarske 20 hours ago on hackernews | 26 comments

For those of us not in the loop, COSE[1] is CBOR Object Signing and Encryption, with CBOR being a binary JSON alternative. It is patterned off JOSE, the JSON standards which includes favorites like JWK.

[1]: https://www.rfc-editor.org/info/rfc9052/

mgaunard | 17 hours ago

so some sort of JWT alternative?

SV_BubbleTime | 14 hours ago

For the most part yes. JWT is a part of JOSE. For most things CBOR, think binary JSON.

I love me some CBOR, but Carl isn’t very adventurous in deviating from JSON (rightfully!) so I don’t expect a lot new in COSE if you have worked with JOSE.

Other than the tagged data types, the main inconpatibiiiry in CBOR to JSON is that CBOR map keys can be integers and in JSON must be strings.

: formerly_proven | 6 hours ago
CBOR and COSE are pretty bad formats. The original "rationale" for CBOR was that messagepack didn't distinguish bytes and strings, which was added around ~2013. Afterwards CBOR was changed up a bit from messagepack and became a decidedly worse format. And COSE just goes against every other principle of well-engineered crypto, but that's not particularly surprising giving it is a JOSE derivative.
A good zero-order classifier for "is this signing format a dumpster fire" is whether the spec mentions canonical encodings.

forty | 10 hours ago

Moving to something else that JSON for this kind of thing is reasonable given the issues with parsing JSON which can cause 2 implementation to interpret it in 2 different ways.

https://seriot.ch/security/parsing_json.html

camgunz | 9 hours ago

CBOR has other ways it's unsuitable; the spec has a whole section about it: https://datatracker.ietf.org/doc/html/rfc9052#name-cbor-enco...

: Asmod4n | 6 hours ago
COSE was invented to solve that gap wasn’t it.

throwaway2037 | 5 hours ago

This is a great blog and an incredible piece of research. Re-reading again today, makes me think of Emil Stenström's recently effort to use an LLM to write an HTML5 parser in pure Python [1] using the official HTML5 spec and their test cases. [2] Later, Simon Willison used an LLM to convert the pure Python source to JavaScript. [3] It seems reasonable to ask an LLM to write a "perfect" JSON parser given the RFC spec and massive test pack from seriot.ch. Regarding the "minefield" of JSON parsing, I used to lean on Google's Gson (Java) a lot in my early days. I thought Jackson FasterXML was "too complex". Later, I realised the mind-boggling number of configuration options was weirdly more sustainable (but more complex!), because I could carefully control each JSON parser/generator edge case.

[1] https://github.com/EmilStenstrom/justhtml

[2] https://simonwillison.net/2025/Dec/14/justhtml/

[3] https://simonwillison.net/2025/Dec/15/porting-justhtml/

Neywiny | 17 hours ago

2 things of notice in the readme as recently I've been in the efficient binary communication hunt:

1. .text size without clarifying the architecture, flags, and compiler is meaningless unless it's all rodata (and it's not)

2. Saying it takes 0 .bss and .data just means it allocates everything elsewhere and that can be helpful to know. Of course in compilation that'll also be dependent on how and for what it's built. To say it's zero alloc is incorrect or at best misleading. Here's a line of code that allocates a ton of stuff on the stack: https://github.com/wolfSSL/wolfCOSE/blob/b90b34abcba90aa7b8a... (previously pointed to another line but it was diluting my thesis). Anyone in embedded who's had to increase stack size to use a fancy function knows what I'm talking about. I'm looking at you, sscanf. Some of this code will allocate hundreds if not low thousands of bytes onto the stack. Which is maybe fine but don't say it's zero alloc just because it's all on the stack.

wmwragg | 17 hours ago

My understanding of zero alloc is that there are no heap allocations i.e. use of a form of malloc. At least that has always been my experience, use of the stack is perfectly fine

dezgeg | 17 hours ago

Some stricter interpretations also require that maximum stack usage can be statically analyzed (ie. no recursion, no function pointers, no VLAs/alloca).

uecker | 10 hours ago

VLA usage can also be analyzed (e.g. when bounded in a simple way by a function argumen) and then may allow reduced stack usage compared to fixed-size arrays sized for a worst case.

Asmod4n | 6 hours ago

VLAs aren’t a mandatory part of any c standard and as such there are platforms which haven’t implemented them, such as windows.

: uecker | 6 hours ago
This does not have much to do with my point, but, anyway, basically any C compiler supports them. MSVC does not, but it also does not support a recent standard so you can not use MSVC to compile C, just some outdated subset.

Neywiny | 16 hours ago

But it puts sizeable arrays on the stack. That's not really better since instead of an out of memory exception it'll just corrupt the stack of on the majority of embedded implementations that don't have hardware stack protection in use or available.

magicalhippo | 12 hours ago

So I think they're correctly saying this is non-allocating, since it doesn't internally malloc, but I also think you're right to criticize their stack spam.

I think it would be better if they made you pass a struct with those arrays and such as members, then you get to chose if you want to put it in global memory to ensure they're available or if you take your chance on a local stack instance.

Asmod4n | 6 hours ago

VLAs can’t be used in portable code at all.

Or what do you mean with sizeable arrays?

: Hendrikto | 5 hours ago
I think he means “large arrays”.

adrian_b | 4 hours ago

I have looked into the header "wolfcose.h".

It contains a bunch of configurable parameters, including the 64-byte size of those buffers.

Based on those configurable parameters, which also impose limits on the depth of procedure calls, it is possible to compute which will be the maximum space that will be needed in a stack. Thus in an embedded system it would be possible to guarantee that the stack size is not exceeded.

By changing the values of those configuration parameters, it should be possible to tune the size of the stack, with the price that with a lower space available in the stack it may become impossible to decode certain more complex messages.

nine_k | 17 hours ago

I used to think that zero alloc = zero malloc, and all stack allocations are of statically known fixed size (you know the max call depth), so you can preallocate your stack area with some confidence, and will never run out of RAM.

The line you point at creates a single local pointer variable which is used in a tight loop; I don't see why won't it stay entirely in a register.

I'm not a real embedded developer though; last time I worked as one I worked on 8-bit devices. Maybe things changed since then.

Neywiny | 16 hours ago

I think your experience on 8 bit is just fine. Imagine, if you will, that your 8 bit micro has 2 kB of RAM, such as the famous atmega328p of the Arduino UNO. Sure the compiler might put it into a register, but it might not. It most certainly won't put where later in the code they define 3 66 byte arrays on the stack, but that's maybe ok. The question is: how do you preallocate the stack safely? How do you know exactly what your usage is without overflowing the stack and wreaking havoc? Maybe you profile the code with debug on and it's X bytes, then in release mode it's Y because register packing. This effects all code, but it's something we need to be cognizant of when we're trying to maximize the 2 kB. It's easy to throw kilobytes of stack around on desktop. Megabytes even. I've done gigabytes before for quick and dirty stuff. But on deeply embedded 8 bits, you don't want to be doing that.

My bigger point was that no malloc should be called "stack allocated" or some other more technically correct term. That tells me "hey if you run this code and something goes haywire, check your stack isn't corrupted" because 9 times out of ten for me that's the problem.

: AlotOfReading | 15 hours ago
I don't know if you work in embedded, but I do and I've always understood zero alloc as "no dynamic allocation".
Most companies buying anything from WolfSSL will already be using a script or toolchain flags to validate stack usage. And if they don't, even embedded toolchains generally support canaries these days.

amiga386 | 16 hours ago

That line allocates nothing. The function is their version of explicit_bzero(). The line casts an existing pointer passed in (e.g. pointing to something on the stack, or allocated by you) to a volatile pointer, which prevents the compiler from optimising away the writes.

Their README states "zero dynamic allocation: all operations use caller-provided buffers" and "Full COSE lifecycle in ~<1KB RAM (excluding wolfCrypt internals)", so I assume their stack usage is low too, because you (the caller) will own and have to allocate all buffers yourself

Neywiny | 16 hours ago

p is allocated on the stack

: amiga386 | 16 hours ago
So what? So is every automatic variable, including size_t i on the next line, unless it fits in a register. So are the platform-specific preserved registers across function calls, so are the function parameters if the platform declares they need to be on the stack, so is the return value in some cases. So is the preserved return address / link register for the function call.
Nobody is going to write code with exclusively global variables and GOTO instead of function calls, in order to use zero stack. And if they did, the other claim (no .data or .bss) would be untrue.
I'm not sure how you misinterpreted "Zero dynamic allocation" to mean "not even stack variables", because nobody would read it that way, and I don't think anyone sane would use software that promised it used no stack.

adrian_b | 7 hours ago

Because this is a library, it presumably allocates nothing in the heap or in static storage.

All data must be allocated in the program invoking procedures from the library, and passed as actual parameters.

You are right about the .text size being dependent on architecture, flags and compiler, but these dependencies may at most double or triple the size. They will certainly not make the size ten times greater. So with a maximum size of 25 kB, I expect that the maximum size will be under 100 kB on any combinations of architecture, flags and compilers.

I do not understand exactly what you mean about "unless it's all rodata". Depending on the architecture, flags and compiler, the constants may be allocated in separate sections, like ".rodata", or they may also be allocated in the same ".text" section with the executable code. The latter choice is typically superior on the CPUs that have relative addressing, like x86_64.

This is what you meant, that it is not clear whether the quoted ".text" size also includes the constants, or not?

I do not think that such a library includes a great amount of constants, so it is likely that adding them or not does not change much the size.

At the line pointed by you, the size of each of the 2 allocated buffers is 64 bytes. The buffer sizes and other parameters that determine the maximum amount of stack usage are defined in "wolfcose.h". It appears that is possible to tune the amount of stack needed, as a tradeoff with the complexity of the messages that can be decoded.

I agree that "Zero dynamic allocation" in the README is not really correct, because they meant "zero allocation in the heap".

Nevertheless, this cannot cause confusions, because any programmer should be aware that a claim of no dynamic allocation of any kind is typically impossible, because almost all functions or procedures must allocate some variables in the stack, with very rare exceptions where there are so few local variables that they may be allocated only inside registers. On x86_64, zero allocation in the stack is completely impossible, because at least the return address must be allocated in the stack.