Parse, Don't Validate and Type-Driven Design in Rust

83 points by todsacerdoti 3 hours ago on hackernews | 29 comments

Recent and related: Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=46960392 - Feb 2026 (172 comments)

also:

Parse, Don’t Validate – Some C Safety Tips - https://news.ycombinator.com/item?id=44507405 - July 2025 (73 comments)

Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=41031585 - July 2024 (102 comments)

Parse, don't validate (2019) - https://news.ycombinator.com/item?id=35053118 - March 2023 (219 comments)

Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=27639890 - June 2021 (270 comments)

Parsix: Parse Don't Validate - https://news.ycombinator.com/item?id=27166162 - May 2021 (107 comments)

Parse, Don’t Validate - https://news.ycombinator.com/item?id=21476261 - Nov 2019 (230 comments)

Parse, Don't Validate - https://news.ycombinator.com/item?id=21471753 - Nov 2019 (4 comments)

(p.s. these links are just to satisfy extra-curious readers - no criticism is intended! I add this because people sometimes assume otherwise)

jaggederest | 2 hours ago

You can go even further with this in other languages, with things like dependent typing - which can assert (among other interesting properties) that, for example, something like

    get_elem_at_index(array, index)

cannot ever have index outside the bounds of the array, but checked statically at compilation time - and this is the key, without knowing a priori what the length of array is.

"In Idris, a length-indexed vector is Vect n a (length n is in the type), and a valid index into length n is Fin n ('a natural number strictly less than n')."

Similar tricks work with division that might result in inf/-inf, to prevent them from typechecking, and more subtle implications in e.g. higher order types and functions

VorpalWay | 2 hours ago

How does that work? If the length of the array is read from stdin for example, it would be impossible to know it at compile time. Presumably this is limited somehow?

: jaggederest | 2 hours ago
If the length is read from outside the program it's an IO operation, not a static variable, but there are generally runtime checks in addition to the type system. Usually you solve this as in the article, with a constructor that checks it - so you'd have something like "Invalid option: length = 5 must be within 0-4" when you tried to create the Fin n from the passed in value
: ratorx | 2 hours ago
It doesn’t have to be a compile time constant. An alternative is to prove that when you are calling the function the index is always less than the size of the vector (a dynamic constraint). You may be able to assert this by having a separate function on the vector that returns a constrained value (eg. n < v.len()).
: mdm12 | an hour ago
One option is dependent pairs, where one value of the pair (in this example) would be the length of the array and the other value is a type which depends on that same value (such as Vector n T instead of List T).
Type-Driven Development with Idris[1] is a great introduction for dependently typed languages and covers methods such as these if you're interested (and Edwin Brady is a great teacher).
[1] https://www.manning.com/books/type-driven-development-with-i...
: marcosdumay | an hour ago
If you check that the value is inside the range, and execute some different code if it's not, then congratulations, you now know at compile time that the number you will read from stdin is in the right range.
: dernett | an hour ago
Not sure about Idris, but in Lean `Fin n` is a struct that contains a value `i` and a proof that `i < n`. You can read in the value `n` from stdin and then you can do `if h : i < n` to have a compile-time proof `h` that you can use to construct a `Fin n` instance.

esafak | an hour ago

I wish dependent types were more common :(

satvikpendem | 33 minutes ago

Rust has some libraries that can do dependent typing too, based on macros. For example: https://youtube.com/watch?v=JtYyhXs4t6w

Which refers to https://docs.rs/anodized/latest/anodized/

cmovq | 2 hours ago

Dividing a float by zero is usually perfectly valid. It has predictable outputs, and for some algorithms like collision detection this property is used to remove branches.

: woodruffw | 2 hours ago
I think “has predictable outputs” is less valuable than “has expected outputs” for most workloads. Dividing by zero almost always reflects an unintended state, so proceeding with the operation means compounding the error state.
(This isn’t to say it’s always wrong, but that having it be an error state by default seems very reasonable to me.)

noitpmeder | 2 hours ago

This reminds me a bit of a recent publication by Stroustrup about using concepts... in C++ to validate integer conversions automatically where necessary.

https://www.stroustrup.com/Concept-based-GP.pdf

  {
     Number<unsigned int> ii = 0;
     Number<char> cc = '0';
     ii = 2; // OK
     ii = -2; // throws
     cc = i; // OK if i is within cc’s range
     cc = -17; // OK if char is signed; otherwise throws
     cc = 1234; // throws if a char is 8 bits
  }

strawhatguy | 2 hours ago

The alternative is one type, with many functions that can operate on that type.

Like how clojure basically uses maps everywhere and the whole standard library allows you to manipulate them in various ways.

The main problem with the many type approach is several same it worse similar types, all incompatible.

packetlost | an hour ago

I don't really get why this is getting flagged, I've found this to be true but more of a trade off than a pure benefit. It also is sort of besides the point: you always need to parse inputs from external, usually untrusted, sources.

: doublesocket | an hour ago
Agree with this. Mismatching types are generally an indicator of an underlying issue with the code, not the language itself. These are areas AI can be helpful flagging potential problems.

fiddlerwoaroof | an hour ago

Yeah, there's something of a tension between the Perlis quote "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures" and Parse, don't validate.

The way I've thought about it, though, is that it's possible to design a program well either by encoding your important invariants in your types or in your functions (especially simple functions). In dynamically typed languages like Clojure, my experience is that there's a set of design practices that have a lot of the same effects as "Parse, Don't Validate" without statically enforced types. And, ultimately, it's a question of mindset which style you prefer.

: strawhatguy | an hour ago
There's probably a case for both. Core logic might benefit from hard types deep in the bowels of unchanging engine.
The real world often changes though, and more often than not the code has to adapt, regardless of how elegant are systems are designed.

marcosdumay | an hour ago

There are more than two alternatives, since functions can operate in more than one type.

Rygian | an hour ago

This sounds like the "stringly typed language" mockery of some languages. How is it actually different?

sam0x17 | an hour ago

btw the “quoth” crate makes it really really easy to implement scannerless parsing in rust for arbitrary syntax, use it on many of my projects

: IshKebab | an hour ago
Interesting looking crate. You don't seem to have any examples at all though so I wouldn't say it makes it easy!

hutao | an hour ago

Note that the division-by-zero example used in this article is not the best example to demonstrate "Parse, Don't Validate," because it relies on encapsulation. The principle of "Parse, Don't Validate" is best embodied by functions that transform untrusted data into some data type which is correct by construction.

Alexis King, the author of the original "Parse, Don't Validate" article, also published a follow-up, "Names are not type safety" [0] clarifying that the "newtype" pattern (such as hiding a nonzero integer in a wrapper type) provide weaker guarantees than correctness by construction. Her original "Parse, Don't Validate" article also includes the following caveat:

> Use abstract datatypes to make validators “look like” parsers. Sometimes, making an illegal state truly unrepresentable is just plain impractical given the tools Haskell provides, such as ensuring an integer is in a particular range. In that case, use an abstract newtype with a smart constructor to “fake” a parser from a validator.

So, an abstract data type that protects its inner data is really a "validator" that tries to resemble a "parser" in cases where the type system itself cannot encode the invariant.

The article's second example, the non-empty vec, is a better example, because it encodes within the type system the invariant that one element must exist. The crux of Alexis King's article is that programs should be structured so that functions return data types designed to be correct by construction, akin to a parser transforming less-structured data into more-structured data.

[0] https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...

: Sharlin | 14 minutes ago
Even the newtype-based "parse, don't validate" is tremendously useful in practice, though. The big thing is that if you have a bare string, you don't know "where it's been". It doesn't carry with it information whether it's already been validated. Even if a newtype can't provide you full correctness by construction, it's vastly easier to be convinced of the validity of an encapsulated value compared to a naked one.
For full-on parse-don't-validate, you essentially need a dependent type system. As a more light-weight partial solution, Rust has been prototyping pattern types, which are types constrained by patterns. For instance a range-restricted integer type could be simply spelled `i8 is 0..100`. Such a feature would certainly make correctness-by-construction easier in many cases.
The non-empty list implemented as a (T, Vec<T>) is, btw, a nice example of the clash between practicality and theoretical purity. It can't offer you a slice (consecutive view) of its elements without storing the first element twice (which requires that T: Clone, unlike normal Vec<T>), which makes it fairly useless as a vector. It's okay if you consider it just an abstract list with a more restricted interface.
: rapnie | 8 minutes ago
You can also search for "make invalid states impossible" to find more info on related practices.

fph | an hour ago

The article quickly mentions implementing addition:

```

impl Add for NonZeroF32 { ... }

impl Add<f32> for NonZeroF32 { ... }

impl Add<NonZeroF32> for f32 { ... }

```

What type would it return though?

alfons_foobar | 56 minutes ago

Would have to be F32, no? I cannot think of any way to enforce "non-zero-ness" of the result without making it return an optional Result<NonZeroF32>, and at that point we are basically back to square one...

: MaulingMonkey | 29 minutes ago
> Would have to be F32, no?
Generally yes. `NonZeroU32::saturating_add(self, other: u32)` is able to return `NonZeroU32` though! ( https://doc.rust-lang.org/std/num/type.NonZeroU32.html#metho... )
> I cannot think of any way to enforce "non-zero-ness" of the result without making it return an optional Result<NonZeroF32>, and at that point we are basically back to square one...
`NonZeroU32::checked_add(self, other: u32)` basically does this, although I'll note it returns an `Option` instead of a `Result` ( https://doc.rust-lang.org/std/num/type.NonZeroU32.html#metho... ), leaving you to `.map_err(...)` or otherwise handle the edge case to your heart's content. Niche, but occasionally what you want.

the__alchemist | 12 minutes ago

The examples in question propagate complexity throughout related code. I think this is a case I see frequently in Rust of using too many abstractions, and its associated complexities.

I would just (as a default; the situation varies)... validate prior to the division and handle as appropriate.

The analogous situation I encounter frequently is indexing, e.g. checking if the index is out of bounds. Similar idea; check; print or display an error, then fail that computation without crashing the program. Usually an indication of some bug, which can be tracked down. Or, if it's an array frequently indexed, use a (Canonical for Rust's core) `get` method on the whatever struct owns the array. It returns an Option.

I do think either the article's approach, or validating is better than runtime crashes! There are many patterns in programming. Using Types in this way is something I see a lot of in OSS rust, but it is not my cup of tea. Not heinous in this case, but I think not worth it.

This is the key to this article's philosophy, near the bottom:

> I love creating more types. Five million types for everyone please.