Broken proofs and broken provers

24 points by gerikson a day ago on lobsters | 8 comments

Corbin | a day ago

My impression is that the LCF style systems, including the HOL family as well as Isabelle, have an excellent record for soundness, confirming Robin Milner’s conception from half a century ago – and with no countervailing penalty.

It's galling to read this after a listing of kernel bugs in a multitude of LCF-style systems. Meanwhile, Metamath's kernel has been independently implemented like five times and you (yes, you the reader!) could implement a more-or-less working kernel in a week. In Metamath databases, it's possible to have buggy or incorrect axioms, or to have proofs which incorrectly substitute metasyntax, but kernel correctness has already been figured out. Indeed, we invert the responsibility somewhat: your Metamath kernel is not correct if it thinks that set.mm or iset.mm have any invalid proofs.

k749gtnc9l3w | 10 hours ago

I think the scope of those bugs is intersecting with what would have to be axioms in Metamath-kernel-based proofs, doesn't it?

Corbin | 4 hours ago

In general, yes, bugs in the low-level set-theoretic foundations of a Metamath database are axiomatic bugs rather than kernel bugs. This is because Metamath's kernel has one rule, Meredith's rule D, and the entirety of what we think of as standard predicate logic is built above that rule. Isabelle has a similarly small kernel but it is still committed to a type-theoretic view of the world and that commitment comes with a larger kernel. (Also, Isabelle/Pure and Isabelle/HOL are not the same theory; the latter has an even larger kernel still.)

Frankly, when I put it that way, it almost sounds like an instance of The Bitter Lesson; certainly, adding compute to a Metamath system makes verification faster, scaling linearly. Adding compute to a type-theoretic system helps but it can get wasted by an exponentially-expensive tactic which overwhelms Amdahl's predicted speedup.

k749gtnc9l3w | 3 hours ago

I'm just saying that comparing track record of Metamath's kernel with Isabelle/HOL much larger base theory is not exactly apples-to-apples.

I thought expensive tactics can be expanded to stored verifiable traces (which can be large themselves though). Sure, a tactic can get expensive, but usually if one reaches for it, probably it is a significant effort to do things without it — just like it is not recommended to reach for a SAT-solver when there is a well-known simple loop to construct the object in question, but there are cases where known-alternatives to SAT-solvers are even less efficient…

Myself, I like when one can use a first-order prover for predicate logic (E/Z3/Vampire) because that's the logic in which the theories under most of the proofs were actually designed.

Corbin | 3 hours ago

I think that you've misread my original critique, then. I was pointing out that Paulson is heavily biased in favor of LCF-style systems. To me, LCF-style systems have a decent record of correctness and soundness but are nowhere near the simplicity and directness of syntax-driven proof systems like Metamath. I suppose that Lean is the LCF/HOL community's current big hope, but Lean's junk theorems are a showstopper; Lean's proof kernel has the same sort of unsafety as the NBE tactic mentioned in the original post.

k749gtnc9l3w | 3 hours ago

I don't think I misread it? Metamath is simpler, but when you are doing anything with it, the simple part is not the entire story. There are benefits to proper layering of the setup, and the author surely likes the unlayered version; but on the other hand, with more layers there is a question which part of cross-checking effort is shared or not for the higher layers.

k749gtnc9l3w | 3 hours ago

Comment removed by author

nrdxp | a day ago

This is an interesting look at "bit rot" in a field (formal verification) that many outsiders assume is immune to it. Paulson’s point about proofs breaking due to changes in underlying libraries really highlights that "complexity is the enemy," even when that complexity is mathematically sound.

It makes me wonder if the formal methods community could benefit from a more "suckless" inspired approach to library design. If the goal is long-term stability, we often find in systems programming that the best way to keep a project buildable ten years later is to aggressively minimize dependencies. It seems like formal proofs are currently sitting on a very deep, moving stack that makes that kind of longevity nearly impossible right now.

meithecatte | a day ago

Did we read an entirely different blog post? I don't see anything that sounds like bitrot within the post, and I certainly don't see any discussion of proofs breaking due to changes in underlying libraries...