Ph.D. student studying programming languages at Purdue University. Advised by Ben Delaware. Formerly Square and elsewhere.

Note: I originally wrote this post back in 2008. In December of 2014, I revisited it. I edited it for clarity, but attempted to keep the original meaning intact. My views around correctness proofs and (especially) automated testing have changed a little over the last six years, and I found myself cringing slightly while reading over a couple of sections. However, I think there’s enough good stuff here to justify keeping and sharing it.

My university required senior computer science students to complete a capstone project before graduation. The idea was to have every person perform a research project or internship that brought together all that he or she had learned over the last four years.

My capstone project was of the independent research variety. I’m very interested in formal logic and proof systems, and wanted to examine parallels between generating a formal proof and writing a computer program. What I ended up doing was writing code to transform a formal program specification written in something called Z-notation into a working Spark-Ada program. As a side effect, the tool would also generate a proof of that program’s conformity to the Z spec.

My strategy for doing this was centered around examining pre- and post- conditions at single-line level. Since each line of code transforms the program from one state to another, it’s possible to automatically generate Z pre- and post- conditions for individual lines of code. So, the problem becomes a search for a list of statements whose pre- and post- conditions chain together to meet the top-level goal pre- and post- conditions as specified in Z spec. The sequence of statements make up the working code, and the chain of line-by-line pre- and post- conditions make up the correctness proof.

The programs my system could generate were very simple (they could only use a ridiculously small subset of Spark-Ada). Think methods that swap the values of two variables. It worked, though, and it was a lot of fun to build.

One of my big takeaways, however, was a sense of limited utility yielded by formal proofs of formal specifications. Something I read on the Haskell mailing list today got me thinking about software proofs (and inspired this post): “[b]asically, if you can prove that the program is correct, then you don’t need to test it.” I disagree. What I learned while creating my senior capstone is that, when you prove a program “correct”, it’s important to remember that what you’re actually proving is that the program conforms to some specification. Maybe it’s a specification formally defined in a language like Z, or maybe it’s a non-formal specification that resides in your head.

In any case, a specification is prone to error just like program code. What’s worse, as your specification language gets detailed enough to say meaningful things about real code, it starts to become another programming language unto itself. With my capstone, all I really did is write a Z-to-Spark compiler. I’d shifted the opportunity for human error up a level – now, instead of making mistakes in Ada code, people were free to make mistakes in their Z spec.

It’s like saying you can write assembly code to provably match a specification written in the C language. Maybe C (or Z) is an easier language in which to reason, which is certainly valuable, but it’s not a magic “correctness” bullet. Your “provably correct” assembly can be completely broken code if the C contains errors.

You might be tempted to argue against this analogy by claiming C is much more detailed and low-level than a specification language ought to be. However, if you keep the specification language less detailed and higher-level, then the opportunity for error shifts back toward the code itself. Less constraints in the specification means more degrees of freedom in the code, which means more opportunity for mistakes that don’t cause the proof to fail.

In other words, creating an expressive specification language means you have a new programming language to deal with, while a less formal spec means you can have inconsistencies or missing information that bleed into your code base. There is some essential complexity in developing software that cannot be eliminated by correctness proofs – you’re simply moving opportunity for error from the “coding” side of the balance to the “specification” side.

All of that being said, if you’re doing something that has to be right (air traffic control, data encryption, etc.) and you’re willing to spend a lot of time thinking about your formal specification, then I think provably “correct” software is a worthwhile goal. However, it’s important to remember that “correctness” here actually means “meshes with some specification a human wrote.” Proofs are just one tool in the toolbox. Even if everything is proved to match the spec, it would be irresponsible not to also follow other software engineering best practices, like writing automated tests for your code!

I think this is one of the reasons that test-driven development is such an appealing idea for “normal” software development; it recognizes that your specifications are as incomplete and subject-to-change as your code. With TDD, your “specification” is your suite of tests, and proving your code meets the “spec” is easy – just run the tests. Tests by no means rigorously demonstrate complete correctness of a program, but for most programming tasks, I think this loose way to specify program behavior is the best thing we have available to us right now.