Program Correctness
Created | Updated Mar 27, 2009
Computer programs that most people run in their daily lives tend to have bugs, so that occasionally (or more often) they fail to do what they are meant to do, sometimes with other undesirable consequences. It is therefore a cause for concern that computers are allowed to control processes where failure is not an option: nuclear reactors, airplanes, the traffic lights of New York, and so forth. In these situations, programmers rigourously try to prove that their programs are correctly formed, and that they will hopefully avoid causing death and destruction.
Asking the Question
The first step is to decide what it means for a program to be 'correct'. The most common way is with 'preconditions' and 'postconditions'. The program is allowed to assume that the preconditions are true, but has to guarantee that the postconditions are true when it finishes, and that it will, eventually, finish. To prove the program correct, it's then only necessary to show that, 'whenever' the program is run, and the preconditions are true, the postconditions will be true afterwards.
Here's an example; suppose that we have a plan for the day and we want to check that it will make us happy and fulfilled, provided our computer is working properly. In this case, the precondition would then be 'computer is working properly', and the postcondition would be 'I am happy and I am fulfilled'. As shown in the postcondition, you can use Boolean logic in conditions to make them more easy to deal with mathematically.
To save on space, this is normally written as follows:
[computer is working]
Perform plan for the day
[I am happy AND I am fulfilled]
Or, if we elaborate on the plan, the following is also correct:
[computer is working]
Turn on computer
Log on to h2g2
Check out the Post
Write an entry
Log off h2g2
Turn off computer
[I am happy AND I am fulfilled]
Partial Correctness
If a program is 'partially correct', it doesn't mean that it is partly correct and partly incorrect. Instead, it is just like normal correctness, except the program no longer promises to finish - it might go off into an endless loop. This is considerably less useful, but it turns out that it is often easiest to prove that a program is correct by first proving that it is partially correct, and then, separately, proving that it will eventually finish (otherwise known as 'termination').
Here is an example of a partially correct program:
[I am penniless]
start: Busk to earn a pound.
Buy a lottery ticket
If ticket is not a jackpot winner, GO TO start
If ticket is a jackpot winner, celebrate
[I am a millionaire]
If this program ever finishes, then you'll be rich - the problem is that there's nothing to stop it going round and round in circles forever!
Getting the Specifications Wrong
The difficulty in this process is that if you get the specifications wrong, then proving that your program meets the specifications is worse than useless. Consider our original specifications for happiness and fulfilment - they could equally be satisfied by the following program:
[computer is working]
Remember computer deleted your homework
Smash computer with a large axe as revenge
[I am happy AND I am fulfilled]
The chances are that's not what we want - in this case the postconditions weren't strong enough because we didn't specify that we wanted the computer to still be working after following the plan. A better postcondition would be 'I am happy and I am fulfilled and the computer is still working' - which would reject the 'smash the computer' plan, while still allowing the 'use h2g2' plan.
Naturally, when dealing with computer programs the pre- and postconditions are statements about the values of numbers and suchlike, which are clearly mathematically true or false, and it is all much more formal and much less obvious.
Prove It!
Now we have an idea about what it means for something to be 'correct', but this doesn't help very much if there's no way to tell which programs are useful for needs. Unfortunately, it's impossible to take an arbitrary program and automatically determine whether it's correct or not1. There will always be some programs which appear to be correct, but which we can't prove definitively to be correct. Programmers have to live with this.
Floyd-Hoare Logic
Floyd-Hoare is what most people think of when they think about program proofs. They consist of a number of lines, each of which builds from the previous ones according to some set of rules, and there will be at least one rule for each available type of command. There are two types of rules that can be used:
Axioms in logic are the simplest possible true statements - so in arithmetic, it is taken as an axiom that the predecessor of the successor of any number is that number, and that zero is the smallest non-negative number.
When proving programs, typically there is at least one axiom for each of the simple commands. So, there might be an axiom for 'turn on computer', reading:
[computer is working AND computer is off]
Turn on computer
[computer is on]
[computer is on]
Turn on computer
[computer is on]Rules of inference in logic are ways of obtaining true statements, given other true statements. For example, if you know that 'Fred is tall' and you also know that 'Fred is a man', then you can infer that 'Fred is tall and a man'. The rule of inference here might be: 'If some person has property A, and the same person has property B, then that person has property A and property B.'
In program proving, these rules are normally used to combine simple commands into more complicated commands. Some of these can be used in almost any programming language, while some of them are specific to the specific commands available (typically, constructs like loops and branches). One of them is the 'sequencing rule', which is:
If the following two statements of correctness are true:
[condition 1]
Command 1
[condition 2]
[condition 2]
Command 2
[condition 3]
Then the following statement of correctness is true:
[condition 1]
Command 1
Command 2
[condition 3]
There are many other similar rules, and you can prove very complicated expressions by using them in combination with each other. The difficulty is working out how to use them - this is a very tedious process in practice, especially if you want to be absolutely rigourous.
Verification Conditions
The idea behind 'verification conditions' is that people and computers are pretty lousy at proving whether commands are correct or not, but are a lot better at proving the validity of logical statements. Therefore, it is easier to compile the correctness statements into logical statements, called 'verification conditions'.
For example, consider the following command for getting a full kettle of water:
[kettle=empty]
Put water in kettle
[kettle=full of water]
This might compile down to the logical statement: 'a kettle is a container', which is pretty simple, and happens to be true. The logical statements generally are this simple to prove, and in practice they can often be proven automatically.
Generally, the compiler needs a little help, though, because it is not obvious which logical statements should be generated. To help the compiler along therefore, a human expert needs to insert regular 'assertions' into the program. Assertions are just like pre- and postconditions, except that they must be true every time that they are passed, rather than just at the beginning or end of the program.
Assertions, as well as many other concepts in this field, are now used in a more informal manner by programmers in general. Typically, when a program is run in 'debug' mode, the computer will check that every assertion is true as it is passed - but this is slow. When run in 'non-debug' mode, the assertions will be ignored - but hopefully any bugs will have been picked up on a previous debug execution.
What Else?
People putting together nuclear reactor control programs typically, and wisely, trust nothing and nobody. The compiler might have bugs in it, for example, or the rules of inference could be wrong, or the specifications could have a hole in them. Because of this it is never enough to prove something correct and then sit back and go home.
Systems such as these are put through rigourous testing, which means that operators try to ensure that all of the code has been 'exercised'. The idea is that any errors will hopefully show themselves in the tests. Testing doesn't tend to help much with code that won't be run terribly often though, and ironically that is often the code that is most critical, like disaster recovery code.
So a combined approach of proof and testing is always used for critical systems, along with a philosophy of 'Keep it simple, Stupid' and 'If it ain't broke, don't fix it2. In especially critical systems, it is usual to run three programs, on three different systems, designed by three separate teams in complete isolation from one another - and use votes to decide which one is correct. And with all this, disasters still happen - Murphy's Law is stronger than any human attempt to restrain it.