Engineering is the manipulation of the natural world for our own ends. It is the construction of a system from components, parts and subsystems to fulfill a certain function. It is a mature discipline, controlled by well-understood and well-documented procedures and staffed by highly qualified personnel. At the same time it is constantly changing and improving, with new fields springing up regularly to service new discoveries or satisfy new requirements. This article concerns only one of these gatecrashers - the field of ‘Software Engineering’ - and discusses the legitimacy of its inclusion in this illustrious fold.
To provide some background, we are all familiar with the great achievements of engineering, such as skyscrapers, bridges, supertankers, the space shuttle, etc. These things are highly visible reminders of human ingenuity. But we tend to take the many smaller but equally important achievements for granted. We no longer marvel at trains, ships and aircraft, we just travel on them. We’re used to satellite navigation, satellite TV and seeing satellite picture of our houses. Similarly, we rarely contemplate the inner workings of the computer we use day in day out, let alone try to comprehend the sheer mountain of science and engineering effort that went into it’s design.
The Miracle of Hardware
Take the modern microprocessor. The Pentium IV Prescott chip contains 125 million transistors, all wired up into systems, subsystems and sub-subsystems to perform a multitude of complex tasks simultaneously. This isn’t simply some kind of massive jigsaw consisting of off-the-shelf components; each transistor is itself an imprecise electronic device carefully designed by experts in fields as diverse as materials science and quantum mechanics.
The design is so complex that no single human being could possible understand all of it, hence it was in itself largely designed by other computers. Despite the daunting complexity, microprocessors work extremely reliably. They don’t make mistakes. Not only do they work most of the time, they work all of the time - the mean-time-between-failures of a typical microprocessor is around 500 years. Spontaneous CPU failures within the lifetime of a typical computer are virtually unknown. More often than not it’ll be in landfill within five years; just a redundant sliver of stone with some very intricate internal patterns a mere 1% through its potential operational lifetime.
Standing on the Shoulders of Giants
Despite their complexity, these understated miracles of technology can be mass-produced at very low unit cost. They can be shipped by the million. Once they are incorporated into every PC (or Mac, etc), all that needs to be done is for someone to tell them what to do. Since 99% of the clever stuff is done and dusted, you’d have thought that programming them would be easy. And you’d be right.
The production of computer software is a simple matter of putting together a set of instructions, each one taken from a small list known as the instruction set. The set is collectively known as a programme, and the programme tells the microprocessor how to behave. The programme is analogous to a battlefield general commanding an army. The army itself may be extremely complex, consisting of men and equipment designed to fulfil all sorts of different tasks, but he can control the whole thing with no more than a few very simple commands.
There is no need to instruct the soldiers how to walk, fight or fire a gun; someone else has already trained them to do that. The general doesn’t need to worry about providing them with the resources they need to do their job, such as food, water and ammunition - some other mug has sorted all that out. In many ways he has the easiest, and the best, job in the army; the one that all soldiers aspire to.
In engineering terms, software is one-dimensional – it has to fulfill a certain function. That is all. A piece of hardware has to do this too, but also has to do it within certain limits of temperature, humidity, shock, vibration etc, it has to do it safely, it has to do it when subject to interference from other hardware and without interfering with other hardware, it has parts that can be damaged or wear out, it has to be manhandlable, transportable, maintainable, replaceable, manufacturable. It will have constraints on its dimensions, weight, power consumption and the waste heat it can produce. It cannot contain certain toxic or environmentally unfriendly materials, and nowadays it must be mainly recyclable. Perhaps more importantly, it has to be tested to prove that it does all these things.
There are organisations overseeing the various disciplines, well-defined procedures for managing complex projects and a plethora of standards that engineers have to comply with. This is why microprocessors work, why bridges don’t collapse, why planes don’t fall out of the sky (often), and why your car generally starts in the morning.
Dog Training for Nerds
Writing software on the other hand is a simple enough job that geeks and nerds the world over can do it in their bedrooms; no bachelors degrees or apprenticeships required. Programming a computer is essentially just talking to it in a language it can understand. The word ‘language’ is important here, as in 99% of software production, typing instructions at the computer in a near-natural language is all that is involved. A compiler then converts the programmer’s words into the binary code the processor needs.
The language is very limited and prescriptive compared to say, English, as the computer has a very small vocabulary. In fact the word ‘language’ is perhaps unjustifiably generous. The author of a bestselling novel has to command a vocabulary of hundreds of thousands of words, stringing them together in just the right order to convey both meaning and sentiment. A typical programming language may have less than a hundred instructions conveying only instruction, and nine-times-out-of-ten the compiler will tell you if you get them in the wrong order.
As if that isn’t easy enough, we now have software design tools. Some of these are so advanced that the designer only has to draw a pretty diagram showing, for example, user, input and required output, and the tool will do the rest. It is not so much building a working model from raw materials, as building one from lego.
Can the activity of issuing a simple set of instructions to a machine that someone else designed and built really be considered engineering? Would you consider a pianist to be an engineer? Probably not. Regardless of your views on the subject, you can probably understand why more traditional engineers sometimes feel that their status is being debased by this influx of poorly-trained geeks.
The Great Successes
But it is not merely the mechanics of the activity itself we need to consider - we have to assess its success record too. And we find that software is less than exemplary when compared to other engineering disciplines.
The failure of a multi-million dollar engineering project because of a simple software error is an excruciating irony, but it happens. The European rocket Ariane 5 flight 501 veered off course and had to be destroyed when an arithmetic overflow error caused guidance computers to crash. Four scientific spacecraft and $370million were lost.
In the 1980s, a operating system bug caused the Therac-25 radiotherapy machine to deliver lethal doses of radiation, resulting in five deaths.
In December 1999 the Mars Polar Lander, having survived the stresses of launch and a tenth-month journey through the extreme cold and radiation of deep space, crashed into the Martian surface when software shut down its thrusters too early. It joined its sister ship, Mars Climate Orbiter, allegedly the victim of a mix up between metric and imperial units in the same year, as dead junk on the Martian surface.
Most recently, Heathrow's £4B Terminal 5 opened with a several days of very public chaos as baggage handling systems refused to work. Flights were canceled, passengers stranded and holidays ruined. No surprise that it turned out to be largely due to software bugs, but when you consider the sheer effort and complexity involved in running an airport, not to mention the technology of the aircraft themselves, the fact that the whole system was crippled due to a few previously undetected bugs in a logistics support system is nothing short of scandalous. And yet no-one was unduly surprised.
The Daily Frustration
But what about more down-to-earth applications? In these enlightened days why on earth does my PC crash when I try to do something as basic as print a single-page letter? I estimate that the software tools I use at work fail about twice per day. By ‘fail’ I mean crash - I am not including the various non-catastrophic inconsistencies, idiosyncrasies and incompatibilities that modern PC software always seems to exhibit.
When we start the car in the morning, it usually just starts. We don’t need to wait 15 minutes for it to download the latest version of ‘brakes’; we don’t need to waste more time trying to find where the upgrade has chosen to put the brake pedal; we don’t need to hurriedly learn to drive everywhere in first gear because the new brakes are incompatible with the existing gearbox. We can drive a car wherever we want to go; we don’t have to coax it there if it predicts we want to go somewhere else based on where we went yesterday, or based on where the designers thought would be a nice place to go at the time. When driving along a motorway, we don’t have to wait for a few nail-biting seconds when all the controls freeze as the car thinks about what it’s got to do next.
If the car broke down twice a day, even if it could simply be restarted, we would rightfully complain. We would be a tad peed off if we had to replace the entire vehicle after three years simply because no one made headlamp bulbs for it any more. Perhaps most importantly, you can jump into any car and find the controls where you would expect to find them. They don’t vary across models and versions, and do not move about on a whim ‘to make your life easier’. In reality of course, any car manufacturer guilty of this would be heading towards bankruptcy very, very quickly.
A car however contains thousands of moving parts. All of these are subject to wear and tear, so it is reasonable to expect some of them to wear out after a few years and need replacement. Generally however (consumables aside) most of them are designed to last the lifetime of the car, which could be 10 years or 100,000 miles. By contrast, an operating system contains NO moving parts. At all. And yet it often takes less than 2 years before a typical PC has slowed to the point of uselessness, requiring a complete rebuild. Why should this happen?
In the world of consumer software, we simply put up with it. In fact we choose to pay so much for it that we made one of the industry’s leaders the richest man in the world for a while. Not only that, but once we finally get things working properly after a few years of fixes, bodges and workarounds, we are often persuaded to fork out again for the next version so that we can repeat the whole sorry exercise. In no other field of engineering would you get away with marketing a product that simply doesn’t do what it says on the tin, and then encouraging consumers to solve the problem by buying another product to do the same job. For a start, to do so would contravene the Sale of Goods Act.
If it's that easy, how come it's such crap?
To sum up: we have a global multi-billion dollar industry doing something that is, by normal engineering standards absurdly straightforward, but consistently churning our products that are, also by normal standards, extremely shoddy. But enough of the rant. What are the reasons for this state of affairs?
There are many reasons. Software people will tell you that it is impossible to test every possible execution path through a programme. They have a point. There could be billions of possible permutations. Another excuse you might hear is along the lines that every PC is different, with slightly different hardware, disks, drivers etc and different programs running in different places in memory. You don’t know exactly what the user is gong to do with it. Again, it is impossible to predict, let alone test, every possible permutation. This is also true, but it is not a valid excuse.
I’ll try to explain by analogy, and this time not car-related: If you have a working mobile phone with a good signal, you can call any other number in the world and connect within a few seconds. It doesn’t matter what network you’re on. It doesn’t matter which base station you connect to. It doesn’t matter where in the world the other telephone is, whether it is a cellphone, radio phone or landline. It just works.
Do you think, when a phone manufacturer launches a new handset, that they test it with every base station on every network and in every location? Or with every possible transceiver, exchange, microwave link, trunk line, ground station, aerial, satellite? Do you think they test it to make sure it can communicate with every one of the thousands of other handsets on the market? Perhaps they have to test it with every possible phone number to make sure there are no hidden bugs?
Standards? What Standards?
No, of course not. The reason is standards. Communications standards, e.g. GSM, specify how a device should behave and providing the standards are correct and the devices comply with them, everything will work. So long as everyone uses the same standard, everything is compatible. The standards are generally maintained and published by an independent organisation, or in some cases a team comprising members from all interested companies. Either way it is in everyone’s interest that the standards are clear, unambiguous and straightforward to comply with.
Standards help to ensure that design errors are not made in the first place, and guarantee compatibility in scenarios that cannot be tested.
The software industry has standards too, but they tend to be woolly compared to proper engineering principles, and compliance is voluntary. For example, the provider of a PC application would have to comply with the application programming interface published by the provider of the PC operating system. Despite claiming ‘full compliance’, new PC applications often exhibit incompatibilities that do not come to light until after consumers have paid good money for them. The difference? The ‘standard’ is published by the operating system vendor, who may be in direct competition with application providers. It is simply not in their interest to make everyone else’s life easy. A more mundane reason for poor quality standards is simply that anyone very familiar with a product is often the very worst person to write the interface documents, simply because he takes so much for granted. Also, there is rarely any independent approval of the standard.
Commercial competition aside, the main reasons for the woefully poor quality levels in the software industry are pretty much down to human nature and market forces.
Oodles of Nerdy Geekness
There’s the ‘fun’ factor. It is immensely rewarding being able to write a piece of code that does something useful. It doesn’t matter that, as we have established above, others have done 99% of the work, as the programmer is the final cog in the machine, the general that makes the final decision to go over the top.
It is very rare that a mechanical engineer could build an aircraft in his garage, or a civil engineer build a bridge in his spare time, but it is entirely possible that a teenage programmer can come up with a perfectly functional programme in his bedroom after school.
The other engineering disciplines require years of training, the learning of complex maths and detailed procedures before an individual can join a team and hopefully, after a few years, finally see the fruits of his labour put into production. Programmers expect instant results, and are consequently much less likely to be overly bothered about the less technical aspects of their work, such as testing and documentation. Much more rewarding to simply churn out a programme and then move on to the next project, than to spend months testing and ironing out bugs. Someone else can worry about that.
The ‘instant gratification’ associated with the ability to create things in very little time breeds a very different kind of engineer. They are the ADHD sufferers of the technical world.
If it ain't broke, keep fiddling until it is
Then there is the ‘widget’ factor. Because software production is so easy, there’s a tendency to add functionality, gizmos and gadgets where they are not strictly needed or required. In some respects this may give a product a slight competitive edge, but in most cases it is likely that the programmer includes them simply 'because he can'. Proper engineers live by the statute of 'if it ain’t broke, don’t fix it'. If something is unnecessary, its inclusion can only serve to increase complexity and reduce reliability.
This 'feature creep' is largely responsible for the fact that PC performance has not substantially increased in real terms over the last 15 years or so despite the performance of the hardware increasing some 1,000 times in line with Moore's Law. Each next generation of hardware simply gives the programmers more room to play, and the additional processing power or memory is quickly used up by more intelligent user interfaces, real time assistants such as grammar and spell checkers, and of course dancing paperclips.
Testing is for Wimps
Commercial pressures also have a bearing. For obvious reasons, the testing phase occurs towards the end of a project, just when the overspend has been discovered and the budget becomes tight. If the project is already late, it is all too easy to simply cut back on testing and ship the product in whatever state it happens to be in at the time.
This is true of all projects of course, not just software ones. But with hardware, you can’t simply go to full production and then release a fix at a later date if the product is found to be flawed. You have to recall the product, which is very, very expensive. Companies that do this on a regular basis will not stay in business for very long. Market pressures enforce the implementation of good quality control.
By contrast, modern software is easy to fix even after sale. Distribution of upgrades or fixes is almost zero-cost due to the internet. In fact, most modern consumer applications are now designed to upgrade themselves over the net, with little or no user intervention.
Let them eat Beta
A legitimate tactic is the release of beta versions to a volunteer group of users, who can collectively shake down the product and find many of the bugs in advance of any ‘official’ release. This is sensible: the geeks get their early releases and the company gets lots and lots of free real-world testing. No-one is knowingly selling or buying a shoddy product.
But here’s the problem - there is no well-defined threshold of quality at which it is acceptable to release a product to the market at large. Some companies are at liberty to take the p**s by marketing a beta version as a final product. In the absence of the inherent commercial constraints that inhibit the release of half-baked hardware, the temptation with software is to do the absolute minimum of pre-release testing, ship the product and cash in straight away. Full testing is then completed by all the poor sods that bought the product. Market forces make this almost inevitable.
Companies don’t even bother to disguise the approach any more. You may have noticed your operating system offering to inform its manufacturer every time something goes wrong. This is you doing their testing for them. Everyone is now familiar with the edict that you shouldn’t upgrade to a new operating system as soon as it comes out; you should always wait a year or so for the trail-blazers to find all the problems first.
Accusing the entire industry of these questionable tactics would be unfair. Naturally there are companies capable of producing very reliable software, such as air traffic control systems, avionics or process control in large chemical plants. The reliability is partly down to strict coding standards but mainly due to thorough testing, which is reflected in the cost. Even in these cases, the approach is to make a first stab at the problem, test it, fix, test again and so on. It’s a very elaborate process of trial and error.
It is difficult to categorise software production. It is not science - there are no natural phenomena being investigated. Neither is it applied science, as there is no manipulation of natural phenomena. It is similar to English literature, in that it consists largely of written words strung together in the right order. In some ways it is akin to musicianship, as in the art of controlling a device to do your bidding. In the army analogy it’s just the process of issuing instructions to a system pre-designed to accept them, in which case it’s just management. But I think you’ll now accept, it is NOT engineering.