How To Understand Statistics

4 Conversations

The world is littered with statistics, and the average person is bombarded with five statistics a day.* Statistics can be misleading and sometimes deliberately distorting. There are three kinds of untruth commonly recognised;

"Lies, damn lies and statistics." - Mark Twain

The quote is accurate, statistics are often used to lie to the public, because most of them do not understand how statistics work. The aim of this entry is to aquaint the reader with the basics of statistical analysis and help them determine when someone is trying to pull a fast one.

"Think about how stupid the average person is;

now realise half of them are dumber than that."
- George Carlin

There are many books to teach you statistics, but they are big and heavy maths books, which cost a lot of money, and which may require a pre-requisite degree in maths to understand anyway. The Internet is full of how to understand statistics, but the sites mostly refer to medical students who need to examine experimental drug studies. for many years there has been a need for a "Statistics for Dummies" book. If fact there is, written by Deborah Rumsey. A great on-line starting place is RobertNiles.com, which explains how to examine statistics for errors and how to create your own statistics correctly.

Let us examine a most controversial subject:

Women are better drivers than men

This is not the same thing as saying that all women are better drivers than men, although many people, and some insurance company advertisements, seem to think that that is exactly what it means. In fact it means that on average, a woman who drove a car between the ages of 20 and 65 will have had fewer accidents than a man of the same age, driving the same car. The data is drawn almost exclusively from insurance company statistics. It may not even be accurate as few people bother to alert their insurers if they clip the wing mirror, or scratch the paint.

Here is another, rather famous use of distorting statistics:

Toddlers who attend pre-school exhibit agressive behaviour

The study was on four year old toddlers and compared those who went to pre-school and socialised with other children with those that stayed at home with their mother. It measured aggressive behaviour such as stealing toys, pushing other children down and starting fights.

The study showed that children who went to pre-school were three times more likely to be aggressive than those who stayed at home with their mothers. The statisitcs were well documented and were, technically, accurate. The report used these statistics to recommend that parents keep their children at home until they start school, at age five.

What the study failed to mention was that aggressive behaviour is normal in four year olds. Parents who keep their children at home, but take them to toddler groups also observe their children being aggressive. Psychologists say it is the child learning their "pecking order" in society. the children who stayed at home and did not attend pre-school were less aggressive, because their behaviour was abnormal. A follow up survey (done by another group) showed that the children who stayed home before attending school were more aggressive than those who had gone to pre-school.

In other words: the children who attended pre-school were normal (for want of a better word). the ones who stayed home with their mother were not.

The initial study was funded by a mother support group. They used the statistics to promote their own (pre-determined) agenda. This illustrates rule one of statisitcs: Always ask; who's paying for this study?1

First World War head injuries

Another strange statistical anomoly was the introduction of tin helmets to the front line. The reason was that the number of head wound injuries was very high, and soldiers took a long time to recover. The soldiers only had cloth hats to wear, but after the introduction, the number of injuries to the head increased dramatically. No-one could explain it, until it was revelaed that the ealrier records only showed injuries, not fatalities. After the introduction, the number of fatalies had dropped dramatically, but the number of injuries went up because the tin helmet was saving their lives, where previously they would have been killed. This demonstrates the second rule of statistics:What Question was asked? A leading or misleading question used to gather statistics can result in mis-leading statistics.

The above examples demonstrate that statistical conclusions can be misleading, and can even be used to prove a negative, ie to show that something that is false is true. A good eye for spotting any irregularities in statistical interpertation is a useful skill to develop in life.

Things To Look Out For

"47.3% of all statistics are made up on the spot" - Steven Wright .
  • Where did the data come from? Ie, who ran the survey? Do they have an ulterior motive for having the result go one way?
  • How was the data collected? Ie, what questions were asked? How did they ask them? Who was asked?
  • Be wary of comparisons. Ie, two things happening at the same time does not mean that one caused the other. this is used a lot by politiciians wanting to show their new policy is working.
  • Be aware of numbers taken out of context. This is called "cherry-picking" where the analysis only concentrates on the data that supports the conclusion and ignores everything else.

A survey on the effects of second hand smoking, payed for by a major tobacco manufacturer is hardly likely to be impartial, but then again, neither is one carried out by a medical firm.

If a survey on road accidents says that cars with brand X tyres were less likely to have an accident, check who took part. The brand X tyres may be new, and only fitted to new cars, which are less likely to be in accidents anyway.

Check the area covered by a survey linking nuclear power plants to cancer. The survey may have excluded sufferers who fall outside a certain area, or excluded perfectly healthy people living inside the area.

Don't be fooled by graphs. The scale can be manipulated to make a perfectly harmless bar chart look worrying. Be wary of the use of colours. A certain chewing gum company wanted to show that chewing gum increases saliva. The chart showed a "red" part of the inclrease in danger to the gums after eating, and the "blue" or safe part after chewing. However the chart showed that the chewing would have to go on for 30 mins to take the line out of the danger zone. The curve was just coloured in a clever way to make it look like the effect was faster.

Perhaps the most important thing to check for sample size and margin of error. It is often the case that with small samples, a change in 1 sample (1 data item) can completely change the results. Small samples can sometimes be the only way to get the analysis done, but generally, the bigger the sample size, the more accurate the results are and the less likely a single error in sampling will affect the analysis. For example, people will go on and on about how 95% of children passed their exams at this school, and 92% of children passed their exams at that school, but the sample sizes aren't actually big enough for the difference to be statistically significant. Many people don't understand how important sample size is to interpreting statistics.

The Problem With Statistics

The main problem with statistics is that people like number to back up a descision. For example, when choosing an Internet provider, most people will go on the provider with the most customers. But that statistics doesn't tell you useful things like: what their customer turn over is, what their connection reliability is, what the mean time taken to asnwer a technical fault call is, and so on. Most people will simply make the assumption that if they have a lot of customers, they must be alright. Generally this is true, but there are the, admitably rare, companies that work by having a lot of customers, providing bad service (but not deliberately) and making it hard for people to cancel their agreement with them. Just because a company is the most popular, doesn't mean it's the best.*

"Common sense" can cloud statistical results. For example: a technology firm dicovered that 40% of all sick days were taken on a Friday or a Monday. They immediately clamped down on sick leave before they realised their mistake. (40% represents 2 days, and therefore is normal.)

Fundamental to the mathematics of probability is the requirement for conditional probabilities to be independent of each other, such as dice rolls or coin flips. If they are not independant the maths stops working, and the answers stop making sense. However a lot of statistics are worked out at a distance from the core events, so working out if the results are valid can be next to impossible. This is essentially the same as the gambler who thinks his luck must change soon because he couldn't continue to have bad luck all night. This is wrong, there's nothing to say the dice should start rolling your way based on previous behaviour.

Legal History

A more serious problem was highlighted in a court case or an innocent man* who was facing fingerprint evidence that he had been at the crime scene. He denied this. A finger print expert was presented in court by the prosecution, who asked:

"Assuming that the defendant did not commit this crime, what is the probability that the defendant and the culprit having identical fingerprints?"

Expert: "1 in several billion."

Prosecution: "Thank you."


Defence lawyer: "Let me ask you a different question. What is the probability that a fingerprint lifted from a crime scene would be wrongly identified as belonging to someone who wasn't there?"

Expert: "Oh, about 1 in 100."

It's all about the question asked. The defendant's fingerprints had been incorrectly identified as being the same as the ones lifted from the scene. Several subsequent expert examinations showed that the fingerprints were not the same. Not even close. But the fingerprint evidence was submitted in court as fact. It is not a fact, it is a science, and is governed by probabilities.

Other cases involving cot deaths have raised serious questions about the presentation of statistics from experts in court. All too often these are presented as "fact" in a case, for example in the case against Sally Clark, who served three years in prison before having her conviction overturned by the Appeal Court in February 2003. In her case, as with several others in recent years, evidence from expert pathologists stating that the chance of multiple cot death in a single family was almost impossible and therefore must be murder. This was presented as a scientific fact. This is because the jury did not analyse the statistics. In actual fact multiple cot deaths in a family are not independant2, and the probabilities are much lower, to such an extent that when the third child dies, cot death is the most likely cause even before a post mortem is carried out. Calling mothers of multiple cot deaths serial murderers is analagous to assuming all air crashes are caused by pilot error. Eventually the assumptions must be changed to stop miscarriages of justice.

No Average

The main thing statistics shows is that there is no such thing as "average". If 50% of your employees are above average in productivity, then 50% are below average. Changing the definition will not help, 50% will always be below it, as demonatrated in the Bell curve graph, otherwise known as the Normal Distribution Curve.

This demonstrates another problem people have in interpretating statistics. Many people try to make their statistics fit the normal distribution but there are non-normal distributions, and that the statistics used for normal distributions are often inappropriate when the distribution is patently non-normal.

Many people think that "mean" means the same thing as average. It doesn't. Mean is a mathematical term, average is often used as a description for a person or data item, but in mathematics it means; A number that typifies a set of numbers of which it is a function. In other words, average can mean; mean, median, or mode.

  • Median relates to or constituting the middle value in a distribution (ie the middle value in a distribution, above and below which lie an equal number of values).
  • Mean is a number that typifies a set of numbers, such as a geometric mean or an arithmetic mean (ie the average value of a set of numbers).
  • Mode is the value or item occurring most frequently in a series of observations or statistical data.

Example data 1:255691215

Analysing the data, we get mean: 7.71, median: 6, mode: 5

Example data 2:455581286

Analysing this data, we get mean: 17.857, median: 5, mode: 5

Statistics do have a sort of magical appeal. They appear to the untrained eye to be based on complex maths that are difficult to understand. This is rubbish. Statistics are easy to create. Accurate statistics, now those are more difficult to calculate.

Statistics are governed by a term used to describe computer problems; GIGO, or Garbage In Garbage Out. If the survey asked the wrong question, asked the wrong group of people or was subject to any other major problem, there is no statistical analysis method in the world that can create meaningful information from the raw data. There are some techniques that can correct small errors, but the more small errors corrected, the less accurate the results will be.

Fun With Statistics

Fun? Well yes, statistics can be fun, like the joke about the table with 1/4 of it's legs missing. Statistics can create some unusual mental games, with interesting answers. They can be great conversation starters at parties and can be fun to baffle your friends. They're a bit like mathematical magic tricks.

More Information



For more information on statistics check out: National Statistics Online or RobertNiles.com, a good source for statistical analysis for the beginner. But there's also Cartoon Guide to Statistics, by Larry Gonick.

For more information on how strange mathematics can be, check out: The Gameshow (Monty Hall problem).



If you enjoyed reading this entry, you may like to read: Things to Consider when Reading Medical Research

For some fun statistics, check out: Fun Statistic links, including "Play the Monty Hall game!

1So often a company will collect statistics on hundreds of variables and perhaps calculate a thousand more from those original hundreds, and then present only the two or three most positive to the public.2There may even be a 'cot death gene' that affects child mortality.

Bookmark on your Personal Space


Entry

A1054117

Infinite Improbability Drive

Infinite Improbability Drive

Read a random Edited Entry


Written and Edited by

References

h2g2 Entries

External Links

Not Panicking Ltd is not responsible for the content of external internet sites

Disclaimer

h2g2 is created by h2g2's users, who are members of the public. The views expressed are theirs and unless specifically stated are not those of the Not Panicking Ltd. Unlike Edited Entries, Entries have not been checked by an Editor. If you consider any Entry to be in breach of the site's House Rules, please register a complaint. For any other comments, please visit the Feedback page.

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more