A Conversation for h2g2 Maths Lab
I've got a question about standard deviation.
Clive the flying ostrich: Amateur Polymath | Chief Heretic. Started conversation Jan 22, 2009
I've got a job interview coming up, and there's a statistics test involved. Judging from the job role it's quite likely I'll be expected to demonstrate an ability to understand and interpret results taken from standard deviations and regression analyses.
I've just realised the text book I thought I had stashed away somewhere I returned to the college so short of digging around in the loft all evening looking for my class notes, can anyone be so kind as to walk me (slowly) back through standard deviation, how you do it, and what to look for in the outcomes. I think I've got the gist of the idea but I'm rusty on practice.
Alternatively, any outstanding web-resources that do much of the same.
Cheers.
Clive.
I've got a question about standard deviation.
warner - a new era of cooperation Posted Jan 22, 2009
How's about this site, Clive?
>> http://davidmlane.com/hyperstat/index.html <<
It's not that clear, but might help ..
I've got a question about standard deviation.
8584330 Posted Jan 23, 2009
Clive,
Is this discussion too basic?
F106558?thread=6084778
We were talking about variance, which is the square of the standard deviation. Do you need to be able to crank out a standard deviation by hand, or are you more likely to just use a calculator? Or just interpret the results?
Basic Concepts:
1) The greater the standard deviation, the more spread out the data.
2) Regardless of the size of your standard deviation, Chebyshev's theorem states that, given a normal distribution, 68% of your data will fall within one standard deviation of the mean. And roughly 95% falls within 2 standard deviations, and 99% falls within 3 standard deviations.
I hope this helps. at the interview!
HN
I've got a question about standard deviation.
Clive the flying ostrich: Amateur Polymath | Chief Heretic. Posted Jan 23, 2009
The concept of distribution about the mean on a normal curve I get, but the practice is a little rusty.
Standard deviation by hand. Hmm I think I've got it, but if you want to run it by me just one more time shouldn't hurt.
Calculator I'll probably be using Excell actually, given the nature of the job.
Interpreting the results: ah now we're talking!
For example:
A random series of numbers masquerading as a data.
12, 14, 15, 14,17, 18, 14, 15, 12, 14, 14
In excell: fx > stats > average gives me, =AVERAGE(A1:A13) and the answer 14.45454545 (159/12)
Next step is =STDEV(A1:A11) to find the standard deviation and I get the solution
1.809068067
Now what does at a guess: it's 1.81 standard deviations from the mean, so one-and-a-bit standard deviations makes it in the 68% range or the 95%? What in fact does 1.809068067 imply?
That's where I get confused.
I've got a question about standard deviation.
warner - a new era of cooperation Posted Jan 23, 2009
>>What in fact does 1.809068067 imply?<<
There's a short explanation with 'curve' here:
> http://davidmlane.com/hyperstat/A81334.html <
I've got a question about standard deviation.
warner - a new era of cooperation Posted Jan 23, 2009
You might find the subject of 'Hypothesis Testing' interesting.
> http://davidmlane.com/hyperstat/B35642.html <
I've got a question about standard deviation.
8584330 Posted Jan 23, 2009
Clive, I see 11 data points, even allowing for the just-got-up, haven't-had-coffee thing. I'll repeat your data here:
12, 14, 15, 14,17, 18, 14, 15, 12, 14, 14.
Reordered from lowest to highest:
Add the above eleven numbers to get 159.
Divide by the number of data points, 11, to get 159/11 and we again agree:
average = 14.4545455.
stdev = 1.80906807.
With rounding, let's say 14.45 and 1.81
Now we are going to pretend your data is normally distributed. And we'll forge ahead anyway.
Average 14.45 plus or minus one standard deviation 1.81 gives us everything from 12.64 to 16.26.
Average = 14.45
Average minus one stdev = 12.64
Average plus one stdev = 16.26
68% of 11 is 7.48, so we expect that 7 of our data points to be found between 12.64 and 16.26.
Your data original data set reordered from lowest to highest:
12, 12, 14, 14, 14, 14, 14, 15, 15, 17, 18.
Same numbers with **** marking the average plus and minus one standard deviation
12, 12, **** 14, 14, 14, 14, 14, 15, 15, **** 17, 18.
And sure enough, seven numbers are in there.
Would you like a little discussion of the two versions of standard deviation, and how one is biased and one is not? Linear regression? Let me know, I'll check back after work.
Happy Nerd
I've got a question about standard deviation.
Clive the flying ostrich: Amateur Polymath | Chief Heretic. Posted Jan 23, 2009
I've got a question about standard deviation.
8584330 Posted Jan 24, 2009
Okay, biased and unbiased estimators of a population's standard deviation.
You probably remember, standard deviation is the square root of variance, so what we are about to say about one applies to the other. And a sample is a portion of a population.
Did you read post 3 in this thread about finding sample variance? F106558?thread=6084778
When calculating the variance or the standard deviation, there's one step where we must choose between dividing by N, the number of data points in our sample, and dividing by N-1. It turns out that when we divide by N, we get a biased estimator of variance or standard deviation, that is, we either over-estimate or under-estimate the spread of the data. But when the sample size is sufficiently large, where large means at least 20 data points, the difference between dividing by N or N-1 is usually insignificant. So a good rule is, divide by N-1 for small samples, and divide by N for large ones, where large is at least 20 data points.
In Excel, use STDEV to estimate a population's standard deviation based on a sample. STDEV uses the N-1 method to calculate standard deviation.
If you have the entire population to hand, and you are calculating the standard deviation based on the entire population and not a sample of the population, then use Excel STDEVP. STDEVP uses N. Use STDEVP when you have the entire population, regardless of population size.
If you tell me what type of job you are interviewing for, I'll try to make the linear regression example somewhat relevant.
I've got a question about standard deviation.
Clive the flying ostrich: Amateur Polymath | Chief Heretic. Posted Jan 24, 2009
Research and Information Assistant, business planning and strategy division,
Collect data, update housing service's I.T systems.
Interrogate I.T and other systems produce reports on information held and data gathered.
Produce high levels of information accurately.
Duties:
1. Prepare reports, graphs and statistical information as required.
7. Assist with planning management, and analysis of various research projects and satisfaction surveys which will inform future strategic planning.
-----------------
This is for local council housing department so my guess is, this will be data like income, or age, or social security claimants.
I've got a question about standard deviation.
8584330 Posted Jan 24, 2009
Okay, to sum up and make this MS Excel friendly, you are always safe using STDEV to find standard deviation unless you just happened to sample every single member of the people living in council housing, that is, your entire population.
I'm sorry I don't quite understand what council housing means, but you can explain if I get too far out of the ballpark. I'm guessing you have a range of people living in council housing from ages newborn to 105, the oldest person. They have some needs in common while some needs are age-dependent. They need nearby food, transportation, jobs, schools, medical care, and the like. I'm guessing there is an entire range of income levels and that council housing is usually in cities. Members of this population are willing to pay a bit more for certain services, and if dissatisfied, express their opinions by leaving for other housing. Being humans, they lie like cheap carpets when interviewed, yet that is the normal way of obtaining their opinions, other than observing their behaviors. Am I close?
Otherwise we forge ahead to linear regression, a handy wrench in the toolbox of the working math modeler.
We have the idea of the independent variable and the dependent variable. The independent variable, normally plotted along the x axis, is the variable we hope will explain the dependent variable, usually plotted on the y axis. For example, reading ability in the 4th grade (x) ordinarily lets us predict how well 5th graders perform scholastically (y). Pre-season games (s) help us determine which football teams to bet on during the season (y). Age (an x), smoking (another x), and obesity (yet another x) are factors helping us predict heart disease (y).
Under the Tools menu, Excel has a Data Analysis menu that has a Regression option. Put your data in columns. The output includes variables, R-squared values, etc. Try this and make sure it works for you, then we'll talk about interpreting results.
I've got a question about standard deviation.
Clive the flying ostrich: Amateur Polymath | Chief Heretic. Posted Jan 27, 2009
Thanks for the refresher course.
I didn't need it in the end. Just needed to know how to calculate percentages.
I reckon the interview went quiet well.
I've got a question about standard deviation.
warner - a new era of cooperation Posted Feb 2, 2009
If I was above average intelligence ( which I'm not! ), perhaps I could show with a mathematical proof
that omniscience and omnipotence were mutually compatible ...
http://mathworld.wolfram.com/NormalDistribution.html
I've got a question about standard deviation.
warner - a new era of cooperation Posted Feb 2, 2009
Then of course, there's this >> A593552
Key: Complain about this post
I've got a question about standard deviation.
- 1: Clive the flying ostrich: Amateur Polymath | Chief Heretic. (Jan 22, 2009)
- 2: warner - a new era of cooperation (Jan 22, 2009)
- 3: 8584330 (Jan 23, 2009)
- 4: Clive the flying ostrich: Amateur Polymath | Chief Heretic. (Jan 23, 2009)
- 5: warner - a new era of cooperation (Jan 23, 2009)
- 6: warner - a new era of cooperation (Jan 23, 2009)
- 7: 8584330 (Jan 23, 2009)
- 8: Clive the flying ostrich: Amateur Polymath | Chief Heretic. (Jan 23, 2009)
- 9: 8584330 (Jan 24, 2009)
- 10: Clive the flying ostrich: Amateur Polymath | Chief Heretic. (Jan 24, 2009)
- 11: 8584330 (Jan 24, 2009)
- 12: 8584330 (Jan 27, 2009)
- 13: Clive the flying ostrich: Amateur Polymath | Chief Heretic. (Jan 27, 2009)
- 14: warner - a new era of cooperation (Feb 2, 2009)
- 15: warner - a new era of cooperation (Feb 2, 2009)
More Conversations for h2g2 Maths Lab
Write an Entry
"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."