More from Statistics Hacks … Hack #5: Go Big to Get Small.
Accuracy is the big bugaboo with statistics. We can crunch numbers and manipulate (er, massage) data to get interesting results, but the proof is in the pudding. And since the whole point of statistics in the first place is to avoid experimentation on massive populations, it takes lots of pudding to get a good enough taste to satisfy. Increasing the size of the sample tends to improve the reliability of the conclusions.
The basis for this insight is Jakob Bernoulli’s Golden Theorem. “It is likely the single most useful discovery in the history of statistics,” according to author Bruce Frey.
In probability, this means that the number of sample points examined will determine how far away the predictions are from the actual results embedded in the world. This standard error, the gap between the guess and the ideal observation, can be calculated in random systems as the inverse of the square root of the sample size. The specific value depends on the scale of measurement and variability of the sample, but the gist is: big samples = small error.
To illustrate this further, imagine yourself counting cards with your big, exploitive brother in the movie “Rain Man.” As soon as the first cards are revealed, Raymond can make a guess about what cards are still in the deck. But with so many card left in the deck, it is not likely to be an accurate guess. The only certainty is that chance of getting cards like the ones he has seen has gone down. If he sees an ace of hearts on the table, then there is at least one fewer aces hidden in the deck. The more cards played, the more certain Raymond can be that he knows what remains in the shrinking deck. Fortunately, statisticians don’t get kicked out of academia for counting cards; they publish papers.
This all works, though, only if the sample is considered statistically random. Otherwise (sorry) … all bets are off.
Also: ,
Some definitions:
- as the size of the sample increases, the mean of the sample grows closer to the mean of the whole population
- the gap between the expected outcome and the actual observed proportions
2 replies on “Stats Hacks #5”
For more info about Statistic Hacks, by Bruce Frey, visit http://www.oreilly.com/catalog/statisticshks/
Copyright © 2006 O’Reilly Media, Inc. All rights reserved.
Normally, I wouldn’t have approved a comment like that (preferring you instead to use this link instead. But even though grad school got in the way of my quest to do all of the Hacks by now, I am still very much liking this book.