Stats Hack #1 – Blogschmog

We skipped three hours of Pirates this past weekend to make a Borders run. I picked up Statistics Hacks by O’Reilly. The “Hacks” series is like a higher-brow, geekier version of Statistics for Dummies, which sat nearby on the shelf. What I liked about it most was the bite-sized concepts it presents. There are 75 hacks in this book. At one-a-day, I’ll be a stats whiz by October 6.

Which brings us to Hack #1 … Know the Big Secret

According to author Bruce Frey, the primary purpose of statistics is to “make probability statements about samples of scores.” Probability is a pretty straightforward concept: What is the chance of having something do a particular thing? A sample is a portion of a greater population, which also makes sense since we usually can’t study an entire population, or know every property of every individual within a population. We need smaller sets of data values (scores) to allow us to do anything constructive, and then use our probability tricks to come up with a good projection. Some of these tricks lead to descriptive statistics — what does the population look like? — and others to inferential statistics — what can we expect this population to do. Statisticians take known information and express it as probable information.

The thing that always seems so intimidating about statistics is the SPS effect, where there are fifty zillion different things one might use to get to that probability. Which thing is the right thing to use? All of these options represents centuries of mathematicians thinking logically and coming up with theorems, formulas, assumptions, etc. about the way the known data can relate to each other to produce probable outcomes. I believe **that** is my hangup about statistics — too many people with too many ideas and too many proofs to trust my decision on which to use. It feels like in order to use statistics to its fullest one has to know all of that history.

Getting that sample is also intimidating. Having heard before that random isn’t really random, and God Does Not Play Dice With the Universe, creates an initial mindset that it is impossible to ever have a truly random sample. Yet, most of the procedures in that statistician bag-o-tricks require it. “We can’t possibly know how wrong” the probabilites are, according to Frey, when the sampling is not random. He further claims that most of the psychological and educational research going on today draws from samples that are not randomly drawn from a population … Students are the convenient guniea pigs. Somehow, I don’t take comfort in the fact that it’s OK because everyone else is doing it.

Some definitions:

probability

the chance of having something do a particular thing

sample

a portion of a larger population

score

data values collected when studying a population

distribution

a list of scores, both value and frequency

descriptive statistics

a probable projection of the inherent properties of a population

inferential statistics

a probably projection of the expected behaviors or influences on a population

By Kevin Makice