Statistics in plain English: July 2014

In a quantitative research study, the researcher proposes a research hypothesis about something that usually represents a new idea. For example, someone may propose that playing a certain brain game can improve memory performance. Then the researcher collects data to see if s/he can show that this new idea is likely to be true. This hypothesis testing process actually resembles the legal process of trying a case in court.

In a court case, the prosecutor proposes a hypothesis that someone commits a crime. Then s/he gathers data to see if s/he can demonstrate beyond the shadow of a doubt that the hypothesis is likely to be true.

In both the research study and the court case, the fundamental basis of the process is the same: The default is that the hypothesized event or phenomenon does not exist. A person is presumed innocent until proven guilty (with convincing evidence); the brain game is presumed to do nothing for memory performance until proven effective.

In the court case, the default (of presumed innocence) is clear and known to every citizen. In a research study, we need to spell out the default for the reader, which is called the "null hypothesis" because it means that the hypothesis proposed is not valid ("null"). In contrast to the term "null hypothesis," the proposed hypothesis is called "alternative hypothesis" or "research hypothesis."

In the research study, the researcher has to provide evidence that is convincing enough to reject the null hypothesis, just as the prosecutor needs to provide evidence that is convincing enough to reject the presumed innocence of the accused individual. But when is the evidence "convincing" enough? In the course case, it's a judgment call on the part of the jury while we rely on statistics in the research study.

The convention in social science research is to consider the evidence "convincing" when there is a 95% chance, mathematically speaking, of the hypothesis being true. In other words, there is only at most a 5% chance that the observed effect is just a random finding for some unknown reason. For example, we show that people who have played the brain game for 6 months perform on average better than those (with similar baseline memory functions) who have not played the brain game. Statistical comparison of the memory scores between the two groups can tell us how likely the difference in memory scores between the two groups can happen by chance alone, when the brain game really has no effect of any kind, which means the default (null hypothesis or presumed innocence) stands.

The probability of the difference in memory scores coming from chance is the p value obtained in the statistical test. If p is lower than the threshold 5%, then we are willing to reject the default or null hypothesis. What we are saying is the following:
1. We see that the two groups of people have different scores on average.
2. We perform a statistical test to see how likely the difference is a result of random chance when the brain game really didn't have any effect.
3. We see that the probability of #2 is lower than 5%, which is the threshold that we are willing to accept. In other words, the probability of the brain game having no effect is low enough, given the memory scores we obtained.
4. Since the null hypothesis has a low enough probability, we reject the null hypothesis that the brain game has no effect. Our evidence therefore "suggest" that the rain game is likely to have an effect on memory performance.

The chosen threshold, called "alpha level", will determine how stringent your statistical test is. Setting it lower than .05 (or 5%) makes it harder to reject the null hypothesis and support your research hypothesis. But if you are successful in rejecting the null hypothesis with a lower alpha level, the research result would of course be more convincing. Setting the alpha level above .05 is usually not recommended.

The name of a variable is like the last name of a family because it applies to all family members who are distinguishable by their first names.

Example 1:
The family of "gender" has two members, "male" gender and "female" gender.

Example 2:
The Likert scale family of "degree of agreement" has five members: "Strongly disagree", "disagree", "neutral", "agree", and "strongly agree."

Example 3:
The family of "SAT score" has many many possible members who are represented by specific test scores.

But sometimes it's not very obvious what the "last name" of the variable family is. For example, if we want to see if retaining a child in a grade enables her to make more progress at the end of a year, we may compare "children retained in a grade" with "children with the same level of test scores but promoted to the next grade anyway". What would be the variable to represent the two conditions of retention and no-retention?

Here's another example. We would like to see if playing violent video games affects children's playground behavior, we might compare behavior of "kids playing violent games" and "those playing non-violent video games". What would be the variable name to capture these two types of video games?

To name the variable (family), we want a label that is neutral so that each family member (variable level) can fit the label. For the grade retention example, we can name the variable "grade retention" or "grade retention status", and the two groups of kids vary along this factor. For the video game example, we can name the variable "violence in video game" so that one group experience the presence of violence but the other group experience the absence of violence in video games.

If you think of variable and their specific levels or values as family and family members, you can avoid the common mistake of describing the specific levels or values as "variables," such as the error of calling violent games and non-violent games as two variables when they belong to the same variable (family).

Statistics in plain English

Tuesday, July 1, 2014

Hypothesis testing is like trying a case in court

Naming a variable