Tuesday, July 1, 2014

Hypothesis testing is like trying a case in court

In a quantitative research study, the researcher proposes a research hypothesis about something that usually represents a new idea. For example, someone may propose that playing a certain brain game can improve memory performance. Then the researcher collects data to see if s/he can show that this new idea is likely to be true. This hypothesis testing process actually resembles the legal process of trying a case in court.

In a court case, the prosecutor proposes a hypothesis that someone commits a crime. Then s/he gathers data to see if s/he can demonstrate beyond the shadow of a doubt that the hypothesis is likely to be true.

In both the research study and the court case, the fundamental basis of the process is the same: The default is that the hypothesized event or phenomenon does not exist. A person is presumed innocent until proven guilty (with convincing evidence); the brain game is presumed to do nothing for memory performance until proven effective.

In the court case, the default (of presumed innocence) is clear and known to every citizen. In a research study, we need to spell out the default for the reader, which is called the "null hypothesis" because it means that the hypothesis proposed is not valid ("null"). In contrast to the term "null hypothesis," the proposed hypothesis is called "alternative hypothesis" or "research hypothesis."

In the research study, the researcher has to provide evidence that is convincing enough to reject the null hypothesis, just as the prosecutor needs to provide evidence that is convincing enough to reject the presumed innocence of the accused individual. But when is the evidence "convincing" enough? In the course case, it's a judgment call on the part of the jury while we rely on statistics in the research study.

The convention in social science research is to consider the evidence "convincing" when there is a 95% chance, mathematically speaking, of the hypothesis being true. In other words, there is only at most a 5% chance that the observed effect is just a random finding for some unknown reason. For example, we show that people who have played the brain game for 6 months perform on average better than those (with similar baseline memory functions) who have not played the brain game. Statistical comparison of the memory scores between the two groups can tell us how likely the difference in memory scores between the two groups can happen by chance alone, when the brain game really has no effect of any kind, which means the default (null hypothesis or presumed innocence) stands.

The probability of the difference in memory scores coming from chance is the p value obtained in the statistical test. If p is lower than the threshold 5%, then we are willing to reject the default or null hypothesis. What we are saying is the following:
1. We see that the two groups of people have different scores on average.
2. We perform a statistical test to see how likely the difference is a result of random chance when the brain game really didn't have any effect.
3. We see that the probability of #2 is lower than 5%, which is the threshold that we are willing to accept. In other words, the probability of the brain game having no effect is low enough, given the memory scores we obtained.
4. Since the null hypothesis has a low enough probability, we reject the null hypothesis that the brain game has no effect. Our evidence therefore "suggest" that the rain game is likely to have an effect on memory performance.

The chosen threshold, called "alpha level", will determine how stringent your statistical test is. Setting it lower than .05 (or 5%) makes it harder to reject the null hypothesis and support your research hypothesis. But if you are successful in rejecting the null hypothesis with a lower alpha level, the research result would of course be more convincing. Setting the alpha level above .05 is usually not recommended.

3 comments:

  1. Would you please blog about justifying a linear regression (for example what does it mean that the independent/explanatory variables are continuous or categorical and the dependent/criterion variable is continuous, homoscedasticity, no multicolinearity, normally distributed errors, etc. or just linear regression in general)? I am trying to compare parental deployment length and number of parental deployments to student achievement (SAT10 reading and SAT10 math) to determine if one is a more significant indicator of achievement and I need to justify why I would conduct linear regression within the text of the dissertation and I am having problems. Thanks a bunch.

    ReplyDelete
    Replies
    1. Hi, Lienne,
      Thanks for your comment! At some point I may write about the concept of regression in general. But in terms of actually determining whether regression works for a particular study, what you need is really a good statistics resource like a textbook or specialized website. A good resource will guide you step by step to check your data and see if the assumptions for the analysis are met. In the end, however, selecting the right analysis for your data is not a black-and-white decision so I would recommend studying about linear regression and work closely with your dissertation committee as well.
      Here's a good textbook that provides very detailed explanation and the procedure of running an analysis with SPSS:
      Meyers, L. S., Gamst, G., & Guarino, A. J. (2013). Applied multivariate research: Design and interpretation. Los Angeles, CA: Sage.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete