In the previous lecture, we talked about the intuition and the basic idea of hypothesis testing. There are still many things need to be formalised. In addition, I’m sure many of you had some burning questions. We dealt with them in this lecture.

Fisher vs. Neyman-Pearson

Historically, there are two different types of hypothesis testing. The method developed by Ronald Fisher allows us to compute the probability of observing the data or more extreme under the null hypothesis, which is a default stand where we assume there is no difference, no effect, no association, no relationship etc.. Based on the probability, we make decisions about whether to reject the null hypothesis or not. The Neyman-Pearson framework proposed by Jersey Neyman and Egon Pearson formulates a null hypothesis and an alternative hypothesis of an effect size. In addition, the control of the error rates in the long run was considered.

We are not going to dig into the differences of the Fisher vs Neyman-Pearson styles of hypothesis testing. What people usually use nowadays is more like a combination of those two, which we call Null Hypothesis Significance Testing (NHST).

Null Hypothesis Significance Testing (NHST)

The examples and practice we did in the previous lecture were essentially Null Hypothesis Significance Testing (NHST), which plays a central role in many areas of science. As you can see, we were able to make some conclusions about our original belief (the null hypothesis) based on the probability of observing the sample data or more extreme.

Formulate Hypotheses

In NHST, we still come up with a set of two hypotheses:

$$ \begin{cases} H_0: & \textmd{ the null hypothesis}\\ H_1 \textmd{ or } H_a: & \textmd{ the alternative hypothesis} \end{cases} $$

The NHST helps us establish a discrepancy between $H_0$ and $H_1$. Like said before, the null hypothesis $H_0$ is always the default setting where it states there is no difference, no effect, no association, no relationship etc.. The alternative is usually the statement we want to investigate. Very often, they are mutually exclusive and collectively exhaustive1.

Using the examples from the previous lecture, we designed experiments and collected some data. Based on the data, we have some hypotheses in our mind ($H_1$), and we want to test them. Now we assume the opposite is true ($H_0$) to formulate a set of hypotheses:

$\boldsymbol{H_0}$ $\boldsymbol{H_1}$
$\mu = 37$ $\mu \neq 37$
$\mu_A \leqslant \mu_B$ $\mu_A > \mu_B$

Calculate The p-value

After formulating the hypotheses, we assume $\boldsymbol{H_0}$ is true. Then we use what we learnt from the previous sections to calculate the probability of observing the data or more extreme, namely the p-value. We make the claim based on the p-value. As you can see, the p-value is:

$$\mathbb{P}(\textmd{observing \textbf{the data} we have or more extreme }|\ H_0 \textmd{ is true})$$

What does the above probability mean? It is straightforward to see that the p-value is basically a conditional probability. The probability is about the data which I put in bold fonts. It is NOT the probability of $H_0$ being correct. You will be surprised to see how often people interpret the meaning of p-values in the wrong way.

Why Testing The Null

One burning question is: why testing the null hypothesis? Why not directly testing the alternative hypothesis? There are at least three reasons:

  1. To avoid confirmation bias
  2. It is practically easier to disapprove something
  3. One feature of the scientific method is that things can be falsified

You can relate the hypothesis testing to proof by contradiction in math.


  1. Well, strictly speaking, they do not have to be collectively exhaustive. This point will be elaborated in Lecture 25↩︎