We have been using the property of the sampling distributions of the sample mean, sample variance and sample proportion to make interval estimations of the corresponding population parameters. They are very useful, but they are not the whole story of inferential statistics. Otherwise, the course will be much shorter.

Today, we introduced a very powerful technique: the hypothesis testing. As the name suggested, it allows us to test hypotheses. You see, we rarely know the properties of the population. By drawing samples and studying the data we have, we may come up with some hypotheses. For example, when I was very little, I was told that the normal body temperature was 37 $^\circ$C, but the data I have collected now suggest otherwise. Therefore, I hypothesise that the normal body temperature may not be 37 $^\circ$C. How do we figure out if the hypothesis is correct or not? Or can we at least calculate the probability of our hypothesis being true based on the sample? That is, the probability of this sort:

$$\mathbb{P}(\textmd{our hypothesis is true}\,|\,\textmd{sample data})$$

Well … there are methods (Bayesian Statistics) for that purpose. Unfortunately, with the techniques covered in this introductory statistics course, we cannot answer if a hypothesis is correct or not, and we cannot calculate the probability of a hypothesis being correct or wrong, either. However, we can still do something useful. What we can do is to approach the problem in a kind of indirect way through hypothesis testing. In this lecture, we simply provided an intuition about the concept of hypothesis testing. We just talked about ideas using specific examples. We left all the formality and technical stuff about hypothesis testing in the next lecture.

In practice, we often want to claim if the measurements, such as body temperatures and test scores, are different from a specific value or under different conditions. For example, we would like to claim that the mean body temperature of normal people is lower, the same or higher than 37 $^\circ$C. Or we would like to claim that the average test score of students (condition A) who take notes during lectures is higher, the same or lower than those who do not take notes (condition B). Then we design and perform experiments and collect data. We see that the mean body temperature is 36.8 $^\circ$C which is not 37 $^\circ$C, and the average test score of condition A is 7.5 points higher than that of condition B. Based on these data, we have some hypotheses in our mind:

  • The mean body temperature of normal people might not be 37 $^\circ$C.
  • The average test score is higher for those students who take notes during lectures compared to those who do not.

We want to test them. Now we assume the opposite is true. Then it is relatively straightforward for us to use the knowledge that we learnt in the probability section to calculate the probability of observing the data. If the mean body temperature of normal people is indeed 37 $^\circ$C, we can calculate the probability of observing our sample mean to be 36.8 $^\circ$C. If there the mean test score of condition A is not higher than that of condition B, we can calculate the the probability of observing the difference of our sample means, which is 7.5 points. Since we are dealing with continuous data, the probability of getting any specific values is basically $0$. Therefore, most of the time we calculate the probability of observing our sample data or more extreme1.

In reality, we tend to think or behave as if small probability events do not generally occur. Based on this thinking, we ask: is the probability we get small or not? If it is small, it means:

  • If the mean body temperature of normal people is indeed 37 $^\circ$C, it is unlikely for us to observe our sample mean to be 36.8 $^\circ$C or more extreme.
  • If the average test score is not higher for those students who take notes during lectures compared to those who do not, it is unlikely for us to observe the difference of our sample means to be 7.5 points or more extreme.

Yet, it happens, which indicates there might be something wrong with our original assumptions. On the other hand, if the probability is not small, then it is kind of expected to get our sample data based on our original assumptions. Therefore, we do not question our original belief.

The above practice is the basic idea of hypothesis testing.


  1. The meaning of more extreme depends on the context, which we will elaborate during the lecture. ↩︎