Lecture 14 marks the start of the section of statistics, or more precisely, the inferential statistics section. In inferential statistics, we would like to use the information from the samples we get to make inference and some generalised conclusions about populations.
First, we need to clarify the concepts of population and sample. We actually already talked about them in previous lectures, even though we have not formally defined them. The concepts of populations and samples are actually difficult to describe. When we talk about them, we roughly know what we are talking about. We can immediately get an idea simply based on the meaning of those words. However, we do not have formal definitions on them. You can check the articles in the References section. It is not that straightforward. Anyway, we will try to define and tell the difference between them during the lecture.
I think it is helpful to put the following things in mind:
- Don’t think of a population as a collection of all people/subjects/items etc., even though it can be interpreted in this way in some cases. Instead, think of a population as an abstract thing that we are interested in. It is not approachable directly, so we used a sample to make guesses about it.
- A sample is some sort of a representation of the population. A good sample is a micro-version of the population. When we have a sample, the exact statistics about the sample, such as the mean, the variance etc., are not of our primary interest. Why? Because we know if we draw another sample from the same population, it is highly unlikely to get the exact the same statistics as the previous sample. Actually, it is almost guaranteed that the sample statistics are going to change. Therefore, the exact values in the sample are NOT of our interest. What we really care about is what the sample actually represents, that is, the population.
By getting to know some properties of the population, we will have some prediction power and could roughly tell what happens if we draw a sample from it. We can use the information to help us make decisions.