This static website holds all course material from the **BIO210 Biostatistics** course delivered by the **School of Life Sciences** at **SUSTech**, Shenzhen. This is an entry level statistics course for undergraduates who have no prior knowledge about statistics at all. We do assume you are familiar with the math from your high school and the 1st year undergraduate training.

## About This Website

You can find all material listed in this **Content Index** page, typically a few days before or after the actual lesson. In general, the course material for each lesson includes the following content:

**Lecture slides**(in`PDF`

format)**Homework assignment**(due date included)**Extra reading material**(not required for the exam but will help you understand the content from the lecture)

You will find that the lecture slides are not very detailed at all. We have done this on purpose to force you to take notes during the class. The lecture slides are not meant to contain everything. They are reminders of key points of the lectures. You are supposed to recall what is happening during the lecture by just looking at them.

To help you understand the content and remember what is mentioned during the lectures, some details about the key points from the lecture slides will be posted in the **Posts** page, one article per lecture. The purpose of the articles is to provide an intuitive explanation about the key concepts and some mathematical proofs in the lectures. Many students think math should be laconic and purely based on symbols and formulae. I do not think this course to be like that. Instead, it is often the intuition about those statistical concepts that matters. Therefore, we have been trying to write the posts in an expository style and be narrative. We hope to explain, in an intuitive way, why certain numerical measures are chosen over others, how those equations are derived and how to use those things in real life. Of course, you still need to come to the lectures in order for those posts to make sense to you.

If you have any questions, you can drop me an email, or ask the teaching assistants via WeChat.

## About The Course

- Introduce basic concepts of statistics to those with no prior knowledge
- Help you feel justifiably confident of your ability to interpret data/information from research articles and daily lives
- Select appropriate statistical methods for your problem
- Make you able to read other textbooks about statistics
**Help you formulate a statistical problem from real-life situation and use the numerical techniques to solve and extract information from it**

Some of you are also taking the **MA212 Probability and Statistics** course offered by the Department of Mathematics. BIO210 is very different from MA212.

- Focused on data from basic biology and medicine
- Focused on intuitive explanation
- Focused on application
- Focused on inferential statistics

Very often, it is important to know what the course is **NOT** about.

- Bayesian statistics
- Mathematical proof (important proofs will be provided, though)
- Implementation
- How and where to find data

Here are the main topics and sections from the course:

**1. Course Overview (1 hour)**- What is Statistics
- Why should biologists care about statistics
- Real-life examples
- Main difference from MA212: focused on statistics and applications

**2. Descriptive Statistics - Data Presentation (2 hours)**- Types of numerical data
- Summarising numerical data: tables and graphs
- Measures of central tendency
- Arithmetic mean
- Geometric mean
- Another interpretation of mean: weighted average

- Measures of dispersion
- Variance
- Standard deviation

**3. Introduction To Probability (1 hour)**- Probability and set notations
- Intuition/interpretation of probability
- Sample space, outcomes and events
- Probability Axioms (Kolmogorov’s Axioms)
- Probability calculation
- Discrete/Continuous Uniform Law
- The frequentist definition of probability

**4. Conditional Probability (4 hours)**- Interpretation of conditional probability
- The multiplication rule
- The total probability theorem
- The Bayes’ theorem
- The Tversky and Kahneman experiment
- The concept and interpretation of independence

**5. Probability Distributions (5 hours)**- Random variables
- Discrete random variables and probability mass functions
- Basic counting principles
- Expectation and variance of a random variable
- The Bernoulli, Binomial and Poisson distributions
- Continuous random variables and probability density functions
- The normal (Gaussian) distribution
- A little history
- Derivation of the normal PDF
- Properties and applications of the normal PDF

**6. Population And Sample (1 hour)**- Introduction to various concepts: population, sample, sampling
- Population parameters and sample statistics
- Random sample
- Different sampling strategies
- Biases

**7. Sampling Distributions And The Central Limit Theorem (2 hours)**- The idea and interpretation of sampling distributions
- Sampling distribution of the sample mean
- The central limit theorem
- Sampling Distribution of the sample variance
- The
*Chi*-squared ($\chi^2$) distribution

- The

**8. Parameter Estimation - Point Estimation (2 hours)**- Estimator and estimate
- The idea and intuition of maximum likelihood estimation (MLE)
- Examples of MLE
- Why $n-1$: MLE of variance is biased

**9. Parameter Estimation - Interval Estimation (4 hours)**- Introduction to confidence interval
- Interpretation of confidence intervals
- Construct confidence intervals for the mean
- Use the properties of the sampling distribution of the sample mean
- The Student’s
*t*distribution

- Construct confidence intervals for the variance
- How to use the
*Chi*-squared ($\chi^2$) distribution

- How to use the
- Confidence intervals for the proportion
- Sampling distribution of the sample proportion
- Normal approximation to the binomial distribution

**10. Mid-term exam (2 hours)****11. Hypothesis Testing (4 hours)**- Introduction and intuition of hypothesis testing
- The null and alternative hypotheses
- Why null hypothesis
- Introduction to p-values
- Interpretation of p-values
- One-sample hypothesis testing
- Test for mean
- Test for proportion
- Test for variance

- Types of errors
- Power and sample size estimation

**12. Compare Two Populations (3 hours)**- Compare two proportions
- Compare two means
- Paired samples
- Independent samples

- Compare two variances and the
*F*distribution

**13. The behaviour of the p-value (1 hour)**- Why p-value is so successful in science
- The distribution of p-values when $H_1$ is true
- The distribution of p-values when $H_0$ is true
- Lindley’s paradox

**14. Analysis of Variance (3 hours)**- Compare more than two populations
- The problem of multiple testing
- Source of variation and the
*F*-test *Post hoc*multiple comparisons- ANOVA examples

**15. Categorical Data Hypothesis Testing (3 hours)**- Introduction to goodness-of-fit test
- Why using
*Chi*-squared tests - Relationship between two categorical variables
- Contingency table
*Chi*-squared tests for association- Relative risk and odds ratio

**16. Linear Regression (7 hours)**- Exploration of bivariate data
- Pearson correlation coefficient (r)
- Covariance
- Different ways of calculating and interpreting r
- Sampling distribution of r

- Simple Linear Regression
- Introduction to Ordinary Least Square (OLS) regression
- Derivation of the equations for slope and intercept of the regression line
- The model of OLS
- The ANOVA of OLS
- The sampling distribution of the slope and intercept

**17. Nonparametric Methods (1 hour)**- The Wilcoxon Sign Test
- The Wilcoxon Signed-Rank Test
- The Wilcoxon Rank Sum Test
- Advantages and disadvantages of nonparametric methods

**18. Practical Data Analysis Techniques (1 hour)**- Monte Carlo Simulation
- Bootstrapping
- Permutation test

**19. Course Review (1 hour)**- Use descriptive statistics
- Anscombe quartet
- Simpson’s paradox
- Learn a programming language
- Apply the knowledge in real life