Course Description

A continuation of STT 2860 with an emphasis on statistical modeling and reproducible reporting using professional tools. Hypothesis testing will be introduced via resampling, and estimation will be introduced via bootstrapping. Cross-validation will be used to evaluate and select models that take into account the bias-variance trade-off. Supervised learning techniques will include linear regression, regression trees, classification trees, and random forests. Unsupervised learning techniques will include hierarchical clustering, k-means, and if time permits an introduction to principal components.

Labs

The labs are a series of assignments that combine various modeling techniques used to predict bodyfat.

Lab 1

Lab 2

Lab 3

Bookdown Repreductions

The bookdown reproductions are of various DataCamp assignments.

Correlation and Regression

Multiple and Logistical Regression

Tree Based Models in R

Machine Learning with Caret