# HHS CoLab Spring 2019

This course is part of the in-person / live-streaming delivery of the HHS CoLab program, and ** is not meant as a standalone, asynchronous course**.

### Course Objectives

**Build your skills in R****Learn how to implement powerful data science methods****Create a capstone project to move your team forward**

#### Week 1: Course introduction, introduction to Git and fundamentals of R

**1. Introduction to git and fundamentals of R (6/5)**

a) Introduction to git

b) Basic shell commands

c) Introduction to R and R Studio

d) An overview of data science

e) Performing basic calculations in R

f) Loading data into R

**2. Fundamentals of R (6/8)**

a) Understanding data types, how and when to use them

b) Read/write data

c) Evaluate and address missing values in data

d) Manipulate data types and structures using flow control structures

#### Week 2: Fundamentals of R II and Static and interactive visualization in R

**3. Fundamentals of R II (6/12)**

a) Transforming and cleaning data using tidyverse’s dplyr package

b) Selecting and subsetting data using dplyr

c) Summarizing and aggregating data using tidyverse’s tidyr package

**4. Intro to vis and base r (6/14)**

a) Basic plotting in R

b) Introduction to the ‘grammar of graphics’ structure

c) Basic plotting in ggplot2

d) Customizing graphs and adjusting formats

#### Week 3: Introduction to foundational statistics and regression

**5. Advanced ggplot, interactive vis – highcharts (6/19)**

a) Advanced plotting in ggplot2, incorporating many variables

b) Working with other libraries (i.e highcharts) for interactive visualization

c) Telling a story through data and visualizations

**6. Introduction to foundational statistics (6/21)**

a) Basic statistics

b) Expected value/standard deviation/variance/covariance

c) Statistical tests and significance

d) Linear regression

e) Single variable regression

f) Multiple regression

#### Week 4: Introduction to foundational statistics and regression

**7. Best practices for model building (6/26)**

a) Introduction to the model building process

b) Splitting data into train/test/validation sets

c) Multiple regression – dealing with correlated predictors

d) Predicting

**8. Clustering (6/28)**

a) Unsupervised vs. supervised learning

b) Introduction to clustering

c) K-means

d) Pitfalls of clustering

**— Break – week of 7/2 – Holiday week (Fourth of July) —**

#### Week 5: Principal Component Analysis (PCA) and Capstone presentations

**9. Principal Component Analysis (7/10)**

a) Introduction to feature selection and engineering

b) Curse of dimensionality

c) Introduction to PCA and other related techniques

**10. Midterm capstone outlines and ideas presented (7/13)**

a) Capstone ideas presented

b) Introduction to README.md files

c) Findings so far

d) Data being used

e) Cleansing steps

f) Steps to be completed in the next 4 weeks

#### Week 6: Introduction to working with text in R and text mining

**11. Processing and working with text in R (7/17)**

a) Introduction and first steps

b) Working with word counts

c) Text cleaning and pre-processing

**12. Text mining in R (7/19)**

a) Applications of text mining at scale

b) Word distribution in a corpus and its applications

#### Week 7: Advanced text mining and introduction to classification

**13. Text mining in R – continued (7/26)**

a) Summary metrics of corpora

b) Visualizing text data

**14. Introduction to classification and k-nearest neighbors (7/27)**

a) Acknowledge the difference between classification and regression

b) Introduction to kNN

c) Overview of classification performance metrics

#### Week 8: Supervised learning methods – Classification

**15. Logistic regression (7/31)**

a) Introduction to logistic regression

b) Data transformation for logistic regression

c) Prediction and measuring error using logistic regression

**16. Decision trees / Random Forests (8/2)**

a) Introduction to decision trees

b) Performance metrics for decision trees

c) Introduction to ensemble methods

d) Visualizing and presenting outcomes of a random forest

#### Week 9: Presentation

Final Presentations – Capstones (8/7)