Current Status

Not Enrolled

Price

Closed

Get Started

This course is currently closed

HHS CoLab Spring 2019

This course is part of the in-person / live-streaming delivery of the HHS CoLab program, and is not meant as a standalone, asynchronous course.

Start course

Course Objectives

Build your skills in R
Learn how to implement powerful data science methods
Create a capstone project to move your team forward

Full Syllabus

Week 1: Course introduction, introduction to Git and fundamentals of R

1. Introduction to git and fundamentals of R (6/5)

a) Introduction to git
b) Basic shell commands
c) Introduction to R and R Studio
d) An overview of data science
e) Performing basic calculations in R
f) Loading data into R

2. Fundamentals of R (6/8)

a) Understanding data types, how and when to use them
b) Read/write data
c) Evaluate and address missing values in data
d) Manipulate data types and structures using flow control structures

Week 2: Fundamentals of R II and Static and interactive visualization in R

3. Fundamentals of R II (6/12)

a) Transforming and cleaning data using tidyverse’s dplyr package
b) Selecting and subsetting data using dplyr
c) Summarizing and aggregating data using tidyverse’s tidyr package

4. Intro to vis and base r (6/14)

a) Basic plotting in R
b) Introduction to the ‘grammar of graphics’ structure
c) Basic plotting in ggplot2
d) Customizing graphs and adjusting formats

Week 3: Introduction to foundational statistics and regression

5. Advanced ggplot, interactive vis – highcharts (6/19)

a) Advanced plotting in ggplot2, incorporating many variables
b) Working with other libraries (i.e highcharts) for interactive visualization
c) Telling a story through data and visualizations

6. Introduction to foundational statistics (6/21)

a) Basic statistics
b) Expected value/standard deviation/variance/covariance
c) Statistical tests and significance
d) Linear regression
e) Single variable regression
f) Multiple regression

Week 4: Introduction to foundational statistics and regression

7. Best practices for model building (6/26)

a) Introduction to the model building process
b) Splitting data into train/test/validation sets
c) Multiple regression – dealing with correlated predictors
d) Predicting

8. Clustering (6/28)

a) Unsupervised vs. supervised learning
b) Introduction to clustering
c) K-means
d) Pitfalls of clustering

— Break – week of 7/2 – Holiday week (Fourth of July) —

Week 5: Principal Component Analysis (PCA) and Capstone presentations

9. Principal Component Analysis (7/10)

a) Introduction to feature selection and engineering
b) Curse of dimensionality
c) Introduction to PCA and other related techniques

10. Midterm capstone outlines and ideas presented (7/13)

a) Capstone ideas presented
b) Introduction to README.md files
c) Findings so far
d) Data being used
e) Cleansing steps
f) Steps to be completed in the next 4 weeks

Week 6: Introduction to working with text in R and text mining

11. Processing and working with text in R (7/17)

a) Introduction and first steps
b) Working with word counts
c) Text cleaning and pre-processing

12. Text mining in R (7/19)

a) Applications of text mining at scale
b) Word distribution in a corpus and its applications

Week 7: Advanced text mining and introduction to classification

13. Text mining in R – continued (7/26)

a) Summary metrics of corpora
b) Visualizing text data

14. Introduction to classification and k-nearest neighbors (7/27)

a) Acknowledge the difference between classification and regression
b) Introduction to kNN
c) Overview of classification performance metrics

Week 8: Supervised learning methods – Classification

15. Logistic regression (7/31)

a) Introduction to logistic regression
b) Data transformation for logistic regression
c) Prediction and measuring error using logistic regression

16. Decision trees / Random Forests (8/2)

a) Introduction to decision trees
b) Performance metrics for decision trees
c) Introduction to ensemble methods
d) Visualizing and presenting outcomes of a random forest

Week 9: Presentation

Final Presentations – Capstones (8/7)