HHS CoLab Spring 2019

Current Status
Not Enrolled
Price
Closed
Get Started
This course is currently closed

HHS CoLab Spring 2019

This course is part of the in-person / live-streaming delivery of the HHS CoLab program, and is not meant as a standalone, asynchronous course.

Course Objectives

  • Build your skills in R
  • Learn how to implement powerful data science methods
  • Create a capstone project to move your team forward

Week 1: Course introduction, introduction to Git and fundamentals of R

1. Introduction to git and fundamentals of R (6/5)

a)  Introduction to git
b)  Basic shell commands
c)  Introduction to R and R Studio
d)  An overview of data science
e)  Performing basic calculations in R
f)  Loading data into R

2. Fundamentals of R (6/8)

a)  Understanding data types, how and when to use them
b)  Read/write data
c)  Evaluate and address missing values in data
d)  Manipulate data types and structures using flow control structures

Week 2: Fundamentals of R II and Static and interactive visualization in R

3. Fundamentals of R II (6/12)

a)  Transforming and cleaning data using tidyverse’s dplyr package
b)  Selecting and subsetting data using dplyr
c)  Summarizing and aggregating data using tidyverse’s tidyr package

4. Intro to vis and base r (6/14)

a)  Basic plotting in R
b)  Introduction to the ‘grammar of graphics’ structure
c)  Basic plotting in ggplot2
d)  Customizing graphs and adjusting formats

Week 3: Introduction to foundational statistics and regression

5. Advanced ggplot, interactive vis – highcharts (6/19)

a) Advanced plotting in ggplot2, incorporating many variables
b) Working with other libraries (i.e highcharts) for interactive visualization
c) Telling a story through data and visualizations

6. Introduction to foundational statistics (6/21)

a) Basic statistics
b) Expected value/standard deviation/variance/covariance
c) Statistical tests and significance
d) Linear regression
e) Single variable regression
f) Multiple regression

 

Week 4: Introduction to foundational statistics and regression

7. Best practices for model building (6/26)

a) Introduction to the model building process
b) Splitting data into train/test/validation sets
c) Multiple regression – dealing with correlated predictors
d) Predicting

8. Clustering (6/28)

a) Unsupervised vs. supervised learning
b) Introduction to clustering
c) K-means
d) Pitfalls of clustering

 

— Break – week of 7/2 – Holiday week (Fourth of July) —

 

Week 5: Principal Component Analysis (PCA) and Capstone presentations

9. Principal Component Analysis (7/10)

a)  Introduction to feature selection and engineering
b)  Curse of dimensionality
c)  Introduction to PCA and other related techniques

10. Midterm capstone outlines and ideas presented (7/13)

a)  Capstone ideas presented
b)  Introduction to README.md files
c)  Findings so far
d)  Data being used
e)  Cleansing steps
f)  Steps to be completed in the next 4 weeks

 

Week 6: Introduction to working with text in R and text mining

11. Processing and working with text in R (7/17)

a)  Introduction and first steps
b)  Working with word counts
c)  Text cleaning and pre-processing

12. Text mining in R (7/19)

a)  Applications of text mining at scale
b)  Word distribution in a corpus and its applications

 

Week 7: Advanced text mining and introduction to classification

13. Text mining in R – continued (7/26)

a)  Summary metrics of corpora
b)  Visualizing text data

14. Introduction to classification and k-nearest neighbors (7/27)

a)  Acknowledge the difference between classification and regression
b)  Introduction to kNN
c)  Overview of classification performance metrics

 

Week 8: Supervised learning methods – Classification

15. Logistic regression (7/31)

a)  Introduction to logistic regression
b)  Data transformation for logistic regression
c)  Prediction and measuring error using logistic regression

16. Decision trees / Random Forests (8/2)

a)  Introduction to decision trees
b)  Performance metrics for decision trees
c)  Introduction to ensemble methods
d)  Visualizing and presenting outcomes of a random forest

Week 9: Presentation

Final Presentations – Capstones (8/7)