0 of 1443 questions completed
Questions:
You have already completed the quiz before. Hence you can not start it again.
Quiz is loading…
You must sign in or sign up to start the quiz.
You must first complete the following:
0 of 1443 questions answered correctly
Your time:
Time has elapsed
You have reached 0 of 0 point(s), (0)
Earned Point(s): 0 of 0, (0)
0 Essay(s) Pending (Possible Point(s): 0)
Why do we cluster the basketball data with 3 clusters after we analyze it with 2 clusters?
What does the Akaike Information Criterion (AIC) do?
Select all that apply
Which standard delimiters can Excel identify to split text into multiple columns?
What is heteroscedasticity?
Where can you download the Excel Solver from?
What is one of the most effective methods for tuning an algorithm?
Which of the below statements are TRUE about nfold cross validation?
What happened to the AUC when we ran a boosted random forest model after the initial random forest model?
Which of these is not an assumption of time series analysis?
Which of these statements is true?
Why do we see mostly zeroes in the termdocument matrix that we created from the NYT articles?
1. In order to create interactive data visualizations, we will use:
1. Fill in the blank.
A connects points (nodes) by lines that represent relationships. By studying the interactions between people, places and events, you can determine how messages, ideas, and diseases spread and how a change in one thing can cause a cascading set of effects.
1. Fill in the blank below.
is the smoothing parameter you add to the base formula because it tells you how much of the error from a past time period to incorporate into your forecast for a future time period.
2. Identify the three things necessary to make a graph in ggplot2.
Select all that apply
1. Which function tells Shiny to visualize a graph?
1. Which two variables showed the strongest correlations with two clusters?
Select all that apply
1. Why might it be more challenging to do a sentiment analysis on communications between Millenials?
1. When using data you may encounter any of the challenges listed below. Match each challenge with its description.
Practical challenge


Epistemological challenge


Ethical challenge


Grand challenge


2. How does density affect a network?
Select all that apply
2. Betweenness tells you which nodes can be
Select all that apply
1. Which questions would seasonality analysis help answer?
Select all that apply
1. Which two functions do you need to save your image output as a PDF?
Select all that apply
2. Identify the three things necessary to make a graph in ggplot2.
2. Fill in the blank below.
2. The function searches for additional function types, such as geom_line and geom_point.
2. Which function do we use for each tab in our app?
2. Why do we cluster the basketball data with 3 clusters after we analyze it with 2 clusters?
3. Based on this word cloud, generated from hotel reviews, order the words by frequency. Put the most frequently used words at the top.
3. What is Big Data?
Select all that apply
2. What happened when Target predicted pregnancy in their customers?
3. When building a forecasting model, it is better to have more variables in the model.
How often is this statement true?
2. Which function can you use in the igraph package to eliminate duplicate data?
2. Match the functions to the actions they perform in R.
var()


data.frame()


View()


sd()


3. What is heteroscedasticity?
3. What is TRUE about adding more variables to your regression model?
Select all that apply
2. Put the steps for calculating undirected eigenvectors in the correct order.
2. What is an example of how the direction of trust does not always go both ways?
2. Why does it make sense to save a graph to PDF format?
Select all that apply
3. Fill in the blank below.
3. It’s important to look at the raw data before you it so you can see what it looks like and find initial patterns and insights without doing too much analysis.
3. Why do we use packages for data manipulation instead of just using builtin R functions?
Select all that apply
3. Why might bar graphs be misleading?
4. Which Shiny functions integrate Leaflet so it can be displayed as an application?
Select all that apply
3. Why is clustering more powerful than visualizing?
Select all that apply
4. Order these data formats from most to least structured.
4. Sort the network from the broadest community to the most niche community.
5. Identify whether the activity would need a data science team member with a low level or high level of expertise.
Generate a new algorithm


Use tools (e.g., Tableau and Excel) to visualize data


Wrangle data


Manage data


Interpret results of analyses


Ask meaningful questions


Identify the weaknesses within a model


3. Which of the following are ethical questions you may face when using data?
Select all that apply
4. Which of the below were top indicators for Target that a customer was pregnant?
Select all that apply
D


C


A


B


4. Fill in the blanks.
will set the horizontal limits of your map and will set the vertical limits of your map.
4. Put the 5 similar functions of an organization in the correct order.
4. Sort the variables as either continuous or discrete.
Continuous variables


Discrete variables


Discrete variables without a defined sequence


5. Match the methods below to their respective purpose.
Polynomial regression


Variable transformation


Moving average


Autocorrelation


Seasonality (additive/multiplicative)


HoltWinters: trendcorrected, seasonally adjusted, exponential smoothing


5. Fill in the blank below.
centrality measures a node’s (person’s) importance by giving consideration to importance of the nodes (people) connected to it.
4. Match the arguments to what they do.
navigator


ani.height and ani.width


verbose


4. What does Louvain Modularity assume?
5. The aes layer contains:
5. What aspects of a graph should you keep in mind while creating it?
Select all that apply
What did you think of the material in this section?
5. What does missing data prevent?
5. What are some important things to remember when working with outliers in your data?
Select all that apply
5. What is TRUE about followers in a directed network?
Select all that apply
5. What is a takeaway we learned from analyzing Congressional donation data?
5. Which function eliminates duplicate rows?
Fill in the blanks below.
The method is part of unsupervised machine learning and discovers new patterns or groups of data, while the method is part of supervised machine learning and assigns data points to known groups or categories.
Return the total number of Employees in the table


Return the smallest number of Employees in the table


Return the number of characters in the Company field


Return the first 2 characters of the Company field


Return the number of records in the table


Return the second character of the Company field


Match the models to the correlation they are displaying.



How was Target able to identify their pregnant customers?
Which of the following SQL functions can be used on date fields?
Select all that apply
Which operator pulls rows that contain specified terms you’re searching for to create a new dataset with only those rows?
Fill in the blank below.
The 'gg' in ggplot2 stands for .
Fill in the blank below.
To speed up our work and avoid running calculations on each data point manually, we can use the loop function, which performs a set of operations as many times as you tell it to. The advantage to this loop is that you can run different types of data through the same operations, and it does it automatically.
How do we know we still need to refine our model further?
Match the terms to the descriptions.
Sum of all the squared distances between data points in different clusters


Sum of all the squared distances between points within the same cluster


Total sum of squares


Match each data quality with its description.
Accuracy


Completeness


Uniqueness


Timeliness


Consistency


In which scenario below would we want to minimize the false negatives of our model?
Please select all that apply
What are some of the fields that text mining methods employ?
Please select all that apply
How does clustering help when there are more than 3 attributes in the data?
Please fill in the blanks below:
is the most popular hierarchical clustering method, it’s a bottomup approach, while does the opposite and is a topdown approach.
5. Which of these is an example of exploratory data analysis?
Which piece of code will retrieve only the third column of a matrix called 'm'?
What types of data are difficult to cluster?
Select all that apply
Which questions can datafication help us answer
Select all that apply
Which function converts wide data to long data?
Which of these is an example of exploratory data analysis?
How can we address the limitations of our analysis to see the data differently?
Select all that apply
What is heteroscedasticity?
Which function would I use to change transform the text below to all capital letters?
coUpE > COUPE
What is R Squared?
What is the objective of solving an optimization problem?
What type of data analysis gives you an overview of your data quickly by visualizing it?
Which R function makes sure that we can reproduce the exact same results when we use the createFolds() function?
What is a moving average?
Why do we use the silhouette coefficient to determine the number of clusters for skmeans?
Why is it important to convert all words to lower case before removing 'stop words'?
1. Fill in the blank below.
is the process of extracting information from large quantities of data to find insights, patterns and other latent information.
1. Fill in the blank below.
The goal of an rule is to extract correlation relationships in the large datasets of items. Ideally, these relationships will be causative.
1. Fill in the blank below.
If you ignore factors, then the most recent trend is used in conjunction with the last data point to make a static projection.
1. Why does R put an 'X' in front of numerical column names?
Select all that apply
1. Fill in the blank below.
The function allows us to wrap multiple Shiny apps into one and click between them.
1. Why did NAs appear when we initially read in the data?
Based on this comparison cloud, order the customer comments according to those that are most consistently complimentary (at the top) and those most negative and pervasive(at the bottom).
1. What can we conclude about the statistic that there are 88 guns for every 100 Americans?
1. Which of the following sources would need to be datafied or converted into numbers, so that you can run analyses and gain greater insight?
Select all that apply
1. Which function should I use if I want to make sure my results are reproducible?
1. Fill in the blanks below.
Sometimes, in order to analyze relationships between variables we need to the data in order to isolate certain effects, make the scales similar, etc. The package is very helpful at transforming a lot of data quickly.
1. What does the ".SD" notation stand for?
1. Fill in the blank below.
The Index will calculate the similarity between politicians # and their donors by comparing the people that they are connected to.
2. Drag the term to the box next to the correct description.
Collection of elements of the same type


Multiple rows and columns of the same data type


Collection of elements of different types


Multiple rows and columns of different data types


3. Which function generates a vector of numbers with a specified range that can count by another number?
2. Which package can we use to scrape websites for information?
3. What are some conclusions we see when we graph points per game by minutes per game with three clusters?
Select all that apply
3. When summarizing a long document via text analysis, the most "important" words will be determined by:
2. Which aspect of Big Data is highlighted in these examples?
Location data from mobile phones can infer how many people were in Macy’s on Black Friday, estimating sales before Macy's aggregates the numbers.
Amazon's product catalogue receives more than 50 million updates a week, and deliveries and inventories are tracked in real time.
3. Naïve Bayes is probabilistic classification method commonly used for text classification. Most spam filters are based on a variant of Naïve Bayes.
Order the steps that a spam filter takes when deciding whether or not a new email should be placed in the spam folder. Place the first step at the top.
2. Order the steps needed to build a multivariate regression model.
3. Use the edge.attr.comb argument to
$stats


$n


$conf


$out


2. What does the Akaike Information Criterion (AIC) do?
Select all that apply
3. Put the steps of the data science control cycle in the correct order.
3. What is TRUE about an identity matrix?
Select all that apply
2. Fill in the blank below.
The Index is a way of measuring the extent of similarity between two people or objects.
3. Please fill in the blanks below:
identifies people who have similar connections, not necessarily people who are connected. identifies communities that are indeed connected.
3. Match the function to its purpose.
grep()


length()


table()


order()


4. Fill in the blank below.
In graphics, transparency is called , where 0 is entirely transparent, and the default of 1 is entirely opaque.
4. Visualization is an iterative process. Put the steps in order starting with "Analyze"
4. What is the output of this R code?
"matrix"[,c(1:4)]
4. In order for us to determine how much variation our clusters account for, we need to:
Select all that apply
3. Which of the following are epistemological challenges?
5. Networks can contain a wealth of information. Which of the following questions are best be answered by measuring an aspect of the network (assuming the necessary data are available).
4. Data analysis can provide:
Select all that apply
5. Fill in the missing words.
Knowing the task at hand with help you choose the best tool for the job. When processing videos and images, manipulating files quickly, analyzing data, or creating dynamic or interactive visualizations, is a better tool to use than , which is better for when you have received minimal training or are viewing data.
4. Which assumption does Naïve Bayes make about its attributes?
5. If most of the data points cluster around a regression line, it may be the case that:
3. Which function would you use to pull the longitude and latitude figures from Google?
5. Match the questions with the correct step in the Data Science Control Cycle.
Ask


Research


Model


Validate


Test


Interpret


5. How do we know we still need to refine our model further?
4. Put the steps of time series data methodology in correct order.
4. What is TRUE about top observers or collectors of information in a network?
Select all that apply
4. Put the steps of the Data Science Control Cycle in order.
4. Which method allows us to build a recommendation engine?
3. Why is it important to understand different data types?
Select all that apply
5. Why do we build visualizations?
Select all that apply
5. When Hewlett Packard started tracking a range of employee factors, what were the results?
5. You are creating a feedback survey to send your customers. You already know their zip code, education level, and age. Which additional survey item captures a different type of information and may add explanatory power to your model?
5. What is R Squared?
5. What is important to keep in mind when calculating eigenvector centrality in R?
Select all that apply
5. What is a limitation of calculating edge betweenness in large networks?
5. Why do we build visualizations?
Fill in the blank below.
The main goal of clustering is to intracluster distance (the distance between points in a cluster) and intercluster distance (the distance between clusters). This ensures that the clusters are as defined and separated as possible.
Table: Company_Employees
Which SQL code will result in a frequency distribution of the Company field?
How do you measure the explanatory power of your predictive model?
Put the four steps of building a classification tree in order.
Put the following SQL clauses in the correct standard query template order:
Put the six Data Science control cycle steps in order, starting with “Ask”
Identify the three things necessary to make a graph in ggplot2.
Select all that apply
Match the functions to the actions they perform in R.
var()


data.frame()


View()


sd()


What are some things you should always check for in your model?
Select all that apply
Match the function names to their descriptions.
Returns the length of a cell, or number of characters of text in a cell


Returns the number of the character at which a specific character or text string starts


Replaces part of a text string with another text string


Match the function names to their descriptions.
Returns the length of a cell, or number of characters of text in a cell


Returns the number of the character at which a specific character or text string starts


Replaces part of a text string with another text string


In order for us to determine how much variation our clusters account for, we need to:
The datasets graphed all have similar summary statistics (including means and variances). What valuable lesson(s) can be learned from comparing the graphs?
What are some of the strengths of Naive Bayes classifier?
Please select all that apply
Match the definition to the term by dragging and dropping it into the chart.
Retrieving data from an online source, usually a web page


Collection of documents


A process that focuses on obtaining insights from text data


Hierarchical clustering assumes that points with the shortest distance between them are:
Was this hard?
Why do we use ‘==‘ instead of ‘=‘ to pull the day shift data?
What does it mean when there is a high positive correlation between two attributes?
What is a good way to test for multicollinearity?
Why does R put an 'X' in front of numerical column names?
Select all that apply
What happened when Telenor started contacting its customers?
Why do we look at correlations between players' salaries and player statistics?
Which statement is not true if you receive a positive result from a cancer test that is 95% accurate with a base rate of 1 out of 5,000 people a month?
Please fill in the blank below.
These two functions allow you to look up a value in a table, and return the desired value from another column in that table. The is for vertical tables, and the is for horizontal tables.
What does it mean if you have a small pvalue?
What happened when Telenor started contacting its customers?
Which function creates data for the ROC curve?
Which package do we use to implement boosting in R?
Which function applies linear filtering to time series?
Why wouldn't a silhouette value be computed with one cluster?
What is a Term Document Matrix?
1. Fill in the blank below.
Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together are probably more similar than data points that are farther apart.
1. Fill in the blank.
There are many methods for analyzing data. When forecasting or predicting future events, the two most common methods are classification and .
1. Match the functions to the actions that they perform in R.
graph.data.frame()


str()


V()


E()


1. Which function converts wide data to long data?
1. Fill in the blank below
Many organizations make their data publicly available through APIs, which stands for .
1. List the six steps of the Clustering control cycle, starting with "Load Data".
2. Fill in the blank.
After text mining, it is often helpful to your data. Word clouds and histograms can help you communicate your findings to others.
2. Identify five basic tools you will need to do a data science project
Select all that apply
1. Which of these methods extracts latent topics from text?
2. Match the arguments you might use when plotting a base map.
col


fill


bg


lwd


xlim


ylim


1. Fill in the blank below.
is a pattern that occurs at a regular interval over time.
1. Which function can you use to calculate betweenness centrality?
1. Which package has the gather function?
1. Drag the description into the box next to the appropriate term.
Integer


Double


String/characters


Boolean/logicals


Factor


2. Why has creating 3D visualizations become easier in R?
2. What are the three main columns we need to specify in the network data frame to describe relationships in the network?
2. Why do we need to transpose our data set?
3. Sets of variables are provided. Identify whether the set is most likely an example of correlation or causation.
Number of school hours missed and student achievement


Number of fire trucks dispatched and amount of property damage


Amount of money spent each week on vegetables and changes in weight


Number of purchases of rain boots and percent of delayed flights


3. How was Capital One able to provide customized credit products?
2. You are testing a new classification algorithm. Which of the following results may suggest that your algorithm is performing accurately?
2. What does R squared measure?
2. Fill in the blank below.
When column titles have dashes or titles using hyphens or spaces, you need to enclose the title with the accent.
2. What are some ways you can identify outliers in your data?
Select all that apply
3. What are some key things you should always check for in your model?
Select all that apply
2. What data CAN'T we get from Twitter?
2. Match the symbols to what they represent when calculating an eigenvector.
A


I


λ


ν


2. What are some things we can figure out using The Sunlight Foundation API?
Select all that apply
3. Put the steps of the cluster_edge_betweenness() function in order.
2. Fill in the blank below.
R is a powerful tool for because the graphics tie in with the functions used to analyze data
5. Match the function names to their descriptions.
Sets labels for axes and title


Flips the axes of a graph


Splits up data by category to give smaller individual graphs


Creates an area plot


4. What are some common languages to create websites?
Select all that apply
5. What R code would give me the information in the 3rd6th rows of the crime data set?
4. How do you fix the following error?
Error in plot.new() : figure margins too large
4. Which of the following are ethical questions you may face when using data?
4. Your target demographic is educated, single women who are 2540 years old. If you use text mining to analyze their comments on your website, what might you find?
5. In his experiment, Philip Tetlock found that:
Select all that apply
4. Which programming languages can not be used for data analysis?
4. Increasing the number of variables in a predictive model may not be beneficial because...
4. If two factors are strongly correlated, such as "temperature" and "what the temperature feels like" in our Bikeshare example, then we are...
Select all that apply
4. Networks can be measured and visualized by
Select all that apply
3. Which of these use cases are examples of how regression is applied? Select all that apply.
Select all that apply
4. Which package do we use to run the vif() function?
4. What is important to know about span?
Select all that apply
4. What is a scalar?
4. What is TRUE about the Jaccard Index?
4. What is the given modularity score to stop iterations at?
4. What is the output of this R code?
"matrix"[,c(1:4)]
5. Why is it easier to create interactive experiences in R today?
5. Which aspect of Big Data is highlighted in these examples?
Location data from mobile phones can infer how many people were in Macy’s on Black Friday, estimating sales before Macy's aggregates the numbers.
Amazon's product catalogue receives more than 50 million updates a week, and deliveries and inventories are tracked in real time.
5. Why is S more intuitive than R squared?
5. What is TRUE about outliers?
Select all that apply
5. What do networks represent?
Select all that apply
5. What happened when we ran label propagation 100 times on the congressional data?
5. Which package should you install to plot an interactive network visualization?
Fill in the blank below.
Match the table type to the statement that would create it:
Global Temporary Table


Local Temporary Table


Permanent table


Match the terms to their definitions.
Variance


Standard deviation


Distribution and "normality"


Covariance


Correlation


Slope


R squared


pvalues


Match the attributes to the decision tree calculation.
Categorical attributes Finds the largest class in the data Uses algorithms 

Continuous variables Finds groups of classes that make up over 50% of their data Minimizes classification 

What are the three types of relationships between tables?
Select all that apply
Good coding habits include:
Select all that apply
Fill in the blank below.
Fill in the blank below.
can have a very negative impact on linear regressions if they are not identified and handled properly because they can skew the algorithm. It's important to identify them early and determine why they do not conform to the majority of the data points in case you need to adjust your model. You can identify them with Cook's distance or boxplots.
Match the key terms below to their descriptions.
QQ plot/ distribution of errors


pvalues


VIF


BreuschPagan test


AIC


What are some ways you can adjust text in the 'Alignment' tab?
What are some ways you can adjust text in the 'Alignment' tab?
What are some conclusions from the visualization of Congress?
Select all that apply
What are some important features of visual interactivity?
Select all that apply
Please fill in the blank below.
Logistic regression can also be described as regression, which means that there are only two categories for classification.
Please match the text mining function to the descriptions.
Creates a volatile corpus object that is fully kept in memory


Creates a corpus with metadata from an object


Creates a simple corpus


Stores documents outside of R in a database


Creates a distributed corpus, a corpus that resides in a certain distributed file system


Please fill in the blank below.
distance measures the distance between points by taking the cosine of the angle between them, which measures the similarity between those points both based on the attributes they have and the difference between the attributes they don't have.
Please fill in the blanks below:
If two nodes are in the same community, then their delta is equal to , otherwise it’s equal to .
3. What does it mean to “practice” coding?
Why does it make sense to separate the code in multiple steps?
Select all that apply
Why do we look at correlations between players' salaries and player statistics?
Which package do we use to run the vif() function?
Why do we use packages for data manipulation instead of just using builtin R functions?
Select all that apply
Which of these are examples of clustering?
Select all that apply
Which functions below allow you to compare your data set to a normal distribution and plot a bifurcating line on the graph?
Select all that apply
What function do you need to use to perform the k Nearest Neighbors algorithm?
Which function looks up a particular value in a table and produces the row in which the value is located?
What do you need to run an Ftest?
Select all that apply
Which of these are examples of clustering?
Select all that apply
Which language is HighCharts originally written in?
What is the implication if you have an AUC of 1?
Please select all that apply
What package can you use at transforming data quickly?
What is one of the differences between additive and multiplicative seasonality?
Why do we look at 4 and 9 clusters when they only have an explained variance of 13.4%?
What is the output of Luhn's method?
1. Clustering is a form of what type of comparison?
1. Fill in the blank.
regression is just like univariate or linear regression, but instead of using just two variables to build a model, it can factor in more variables when building the forecasting model.
1. Approximately how many needlesticks are there annually in the United States?
1. Which function creates a heatmap?
1. Fill in the blank below
Networks are composed of two main concepts  , which represent the entities we're interested in, and , which are the relationships between these entities.
1. Why do we use the silhouette coefficient to determine the number of clusters for skmeans?