0 of 1443 questions completed
Questions:
You have already completed the quiz before. Hence you can not start it again.
Quiz is loading…
You must sign in or sign up to start the quiz.
You must first complete the following:
0 of 1443 questions answered correctly
Your time:
Time has elapsed
You have reached 0 of 0 point(s), (0)
Earned Point(s): 0 of 0, (0)
0 Essay(s) Pending (Possible Point(s): 0)
Why do we cluster the basketball data with 3 clusters after we analyze it with 2 clusters?
What does the Akaike Information Criterion (AIC) do?
Select all that apply
Which standard delimiters can Excel identify to split text into multiple columns?
What is heteroscedasticity?
Where can you download the Excel Solver from?
What is one of the most effective methods for tuning an algorithm?
Which of the below statements are TRUE about n-fold cross validation?
What happened to the AUC when we ran a boosted random forest model after the initial random forest model?
Which of these is not an assumption of time series analysis?
Which of these statements is true?
Why do we see mostly zeroes in the term-document matrix that we created from the NYT articles?
1. In order to create interactive data visualizations, we will use:
1. Fill in the blank.
A connects points (nodes) by lines that represent relationships. By studying the interactions between people, places and events, you can determine how messages, ideas, and diseases spread and how a change in one thing can cause a cascading set of effects.
1. Fill in the blank below.
is the smoothing parameter you add to the base formula because it tells you how much of the error from a past time period to incorporate into your forecast for a future time period.
2. Identify the three things necessary to make a graph in ggplot2.
Select all that apply
1. Which function tells Shiny to visualize a graph?
1. Which two variables showed the strongest correlations with two clusters?
Select all that apply
1. Why might it be more challenging to do a sentiment analysis on communications between Millenials?
1. When using data you may encounter any of the challenges listed below. Match each challenge with its description.
Practical challenge
|
|
Epistemological challenge
|
|
Ethical challenge
|
|
Grand challenge
|
|
2. How does density affect a network?
Select all that apply
2. Betweenness tells you which nodes can be
Select all that apply
1. Which questions would seasonality analysis help answer?
Select all that apply
1. Which two functions do you need to save your image output as a PDF?
Select all that apply
2. Identify the three things necessary to make a graph in ggplot2.
2. Fill in the blank below.
2. The function searches for additional function types, such as geom_line and geom_point.
2. Which function do we use for each tab in our app?
2. Why do we cluster the basketball data with 3 clusters after we analyze it with 2 clusters?
3. Based on this word cloud, generated from hotel reviews, order the words by frequency. Put the most frequently used words at the top.
3. What is Big Data?
Select all that apply
2. What happened when Target predicted pregnancy in their customers?
3. When building a forecasting model, it is better to have more variables in the model.
How often is this statement true?
2. Which function can you use in the igraph package to eliminate duplicate data?
2. Match the functions to the actions they perform in R.
var()
|
|
data.frame()
|
|
View()
|
|
sd()
|
|
3. What is heteroscedasticity?
3. What is TRUE about adding more variables to your regression model?
Select all that apply
2. Put the steps for calculating undirected eigenvectors in the correct order.
2. What is an example of how the direction of trust does not always go both ways?
2. Why does it make sense to save a graph to PDF format?
Select all that apply
3. Fill in the blank below.
3. It’s important to look at the raw data before you it so you can see what it looks like and find initial patterns and insights without doing too much analysis.
3. Why do we use packages for data manipulation instead of just using built-in R functions?
Select all that apply
3. Why might bar graphs be misleading?
4. Which Shiny functions integrate Leaflet so it can be displayed as an application?
Select all that apply
3. Why is clustering more powerful than visualizing?
Select all that apply
4. Order these data formats from most to least structured.
4. Sort the network from the broadest community to the most niche community.
5. Identify whether the activity would need a data science team member with a low level or high level of expertise.
Generate a new algorithm
|
|
Use tools (e.g., Tableau and Excel) to visualize data
|
|
Wrangle data
|
|
Manage data
|
|
Interpret results of analyses
|
|
Ask meaningful questions
|
|
Identify the weaknesses within a model
|
|
3. Which of the following are ethical questions you may face when using data?
Select all that apply
4. Which of the below were top indicators for Target that a customer was pregnant?
Select all that apply
D
|
|
C
|
|
A
|
|
B
|
|
4. Fill in the blanks.
will set the horizontal limits of your map and will set the vertical limits of your map.
4. Put the 5 similar functions of an organization in the correct order.
4. Sort the variables as either continuous or discrete.
Continuous variables
|
|
Discrete variables
|
|
Discrete variables without a defined sequence
|
|
5. Match the methods below to their respective purpose.
Polynomial regression
|
|
Variable transformation
|
|
Moving average
|
|
Autocorrelation
|
|
Seasonality (additive/multiplicative)
|
|
Holt-Winters: trend-corrected, seasonally adjusted, exponential smoothing
|
|
5. Fill in the blank below.
centrality measures a node’s (person’s) importance by giving consideration to importance of the nodes (people) connected to it.
4. Match the arguments to what they do.
navigator
|
|
ani.height and ani.width
|
|
verbose
|
|
4. What does Louvain Modularity assume?
5. The aes layer contains:
5. What aspects of a graph should you keep in mind while creating it?
Select all that apply
What did you think of the material in this section?
5. What does missing data prevent?
5. What are some important things to remember when working with outliers in your data?
Select all that apply
5. What is TRUE about followers in a directed network?
Select all that apply
5. What is a takeaway we learned from analyzing Congressional donation data?
5. Which function eliminates duplicate rows?
Fill in the blanks below.
The method is part of unsupervised machine learning and discovers new patterns or groups of data, while the method is part of supervised machine learning and assigns data points to known groups or categories.
Return the total number of Employees in the table
|
|
Return the smallest number of Employees in the table
|
|
Return the number of characters in the Company field
|
|
Return the first 2 characters of the Company field
|
|
Return the number of records in the table
|
|
Return the second character of the Company field
|
|
Match the models to the correlation they are displaying.
|
|
|
How was Target able to identify their pregnant customers?
Which of the following SQL functions can be used on date fields?
Select all that apply
Which operator pulls rows that contain specified terms you’re searching for to create a new dataset with only those rows?
Fill in the blank below.
The 'gg' in ggplot2 stands for .
Fill in the blank below.
To speed up our work and avoid running calculations on each data point manually, we can use the loop function, which performs a set of operations as many times as you tell it to. The advantage to this loop is that you can run different types of data through the same operations, and it does it automatically.
How do we know we still need to refine our model further?
Match the terms to the descriptions.
Sum of all the squared distances between data points in different clusters
|
|
Sum of all the squared distances between points within the same cluster
|
|
Total sum of squares
|
|
Match each data quality with its description.
Accuracy
|
|
Completeness
|
|
Uniqueness
|
|
Timeliness
|
|
Consistency
|
|
In which scenario below would we want to minimize the false negatives of our model?
Please select all that apply
What are some of the fields that text mining methods employ?
Please select all that apply
How does clustering help when there are more than 3 attributes in the data?
Please fill in the blanks below:
is the most popular hierarchical clustering method, it’s a bottom-up approach, while does the opposite and is a top-down approach.
5. Which of these is an example of exploratory data analysis?
Which piece of code will retrieve only the third column of a matrix called 'm'?
What types of data are difficult to cluster?
Select all that apply
Which questions can datafication help us answer
Select all that apply
Which function converts wide data to long data?
Which of these is an example of exploratory data analysis?
How can we address the limitations of our analysis to see the data differently?
Select all that apply
What is heteroscedasticity?
Which function would I use to change transform the text below to all capital letters?
coUpE --> COUPE
What is R Squared?
What is the objective of solving an optimization problem?
What type of data analysis gives you an overview of your data quickly by visualizing it?
Which R function makes sure that we can reproduce the exact same results when we use the createFolds() function?
What is a moving average?
Why do we use the silhouette coefficient to determine the number of clusters for sk-means?
Why is it important to convert all words to lower case before removing 'stop words'?
1. Fill in the blank below.
is the process of extracting information from large quantities of data to find insights, patterns and other latent information.
1. Fill in the blank below.
The goal of an rule is to extract correlation relationships in the large datasets of items. Ideally, these relationships will be causative.
1. Fill in the blank below.
If you ignore factors, then the most recent trend is used in conjunction with the last data point to make a static projection.
1. Why does R put an 'X' in front of numerical column names?
Select all that apply
1. Fill in the blank below.
The function allows us to wrap multiple Shiny apps into one and click between them.
1. Why did NAs appear when we initially read in the data?
Based on this comparison cloud, order the customer comments according to those that are most consistently complimentary (at the top) and those most negative and pervasive(at the bottom).
1. What can we conclude about the statistic that there are 88 guns for every 100 Americans?
1. Which of the following sources would need to be datafied or converted into numbers, so that you can run analyses and gain greater insight?
Select all that apply
1. Which function should I use if I want to make sure my results are reproducible?
1. Fill in the blanks below.
Sometimes, in order to analyze relationships between variables we need to the data in order to isolate certain effects, make the scales similar, etc. The package is very helpful at transforming a lot of data quickly.
1. What does the ".SD" notation stand for?
1. Fill in the blank below.
The Index will calculate the similarity between politicians # and their donors by comparing the people that they are connected to.
2. Drag the term to the box next to the correct description.
Collection of elements of the same type
|
|
Multiple rows and columns of the same data type
|
|
Collection of elements of different types
|
|
Multiple rows and columns of different data types
|
|
3. Which function generates a vector of numbers with a specified range that can count by another number?
2. Which package can we use to scrape websites for information?
3. What are some conclusions we see when we graph points per game by minutes per game with three clusters?
Select all that apply
3. When summarizing a long document via text analysis, the most "important" words will be determined by:
2. Which aspect of Big Data is highlighted in these examples?
-Location data from mobile phones can infer how many people were in Macy’s on Black Friday, estimating sales before Macy's aggregates the numbers.
-Amazon's product catalogue receives more than 50 million updates a week, and deliveries and inventories are tracked in real time.
3. Naïve Bayes is probabilistic classification method commonly used for text classification. Most spam filters are based on a variant of Naïve Bayes.
Order the steps that a spam filter takes when deciding whether or not a new email should be placed in the spam folder. Place the first step at the top.
2. Order the steps needed to build a multivariate regression model.
3. Use the edge.attr.comb argument to
$stats
|
|
$n
|
|
$conf
|
|
$out
|
|
2. What does the Akaike Information Criterion (AIC) do?
Select all that apply
3. Put the steps of the data science control cycle in the correct order.
3. What is TRUE about an identity matrix?
Select all that apply
2. Fill in the blank below.
The Index is a way of measuring the extent of similarity between two people or objects.
3. Please fill in the blanks below:
identifies people who have similar connections, not necessarily people who are connected. identifies communities that are indeed connected.
3. Match the function to its purpose.
grep()
|
|
length()
|
|
table()
|
|
order()
|
|
4. Fill in the blank below.
In graphics, transparency is called , where 0 is entirely transparent, and the default of 1 is entirely opaque.
4. Visualization is an iterative process. Put the steps in order starting with "Analyze"
4. What is the output of this R code?
"matrix"[,c(1:4)]
4. In order for us to determine how much variation our clusters account for, we need to:
Select all that apply
3. Which of the following are epistemological challenges?
5. Networks can contain a wealth of information. Which of the following questions are best be answered by measuring an aspect of the network (assuming the necessary data are available).
4. Data analysis can provide:
Select all that apply
5. Fill in the missing words.
Knowing the task at hand with help you choose the best tool for the job. When processing videos and images, manipulating files quickly, analyzing data, or creating dynamic or interactive visualizations, is a better tool to use than , which is better for when you have received minimal training or are viewing data.
4. Which assumption does Naïve Bayes make about its attributes?
5. If most of the data points cluster around a regression line, it may be the case that:
3. Which function would you use to pull the longitude and latitude figures from Google?
5. Match the questions with the correct step in the Data Science Control Cycle.
Ask
|
|
Research
|
|
Model
|
|
Validate
|
|
Test
|
|
Interpret
|
|
5. How do we know we still need to refine our model further?
4. Put the steps of time series data methodology in correct order.
4. What is TRUE about top observers or collectors of information in a network?
Select all that apply
4. Put the steps of the Data Science Control Cycle in order.
4. Which method allows us to build a recommendation engine?
3. Why is it important to understand different data types?
Select all that apply
5. Why do we build visualizations?
Select all that apply
5. When Hewlett Packard started tracking a range of employee factors, what were the results?
5. You are creating a feedback survey to send your customers. You already know their zip code, education level, and age. Which additional survey item captures a different type of information and may add explanatory power to your model?
5. What is R Squared?
5. What is important to keep in mind when calculating eigenvector centrality in R?
Select all that apply
5. What is a limitation of calculating edge betweenness in large networks?
5. Why do we build visualizations?
Fill in the blank below.
The main goal of clustering is to intra-cluster distance (the distance between points in a cluster) and inter-cluster distance (the distance between clusters). This ensures that the clusters are as defined and separated as possible.
Table: Company_Employees
Which SQL code will result in a frequency distribution of the Company field?
How do you measure the explanatory power of your predictive model?
Put the four steps of building a classification tree in order.
Put the following SQL clauses in the correct standard query template order:
Put the six Data Science control cycle steps in order, starting with “Ask”
Identify the three things necessary to make a graph in ggplot2.
Select all that apply
Match the functions to the actions they perform in R.
var()
|
|
data.frame()
|
|
View()
|
|
sd()
|
|
What are some things you should always check for in your model?
Select all that apply
Match the function names to their descriptions.
Returns the length of a cell, or number of characters of text in a cell
|
|
Returns the number of the character at which a specific character or text string starts
|
|
Replaces part of a text string with another text string
|
|
Match the function names to their descriptions.
Returns the length of a cell, or number of characters of text in a cell
|
|
Returns the number of the character at which a specific character or text string starts
|
|
Replaces part of a text string with another text string
|
|
In order for us to determine how much variation our clusters account for, we need to:
The datasets graphed all have similar summary statistics (including means and variances). What valuable lesson(s) can be learned from comparing the graphs?
What are some of the strengths of Naive Bayes classifier?
Please select all that apply
Match the definition to the term by dragging and dropping it into the chart.
Retrieving data from an online source, usually a web page
|
|
Collection of documents
|
|
A process that focuses on obtaining insights from text data
|
|
Hierarchical clustering assumes that points with the shortest distance between them are:
Was this hard?
Why do we use ‘==‘ instead of ‘=‘ to pull the day shift data?
What does it mean when there is a high positive correlation between two attributes?
What is a good way to test for multicollinearity?
Why does R put an 'X' in front of numerical column names?
Select all that apply
What happened when Telenor started contacting its customers?
Why do we look at correlations between players' salaries and player statistics?
Which statement is not true if you receive a positive result from a cancer test that is 95% accurate with a base rate of 1 out of 5,000 people a month?
Please fill in the blank below.
These two functions allow you to look up a value in a table, and return the desired value from another column in that table. The is for vertical tables, and the is for horizontal tables.
What does it mean if you have a small p-value?
What happened when Telenor started contacting its customers?
Which function creates data for the ROC curve?
Which package do we use to implement boosting in R?
Which function applies linear filtering to time series?
Why wouldn't a silhouette value be computed with one cluster?
What is a Term Document Matrix?
1. Fill in the blank below.
Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together are probably more similar than data points that are farther apart.
1. Fill in the blank.
There are many methods for analyzing data. When forecasting or predicting future events, the two most common methods are classification and .
1. Match the functions to the actions that they perform in R.
graph.data.frame()
|
|
str()
|
|
V()
|
|
E()
|
|
1. Which function converts wide data to long data?
1. Fill in the blank below
Many organizations make their data publicly available through APIs, which stands for .
1. List the six steps of the Clustering control cycle, starting with "Load Data".
2. Fill in the blank.
After text mining, it is often helpful to your data. Word clouds and histograms can help you communicate your findings to others.
2. Identify five basic tools you will need to do a data science project
Select all that apply
1. Which of these methods extracts latent topics from text?
2. Match the arguments you might use when plotting a base map.
col
|
|
fill
|
|
bg
|
|
lwd
|
|
xlim
|
|
ylim
|
|
1. Fill in the blank below.
is a pattern that occurs at a regular interval over time.
1. Which function can you use to calculate betweenness centrality?
1. Which package has the gather function?
1. Drag the description into the box next to the appropriate term.
Integer
|
|
Double
|
|
String/characters
|
|
Boolean/logicals
|
|
Factor
|
|
2. Why has creating 3D visualizations become easier in R?
2. What are the three main columns we need to specify in the network data frame to describe relationships in the network?
2. Why do we need to transpose our data set?
3. Sets of variables are provided. Identify whether the set is most likely an example of correlation or causation.
Number of school hours missed and student achievement
|
|
Number of fire trucks dispatched and amount of property damage
|
|
Amount of money spent each week on vegetables and changes in weight
|
|
Number of purchases of rain boots and percent of delayed flights
|
|
3. How was Capital One able to provide customized credit products?
2. You are testing a new classification algorithm. Which of the following results may suggest that your algorithm is performing accurately?
2. What does R squared measure?
2. Fill in the blank below.
When column titles have dashes or titles using hyphens or spaces, you need to enclose the title with the accent.
2. What are some ways you can identify outliers in your data?
Select all that apply
3. What are some key things you should always check for in your model?
Select all that apply
2. What data CAN'T we get from Twitter?
2. Match the symbols to what they represent when calculating an eigenvector.
A
|
|
I
|
|
λ
|
|
ν
|
|
2. What are some things we can figure out using The Sunlight Foundation API?
Select all that apply
3. Put the steps of the cluster_edge_betweenness() function in order.
2. Fill in the blank below.
R is a powerful tool for because the graphics tie in with the functions used to analyze data
5. Match the function names to their descriptions.
Sets labels for axes and title
|
|
Flips the axes of a graph
|
|
Splits up data by category to give smaller individual graphs
|
|
Creates an area plot
|
|
4. What are some common languages to create websites?
Select all that apply
5. What R code would give me the information in the 3rd-6th rows of the crime data set?
4. How do you fix the following error?
Error in plot.new() : figure margins too large
4. Which of the following are ethical questions you may face when using data?
4. Your target demographic is educated, single women who are 25-40 years old. If you use text mining to analyze their comments on your website, what might you find?
5. In his experiment, Philip Tetlock found that:
Select all that apply
4. Which programming languages can not be used for data analysis?
4. Increasing the number of variables in a predictive model may not be beneficial because...
4. If two factors are strongly correlated, such as "temperature" and "what the temperature feels like" in our Bikeshare example, then we are...
Select all that apply
4. Networks can be measured and visualized by
Select all that apply
3. Which of these use cases are examples of how regression is applied? Select all that apply.
Select all that apply
4. Which package do we use to run the vif() function?
4. What is important to know about span?
Select all that apply
4. What is a scalar?
4. What is TRUE about the Jaccard Index?
4. What is the given modularity score to stop iterations at?
4. What is the output of this R code?
"matrix"[,c(1:4)]
5. Why is it easier to create interactive experiences in R today?
5. Which aspect of Big Data is highlighted in these examples?
-Location data from mobile phones can infer how many people were in Macy’s on Black Friday, estimating sales before Macy's aggregates the numbers.
-Amazon's product catalogue receives more than 50 million updates a week, and deliveries and inventories are tracked in real time.
5. Why is S more intuitive than R squared?
5. What is TRUE about outliers?
Select all that apply
5. What do networks represent?
Select all that apply
5. What happened when we ran label propagation 100 times on the congressional data?
5. Which package should you install to plot an interactive network visualization?
Fill in the blank below.
Match the table type to the statement that would create it:
Global Temporary Table
|
|
Local Temporary Table
|
|
Permanent table
|
|
Match the terms to their definitions.
Variance
|
|
Standard deviation
|
|
Distribution and "normality"
|
|
Covariance
|
|
Correlation
|
|
Slope
|
|
R squared
|
|
p-values
|
|
Match the attributes to the decision tree calculation.
Categorical attributes Finds the largest class in the data Uses algorithms |
|
Continuous variables Finds groups of classes that make up over 50% of their data Minimizes classification |
|
What are the three types of relationships between tables?
Select all that apply
Good coding habits include:
Select all that apply
Fill in the blank below.
Fill in the blank below.
can have a very negative impact on linear regressions if they are not identified and handled properly because they can skew the algorithm. It's important to identify them early and determine why they do not conform to the majority of the data points in case you need to adjust your model. You can identify them with Cook's distance or boxplots.
Match the key terms below to their descriptions.
Q-Q plot/ distribution of errors
|
|
p-values
|
|
VIF
|
|
Breusch-Pagan test
|
|
AIC
|
|
What are some ways you can adjust text in the 'Alignment' tab?
What are some ways you can adjust text in the 'Alignment' tab?
What are some conclusions from the visualization of Congress?
Select all that apply
What are some important features of visual interactivity?
Select all that apply
Please fill in the blank below.
Logistic regression can also be described as regression, which means that there are only two categories for classification.
Please match the text mining function to the descriptions.
Creates a volatile corpus object that is fully kept in memory
|
|
Creates a corpus with metadata from an object
|
|
Creates a simple corpus
|
|
Stores documents outside of R in a database
|
|
Creates a distributed corpus, a corpus that resides in a certain distributed file system
|
|
Please fill in the blank below.
distance measures the distance between points by taking the cosine of the angle between them, which measures the similarity between those points both based on the attributes they have and the difference between the attributes they don't have.
Please fill in the blanks below:
If two nodes are in the same community, then their delta is equal to , otherwise it’s equal to .
3. What does it mean to “practice” coding?
Why does it make sense to separate the code in multiple steps?
Select all that apply
Why do we look at correlations between players' salaries and player statistics?
Which package do we use to run the vif() function?
Why do we use packages for data manipulation instead of just using built-in R functions?
Select all that apply
Which of these are examples of clustering?
Select all that apply
Which functions below allow you to compare your data set to a normal distribution and plot a bifurcating line on the graph?
Select all that apply
What function do you need to use to perform the k Nearest Neighbors algorithm?
Which function looks up a particular value in a table and produces the row in which the value is located?
What do you need to run an F-test?
Select all that apply
Which of these are examples of clustering?
Select all that apply
Which language is HighCharts originally written in?
What is the implication if you have an AUC of 1?
Please select all that apply
What package can you use at transforming data quickly?
What is one of the differences between additive and multiplicative seasonality?
Why do we look at 4 and 9 clusters when they only have an explained variance of 13.4%?
What is the output of Luhn's method?
1. Clustering is a form of what type of comparison?
1. Fill in the blank.
regression is just like univariate or linear regression, but instead of using just two variables to build a model, it can factor in more variables when building the forecasting model.
1. Approximately how many needlesticks are there annually in the United States?
1. Which function creates a heatmap?
1. Fill in the blank below
Networks are composed of two main concepts - , which represent the entities we're interested in, and , which are the relationships between these entities.
1. Why do we use the silhouette coefficient to determine the number of clusters for sk-means?
2. Forecasts of consumer demand can inform decision making by...
1. When should you identify how much and what type of data you have?
2. Forecasts of consumer demand can inform decision making by...
Select all that apply
1. What data did we remove because it was creating "noise" in our visualization?
Select all that apply
2. Match the common seasonal patterns to the correct unit of time.
Hourly
|
|
Daily
|
|
Weekly
|
|
Monthly
|
|
Yearly
|
|
1. Fill in the blank below.
You can quantify a network via an matrix where the rows and columns represent the nodes and the values in the matrix represent the strength of the connections.
1. What is TRUE if more people know each other in a community?
Select all that apply
2. Which type of data contain sets of categories?
3. How do we add color by category within the 'threejs' package?
3. What are some insights we discovered when we visualized the Saturday Night Live data?
Select all that apply
2. What is one limitation of the k-means analysis we performed in the previous video?
3. When building a forecasting model, it is better to have more variables in the model.
How often is this statement true?
2. Which of these social media sites share data with 3rd parties?
Select all that apply
3. If your classification model has a perfect, 100% accuracy, which of the following questions should you ask?
Select all that apply
2. When would bike demand be the greatest in Washington DC?
3. Match the terms to the their definitions.
Closeness centrality
|
|
head( )
|
|
length( )
|
|
3. Match the models to the correlation they are displaying.
|
|
|
3. Fill in the blank below.
In a relationship, the value of the dependent variable changes in a non-linear fashion.
3. Put the steps in order for accessing the Twitter API for free.
2. Fill in the blank below.
Use the function when combining two data sets into one.
2. What should you keep in mind as you run a label propagation algorithm?
Select all that apply
2. Fill in the blank below.
To make sure your images are reproducible, you can use the function.
3. Which of these functions are the "bare bones" of ggplot2?
Select all that apply
3. Why do we invert the y-scale in the rCharts plot?
3. Why is it important to understand different data types?
Select all that apply
4. Why do we look at correlations between players' salaries and player statistics?
5. Fill in the blank.
To avoid falsely concluding that one event caused another, a possible method to use is .
5. A company is planning to re-brand one of its products and the product director wants to know how customers feel about some potential product names. Text mining can be most beneficial in which of these situations?
4. Which of these are examples of product recommendations?
Select all that apply
4. Match each data quality with its description.
Accuracy
|
|
Completeness
|
|
Uniqueness
|
|
Timeliness
|
|
Consistency
|
|
5. Using this network visualization, order the relationships based upon the strength of their connection. Put the strongest connection on top.
3. Which of these conditions would result in a high R-squared?
4. Fill in the blank below.
The package is great for reformatting data in order to visualize the data the way we want to.
4. Using the Capital Bikeshare data, what questions can we answer through regression analysis?
Select all that apply
5. Match the methods of variable selection to the correct descriptions.
Forward selection
|
|
Backward selection
|
|
Step-wise selection
|
|
4. Match the regression analyses we have learned about to the correct visualization.
|
|
|
|
5. Match the symbols to what they represent when calculating eigenvalues.
A
|
|
I
|
|
λ
|
|
3. Which package can you use to find the political data from the Sunlight Foundation?
4. How can you determine if the communities that were detected by label propagation are stable?
5. What R code would give me the information in the 3rd-6th rows of the crime data set?
5. Why is it advantageous to create interactive graphs for this data set?
Select all that apply
Based on this decision tree, order the students based on the likelihood that they will be accepted into a graduate program.
5. You finished building a predictive model. Which questions may people have about it?
Select all that apply
5. When does the adjusted R squared increase?
5. Why does the space after the carat below matter?
"[^ [:graph:]]"
5. Which node has the highest PageRank value?
Good coding habits include:
Select all that apply
Clustering is a form of what type of comparison?
What are views in SQL valuable for?
Select all that apply
Put the steps of running a t-test in the correct order.
Put the steps of k-NN in order.
Remove the entire table
|
|
Add a row for company ABC for 2005 to the table
|
|
Change the value 1,500 to 15,000
|
|
Return a conditional value (e.g. large company or small company) based on the number of employees
|
|
Remove the row with company ABC
|
|
Specify that a query return the employees field as an integer
|
|
Match the method to the description (note: there are more methods listed than necessary).
Measures similarity between data points to group them and identify key similarities that you can use to find trends
|
|
Looks at how people, places, and other entities are connected, which can help you determine a sphere of influence and how to propagate your message quickly and effectively
|
|
Digests large amounts of text quickly and finds common themes, messages and patterns.
|
|
Fill in the blank below.
A big strength of the package is the ability to customize graphs by adding layers and adjusting the data so it doesn't look generic. This is a package that brings a lot of flexibility in visualizing your data beyond bar charts.
Match the models to the correlation they are displaying.
|
|
|
Fill in the blank below:
In entropy, the number indicates 100% of the data is the same, and the number indicates a 50-50 split.
Match the function names to their descriptions.
Counts the frequency of occurrence of data points that meet a specific condition
|
|
Sums up data across ranges that meet a specific condition
|
|
Takes an average of values across ranges that meet a specific condition
|
|
Helps you pick out records that meet a variety of conditions you set
|
|
Match the function names to their descriptions.
Counts the frequency of occurrence of data points that meet a specific condition
|
|
Sums up data across ranges that meet a specific condition
|
|
Takes an average of values across ranges that meet a specific condition
|
|
Helps you pick out records that meet a variety of conditions you set
|
|
Fill in the blank below.
Clustering and data mining are types of data analysis , which is a type of data analysis where the intent is to see what the data can tell us beyond modeling or hypothesis testing.
Match the Plotly function to its symbol:
|
|
|
|
Match the confusion matrix term with the question that corresponds to it
Overall, how often is the classifier correct?
|
|
Overall, how often is the classifier wrong?
|
|
How many actual negative outcomes were correctly predicted as negative?
|
|
How many actual positive outcomes were correctly predicted as positive?
|
|
What are some quick and easy-to-use visualizations that express frequency of words?
Please select all that apply
Please fill in the blank below.
In perfect clustering, the silhouette value for each point will approach .
What are some use cases for the Jaccard Index?
Select all that apply
Why do we need to validate the model?
Which of the answers below is a function?
Select all that apply
Which two variables showed the strongest correlations with two clusters?
Select all that apply
What does SQL stand for?
Which of the following statements is not true about k-means clustering?
Select all that apply
What does it mean if your model has a smaller standard deviation of residuals?
What is one of the most effective methods for tuning an algorithm?
Why is it useful to audit your formulas?
Which of the following statements is not true about k-means clustering?
Select all that apply
What is the range of outputs you can expect from a logistic regression model?
What happens when you have a higher degree polynomial model?
Which of these are common smoothing techniques?
Please select all that apply
Which of these are examples of clustering?
Select all that apply
Which package can you use to convert HTML into a readable list?
Fill in the blank.
Classification is a method that the behavior of an object or individual.
Y
|
|
X
|
|
M
|
|
B
|
|
1. The first component ggplot2 starts with is the:
1. Which function pulls geographical information from Google?
2. Fill in the blank below.
In perfect clustering, the silhouette value for each point will approach .
1. Which of these questions may be most impacted by seasons or cycles?
2. Match the data science team structure with its pros and cons. In each situation the client, may represent either an internal or external team.
Pros: Client goals met
Pros: Standardized processes + strategic goals/vision met
|
|
|
2. Choose the correct pair of words to complete the statement.
The objective of a good model is not to fit the data perfectly, it's to have the lowest ___ when applied to new, _____ data.
1. Remember to always use this function to ensure that R is reading the data as characters before using the grep() function.
1. Which visualization below shows a trend with multiplicative seasonality?
1. Fill in the blank below.
The eigenvalue is essentially a number that can scale a matrix up and still maintain the of it.
1. What are two ways to identify clusters from the hclust function?
Select all that apply
1. Match the names to the graphs below.
|
|
|
|
2. Match the map pictures to the type
|
|
|
|
2. Match the picture type to the tile type
|
|
|
3. Fill in the blank below.
distance measures the distance between points by taking the cosine of the angle between them, which measures the similarity between those points both based on the attributes they have and the difference between the attributes they don't have.
2. Order the steps needed to build a multivariate regression model.
2. Which of the following is a practical challenge that companies face when using data?
2. Match the network to the possible connections.
Twitter
|
|
Facebook
|
|
Netflix catalog
|
|
Past presidents
|
|
Your company
|
|
City
|
|
3. Fill in the blank below.
Since seasonality and cycles affect forecasts and it is easier to predict outcomes than long-term outcomes - timing always matters when building predictive models.
3. Put the steps in order for testing the resilience of a network.
2. What is TRUE about correlation?
Select all that apply
2. Match the seasonality analysis use case questions to the business function they relate to.
Research & Development
|
|
Buy
|
|
Make
|
|
Ship
|
|
Sell
|
|
2. Match the functions below to the actions they perform.
paste0()
|
|
parse()
|
|
eval()
|
|
2. Match the questions to the 3 or 4 states that someone can have when considering disease spread.
Susceptible to infection
|
|
Infected
|
|
Recovered
|
|
Susceptible to infection – again
|
|
3. Which function allows you to remove duplicate data from your data set?
3. What are some additional parameters you can use for label propagation?
Select all that apply
3. How can you check the warnings to see if there is anything you should be concerned about?
4. Fill in the blank below.
The function allows you to save graphs in pdf or png formats.
4. Fill in the blank below
The term refers to data in quotation marks - the gsub() function manipulates and replaces patterns in this term.
4. What can you type into RStudio to find more information about a function?
Select all that apply
5. How can you ensure that your analysis is reproducible?
4. You analyzed your customer data and found it made sense to cluster your customers into three distinct categories. When visualizing the data you see that there are a couple of data points that look to be between clusters. What could you conclude?
3. Why might it be advantageous to analyze word combinations or phrases instead of single words when doing a sentiment analysis?
4. When Hewlett Packard started tracking a range of employee factors, what were the results?
Select all that apply
5. The datasets graphed all have similar summary statistics (including means and variances). What valuable lesson(s) can be learned from comparing the graphs?
4. Match the common relationships or patterns with the type of network they represent.
Manager and employee
|
|
Emails between co-workers
|
|
Citizens in the same tax bracket or state
|
|
Members of the same gym
|
|
4. What does S represent?
Select all that apply
5. Complete the matrix below.
high out-degrees but low in-degrees
|
|
high in-degrees but low out-degrees
|
|
4. Which function adds a linear regression line to your model?
4. After running a Breusch-Pagan test, how would I know that there is no heteroscedasticity?
Select all that apply
5. Match the Twitter terminology to the correct description.
# (Hashtag)
|
|
Twitter handle
|
|
@ (at)
|
|
Follower
|
|
3. Why do we need to take the largest positive eigenvalue?
4. Which function allows you to read zip files?
4. Why is calculating PageRank so much faster than calculating eigenvector centrality?
4. What type of data does the grep() function work with?
5. What are some ways to save and display your rCharts plot?
Select all that apply
5. Which question(s) can you answer with text analysis?
5. When detecting outliers, the chief goal is to:
5. Why is it important to remove multicollinearity?
5. Which function allows you to check the structure of a data set that you create?
5. Which function can remove "<>" from the data?
Match the function to its purpose.
grep()
|
|
length()
|
|
table()
|
|
order()
|
|
Fill in the blank below.
is a measure of the extent to which an increase in one variable corresponds to the increase in another variable. This does not imply causation, which determines whether or not a variable causes the effect on another variable - rather, it determines whether or not there is any connection between the variables that we can quantify.
Can a table in SQL be joined to itself? (True/False)
How do you calculate R squared?
The aes layer contains:
Match the following terms to the correct definition:
INNER JOIN
|
|
(LEFT or RIGHT) OUTER JOIN
|
|
FULL OUTER JOIN
|
|
CROSS JOIN
|
|
Fill in the blank below.
Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together physically in a graph are probably more similar than data points that are farther apart.
Fill in the blank below.
The function shows us the structure of the data.
How do you measure the explanatory power of your predictive model?
How was Target able to identify their pregnant customers?
Please fill in the blank below.
The function can help you summarize your data in different ways with just a few clicks - this can be faster and more efficient than using VLOOKUP() and HLOOKUP().
Please fill in the blank below.
The function can help you summarize your data in different ways with just a few clicks - this can be faster and more efficient than using VLOOKUP() and HLOOKUP().
Fill in the blanks below.
The method is part of unsupervised machine learning and discovers new patterns or groups of data, while the method is part of supervised machine learning and assigns data points to known groups or categories.
Match the Leaflet functions to the descriptions below:
This base function generates the relevant map objects
|
|
Creates the visual that defines the look and feel of the map. Many kinds of title are available
|
|
Pulls geographical information from Google Maps. Google limits the number of queries to 2500 per day, so use geocodeQueryCheck() to track how many you have used
|
|
Plot for latitude and longitude positions of all points in our data frame
|
|
What are some of the negative effects of adding as many variables as possible to our model?
Select all that apply
Please put the steps in order for scraping text data from a webpage.
In order to ensure that you don't mistake randomness for patterns, what do you need to do?
What are some reasons to use APIs?
Please select all that apply
What is supervised machine learning?
Fill in the blank below.
You can create a by wrapping code in curly braces. This can help you streamline your code to perform multiple steps in one line, similar to a for() loop.
Why do we cluster the basketball data with 3 clusters after we analyze it with 2 clusters?
When running a variable selection model, how does the computer know when it has found the right variables?
Why do we use packages for data manipulation instead of just using built-in R functions?
Select all that apply
Which function makes sure that the k-means analysis is reproducible?
Which function can you use to see the list of contents that your linear regression model produces?
What is the complexity parameter?
Which error appears for an invalid cell reference?
What happens as your model become more precise?
Why is clustering more powerful than visualizing?
Select all that apply
Please fill in the blanks below:
adjustments affect what data is displayed, while adjustments affect how the data is displayed.
Which function can you use to run a logistic regression model?
Why is it important to remove multicollinearity?
Please fill in the blank below.
The interval of a regression line, or the standard error, tells you with a certain percentage certainty where the best fit line can be.
When should you use density-based clustering?
Please select all that apply
1. Fill in the blank.
A connects points (nodes) by lines that represent relationships. By studying the interactions between people, places and events, you can determine how messages, ideas, and diseases spread and how a change in one thing can cause a cascading set of effects.
1. How can you make differentiate between categorical variables in a regression?
2. Fill in the blanks below.
Data can be , like temperature which increases incrementally and can be fluid, or , like country names that cannot be divided.
2. Drag the term to the box next to the correct description.
Collection of elements of the same type
|
|
Multiple rows and columns of the same data type
|
|
Collection of elements of different types
|
|
Multiple rows and columns of different data types
|
|
1. Which industries can benefit from using predictive analytics?
2. Choose the correct pair of words to complete the statement.
The objective of a good model is not to fit the data perfectly, it's to have the lowest ___ when applied to new, _____ data.
1. Which of the following data science skill areas are necessary, if an organization wants to use data to drive decision making?
Select all that apply
1. Which of these questions may be most impacted by seasons or cycles?
Select all that apply
1. Which questions will you be able to answer by the end of this course?
Select all that apply
2. Fill in the blank below.
A interval is a range of possible values based on the standard deviation of the data.
1. Why do we define eigenvector centrality based on the number of followers that an account has?
1. What does modularity measure?
1. Which two functions do you need to save your image output as a PDF?
Select all that apply
2. Which of the answers below is a function?
Select all that apply
3. Fill in the blank below
are a common format for storing and manipulating geospatial data such as complex borders, polygons, and so on.
3. Match the function to the question.
Research and Development
|
|
Procurement
|
|
Production
|
|
Shipment
|
|
Sales
|
|
3. Fill in the blank.
Since seasonality and cycles affect forecasts and it is easier to predict short-term outcomes than long-term outcomes - always matters when building predictive models.
2. What questions should you ask about your data?
Select all that apply
3. Put the events in order to describe how an economic network may expand. Place the first event on top.
If you add the row and the row, then you get the observed data row.
2. Fill in the blank below.
The function tells R that you'll be working with the vertices or nodes of the graph, and encloses the graph data frame we created earlier.
3. Put the steps of running a t-test in the correct order.
3. When do you need to do a logarithmic transformation?
3. Fill in the blank below.
Use the function to rename the columns of the files that you create.
3. What type of model is the example below?
Common cold: someone can catch a cold, recover, enjoy a period of immunity and then be susceptible to another cold.
3. Match these three formatting functions to the correct actions.
col2rgb()
|
|
rgb2hsv()
|
|
hsv()
|
|
2. What was Google's main ingredient of its successful algorithm?
5. Which function calls up all the files in a folder for an overview?
5. If your data are not compelling at first,
Select all that apply
4. What is the main reason to set the 'standalone' parameter to TRUE for saving an HTML file?
4. What are some qualities that make a good visualization?
Select all that apply
5. How does industry knowledge help us understand our analysis?
Fill in the blank.
One of the benefits of is that you can evaluate many different factors or dimensions (more than humans can see) when looking for patterns.
4. Fill in the blank.
Ideas, such as sentiments about gun control or abortion, can be evaluated using sentiment analysis, a branch of text mining. To do this analysis you will need to use a scale that positions people on a according to the intensity of their feelings.
5. The example below is an example of which aspect of the 3 V's?
"1 flight of a Boeing 737 across the continental United States generates as much data as is stored in the U.S. Library of Congress."
3. You are looking to fill a data analyst position. The following descriptions of previous work experience were found in different resumes. Which resume should you continue reading?
5. There is a new employee starting at the firm. Using the graph of current employees, which employee do you predict the new employee will have the strongest ties with?
The new employee (CTO) majored in Literature and previously worked at Microsoft. They have one child and vacation every year in Jamaica. They will be located at the San Francisco office.
4. Fill in the blank below.
stands for locally weighted scatterplot smoothing and uses linear regression giving more weight to points closer to each point being fitted.
stands for local regression and fits a polynomial function to small segments of the data.
4. What conclusions were we able to draw from visualizing the network?
Select all that apply
5. True or False. Once you have the results of your model you have conclusively determined the trend of the data and/or have an accurate representation of what is happening in the world.
4. You can use the predict() function to make predictions using the model that you developed. Put the prediction use cases below in order according to the business function it was used for.
4. What is TRUE about Twitter?
Select all that apply
4. Match the functions to the correct actions.
write.csv()
|
|
str()
|
|
ifelse()
|
|
sum()
|
|
4. What are some of the conclusions from the analysis we did on the political data set?
Select all that apply
3. Why might bar graphs be misleading?
5. Which function results in an interactive slider?
5. You are creating a feedback survey to send your customers. You already know their zip code, education level, and age. Which additional survey item captures a different type of information and may add explanatory power to your model?
5. What should you keep in mind as you're developing and running your model?
Select all that apply
5. Which function makes the output randomized?
6. Which of these safety tips are correct? Select all that apply.
Which type of data contain sets of categories?
How can you ensure that your analysis is reproducible?
Which of the following SQL functions can be used on text fields?
Select all that apply
What are some important questions you have to ask in order to be comfortable with your model?
Select all that apply
Match the function names to their descriptions.
Sets labels for axes and title
|
|
Flips the axes of a graph
|
|
Splits up data by category to give smaller individual graphs
|
|
Creates an area plot
|
|
Return the total number of Employees in the table
|
|
Return the smallest number of Employees in the table
|
|
Return the number of characters in the Company field
|
|
Return the first 2 characters of the Company field
|
|
Return the number of records in the table
|
|
Return the second character of the Company field
|
|
How does clustering help when there are more than 3 attributes in the data?
Fill in the blank below.
In graphics, the transparency argument is called , where 0 is entirely transparent, and the default of 1 is entirely opaque.
Match the terms to their definitions.
Variance
|
|
Standard deviation
|
|
Distribution and "normality"
|
|
Covariance
|
|
Correlation
|
|
Slope
|
|
R squared
|
|
p-values
|
|
Put the four steps of building a classification tree in order.
What are descriptive statistics?
Please fill in the blank below.
In order to freeze a reference, you can use the .
Fill in the blank below.
The main goal of clustering is to intra-cluster distance (the distance between points in a cluster) and inter-cluster distance (the distance between clusters). This ensures that the clusters are as defined and separated as possible.
Please fill in the blank below:
, the data visualization expert, argues that the visualization presented initially to NASA did not make the dangers clear for the Challenger shuttle launch, which exploded shortly after launch.
What are some of the advantages of using LASSO over Ridge?
Please select all that apply
Identify the seasonality time frames for each example.
Patterns of TV commercials
|
|
Electricity use
|
|
Typical office hours
|
|
Cable bills
|
|
How might finding the purchase patterns of different groups help you with your customers?
Select all that apply
Put the basics of making API calls in order.
What is unsupervised machine learning?
Why do we need to validate the model?
How can we address the limitations of our analysis to see the data differently?
Select all that apply
What does the Akaike Information Criterion (AIC) do?
Select all that apply
What type of data does the grep() function work with?
What is an important step so that R can read numbers as categories?
Which picture below displays the standard error of a best fit line?
Which of these is not an attribute of classification?
Select all that apply
Why should you avoid using pie charts when visualizing data?
Which standard delimiters can Excel identify to split text into multiple columns?
What is one of the dangers of increasing the number of clusters?
Which function can you use to check the number of queries you have left for geocode()?
Note: Google limits your queries to 2500 per day, so make to check this if you are going to be geocoding a lot of data points.
Why do we want to minimize false negatives in the bank marketing example?
What is multicollinearity?
What does it mean if you have a small p-value?
What inputs do you need to give to dbscan()?
Please select all that apply
1. Classify the each document into one of these three categories: Talent, Research and development, Budget.
Meeting minutes on recruitment
|
|
Meeting minutes on expenditures
|
|
Meeting minutes on A/B testing
|
|
Meeting minutes on new vendors
|
|
1. Fill in the blank below.
Highly unusual data and other anomalies in a data set are called .
1. How much experience do you have in Excel?
Never used it before I'm an expert
1. Why is it important to format the axes correctly?
Select all that apply
1. Drag the description into the box next to the appropriate term.
Integer
|
|
Double
|
|
String/characters
|
|
Boolean/logicals
|
|
Factor
|
|
1. Match the reason data are valuable with its description.
Compliance
|
|
Automation
|
|
Dashboards
|
|
Predictive analytics
|
|
2. If the average high temperature in January is 40- 50 degrees Fahrenheit, order the temperatures with the temperatures that are most likely outliers at the top and the temperatures that are least likely outliers at the bottom.
1. What does every row in the customer data set represent?
2. If the average high temperature in January is 40- 50 degrees Fahrenheit, order the temperatures with the temperatures that are most likely outliers at the top and the temperatures that are least likely outliers at the bottom.
1. Fill in the blank below.
Regression is a model that captures the between two or more variables.
2. Fill in the blank below.
stands for LOcal regrESSion, which is a black box method that fits a polynomial function to small segments of the data.
1. What are some examples that prove what happens to one node can happen to other nodes in a network?
Select all that apply
2. Please fill in the blanks below:
If two nodes are in the same community, then their delta is equal to , otherwise it’s equal to .
2. What are two ways to import data from your computer into RStudio?
Select all that apply
2. Fill in the blank below.
R is a powerful tool for because the graphics tie in with the functions used to analyze data
3. Fill in the blank below.
The merge() function or join() function can be used to combine two .
2. Please fill in the blank below.
When recognized and credentialed experts incorrectly predict outcomes, they may claim that despite poor results, their reasoning was sound or there were poorly timed serendipitous events. Greater can help ensure incorrect predictions have real consequences.
3. To do outlier detection you need:
3. Put the steps of the data chain in order.
3. What is an adjacency matrix?
3. To do outlier detection, you need:
Select all that apply
2. Which symbol do you use to tell R that you are ending the use of your function()?
2. What does it mean if you have a small p-value?
2. Put the steps for standardizing scales in order.
2. Fill in the blank below.
You can remove punctuation in your data with the function.
2. Match the steps in identifying message diffusion to the correct image.
|
|
|
|
|
2. Which code below can be used to remove the record in the 312th row of data?
3. Please fill in the blank below:
The sum of all PageRank values for a network is .
4. Fill in the blank below.
is a programming term that means "named"; we use this term to indicate that data has been input into the R environment.
4. Why did the linear model line break into 5 different ribbons with the "fill = Continent" code?
4. Which function enables the display size to adjust to the size of the browser window?
5. Interactive visualization allows users to:
Select all that apply
4. How can we address the limitations of our analysis to see the data differently?
Select all that apply
3. You already have a successful business in Philadelphia. You want to expand into another, similar city. Use the visualization to help determine what city you should launch in next.
5. If you wanted to replicate the case study on Newton and extract political dispositions among Twitter users, order the steps you would take.
4. The 3 V's of data are:
Select all that apply
5. Fill in the missing words.
5. Accountability is important to consider prior to launching a data science project.
Accountability: a clear project owner and the key stakeholders of the project need to be clearly defined – and this depends on the project. It may be the or another C-suite executive such as the Chief Data Scientist.
5. Identify the seasonality time frames for each example.
Patterns of TV commercials
|
|
Electricity use
|
|
Typical office hours
|
|
Cable bills
|
|
4. Which calculation does the "closeness" function in the igraph package perform?
4. What does it mean if your model has a smaller standard deviation of residuals?
5. Match the key terms below to their descriptions.
Q-Q plot/ distribution of errors
|
|
p-values
|
|
VIF
|
|
Breusch-Pagan test
|
|
AIC
|
|
4. Which function lets you see how many calls you have left from the Twitter API?
3. Why is sorting your data by betweenness and eigenvector centrality based on the number of followers helpful?
4. Fill in the blank below.
stands for Hue, Saturation, and Value.
5. How can a politician use these metrics to help with their fundraising?
4. Visualization is an iterative process. Put the steps in order starting with "Analyze"
5. Which part of the "Simple App" application is the user input?
5. You finished building a predictive model. Which questions may people have about it?
5. Why is it important to sanity check yourself before you move on to the next step of your analysis?
Select all that apply
5. Which visualizations below show time-series analysis?
Select all that apply
5. What conclusions can be drawn from the visualization below showing cumulative dispersion of Tweets over time?
Select all that apply
7. When should you report a needlestick injury to your regional manager?
Which operator pulls rows that contain specified terms you're searching for to create a new dataset with only those rows?
What are some conclusions we see when we graph points per game by minutes per game with three clusters?
Select all that apply
Order these logical operators from fastest to slowest in terms of query performance:
After running a Breusch-Pagan test, how would you know that there is no heteroscedasticity?
Select all that apply
Fill in the blank below.
The 'gg' in ggplot2 stands for .
Table: Company_Employees
Which SQL code will result in a frequency distribution of the Company field?
Match the terms to the descriptions.
Sum of all the squared distances between data points in different clusters
|
|
Sum of all the squared distances between points within the same cluster
|
|
Total sum of squares
|
|
Fill in the blanks below.
What do you need to run an F-test?
Select all that apply
Match the attributes to the decision tree calculation.
Categorical attributes Finds the largest class in the data Uses algorithms |
|
Continuous variables Finds groups of classes that make up over 50% of their data Minimizes classification |
|
Match the description to the statistics term
The average value you should expect to get out of a set of numbers
|
|
The number that occurs most frequently in a data set
|
|
The middle value when a set of numbers is arranged in either a decreasing or increasing order
|
|
Measures the dispersion of the data
|
|
What do What-If analysis tools do?
Fill in the blank below.
What are some aspects of a good visualization?
Please select all that apply
Please fill in the blank below
Group learning is called learning, and is made up of many 'weak learners', which is the term for a classification algorithm that performs better than random chance. This type of approach to classification trees is called a random forest.
Match the types of time series analyses to their descriptions.
Analyzes how time-series data can decompose into different frequencies that constitute underlying patterns
|
|
Studies wave patterns used to extract information from audio signals and images
|
|
The delayed correlation of a signal with itself (the past predicts the future)
|
|
Measures similarity between 2 different series and understands dependency between variables in the context of time
|
|
Please fill in the blank before.
DBSCAN uses a center-based approach to estimate for a particular point by counting the number of points present in a specified radius (eps) of that point.
What are some quick and easy-to-use visualizations that express frequency of words?
Please select all that apply
Which of these is an example of exploratory data analysis?
What is supervised machine learning?
What does SQL stand for?
What is heteroscedasticity?
Which function eliminates duplicate rows?
Why is clustering more powerful than visualizing?
Select all that apply
What is TRUE about correlation?
Select all that apply
Which attribute does not belong to clustering?
Select all that apply
What tab can you find the charting functionality in?
Which function would I use to change transform the text below to all capital letters?
coUpE --> COUPE
What is one of the most effective methods for tuning an algorithm?
Which piece of code will pass along the output from one function to the input of another function?
What type of regression offsets the inclusion of too many or irrelevant variables?
4. Which image below shows a polynomial regression?
What type of approach is support vector machines?
Which function in R can help you scale your data?
1. Fill in the blank.
There are many methods for analyzing data. When forecasting or predicting future events, the two most common methods are classification and .
1. Fill in the blank below.
A is a web of connections.
1. How can you search for different types of visualizations ?
4. Fill in the blank below.
is a programming term that means "named"; we use this term to indicate that data has been input into the R environment.
1. Why might three dimensional graphs be a good visualization option?
Select all that apply
2. According to Edward Tufte, what is one of the main reasons why the Challenger Space Shuttle was allowed to take off?
1. When using data you may encounter any of the challenges listed below. Match each challenge with its description.
Practical challenge
|
|
Epistemological challenge
|
|
Ethical challenge
|
|
Grand challenge
|
|
1. Match the opportunity for using data with its business function.
R&D
|
|
Buy
|
|
Make
|
|
Ship
|
|
Sell
|
|
2. Fill in the blank below.
Classification is a method that the behavior of an object or individual so we can forecast future events.
1. Match the method to the questions below.
How do people group together based on preferences?
|
|
How can you anticipate what people will like?
|
|
How can you anticipate how much someone will buy?
|
|
How can you reach the maximum number of people most efficiently?
|
|
How can you predict overall sales?
|
|
1. Fill in the blank below.
To speed up our work and avoid running calculations on each data point manually, we can use the loop function, which performs a set of operations as many times as you tell it to.
1. Which questions can we use local regression to answer?
Select all that apply
1. Put the steps in order that you go through in R to understand message diffusion.
3. What are two ways to load data from the Internet into RStudio?
Select all that apply
3. Fill in the blank below
Web pages today are - they constantly update as the relevant information changes.
2. Which of these functions can be used for applying functions to data frames?
2. What is Big Data?
3. Fill in the missing word.
Some of the most interesting data generated and publicly available today can be accessed through something called an . Instead of downloading finite-sized excel spreadsheets or data sets, it allows you direct access to a database. You don’t have to download all of it, you can query for the sections and types of data you want or syphon data at a much higher speed than a regular download.
2. Which task can take up a considerable amount of a data scientist's time?
3. Match each person in the network with their strength.
A person with a lot of connections
|
|
A person whose average path to all others is shortest
|
|
A person with the most shortest paths
|
|
2. Match the networks to their network analysis use cases.
Healthcare organizations
|
|
Websites
|
|
Businesses
|
|
Individuals
|
|
3. Fill in the blank below.
vertex_connectivity() can tell you how many need to be removed from a graph in order for any 2 nodes to become disconnected.
2. Fill in the blank below.
You can run the correlation analysis using the function .
3. What is the moving average?
2. Fill in the blank below.
The package contains nice, pre-set color schemes that are useful when creating an interactive network visualization.
3. What does the "ego" function do?
2. Which statement below is TRUE?
2. Why might it be better to use the PageRank metric instead of the eigenvector metric for a large network?
Select all that apply
3. Why do we use ‘==‘ instead of ‘=‘ to pull the day shift data?
4. Fill in the blank below.
The function adds a layer to a graph without having to create a data.frame and map it to the scales. This allows us to easily add text or other information right onto our visualization.
4. How do you stop the application from running?
3. While you may have access to big data, it does not mean that you have
3. Why can't we use View function to view the k-means results?
4. Ensemble learning, a concept in machine learning, happens when a group of learners are used together to arrive at a more accurate decision. Based on this concept, which Yelp! review would you consider when deciding whether or not to dine at a restaurant?
4. If you wanted to use text mining to know the road conditions, which source would you try using first?
3. Which of these is not a type of data?
4. When evaluating a work sample provided by a candidate for your data science team, which of these questions might you ask?
Select all that apply
4. Which of the following is MOST characteristic of an opinion leader?
4. Match the situation with your next step.
You want to see how two variables interact.
|
|
You want to see how five variables interact.
|
|
You want to see if you can get a better fit with your five variables.
|
|
You want to see if your five variables are being influenced by seasonal changes.
|
|
4. ignore.case = TRUE is useful when
4. Which functions below allow you to compare your data set to a normal distribution and plot a bifurcating line on the graph?
Select all that apply
4. Which image below shows a polynomial regression?
5. Match the searchTwitter() arguments to the correct functions.
searchString
|
|
n
|
|
lang
|
|
since
|
|
until
|
|
geocode
|
|
sinceID
|
|
maxID
|
|
4. Why is it important to save dynamic plots as html files?
Select all that apply
5. Match the functions to what they do.
colnames()
|
|
rownames()
|
|
view()
|
|
write.csv()
|
|
4. In order for the str_detect() function to work, what format does the data need to be in?
5. Why is the 'if(!require)' function used in the code below?
if (!require("devtools"))
install.packages("devtools")
devtools::install_github("rstudio/shinyapps")
5. When detecting outliers, the chief goal is to:
5. What can we learn from the IBM case study?
5. Which function can help you identify periodicity quickly?
5. Which outputs do you get from the dispersion simulation?
Select all that apply
1. Data science is at the intersection of which three domains?
Put the six Data Science control cycle steps in order, starting with “Ask”
How does industry knowledge help us understand our analysis?
Match the following SQL Server components to their definition
Server
|
|
Database
|
|
Table
|
|
SQL Server Management Studio
|
|
Match the methods of variable selection to the correct descriptions.
Forward selection
|
|
Backward selection
|
|
Step-wise selection
|
|
Identify the three things necessary to make a graph in ggplot2.
Select all that apply
Match the table type to the statement that would create it:
Global Temporary Table
|
|
Local Temporary Table
|
|
Permanent table
|
|
In order for us to determine how much variation our clusters account for, we need to:
Clustering is a form of what type of comparison?
Put the steps of running a t-test in the correct order.
Put the steps of k-NN in order.
Please fill in the blank below.
Match the What-If analysis tool to its description:
Determines how to get a desired result for a dependent variable within the analysis
|
|
Allows you to consider multiple combination of values for independent input variables in a analysis
|
|
Sees the effects of varying one or two variables in an analysis
|
|
Fill in the blank below:
In entropy, the number indicates 100% of the data is the same, and the number indicates a 50-50 split.
Please fill in the blanks below
Shiny applications have two basic components - the script, which defines the appearance of the app, and the script, which contains actions to perform based on user input.
Please put the flow of AdaBoost.M1 in order below.
Please fill in the blank below.
The process of dividing the time series into its components is called - it divides it up into four components called level, trend, seasonality, and random error.
Please match the point names to the descriptions.
The points which are present in the interior of the dense region
|
|
The points which are present on the edge of a dense region
|
|
The points which are in a sparsely occupied region
|
|
Put the steps of simplified Luhn's method in order below:
1. Why did we choose R programming language over other languages?
What is unsupervised machine learning?
Which functions below allow you to compare your data set to a normal distribution and plot a bifurcating line on the graph?
Select all that apply
Which statement is not true if you receive a positive result from a cancer test that is 95% accurate with a base rate of 1 out of 5,000 people a month?
What would be the output of the following code for vector 'v':
v[2:7]
What is one of the dangers of increasing the number of clusters?
What is R Squared?
Which of these is not a strength of kNN?
Select all that apply
Which of these is regression not designed to do?
Select all that apply
Please fill in the blank below.
These two functions allow you to look up a value in a table, and return the desired value from another column in that table. The is for vertical tables, and the is for horizontal tables.
What is the complexity parameter?
Which analogy best describes the UI and server script?
What should you do before applying Ridge regressions?
What is a powerful R package you can use for text mining?
Which SVM classifier plots a 'worst-fit line'?
Which of the following are applications of principal component analysis?
Please select all that apply
1. Fill in the blank.
regression is just like univariate or linear regression, but instead of using just two variables to build a model, it can factor in more variables when building the forecasting model.
1. Fill in the blank below.
plots intermediate points between two locations.
1. How does R determine whether two nodes belong in the same community?
1. Fill in the blank below.
is a term that means only looking at a portion of the data. It is denoted in R by a '$' symbol.
1. Which operator pulls rows that contain specified terms you're searching for to create a new dataset with only those rows?
2. When Hewlett Packard started tracking a range of employee factors, what were the results?
Select all that apply
1. Put the steps of a data science project in order.
1. Why did the Oakland A's 'Moneyball' strategy succeed?
1. What are some pitfalls of clustering?
Select all that apply
2. How can you determine what people are interested in?
Select all that apply
1. Fill in the blank below.
can have a very negative impact on linear regressions if they are not identified and handled properly.
1. What are some sanity check questions you should ask yourself when building regression models?
Select all that apply
1. Which function allows you to select rows that you will keep in your data set?
2. What is the underlying concept of edge betweenness?
2. Which piece of code will retrieve only the third column of a matrix called 'm'?
2. Why is faceting the baseball plot not a good option?
Select all that apply
3. Fill in the blank below.
data is also valuable for the initial exploratory data analysis.
2. Which of the following is a practical challenge that companies face when using data?
2. Which three words are used to describe Big Data?
Select all that apply
3. Identify where in the data chain the issues below could be addressed.
NOTE: Preparing and visualizing data are often iterative processes and you should remember to "Sanity Checks" to ensure that you're continuing to move in the right direction.
When cleaning data, consider...
|
|
When visualizing data, consider...
|
|
When both cleaning and visualizing data, consider...
|
|
3. Fill in the blank below.
The Index measures similarity between people or places in a network. It’s a simple calculation that takes the number of things in common divided by the total unique number of things or connections that each node in the network has.
3. The best way to gain new insights about people and places is to
2. What is NOT an example of a regression analysis output you will create in this course?
2. Which package can you use to create a 3D plot?
2. Match the functions below to the correct actions.
ts()
|
|
decompose()
|
|
diff()
|
|
acf()
|
|
lm()
|
|
filter()
|
|
3. Match the arguments used to plot an interactive graph to what they do.
Links =
|
|
Nodes =
|
|
Source =
|
|
Target =
|
|
Value =
|
|
NodeID =
|
|
Group =
|
|
fontSize =
|
|
opacity =
|
|
charge =
|
|
2. Fill in the blank below.
Use the function to ensure there are no duplicate records in your data.
3. How do you down-weight famous people and boost less famous ones?
3. Which function did we use to calculate the total number of donors for each legislator?
5. Fill in the blank below
In order to select specific values from our crime data, we first need to tell R that the
data we want to use is in a format.
4. When should you set the 'flip.y' argument equal to FALSE?
3. Which UI component adds a checkbox to the application?
4. The real value of more accurate information is in the
5. If you want to know how accurate your classification model is, what information do you need?
4. Label each aspect of the graph.
D
|
|
C
|
|
A
|
|
B
|
|
4. What's new about Big Data?
4. Data science is at the intersection of which three domains?
5. Fill in the blank below.
centrality is a metric evaluates a node’s or a person’s importance by giving consideration to the importance of the nodes or people connected to it.
4. Which of the following are common causes of outliers?
Select all that apply
5. The "-", or minus, symbol before the "grep" function tells R to
4. How does Cook's distance help to identify outliers that can skew your analysis?
5. How did we know our polynomial regression model was stronger than our linear regression model?
Select all that apply
4. Which package do you need to load to use the setnames() function?
4. What are some things that infection can depend on?
Select all that apply
4. What can a basic link prediction algorithm be used for?
Select all that apply
5. Put the steps to extract an email address from an email in order.
1. Data science is at the intersection of which three domains?
5. What should you keep in mind when scraping information?
Select all that apply
Was this easy?
5. Which argument will determine the color of each region of the map?
5. What term do we use to refer the growth factor (slope of best fit line) that adjust the level (y-intercept of the best fit line)?
5. What are some ways that we can measure the relationship between 2 individuals?
Select all that apply
2. Data scientists’ responsibilities may include:
Match the method to the description (note: there are more methods listed than necessary).
Measures similarity between data points to group them and identify key similarities that you can use to find trends
|
|
Looks at how people, places, and other entities are connected, which can help you determine a sphere of influence and how to propagate your message quickly and effectively
|
|
Digests large amounts of text quickly and finds common themes, messages and patterns.
|
|
Which of these is not a common data problem?
Match the table combination types with their definitions:
JOIN
|
|
UNION
|
|
What are some key things you should always check for in your model?
Select all that apply
Fill in the blank below.
What are views in SQL valuable for?
Select all that apply
What are some conclusions from the visualization of Congress?
Select all that apply
Fill in the blank below.
is a measure of the extent to which an increase in one variable corresponds to the increase in another variable. This does not imply causation, which determines whether or not a variable causes the effect on another variable - rather, it determines whether or not there is any connection between the variables that we can quantify.
How do you calculate R squared?
Put the six Data Science control cycle steps in order, starting with “Ask”
Match the correlation values to their meaning.
The variables move perfectly in tandem
|
|
The variables move in a perfectly inverse fashion
|
|
There is no linear relationship between the variance of the variables
|
|
Put the steps of Goal Seek in order, starting with Click on the "What-If Analysis Group" button.
How was Target able to identify their pregnant customers?
Put the steps for creating a Shiny application in order by dragging them below.
What are some advantages of boosting?
Please select all that apply
What is the name of the plot below that plots the autocorrelation function for different values of time lag?
Please put the steps of n-fold cross-validation in order
3. What does it mean to “practice” coding?
Select all that apply
3. Fill in the blank below.
Excel's visualization capabilities can be limited by its menus. R does not have such a constraint. As a result, you can create beautiful, varied visualizations and make more nuanced changes with R than with Excel.
Which of these is an example of exploratory data analysis?
What does it mean if your model has a smaller standard deviation of residuals?
What function do you need to use to perform the k Nearest Neighbors algorithm?
Which piece of code will retrieve only the third column of a matrix called 'm'?
What types of data are difficult to cluster?
Select all that apply
What does it mean if you have a small p-value?
Which one of these diagrams shows an entropy of 1?
What is the Analysis ToolPak?
Which function looks up a particular value in a table and produces the row in which the value is located?
Which of these is not an attribute of classification?
Select all that apply
Which function enables the display size to adjust to the size of the browser window?
What is the only difference between the Ridge penalty and LASSO penalty?
Please fill in the blank below.
The term means "data about data". It contains context ("resource descriptions") for objects of interest, such as MP3 files, library books, or satellite images.
Please fill in the blank below.
The product is the sum of the products of all the same dimensions, while the product is the sum of the products of all the different dimensions.
Please fill in the blank below.
is used for dimensionality reduction of the data by decomposing a matrix into three different matrices.
1. Fill in the blank.
Highly unusual data and other anomalies in a data set are called .
1. Before combining data from two separate data sets using the "join" command it is important to
1. How can you make sure that R doesn't take the weights of a network graph into account?
1. Match the functions to the actions that they perform in R.
graph.data.frame()
|
|
str()
|
|
V()
|
|
E()
|
|
1. Match the names to the graphs below.
|
|
|
|
2. Fill in the blank below.
Clustering and data mining are types of data analysis.
2. Match each question with the best method for answering it.
Based on their shopping history, what commonalities are there among our customers?
|
|
Based on a customer's shopping patterns, is it likely that this customer is pregnant?
|
|
What do people think about our brand?
|
|
Is it likely that this shopper will purchase our product?
|
|
When a disease spreads, are there any patterns in its spreading?
|
|
With the symptoms exhibited, what diagnosis might a doctor propose?
|
|
1. Put the 5 functions of an organization in order.
2. How do businesses use probabilistic algorithms?
Select all that apply
1. Fill in the blank below.
Networks are not always obvious, they are in the vastness of increasing amounts of data collected today.
1. Which package helps us to visualize all the correlations in the data at once so we can get a better sense for the variables that may have the greatest predictive power?
2. Fill in the blank below.
Data is extracting information from large quantities of data to find insights, patterns and latent connections.
1. Which function counts the number of nodes (vertices) in a graph?
1. Put the five steps of label propagation in order.
2. Which of the answers below is a function?
Select all that apply
2. Which of these functions is similar to the 'search and replace' function in other programs?
3. Match the method to the description (note: there are more methods listed than necessary).
Measures similarity between data points to group them and identify key similarities that you can use to find trends
|
|
Looks at how people, places, and other entities are connected, which can help you determine a sphere of influence and how to propagate your message quickly and effectively
|
|
Digests large amounts of text quickly and finds common themes, messages and patterns.
|
|
2. When categorizing observations, you should...
3. Data science teams produce two types of products - functional and one-off products. Match the type of product with the description.
An interactive tool that is used repeatedly to gain new information
|
|
A tool that your client interacts with to get updated information
|
|
An analysis used to deliver information in a presentation
|
|
A visualization that supports a story
|
|
2. Put the steps of the functions of a business in order, starting with "Research and Development"
2. What happened when Wal-Mart decided to combine the data from its loyalty card system with that from its point of sale systems?
2. Put the 5 similar functions of an organization in the correct order.
3. Fill in the blank below.
The key to staying current with your data analysis skills is to continue reading, and staying up to date with the latest tools and applications.
3. Match the arguments in 3D plotting to their actions.
xlim
|
|
ylim
|
|
zlim
|
|
pch
|
|
box
|
|
grid
|
|
highlight.3d
|
|
type = "h"
|
|
lty.hplot
|
|
angle
|
|
3. Which function do you need to run an exponential smoothing model?
2. Why is it important to keep scale in mind when creating visualizations?
3. Which function denotes when you are looking for vertices?
3. Which function can you use to merge two files into one?
2. Which two packages do you need to load to parse email data?
Select all that apply
4. Why does it make sense to separate the code in multiple steps?
Select all that apply
5. Fill in the blank below.
In order to plot points of different sizes, you need to set the renderer argument to .
4. Why is the second line of code below (italicized for emphasis) important for the pull-down menu function?
selectInput('pTypeSubset', 'Provider Type Subset',
c("All", sort(as.vector(unique(medicare$provider_type)))))
5. What are some predictions that have been made based on social media, such as LinkedIn, blogs, Facebook, etc?
Select all that apply
5. How might finding the purchase patterns of different groups help you with your customers?
Select all that apply
3. You are testing a new classification algorithm. Which of the following results may suggest that your algorithm is performing accurately?
5. If most of the data points cluster around a regression line, it may be the case that:
5. Match the analytics methods to the companies.
Uses a variety of data sources about the business and analyzes data from customer ratings and reviews to sales trends in real time to provide loans.
|
|
Used predictive analytics algorithms to provide customized credit offers to its customers.
|
|
Used an analytics team to build a predictive model that was deployed among millions of swing voters to persuade them to vote for the President.
|
|
3. Why is clustering more powerful than visualizing?
Select all that apply
4. Sort the network from the broadest community to the most niche community.
4. Which method can you use to find opinion leaders?
3. Which package is useful to parse text?
4. How do you measure the explanatory power of your predictive model?
4. Which statement below is TRUE?
3. Which notation below would correctly identify punctuation?
4. Which symbol below means "not" in R?
4. How can you remove individuals from your data that do not have connections?
4. Please fill in the blank below:
The () function finds the contents of a variable and the () function applies that first function to the list of variable names in a data set.
4. Big data is like
5. Which plot would best help us visualize nested data (lists within lists)?
5. Which industries can benefit from using predictive analytics?
Select all that apply
5. What is something we CAN'T measure?
5. Which statement below is TRUE?
5. What are some use cases for the Jaccard Index?
Select all that apply
4. Big data is like
Fill in the blank below.
Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together physically in a graph are probably more similar than data points that are farther apart.
Which of the following are methods for importing data into SQL server?
How can you determine if the histogram of your residuals is really a normal, unbiased distribution?
What are the two ways you can use categorical variables in regression models?
Select all that apply
Fill in the blank below.
A big strength of the package is the ability to customize graphs by adding layers and adjusting the data so it doesn't look generic. This is a package that brings a lot of flexibility in visualizing your data beyond bar charts.
Can a table in SQL be joined to itself? (True/False)
Which function runs thirty different tests and aggregates the results from each one?
How can you ensure that your analysis is reproducible?
What are some important questions you have to ask in order to be comfortable with your model?
Select all that apply
How did GE use predictive analytics to offer tailored products to their customers?
Please fill in the blank below.
Please fill in the blank below:
A analysis quickly analyzes the outcome under a range of scenarios. It can be performed with one-way or two-way data tables.
Put the four steps of building a classification tree in order.
Please fill in the blank below.
The two variables, and , are how the UI and server script communicate between each other. One of them is passed as the user puts in a command, and the other is passed back out with the results.
Please fill in the blank below.
Forecasting means to something after looking at the available information.
What are the three parameters that the Holt-Winters() function requires?
Please select all that apply
Please put the general steps of principal component analysis in order.
3. Fill in the blank below.
Excel's visualization capabilities can be limited by its menus. R does not have such a constraint. As a result, you can create beautiful, varied visualizations and make more nuanced changes with R than with Excel.
4. Fill in the blank below.
It is best practice to annotate your code with , which you can do by putting a hashmark at the beginning of a line.
What happened when Telenor started contacting its customers?
Which function can you use to see the list of contents that your linear regression model produces?
What is one of the most effective methods for tuning an algorithm?
Why do we use ‘==‘ instead of ‘=‘ to pull the day shift data?
Which of these functions are the "bare bones" of ggplot2?
Select all that apply
You can use the predict() function to make predictions using the model that you developed. Put the prediction use cases below in order according to the business function it was used for.
What is supervised machine learning?
Which term represents the degree of slant in the data?
Why is it useful to audit your formulas?
Which attribute does not belong to clustering?
Select all that apply
Which function results in an interactive slider?
What type of penalty shrinks coefficients towards zero to control variance?
Which text mining approach focuses on word counts without regarding the words' positions in sentences, part of speech, or meaning?
Which special operator does R have to calculate the dot product?
Which plot (pictured below) is an exploratory graph used for generalization of the simple two-variable scatterplot?
1. Fill in the missing word.
Algorithms and are data science tools. They go beyond running an analysis. When they are customized they take in your questions and data and yield specific, and often actionable, answers and outputs.
1. Fill in the blank below.
Forecasting means to something after looking at the available information.
1. Please fill in the blank below:
The centrality of a node is the percentage of shortest paths in a network that include a given node. This metric allows you to assess which nodes are prominent connectors in a network, indicating that this individual can be a vital connector, or this node can be a critical liability in a computer network or supply chain.
1. Why is it hard to visualize the baseball data with ggplot2?
Select all that apply
1. Put the six Data Science control cycle steps in order, starting with “Ask”
1. Fill in the blank.
Some classification algorithms can go beyond determining whether someone will buy your product or not. The benefit of these algorithms is that they can tell you the that someone will buy your product.
1. What happened when Telenor started contacting its customers?
1. Match the description to the correct data set.
Training data set
|
|
Validation data set
|
|
Test data set
|
|
1. What are some of the practical applications of network analysis?
Select all that apply
1. When is adjusted R squared significantly different from regular R squared?
1. What are the goals of this course?
Select all that apply
1. Which functions do you need to use to plot a network where the nodes change color as they become affected or receive the message?
Select all that apply
1. What is an issue that Google ran into with its search algorithm?
3. Fill in the blank below.
3. It's important to look at the raw data before you it so you can see what it looks like and find initial patterns and insights without doing too much analysis.
3. How does the 'expanded' layout change how the information is presented?
3. What happened when Telenor started contacting its customers?
2. Naïve Bayes is probabilistic classification method commonly used for text classification. Most spam filters are based on a variant of Naïve Bayes.
Order the steps that a spam filter takes when deciding whether or not a new email should be placed in the spam folder. Place the first step at the top.
2. Which of the following are questions that should be asked when building a model?
Select all that apply
3. What is the Ark of the Covenant similar to in your data strategy?
3. What outcome can an association rule predict?
Select all that apply
3. Fill in the blank.
At each step of the process important decisions need to be made, and you should be using to make the right choices.
2. What can regression do?
3. What is multicollinearity?
2. What is TRUE about finding alpha?
Select all that apply
3. Match the function to the action it takes.
visNetwork()
|
|
visOptions()
|
|
visInteraction()
|
|
visPhysics()
|
|
2. What does running a "while" loop do?
2. What does the gather function do?
Select all that apply
3. What type of data does metadata contain?
Select all that apply
3. Fill in the blank below.
You can create a by wrapping code in curly braces.
5. How does the regression feature help us understand this data set?
4. Why is it important to understand the purpose behind various data science methods?
4. Which function implements k-means clustering with cosine distance?
4. If your classification model has a perfect, 100% accuracy, which of the following questions should you ask?
4. If two factors are strongly correlated, such as "temperature" and "what the temperature feels like" in our Bikeshare example, then we are...
4. Which of these questions can you answer with data mining?
Select all that apply
4. Which of these are examples of clustering?
Select all that apply
5. Networks can contain a wealth of information. Which of the following questions are best be answered by measuring an aspect of the network (assuming the necessary data are available).
Select all that apply
3. Which of these are types of classification?
Select all that apply
4. Which function is the correct way to tell R to read the data as characters?
5. Match the terms to their definitions.
Variance
|
|
Standard deviation
|
|
Distribution and "normality"
|
|
Covariance
|
|
Correlation
|
|
Slope
|
|
R squared
|
|
p-values
|
|
4. Which visualizations below helped us to identify the periodicity of errors in our model?
Select all that apply
4. Which function will make the data set horizontal?
4. What does the "select_nodes" function help you do?
Select all that apply
4. Please fill in the blanks below:
is the most popular hierarchical clustering method, it’s a bottom-up approach, while does the opposite and is a top-down approach.
3. Which function will convert a website response to an R-readable format?
5. Match the job with the description.
Data scientist
|
|
Data modeler
|
|
Data analyst
|
|
Data wrangler
|
|
5. What are some of the advantages of interactive visualizations?
Select all that apply
5. Which one of these is not a way for Kabbage to gather information on lending decisions?
Select all that apply
5. Which function can you use to eliminate specific pieces of data?
5. Which function can you use to create a forecast using the LOESS model?
5. Which function allows you to combine two separate data sets?
5. Match the job with the description.
Data scientist
|
|
Data modeler
|
|
Data analyst
|
|
Data wrangler
|
|
How does clustering help when there are more than 3 attributes in the data?
Which of the following SQL functions can be used on date fields?
Select all that apply
$stats
|
|
$n
|
|
$conf
|
|
$out
|
|
Which approach allows for the inclusion of categorical variables with multiple levels in regression models?
Fill in the blank below.
The function shows us the structure of the data.
Which of the following SQL functions can be used on text fields?
Select all that apply
Fill in the blank below.
Clustering and data mining are types of data analysis , which is a type of data analysis where the intent is to see what the data can tell us beyond modeling or hypothesis testing.
What are some conclusions we see when we graph points per game by minutes per game with three clusters?
Select all that apply
After running a Breusch-Pagan test, how would you know that there is no heteroscedasticity?
Select all that apply
The example below is an example of which aspect of the 3 V's?
"1 flight of a Boeing 737 across the continental United States generates as much data as is stored in the U.S. Library of Congress."
After running a Breusch-Pagan test, how would I know that there is no heteroscedasticity?
Select all that apply
What are some of the benefits of Scenario Manager?
Select all that apply
Match the attributes to the decision tree calculation.
Categorical attributes Finds the largest class in the data Uses algorithms |
|
Continuous variables Finds groups of classes that make up over 50% of their data Minimizes classification |
|
Put the steps in order for how the UI script communicates with the server script.
What can regression do?
Match the terms to their definitions.
Variance
|
|
Standard deviation
|
|
Distribution and "normality"
|
|
Covariance
|
|
Correlation
|
|
Slope
|
|
R squared
|
|
p-values
|
|
Match the types of trust to their definitions.
Calculation-based trust
|
|
Personal-based trust
|
|
Similarity-based trust
|
|
Institution-based trust
|
|
4. Fill in the blank below.
It is best practice to annotate your code with , which you can do by putting a hashmark at the beginning of a line.
5. Why will the following code give you an error?
a <- "Hello"
A
Which of these are examples of clustering?
Select all that apply
Which picture below displays the standard error of a best fit line?
What is the complexity parameter?
Why does it make sense to separate the code in multiple steps?
Select all that apply
Which function adds up the data in the previous cells to create a new column or row with cumulative sums?
Which questions can datafication help us answer
Select all that apply
What is unsupervised machine learning?
Which function measures the kurtosis of your data?
Which error appears for an invalid cell reference?
What is the complexity parameter?
Which part of the "SimpleApp" application is the user input?
Why is bagging useful when you are working with a small dataset?
What does the syntax below mean in R?
"[^A-z]"
Which R package contains the SVM model functions?
What is an example of how the direction of trust does not always go both ways?
1. Internal data may be some of a company's most valuable data. Which of the following may be valuable sources of internal data?
Select all that apply
1. Which function can you use to see the list of contents that your linear regression model produces?
1. Please fill in the blank below:
communication played a key role in the infamous bankruptcy of energy trader ENRON.
1. Which of the answers below passes the output of one function into the input of another function?
2. How does clustering help when there are more than 3 attributes in the data?
1. With network analysis, which of the following is the least common metric to be determined?
2. How did GE use predictive analytics to offer tailored products to their customers?
2. Fill in the blank below.
Each represents an individual or an object in the network. Each represents a relationship between two people, places or objects.
2. Match the function to its purpose.
setwd()
|
|
read.csv()
|
|
install.packages("ggmap")
|
|
geocodeQueryCheck()
|
|
1. Which questions can datafication help us answer?
Select all that apply
1. What is TRUE about API?
Select all that apply
1. Put the steps in order for creating a function that will render an animation for the dispersion simulation.
1. What does it mean when R says a graph is acyclic?
2. Fill in the blanks below.
data displays multiple occurrences per row and is easier to read in tables, while data displays one observation per row and is easier to plot with in ggplot2.
2. Put the steps for creating a Shiny application in order
2. Which of the following statements is not true about k-means clustering?
2. Match the network to the possible connections.
Twitter
|
|
Facebook
|
|
Netflix catalog
|
|
Past presidents
|
|
Your company
|
|
City
|
|
2. Please fill in the blank below:
The opposite of big data is .
2. Which of the following statements is not true about k-means clustering?
3. Based on this word cloud, generated from hotel reviews, order the words by frequency. Put the most frequently used words at the top.
2. Networks can be comprised of
Select all that apply
2. Match the variables in the equation "y = mx + b" to what they represent in the equation.
y
|
|
x
|
|
b
|
|
m
|
|
2. Fill in the blank below.
You can create a plot to check if the residuals in your model are normally distributed.
2. Which function allows you to create a forecast for your model?
3. What does it mean when someone communicates a lot or has a lot of followers but has no incoming messages and follows few others?
3. Match the functions to their actions.
readLines()
|
|
strsplit()
|
|
sapply()
|
|
plot()
|
|
2. What can identification of communities help uncover?
Select all that apply
3. What's the appropriate syntax for calling up sequential file names in a loop?
4. Why is it useful to create functions?
Select all that apply
4. Fill in the blank below.
As it turns out, a map of DC looks somewhat similar to the crime map of DC.
3. Which two functions can pass rCharts objects to Shiny?
Select all that apply
4. Fill in the blanks below.
The method discovers new groups or categories of data, while the method assigns data points to known groups or categories.
3. Why wouldn't a silhouette value be computed with one cluster?
5. Increasing the number of variables in a predictive model may not be beneficial because...
4. Match the situation with your next step.
You want to see how two variables interact.
|
|
You want to see how five variables interact.
|
|
You want to see if you can get a better fit with your five variables.
|
|
You want to see if your five variables are being influenced by seasonal changes.
|
|
4. Why did the Literary Digest incorrectly forecast a presidential win?
5. How might finding the purchase patterns of different groups help you with your customers?
Select all that apply
4. How can you use association rules in different industries?
Retail
|
|
Site developers
|
|
Bioinformatics
|
|
Social Media marketing
|
|
4. The process of extracting information from large quantities of data to find insights, patterns and other latent information is referred to as
5. Match the functions to their purpose.
geocodeQueryCheck()
|
|
warnings ()
|
|
cbind()
|
|
as.data.frame()
|
|
write.csv()
|
|
4. What do you need to run an F-test?
Select all that apply
3. Which visualization below is showing the periodicity of the data?
4. Which package allows you save your output as an html file?
4. Which function allows you to create a data frame?
5. Hierarchical clustering assumes that points with the shortest distance between them are:
4. What does it mean if someone on Twitter has a high in-degree and low out-degree?
Select all that apply
2. Data scientists’ responsibilities may include:
Select all that apply
5. Which of these features are available in RStudio? (Go into Global Options)
Select all that apply
5. Which of these are examples of dark data?
Select all that apply
5. Why is calculating closeness centrality useful?
Select all that apply
5. What can you do if you have missing data?
Select all that apply
5. Which package has the "ddply" function?
Put the six Data Science control cycle steps in order, starting with “Ask”
Match the terms to the descriptions.
Sum of all the squared distances between data points in different clusters
|
|
Sum of all the squared distances between points within the same cluster
|
|
Total sum of squares
|
|
Which of the following SQL functions can be used on date fields?
Select all that apply
5. What are two ways you can have increased certainty in your model's accuracy?
Select all that apply
Sort the variables as either continuous or discrete.
Continuous variables
|
|
Discrete variables
|
|
Discrete variables without a defined sequence
|
|
Fill in the blank below.
In graphics, the transparency argument is called , where 0 is entirely transparent, and the default of 1 is entirely opaque.
Order these logical operators from fastest to slowest in terms of query performance:
Fill in the blanks below.
The method is part of unsupervised machine learning and discovers new patterns or groups of data, while the method is part of supervised machine learning and assigns data points to known groups or categories.
How does industry knowledge help us understand our analysis?
Match the methods of variable selection to the correct descriptions.
Forward selection
|
|
Backward selection
|
|
Step-wise selection
|
|
The 3 V's of data are:
Select all that apply
How do you measure the explanatory power of your predictive model?
What are the three parts of an optimization model?
Put the four steps of building a classification tree in order.
What are two ways to run a Shiny application?
Select all that apply
Using the Capital Bikeshare data, what questions can we answer through regression analysis?
Select all that apply
If you add the row and the row, then you get the observed data row.
Please fill in the blank below.
The Index is a way of measuring the extent of similarity between two people or objects.
5. Why will the following code give you an error?
a <- "Hello"
A
1. Why is it useful to use the script window for writing code?
Select all that apply
Which of the following statements is not true about k-means clustering?
Select all that apply
What is TRUE about correlation?
Select all that apply
Which of these is not an attribute of classification?
Select all that apply
Which of the answers below is a function?
Select all that apply
Which function converts wide data to long data?
What is a good way to test for multicollinearity?
Which of these are examples of dark data?
Select all that apply
What does covariance measure?
Select all that apply
What tab can you find the charting functionality in?
Which of these is not an attribute of classification?
Select all that apply
Why can't we calculate the denominator of the Naive Bayes formula?
Which statement is NOT true about random forest?
Why is it important to convert all words to lower case before removing 'stop words'?
What should you do to a non-linearly separable data set?
What is TRUE about the Jaccard Index?
1. Fill in the blank below.
William Deming who is known for training hundreds of engineers, managers, and scholars about statistical process control in Japan after World War 2 has a saying “In God we trust. All others must bring .”
1. Fill in the blank below.
measures how changes in one variable effects another variable.
1. How do you automate the process for reading and formatting multiple files?
1. Why is it useful to use the script window for writing code?
Select all that apply
1. What is the main problem of visualizing the ebola data set in ggplot?
1. Fill in the blank below.
The main goal of clustering is to intra-cluster distance (the distance between points in a cluster) and inter-cluster distance (the distance between clusters).
2. Fill in the blank.
You can use to help answer questions, such as "Who do your customers trust?" and "How does information spread within your company?"
1. Match the reason data are valuable with its description.
Compliance
|
|
Automation
|
|
Dashboards
|
|
Predictive analytics
|
|
1. Which of these relationship types could be in a network?
Select all that apply
1. Why is it important to think about the purpose of the analysis before you manipulate your data?
Select all that apply
1. Fill in the blanks below.
You always have to your models and check for potential even before you can test it on new data!
1. What is the "for" loop used for in R?
1. What problems will you solve in this course?
Select all that apply
2. How can you search for multiple terms in one row?
2. Fill in the blank below.
The str() function shows us the of the data.
3. Which analogy best describes the UI and server script?
2. Match the ggplot function to its purpose.
Specifies to map the data as points.
|
|
Specifies the title of the graph.
|
|
Labels the x axis.
|
|
Labels the y axis.
|
|
Specifies how the legend and data points should be displayed.
|
|
3. Put the events in order to describe how an economic network may expand. Place the first event on top.
3. When you have data,
2. Why are the coordinates of the cheese data set centroids in decimals?
2. Fill in the blank below.
The fallacy involves making a decision based solely on easy quantitative observations and ignoring all other latent observations.
3. Select the statement below that is TRUE.
3. What is the first thing you need to do when you begin working in R?
3. Fill in the blank below.
coding is an approach that allows for the inclusion of categorical variables with multiple levels in regression models.
3. What do you need to do after every forecast period?
Select all that apply
2. Fill in the blank below.
You can use the function from the "data.table" package to make sure the columns have the same name so we can join the two data sets.
2. Which code below correctly shows how to take all the points that are reached by the 20th iteration of the simulation and make them blue?
3. Match the definition of the type of linkage with the correct term.
Single linkage
|
|
Complete linkage
|
|
Average linkage
|
|
Centroid linkage
|
|
2. What are some properties of data in JSON format?
Select all that apply
5. Good coding habits include:
Select all that apply
5. Fill in the blank below.
The package makes it easier to work with dates in R
4. Why is it useful to host apps on external sites?
Select all that apply
3. Why do we cluster only Democrat-introduced bills instead of both Democrat- and Republican-introduced bills?
4. Why do we look at 4 and 9 clusters when they only have an explained variance of 13.4%?
4. Match the common relationships or patterns with the type of network they represent.
Parent and child
|
|
Emails between co-workers
|
|
Citizens in the same tax bracket or state
|
|
Members of the same gym
|
|
4. Which of the following are common causes of outliers?
5. Fill in the blank below:
Remember, like all resources, - data that cannot fit on a single computer or server - has to be cost effective.
4. Fill in the blank below.
You need to understand how much your algorithm can explain the results in your data and how much of it is that is always present in large data sets and cannot be accounted for by your model.
5. A company is planning to re-brand one of its products and the product director wants to know how customers feel about some potential product names. Text mining can be most beneficial in which of these situations?
5. Besides network analysis, what are some other ways to approach data mining?
Select all that apply
3. Fill in the blank below.
You can automate a customized analysis by creating a !
3. Which function should you use to save your ggpairs analysis in R?
5. Exponential smoothing assumes data is made up of which 2 components?
Select all that apply
4. What are some examples of the functionality can have in your interactive visualization?
Select all that apply
3. Which function do you need to use to finish the saving process?
3. Why do you need to include two data sets in the graph.data.frame() function?
5. Match the method with the description.
Measures how quickly someone can spread a message that reaches every other point in the network
|
|
Measures how significant a node is as a connector of the ecosystem.
|
|
Measures who important someone is something is based on what they’re connected to.
|
|
Measures how important someone is based on their relative role in the network.
|
|
Measures how similar or redundant 2 elements of a network are and helps detect fraud
|
|
Takes the Jaccard Index to the next level of application and identifies communities by measuring the similarities of nodes.
|
|
2. R can read all these different file types except
5. Which method helps you model and understand how a disease spreads?
5. Which of the examples below demonstrates the "Shipping" step of business functions?
Select all that apply
5. Which symbol below do you use to end the for( ) loop?
5. Why is it important to make sure your data looks clean?
5. Why do you use the "set seed" function?
Select all that apply
2. R can read all these different file types except
In order for us to determine how much variation our clusters account for, we need to:
Put the following SQL clauses in the correct standard query template order:
In a non-biased model errors will be random. If errors are not random it means...
Select all that apply
How do we know we still need to refine our model further?
Fill in the blanks below.
Match the following SQL Server components to their definition
Server
|
|
Database
|
|
Table
|
|
SQL Server Management Studio
|
|
Fill in the blank below.
The main goal of clustering is to intra-cluster distance (the distance between points in a cluster) and inter-cluster distance (the distance between clusters). This ensures that the clusters are as defined and separated as possible.
How can you determine if the histogram of your residuals is really a normal, unbiased distribution?
What are some key things you should always check for in your model?
Select all that apply
What are some advantages of Excel?
Select all that apply
How do you calculate R squared?
Match the algorithm to the situation below:
A problem that is smooth and nonlinear
|
|
A problem that is linear
|
|
A problem that is non-smooth
|
|
Match the attributes to the decision tree calculation.
Categorical attributes Finds the largest class in the data Uses algorithms |
|
Continuous variables Finds groups of classes that make up over 50% of their data Minimizes classification |
|
Match the R script to its corresponding description.
Contains all the data and packages needed to run the application
|
|
Contains the computational logic needed to display results that depend on user input
|
|
Contains graphical user interface i.e what the app looks like and the control that user interacts with
|
|
Put the steps in order for regression using random forest:
Match the kernel type to the description below.
Tells the SVM function that the data can be separated by a straight line
|
|
A function in the form of a polynomial
|
|
A function that maps / projects data that is non-linearly separable
|
|
A function that maps / projects non-linearly separable data, but doesn't exist under certain circumstances
|
|
Please fill in the blank below.
The Index will calculate the similarity between politicians # and their donors by comparing the people that they are connected to.
1. Why did we choose R programming language over other languages?
Select all that apply
Why do we use packages for data manipulation instead of just using built-in R functions?
Select all that apply
Which function makes sure that the k-means analysis is reproducible?
What is R Squared?
Which attribute does not belong to clustering?
Select all that apply
Fill in the blank below.
You can create a by wrapping code in curly braces. This can help you streamline your code to perform multiple steps in one line, similar to a for() loop.
Why does R put an 'X' in front of numerical column names?
Select all that apply
Which package do we use to run the vif() function?
Why should you be careful when using summary statistics?
What keys should you press to freeze panes?
Which of the following is a practical challenge that companies face when using data?
Which of these is NOT an idea behind the Naive Bayes classifier?
What is true of the boosting approach?
Please select all that apply
What is a Term Document Matrix?
What is the implication of overfitting?
Please select all that apply
Which statement below is TRUE?
1. Fill in the blank below.
Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together are probably more similar than data points that are farther apart.
1. How do you calculate R squared?
1. What would be the output of the following code for vector 'v':
v[2:7]
2. Match each plot to the layout
|
|
|
1. What is an important step so that R can read numbers as categories?
1. The strength of a relationship or ties can vary. Match the description to either weak or strong ties.
People we really trust and rely on
|
|
Help a company learn and expand its reach
|
|
People we're connected to with different perspectives
|
|
Help a company through difficult times and gain a reputation
|
|
2. Fill in the blank below:
Data is not a burdensome workload; rather, it is a that companies can tap to discover new insights and drive their business forward.
1. With network analysis, which of the following is the least common metric to be determined?
1. Match the measures to their descriptions.
Degree centrality
|
|
Closeness centrality
|
|
Betweenness centrality
|
|
2. Fill in the blank below.
A good explanatory model will have residuals whose variance does not depend on the (predictor) variables.
1. Match the functions to the actions they perform in R.
lapply()
|
|
get()
|
|
do.call()
|
|
write.csv()
|
|
1. Match the types of trust to their definitions.
Calculation-based trust
|
|
Personal-based trust
|
|
Similarity-based trust
|
|
Institution-based trust
|
|
1. Please fill in the blank below:
An address has 4 numbers between 0 and 255 separate by periods that is assigned to an individual accessing a website. The site can often identify which specific computer is being used.
2. Which piece of code will change the heatmap colors so they have a low of green and a high of red?
2. Put the steps in order for how the UI script communicates with the server script
2. Which function runs thirty different tests and aggregates the results from each one?
3. Match each person in the network with their strength.
A person with a lot of connections
|
|
A person whose average path to all others is shortest
|
|
A person with the most shortest paths
|
|
2. Match the function to the question.
Research and Development
|
|
Procurement
|
|
Production
|
|
Shipment
|
|
Sales
|
|
3. What are some limitations of the k-means analysis on cheese customers?
Select all that apply
2. What can we assume when there were no more spikes in tweets of sympathy after the Newtown shooting occurred?
2. Using the the for loop function, or for(), is useful because
Select all that apply
2. Fill in the blank below.
is on average, how widely actual data is dispersed around (the predicted values, the mean, etc.).
2. What are the two ways you can use categorical variables in regression models?
Select all that apply
3. What is TRUE about measuring for seasonality?
Select all that apply
3. Match the measures of centrality to the correct description.
Betweenness Centrality
|
|
In-degree Centrality
|
|
Out-degree Centrality
|
|
2. Fill in the blanks below.
par() stands for and mar() stands for .
2. Which function creates a vector of sequential colors in a hexadecimal format?
2. Fill in the blank below.
To make sure your images are reproducible, you can use the function.
4. Fill in the blank below.
While it may look like a scatter plot, a maps a third variable to the size of its points.
3. Fill in the blank below
You can create a by wrapping code in curly braces.
3. Why is SelectorGadget useful for web scraping?
5. Match the terms to the descriptions.
Sum of all the squared distances between data points in different clusters
|
|
Sum of all the squared distances between points within the same cluster
|
|
Total sum of squares
|
|
5. It's very important that you don't mistake randomness for...
5. Using this network visualization, order the relationships based upon the strength of their connection. Put the strongest connection on top.
4. Which statement about APIs is true?
3. Which of the following are epistemological challenges?
Select all that apply
5. How else can you apply clustering?
Select all that apply
4. Your target demographic is educated, single women who are 25-40 years old. If you use text mining to analyze their comments on your website, what might you find?
Select all that apply
4. Match the questions with the correct step in the Data Science Control Cycle.
Step 1: Ask
|
|
Step 2: Research
|
|
Step 3: Model
|
|
Step 4: Validate
|
|
Step 5: Test
|
|
Step 6: Interpret
|
|
4. Which code below shows the correct way of using a function you have created?
Select all that apply
4. Match the function to the visualization it helps to create.
|
|
|
|
|
4. Match the network analysis application to the correct part of the data control cycle.
R&D
|
|
Buy
|
|
Make
|
|
Ship
|
|
Sell
|
|
4. Match the parts of the code below to their purposes.
matrix(c(1, 1, 2, 2,
1, 1, 3, 3),
nrow = 2,
ncol = 4,
byrow = TRUE)
m()
|
|
c()
|
|
nrow =
|
|
ncol =
|
|
byrow =
|
|
4. What negative effect does the image below illustrate?
1. Put the six data science control cycle steps in order, starting with “Ask”.
5. Which of these are examples of clustering?
Select all that apply
5. Why would you use a confusion matrix?
5. What is the first relationship we are going to look at using the Capital Bikeshare data?
5. Which function helps you to remove duplicate data from your data set?
5. Why is it helpful to sort the data in decreasing order by the weighted Jaccard similarity?
Select all that apply
2. What are two ways to import data from your computer into RStudio?
Select all that apply
What are some conclusions from the visualization of Congress?
Select all that apply
What are the three types of relationships between tables?
Select all that apply
Fill in the blank below.
To speed up our work and avoid running calculations on each data point manually, we can use the loop function, which performs a set of operations as many times as you tell it to. The advantage to this loop is that you can run different types of data through the same operations, and it does it automatically.
What are some things you should always check for in your model?
Select all that apply
Which of these is not a common data problem?
Match the table combination types with their definitions:
JOIN
|
|
UNION
|
|
Fill in the blank below.
$stats
|
|
$n
|
|
$conf
|
|
$out
|
|
What are the two ways you can use categorical variables in regression models?
Select all that apply
Match the Excel features to their function
Alphabetically labeled vertical cells
|
|
Numerically labeled horizontal cells
|
|
Tells you what cell you are in
|
|
Shows you the formula for the highlighted cell
|
|
A sheet with individual rows and columns
|
|
What did the polling practices of the 1936 U.S. election illustrate?
Select all that apply
Match the method to the description (note: there are more methods listed than necessary).
Measures similarity between data points to group them and identify key similarities that you can use to find trends
|
|
Looks at how people, places, and other entities are connected, which can help you determine a sphere of influence and how to propagate your message quickly and effectively
|
|
Digests large amounts of text quickly and finds common themes, messages and patterns.
|
|
Order these data formats from most to least structured.
What are some advantages of using Shiny?
Please select all that apply
What are some ways that you can fix multicollinearity?
Please select all that apply
What are some of the pros of SVM?
How do you down-weight famous people and boost less famous ones?
3. What is supervised machine learning?
What type of data does the grep() function work with?
What is an important step so that R can read numbers as categories?
What do you need to run an F-test?
Select all that apply
Which of these is not a strength of kNN?
Select all that apply
Why do we need to validate the model?
Why do we use packages for data manipulation instead of just using built-in R functions?
Select all that apply
What keys should you press to freeze panes?
What is the objective of a good model?
Why do you need to change the NA values to '#N/A'?
Which programming languages can not be used for data analysis?
What does it mean if an observation has more weight in a decision tree?
When would bike demand be the greatest in Washington DC?
Why did NAs appear when we initially read in the data?
What is TRUE if more people know each other in a community?
Select all that apply
1. Google's API (application program interface) allows users to:
1. How can you apply k-Nearest Neighbors for general functions of an organization?
Research and development
|
|
Procurement
|
|
Production and manufacturing
|
|
Sales and marketing
|
|
1. Fill in the blank below.
A regression model with several variables is called regression.
1. Which operator pulls rows that contain specified terms you're searching for to create a new dataset with only those rows?
1. Fill in the blanks below
Shiny applications have two basic components - the script and the script.
1. Fill in the blank below.
The method plots the percentage of variance explained by clustering for different numbers of clusters, which allows us to see how the variance differs with the number of clusters that you choose.
1. Which of the following sources would need to be datafied or converted into numbers, so that you can run analyses and gain greater insight?
1. Fill in the blank below.
While Big Data is a resource, is the domain that will draw insights from the data.
2. Fill in the blank below.
You can use to help answer questions, such as "Who do your customers trust?" and "How does information spread within your company?"
1. Fill in the blank below.
You have to use the function for randomized algorithms to ensure consistency of outputs.
1. What are some important questions you have to ask in order to be comfortable with your model?
Select all that apply
1. Which package has the "join" command that allows you to combine datasets?
1. Match the two sections of Congress to their correct descriptions.
House of Representatives
|
|
Senate
|
|
10. After a needlestick injury, how soon should you have your blood drawn?
3. How can you display the plot once it's assigned to a variable?
Select all that apply
3. What are two ways to run a Shiny application?
Select all that apply
3. What is one of the dangers of increasing the number of clusters?
2. There is a new employee starting at the firm. Using the graph of current employees, which employee do you predict the new employee will have the strongest ties with?
The new employee (CTO) majored in Literature and previously worked at Microsoft. They have one child and vacation every year in Jamaica. They will be located at the San Fransisco office.
3. Data science will allow you to:
Select all that apply
3. Match each question with the best method for answering it.
Based on their shopping history, what commonalities are there among our customers?
|
|
Based on a customer's shopping patterns, is it likely that this customer is pregnant?
|
|
What do people think about our brand?
|
|
Is it likely that this shopper will purchase our product?
|
|
When a disease spreads, are there any patterns in its spreading?
|
|
With the symptoms exhibited, what diagnosis might a doctor propose?
|
|
3. Why might Item Response Theory have trouble analyzing sentiment from tweets?
Select all that apply
2. Which package can we use to plot a network graph and measure centrality?
3. In a non-biased model errors will be random. If errors are not random it means...
Select all that apply
3. What is a good way to test for multicollinearity?
3. What is TRUE about LOESS?
Select all that apply
2. What does directed betweenness help you understand?
2. Match the networks to their network analysis use cases.
Marketing
|
|
Healthcare
|
|
Finance
|
|
Politics
|
|
3. What is the name of the chart below?
2. What do you insert the needle through when drawing up medication?
5. The aes layer contains:
4. Why is it useful to create functions?
Select all that apply
4. Match the HTML tags to their description (note: there are more answer choices than necessary)
<p>
|
|
<a>
|
|
<tb>
|
|
4. Which function makes sure that the k-means analysis is reproducible?
4. What happened when Telenor started contacting its customers?
4. Which of the following is MOST characteristic of an opinion leader?
5. Twitter is an example of a company that has an API. Which of the following data could you access via its API?
Select all that apply
5. Fill in the blank.
To avoid falsely concluding that one event caused another, a possible method to use is testing.
5. Increasing the number of variables in a predictive model may not be beneficial because...
5. Fill in the blank below.
is the conversion of seemingly immeasurable information into something that we can measure.
3. Which theory posits that people are more interconnected than we may realize?
4. What are some use case questions that can be answered by analyzing networks?
Select all that apply
4. Which function below allows you to add a best-fit plane to your 3D plot?
4. What is the seasonality factor?
5. How can you avoid variance in the Twitter user data?
Select all that apply
5. Fill in the blanks below.
The argument defines the size of the axis markers and the argument determines the size of the axis labels.
5. Match the graphs to the modularity scores.
|
|
|
|
5. Which package should you install to plot an interactive network visualization?
5. Which function adds up the data in the previous cells to create a new column or row with cumulative sums?
5. What types of data are difficult to cluster?
Select all that apply
5. Which of these situations below can you apply Naïve Bayes to?
Select all that apply
5. What are two ways you can have increased certainty in your model's accuracy?
Select all that apply
5. Why is creating a visNetwork useful?
Select all that apply
5. Which of these are limitations of clustering that you should consider as the data set increases in size?
Select all that apply
3. What are two ways to load data from the Internet into RStudio?
Select all that apply
Which function runs thirty different tests and aggregates the results from each one?
Remove the entire table
|
|
Add a row for company ABC for 2005 to the table
|
|
Change the value 1,500 to 15,000
|
|
Return a conditional value (e.g. large company or small company) based on the number of employees
|
|
Remove the row with company ABC
|
|
Specify that a query return the employees field as an integer
|
|
Match the functions to the actions they perform in R.
var()
|
|
data.frame()
|
|
View()
|
|
sd()
|
|
Match the key terms below to their descriptions.
Q-Q plot/ distribution of errors
|
|
p-values
|
|
VIF
|
|
Breusch-Pagan test
|
|
AIC
|
|
Which of the following are methods for importing data into SQL server?
Match the function to its purpose.
grep()
|
|
length()
|
|
table()
|
|
order()
|
|
The aes layer contains:
5. What are two ways you can have increased certainty in your model's accuracy?
Select all that apply
Which approach allows for the inclusion of categorical variables with multiple levels in regression models?
Please fill in the blank below.
In order to freeze a reference, you can use the .
Match the terms to their definitions.
Variance
|
|
Standard deviation
|
|
Covariance
|
|
Correlation
|
|
Slope
|
|
p-values
|
|
Fill in the blank below.
Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together physically in a graph are probably more similar than data points that are farther apart.
When using data you may encounter any of the challenges listed below. Match each challenge with its description.
Practical challenge
|
|
Epistemological challenge
|
|
Ethical challenge
|
|
Grand challenge
|
|
How do you stop the Shiny application from running?
What are some ways to standardize different scales?
In the SVM sample, why do we only compare the x coordinates of each data point against the lines?
What can identification of communities help uncover?
Select all that apply
2. Why do we need to validate the model?
Which function eliminates duplicate rows?
Why is clustering more powerful than visualizing?
Select all that apply
What does it mean if you have a small p-value?
Which of these functions are the "bare bones" of ggplot2?
Select all that apply
What is supervised machine learning?
What does it mean when there is a high positive correlation between two attributes?
When running a variable selection model, how does the computer know when it has found the right variables?
Why do you need to change the NA values to '#N/A'?
Fill in the blank below.
Which of these data types can you select in the 'validation criteria'?
Please select all that apply
What questions should you ask about your data?
Select all that apply
What happens when the ROC curve is closer to a right angle?
Which feature was the most important feature identified by the boosted tree algorithm?
Which of these questions may be most impacted by seasons or cycles?
Select all that apply
Which function implements k-means clustering with cosine distance?
What type of clean text data do you need for annotation and parts of speech tagging?
1. Conditional statements are useful when:
1. If an algorithm has 25% probability of correct classification and then is increased to 50%, what is the percent increase in accuracy?
1. Fill in the blank below.
seasonality is a seasonal pattern that is repeated plus a certain value.
1. Fill in the blank below.
The 'gg' in ggplot2 stands for .
1. Fill in the blank below
The two variables, and , are how the UI and server script communicate between each other.
2. Fill in the blank below.
is a measure of the extent to which an increase in one variable corresponds to the increase in another variable.
2. Fill in the blank.
Through the Security and Exchange Commission was able to identify what Enron leadership was communicating about via email.
1. Order these data formats from most to least structured.
1. The strength of a relationship or ties can vary. Match the description to either weak or strong ties.
People we really trust and rely on
|
|
Help a company learn and expand its reach
|
|
People we're connected to with different perspectives
|
|
Help a company through difficult times and gain a reputation
|
|
1. Fill in the blank below.
The number of shortest paths going through a given node is called centrality.
1. When running a variable selection model, how does the computer know when it has found the right variables?
2. In a communication network, what does it mean when someone "listens" or "follows" others but has no followers or outgoing communication?
1. Which function can you use to replace NAs from your data set?
1. Fill in the blank below.
The 'gg' in ggplot2 stands for .
3. Which function creates a new layer with statistical smoothing of the plot?
2. Why do we change the scale of the axes to a logarithmic scale?
3. What does it mean when there is a high positive correlation between two attributes?
3. What could result from people in your network having more ties?
3. When a volcano erupted in Europe in 2010, where did IBM foresee a bottleneck in shipping?
3. Ensemble learning, a concept in machine learning, happens when a group of learners are used together to arrive at a more accurate decision. Based on this concept, which Yelp! review would you consider when deciding whether or not to dine at a restaurant?
3. Sets of variables are provided. Identify whether the set is most likely an example of correlation or causation.
Number of school hours missed and student achievement
|
|
Number of fire trucks dispatched and amount of property damage
|
|
Amount of money spent each week on vegetables and changes in weight
|
|
Number of purchases of rain boots and percent of delayed flights
|
|
3. Put the steps in order that we took to visualize flights.
3. How can you determine if the histogram of your residuals is really a normal, unbiased distribution?
2. What are some things you should always check for in your model?
Select all that apply
2. Fill in the blank below.
The objective of a model is not to perfectly fit data you already have, it's to have the highest predictive rate on new data.
3. Which connectors spread the most information to the rest of a network?
3. Put the functions of a business in the correct order.
Li
|
|
Lo
|
|
Lv
|
|
LC
|
|
Ls
|
|
3. Please fill in the blank below:
After injecting the patient, remove the from the syringe.
4. Fill in the blank below.
A big strength of ggplot is the ability to graphs by adding layers and adjusting the data so it doesn't look generic.
5. Good coding habits include:
Select all that apply
4. Which type of data would the Sankey layout visualize best?
4. What are some conclusions from the visualization of Congress?
Fill in the blank below.
3. While Big Data is a resource, is the analysis that will draw insights from the data.
5. Google's search function is based on a scoring algorithm called PageRank. PageRank determines a website's importance by the number of important pages linked to it. Use the website's links to rank the websites according to their importance. Put the most important website at top.
4. Order the steps a data scientist takes when working on a project. Put the first step on top.
4. Analysis techniques need to provide a measurable benefit greater than the cost of data storage and management.
4. Fill in the blank below.
Some classification algorithms can go beyond determining whether someone will buy your product or not, they can quantify it by telling you the that someone will buy your product.
4. What methods can you use for datafication?
Select all that apply
4. What do networks represent?
Select all that apply
5. What are some caveats to keep in mind when working with network metrics?
Select all that apply
4. What are some ways to fix multicollinearity?
Select all that apply
4. Put the equations in order that you need to use for calculating multiplicative seasonality forecasts
4. Why is calculating betweenness centrality important?
Select all that apply
3. Why is it important to set the x-axis and y-axis limits?
4. Which of the statements below are true for calculating modularity?
Select all that apply
4. Fill in the blank below.
While it may look like a scatter plot, a maps a third variable to the size of its points.
5. What types of data are mapped in the aes() function?
Select all that apply
5. Which of these statements is true?
5. Which of these questions should you ask for classification?
Select all that apply
5. Which picture below displays the standard error of a best fit line?
5. What is important to keep in mind when working with large amounts of network data?
Select all that apply
5. Which function calculates and visualizes the communities in a network?
5. Which function calls up all the files in a folder for an overview?
Fill in the blank below.
Clustering and data mining are types of data analysis , which is a type of data analysis where the intent is to see what the data can tell us beyond modeling or hypothesis testing.
Match the following terms to the correct definition:
INNER JOIN
|
|
(LEFT or RIGHT) OUTER JOIN
|
|
FULL OUTER JOIN
|
|
CROSS JOIN
|
|
Fill in the blank below.
can have a very negative impact on linear regressions if they are not identified and handled properly because they can skew the algorithm. It's important to identify them early and determine why they do not conform to the majority of the data points in case you need to adjust your model. You can identify them with Cook's distance or boxplots.
Fill in the blank below:
In entropy, the number indicates 100% of the data is the same, and the number indicates a 50-50 split.
Which of the following SQL functions can be used on date fields?
Select all that apply
Which type of data contain sets of categories?
Match the function names to their descriptions.
Sets labels for axes and title
|
|
Flips the axes of a graph
|
|
Splits up data by category to give smaller individual graphs
|
|
Creates an area plot
|
|
In a non-biased model errors will be random. If errors are not random it means...
Select all that apply
Sort the variables as either continuous or discrete.
Continuous variables
|
|
Discrete variables
|
|
Discrete variables without a defined sequence
|
|
Put the 5 functions of an organization in order.
Fill in the blank below.
can have a very negative impact on linear regressions if they are not identified and handled properly, as they can skew the data because they lie outside the majority of data points.
How does clustering help when there are more than 3 attributes in the data?
Please fill in the blank below:
Remember, like all resources, - data that cannot fit on a single computer or server - has to be cost effective. This term does not refer to data analytics, although it sometimes conflated to mean the same thing.
Please fill in the blank below:
Naive Bayes makes the assumption that the variables are , which means that the presence of one variable does not affect the presence of another variable.
Please fill in the blank below.
In a relationship, the value of the dependent variable changes in a non-linear fashion, so we may need more than one coefficient to predict trends.
Please fill in the blank below.
Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together are probably more similar than data points that are farther apart.
Match the definition of the type of linkage with the correct term.
Single linkage
|
|
Complete linkage
|
|
Average linkage
|
|
Centroid linkage
|
|
4. What is unsupervised machine learning?
What would be the output of the following code for vector 'v':
v[2:7]
What is one of the dangers of increasing the number of clusters?
You can use the predict() function to make predictions using the model that you developed. Put the prediction use cases below in order according to the business function it was used for.
Which function adds up the data in the previous cells to create a new column or row with cumulative sums?
What is unsupervised machine learning?
Which two variables showed the strongest correlations with two clusters?
Select all that apply
Data science is at the intersection of three domains:
1. Fill in the blank below.
is a term that means only looking at a portion of the data. It is denoted in R by a '$' symbol.
1. Match the functions to the actions that they perform in R.
graph.data.frame()
|
|
str()
|
|
V()
|
|
E()
|
|
1. Match the functions to the actions that they perform in R.
graph.data.frame()
|
|
str()
|
|
V()
|
|
E()
|
|
1. When running a variable selection model, how does the computer know when it has found the right variables?
1. Fill in the blanks below.
You always have to your models and check for potential even before you can test it on new data!
2. Which type of data contain sets of categories?
2. Fill in the blank below.
To make sure your images are reproducible, you can use the function.
2. Fill in the blank below.
To make sure your images are reproducible, you can use the function.
2. What are some things you should always check for in your model?
2. What does the Akaike Information Criterion (AIC) do?
3. How can you check the warnings to see if there is anything you should be concerned about?
3. How can you check the warnings to see if there is anything you should be concerned about?
3. Match the function to its purpose.
grep()
|
|
length()
|
|
table()
|
|
order()
|
|
3. What are some key things you should always check for in your model?
3. What is a good way to test for multicollinearity?
3. What is heteroscedasticity?
4. After running a Breusch-Pagan test, how would I know that there is no heteroscedasticity?
4. What type of data does the grep() function work with?
4. Which package do we use to run the vif() function?
4. You can use the predict() function to make predictions using the model that you developed. Put the prediction use cases below in order according to the business function it was used for.
5. Match the key terms below to their descriptions.
Q-Q plot/ distribution of errors
|
|
p-values
|
|
VIF
|
|
Breusch-Pagan test
|
|
AIC
|
|
5. Match the methods of variable selection to the correct descriptions.
Forward selection
|
|
Backward selection
|
|
Step-wise selection
|
|
5. Which function eliminates duplicate rows?
5. Which one of these is not a component of a needle?
5. Which package should you install to plot an interactive network visualization?
5. Which package should you install to plot an interactive network visualization?
Fill in the blank below.
A good explanatory model will have residuals whose variance does not depend on the (predictor) variables.
Fill in the blank below.
can have a very negative impact on linear regressions if they are not identified and handled properly, as they can skew the data because they lie outside the majority of data points.
What are some important questions you have to ask in order to be comfortable with your model?
Select all that apply
What are some important things to remember when working with outliers in your data?
Select all that apply
What are some ways you can identify outliers in your data?
Select all that apply
What does it mean to “practice” coding?
What is supervised machine learning?
What is unsupervised machine learning?
Which of these is an example of exploratory data analysis?
Why do we need to validate the model?
Put the six Data Science control cycle steps in order, starting with “Ask”
4. What method do you use to recap a needle?