All questions

Time limit: 0

Quiz Summary

0 of 1443 questions completed

Questions:

Information

You have already completed the quiz before. Hence you can not start it again.

Quiz is loading…

You must sign in or sign up to start the quiz.

You must first complete the following:

Results

Quiz complete. Results are being recorded.

Results

0 of 1443 questions answered correctly

Your time:

Time has elapsed

You have reached 0 of 0 point(s), (0)

Earned Point(s): 0 of 0, (0)
0 Essay(s) Pending (Possible Point(s): 0)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443

Current
Review
Answered
Correct
Incorrect

Question 1 of 1443

1. Question
Why do we cluster the basketball data with 3 clusters after we analyze it with 2 clusters?
- Three clusters may give us more information about patterns in the data that two clusters didn't show.
- More clusters are more accurate, so we'll be able to generalize the data better.
- Two clusters are not complex enough to get insights.
- Three clusters are the default recommendation for k-means clustering.
Correct

Incorrect
Question 2 of 1443

2. Question
What does the Akaike Information Criterion (AIC) do?
Select all that apply
- Measures the "quality" of several statistical models in comparison to each other.
- Provides an estimate of the information lost when the variables in the model are adjusted.
- Explains if heteroscedasticity is likely present in the regression model.
- Measures how much the variance of a regression coefficient is increased due to collinearity.
Correct

Incorrect
Question 3 of 1443

3. Question
Which standard delimiters can Excel identify to split text into multiple columns?
- Tab
- Semicolon
- Comma
- Space
Correct

Incorrect
Question 4 of 1443

4. Question
What is heteroscedasticity?
- Bias in the residuals as a function of predicted value.
- Bias as a result of outliers in the data.
- No bias evident in the residuals as a function of predicted value.
- No bias because there are no outliers in the data.
Correct

Incorrect
Question 5 of 1443

5. Question
Where can you download the Excel Solver from?
- It's in the Analysis Toolpak
- It's in the Scenario Manager
- It's in the base Excel
- It's in the XLMiner add-in
Correct

Incorrect
Question 6 of 1443

6. Question
What is one of the most effective methods for tuning an algorithm?
- n-fold cross validation
- Information gain
- Testing the data
- Validating the base rate
Correct

Incorrect
Question 7 of 1443

7. Question
Which of the below statements are TRUE about n-fold cross validation?
- It is used when there's not enough data to partition it into separate training and validation sets
- It assesses how the model will generalize to an independent data set
- None of the training and test data sets overlap with each other.
- It randomly partitions the original dataset into n equal sized datasets
Correct

Incorrect
Question 8 of 1443

8. Question
What happened to the AUC when we ran a boosted random forest model after the initial random forest model?
- The AUC increased.
- The AUC decreased.
- The AUC remained the same.
- We didn't have enough information to calculate the AUC.
Correct

Incorrect
Question 9 of 1443

9. Question
Which of these is not an assumption of time series analysis?
- There is information about the past
- Information can be quantified
- Some aspects of past pattern will continue into the future
- All of the variables are automatically independent of each other
Correct

Incorrect
Question 10 of 1443

10. Question
Which of these statements is true?
- You have to use set.seed() with sk-means.
- sk-means() calculates both Euclidean and cosine distances.
- Cosine distances are based on the angle between a point and the other points in the cluster.
- Spherical k-means uses Euclidean distances to measure similarity.
Correct

Incorrect
Question 11 of 1443

11. Question
Why do we see mostly zeroes in the term-document matrix that we created from the NYT articles?
- Because the sparsity is at 99%
- Because the sparsity is 1%
- Because the sparsity is at 50%
- Because there was not enough information in the articles
Correct

Incorrect
Question 12 of 1443

12. Question
1. In order to create interactive data visualizations, we will use:
- R and JavaScript
- JavaScript and HTML
- Only R
- Only JavaScript
Correct

Incorrect
Question 13 of 1443

13. Question
1. Fill in the blank.
- A connects points (nodes) by lines that represent relationships. By studying the interactions between people, places and events, you can determine how messages, ideas, and diseases spread and how a change in one thing can cause a cascading set of effects.
Correct

Incorrect
Question 14 of 1443

14. Question
1. Fill in the blank below.
- is the smoothing parameter you add to the base formula because it tells you how much of the error from a past time period to incorporate into your forecast for a future time period.
Correct

Incorrect
Question 15 of 1443

15. Question
2. Identify the three things necessary to make a graph in ggplot2.
Select all that apply
- The data
- The shapes on the screen (such as bars or points)
- A way to map the data to the shapes
- Titles and labeled axes
Correct

Incorrect
Question 16 of 1443

16. Question
1. Which function tells Shiny to visualize a graph?
- ```
 renderPlot() 
```
- ```
 distPlot() 
```
- ```
 shinyUI() 
```
- ```
 plotOutput() 
```
Correct

Incorrect
Question 17 of 1443

17. Question
1. Which two variables showed the strongest correlations with two clusters?
Select all that apply
- Points per game
- Minutes per game
- Rebounds per game
- Free throws per game
Correct

Incorrect
Question 18 of 1443

18. Question
1. Why might it be more challenging to do a sentiment analysis on communications between Millenials?
- Reference dictionaries may not include the newest slang
- Reference dictionaries may not include symbols such as :)
- Unconventional grammar can not be datafied
- There may not be enough text to analyze
Correct

Incorrect
Question 19 of 1443

19. Question
1. When using data you may encounter any of the challenges listed below. Match each challenge with its description.
Sort elements
- Having the right staff with the right skills
- Getting the right data, the right sample size, and statistical significance
- Using data that may not have been collected with your intended use in mind
- Putting all the pieces together to extract meaningful insights from your data and use them in a responsible way
- Practical challenge
- Epistemological challenge
- Ethical challenge
- Grand challenge
Correct

Incorrect
Question 20 of 1443

20. Question
2. How does density affect a network?
Select all that apply
- Faster transmission of a message throughout a network.
- Increased reliability and resilience.
- More space in between each node.
- Higher likelihood of a breakdown.
Correct

Incorrect
Question 21 of 1443

21. Question
2. Betweenness tells you which nodes can be
Select all that apply
- which nodes can be bottlenecks.
- which nodes can be weak points.
- which nodes have the closest connections.
- which nodes can reach others most efficiently.
Correct

Incorrect
Question 22 of 1443

22. Question
1. Which questions would seasonality analysis help answer?
Select all that apply
- What is the pattern of behavior across time?
- How do different seasons of the year, and days of the week affect demand for bike rentals?
- How can you factor in seasonality and cyclicality when forecasting demand?
- How does any single given factor (air temperature, humidity, wind speed) affect demand for bikes?
Correct

Incorrect
Question 23 of 1443

23. Question
1. What is true about this directed network image?

Select all that apply
- The number of edges going into the node is the in-degree.
- The number of edges going out of a node is the out-degree.
- Allie has an in-degree of 1 and an out-degree of 2.
- Allie has an in-degree of 2 and an out-degree of 1.
Correct

Incorrect
Question 24 of 1443

24. Question
1. Which two functions do you need to save your image output as a PDF?
Select all that apply
- pdf()
- dev.off()
- save()
- off()
Correct

Incorrect
Question 25 of 1443

25. Question
2. Identify the three things necessary to make a graph in ggplot2.
- The data
- The shapes on the screen (such as bars or points)
- A way to map the data to the shapes
- Titles and labeled axes
Correct

Incorrect
Question 26 of 1443

26. Question
2. Fill in the blank below.
- 2. The function searches for additional function types, such as geom_line and geom_point.
Correct

Incorrect
Question 27 of 1443

27. Question
2. Which function do we use for each tab in our app?
- ```
 tabPanel() 
```
- ```
 navbarPage() 
```
- ```
 newPanel() 
```
- ```
 newTab() 
```
Correct

Incorrect
Question 28 of 1443

28. Question
2. Why do we cluster the basketball data with 3 clusters after we analyze it with 2 clusters?
- Three clusters may give us more information about patterns in the data that two clusters didn't show.
- More clusters are more accurate, so we'll be able to generalize the data better.
- Two clusters are not complex enough to get insights.
- Three clusters are the default recommendation for k-means clustering.
Correct

Incorrect
Question 29 of 1443

29. Question
3. Based on this word cloud, generated from hotel reviews, order the words by frequency. Put the most frequently used words at the top.
- Room
- Golf
- Staff
- Beach
Correct

Incorrect
Question 30 of 1443

30. Question
3. What is Big Data?
Select all that apply
- The accumulation of large volumes of a variety of data at an unprecedented velocity
- A revolution in measurement
- A point of view which guides how decisions should be made
- A resource
Correct

Incorrect
Question 31 of 1443

31. Question
2. What happened when Target predicted pregnancy in their customers?
- Target's advertisements inadvertently informed parents that their daughter was pregnant.
- Target was able to help infertile couples have children.
- Target optimized its resupply of prenatal vitamins.
- Target released a new line of maternity clothes.
Correct

Incorrect
Question 32 of 1443

32. Question
3. When building a forecasting model, it is better to have more variables in the model.

How often is this statement true?
- always
- sometimes
- never
Correct

Incorrect
Question 33 of 1443

33. Question
2. Which function can you use in the igraph package to eliminate duplicate data?
- simplify( )
- plot( )
- graph.data.frame( )
- str( )
Correct

Incorrect
Question 34 of 1443

34. Question
2. Match the functions to the actions they perform in R.
Sort elements
- produces the variance of a data set
- creates a data frame
- view the output
- calculates the standard deviation
- var()
- data.frame()
- View()
- sd()
Correct

Incorrect
Question 35 of 1443

35. Question
3. What is heteroscedasticity?
- Bias in the residuals as a function of predicted value.
- Bias as a result of outliers in the data.
- No bias evident in the residuals as a function of predicted value.
- No bias because there are no outliers in the data.
Correct

Incorrect
Question 36 of 1443

36. Question
3. What is TRUE about adding more variables to your regression model?
Select all that apply
- Additional variables make your model more accurate.
- Additional variables make your model less generalizable.
- Additional variables make your model less accurate.
- Additional variables make your model more generalizable.
Correct

Incorrect
Question 37 of 1443

37. Question
2. Put the steps for calculating undirected eigenvectors in the correct order.
- Each value in the eigenvector is divided by the top value to determine its relative eigenvector centrality.
- Calculate degree centrality by summing the rows and assigning those new values to the nodes.
- Sum the adjacent weights with the newly calculated values.
- Repeat until the weights eventually converge.
- When the values stabilize, they can be represented by a vector of values called an eigenvector.
Correct

Incorrect
Question 38 of 1443

38. Question
2. What is an example of how the direction of trust does not always go both ways?
- Young children trust their parents, but parents may not trust their young children.
- A patient trusts her doctor and the doctor always trusts the patient.
- A website developer trusts a web hosting service and the service trust its user.
- A customer trusts a bank and a bank always trusts its customers to be honest.
Correct

Incorrect
Question 39 of 1443

39. Question
2. Why does it make sense to save a graph to PDF format?
Select all that apply
- To share the graph with others.
- To inspect a large graph more closely.
- To create a graph with different colors.
- To create more R files.
Correct

Incorrect
Question 40 of 1443

40. Question
3. Fill in the blank below.
- 3. It’s important to look at the raw data before you it so you can see what it looks like and find initial patterns and insights without doing too much analysis.
Correct

Incorrect
Question 41 of 1443

41. Question
3. Why do we use packages for data manipulation instead of just using built-in R functions?
Select all that apply
- These packages are easier to learn
- This allows you more time to think about the data instead of complex programming
- So we can directly chat to the R community
- These packages are updated regularly, while R is not
Correct

Incorrect
Question 42 of 1443

42. Question
3. Why might bar graphs be misleading?
- Because the x axis is usually categorical even if it looks numerical
- Because the x and y axes are flipped
- Because the bars are not indicative of categorical data
- Because the bar graph is usually not labeled properly
Correct

Incorrect
Question 43 of 1443

43. Question
4. Which Shiny functions integrate Leaflet so it can be displayed as an application?
Select all that apply
- ```
 leafletOutput() 
```
- ```
 renderLeaflet() 
```
- ```
 displayLeaflet() 
```
- ```
 deployApp() 
```
Correct

Incorrect
Question 44 of 1443

44. Question
3. Why is clustering more powerful than visualizing?
Select all that apply
- Clustering mathematically defines similarity between all the data points, even the ones on the periphery.
- Clustering can work with many more dimensions than we can visualize.
- Clustering is easier than visualizing data.
- Clustering can group data into pre-defined groups.
Correct

Incorrect
Question 45 of 1443

45. Question
4. Order these data formats from most to least structured.
- Data with identifiable patterns in its presentation, but is without clear labels and organization
- Data with labels and organization, but is not in a table
- Data without a pre-defined format, such as sound or video data
- Data presented in a table
Correct

Incorrect
Question 46 of 1443

46. Question
4. Sort the network from the broadest community to the most niche community.
- Rock and Pop concert goers
- Casual listeners of music and podcasts
- Music lovers
- Alternative and Top 40 bloggers and columnists
Correct

Incorrect
Question 47 of 1443

47. Question
5. Identify whether the activity would need a data science team member with a low level or high level of expertise.
Sort elements
- high level
- low level
- low level
- low level
- high level
- high level
- high level
- low level
- Generate a new algorithm
- Use tools (e.g., Tableau and Excel) to visualize data
- Wrangle data
- Manage data
- Interpret results of analyses
- Ask meaningful questions
- Identify the weaknesses within a model
Correct

Incorrect
Question 48 of 1443

48. Question
3. Which of the following are ethical questions you may face when using data?
Select all that apply
- Which data are private and which data are subject to unlimited use by anyone?
- How is confidentiality supposed to be maintained, when should the data be anonymized?
- How long may you store the data?
- Were the data collected knowingly or were people tricked into supplying information?
Correct

Incorrect
Question 49 of 1443

49. Question
4. Which of the below were top indicators for Target that a customer was pregnant?
Select all that apply
- Ginger Ale
- Stopped buying wine
- Folic acid
- Gender
Correct

Incorrect
Question 50 of 1443

50. Question
4. Label each aspect of the graph.
Sort elements
- regression line
- Actual data points
- independent variable
- dependent variable
- D
- C
- A
- B
Correct

Incorrect
Question 51 of 1443

51. Question
4. Fill in the blanks.
- will set the horizontal limits of your map and will set the vertical limits of your map.
Correct

Incorrect
Question 52 of 1443

52. Question
4. Put the 5 similar functions of an organization in the correct order.
- Research and development
- Sell
- Ship
- Buy
- Make
Correct

Incorrect
Question 53 of 1443

53. Question
4. Sort the variables as either continuous or discrete.
Sort elements
- Number of cars, number of buildings, temperature
- Days of the week, months of the year
- Colors, types of weather, names
- Continuous variables
- Discrete variables
- Discrete variables without a defined sequence
Correct

Incorrect

Question 54 of 1443

54. Question

5. Match the methods below to their respective purpose.

Sort elements

Create a model where the relationship or trend changes over time
Make scales of variables comparable so you can run a regression model
Predict time-series results based on a historical average
Identify periodicity of patterns in historical data
Predict time-series results based on historical trend and regular fluctuations
Predict time-series results based on a historical trend, changing level of activity across seasonal cycles and regular fluctuations

Polynomial regression
Variable transformation
Moving average
Autocorrelation
Seasonality (additive/multiplicative)

Holt-Winters: trend-corrected, seasonally adjusted, exponential smoothing

Correct

Incorrect

Question 55 of 1443

55. Question
5. Fill in the blank below.
- centrality measures a node’s (person’s) importance by giving consideration to importance of the nodes (people) connected to it.
Correct

Incorrect
Question 56 of 1443

56. Question
4. Match the arguments to what they do.
Sort elements
- tells R whether to include a navigation bar
- set dimensions of the graph that are interpreted as pixels, this changes if you output pdf files instead of png files
- tells R whether to include the code at the bottom of the animation
- navigator
- ani.height and ani.width
- verbose
Correct

Incorrect
Question 57 of 1443

57. Question
4. What does Louvain Modularity assume?
- That all nodes start out in their own community.
- That all of the nodes' modularities sum up to 1.
- That more interconnected networks have higher modularity.
- That smaller networks have higher modularity.
Correct

Incorrect
Question 58 of 1443

58. Question
5. The aes layer contains:
- The mappings between the data and the graph
- The geom layer
- The titles and the axes labels
- The initial data analysis
Correct

Incorrect
Question 59 of 1443

59. Question
5. What aspects of a graph should you keep in mind while creating it?
Select all that apply
- The purpose of the graph
- The accurate display of the data
- The intuitiveness of the visualization
- The type of data and information you want to convey
Correct

Incorrect
Question 60 of 1443

60. Question
What did you think of the material in this section?
- Too easy Somewhat easy Appropriately challenging Somewhat hard Too hard
Correct

Incorrect
Question 61 of 1443

61. Question
5. What does missing data prevent?
- It prevents the discovery of groups.
- It prevents cleaning the existing data.
- It prevents research on a topic of interest.
- It prevents normalized distribution of the data.
Correct

Incorrect
Question 62 of 1443

62. Question
5. What are some important things to remember when working with outliers in your data?
Select all that apply
- Never remove outliers without understanding why they are present in the data.
- Make sure you understand why removing outliers is the correct course of action for your analysis.
- Never leave outliers in your data because they will always erroneously skew your model.
- It is unimportant to have a common sense justification for why you remove outliers from your data.
Correct

Incorrect
Question 63 of 1443

63. Question
5. What is TRUE about followers in a directed network?
Select all that apply
- Someone can be heavily followed by people and serve as an opinion leader.
- Someone can have few followers but follow many other people, this is a reader and potentially an automated bot.
- Not all followers are equally valuable!
- All followers have the same amount of value in a directed network.
Correct

Incorrect
Question 64 of 1443

64. Question
5. What is a takeaway we learned from analyzing Congressional donation data?
- Organizations in the same cluster donate to both Democrats and Republicans.
- Democrats and Republicans vote more similarly for key laws.
- Some organizations don't donate a lot of money to politicians.
- Most organizations donate more to one political party than to the other.
Correct

Incorrect
Question 65 of 1443

65. Question
5. Which function eliminates duplicate rows?
- ```
 unique() 
```
- ```
 order() 
```
- ```
 duplicate() 
```
- ```
 grep() 
```
Correct

Incorrect
Question 66 of 1443

66. Question
Fill in the blanks below.
- The method is part of unsupervised machine learning and discovers new patterns or groups of data, while the method is part of supervised machine learning and assigns data points to known groups or categories.
Correct

Incorrect
Question 67 of 1443

67. Question
Given the table above, what SQL function would you use to perform the following tasks?
Sort elements
- SUM(Employees)
- MIN(Employees)
- LEN(Company)
- LEFT(Company,2)
- COUNT(*)
- SUBSTRING(Company,2,1)
- Return the total number of Employees in the table
- Return the smallest number of Employees in the table
- Return the number of characters in the Company field
- Return the first 2 characters of the Company field
- Return the number of records in the table
- Return the second character of the Company field
Correct

Incorrect
Question 68 of 1443

68. Question
Match the models to the correlation they are displaying.
Sort elements
- Correlation = 1
- Correlation = 0
- Correlation = -1
Correct

Incorrect
Question 69 of 1443

69. Question
How was Target able to identify their pregnant customers?
- They used a decision tree to classify their shoppers
- They used k-means clustering to classify their shoppers
- They used a multiple regression to classify their shoppers
- They conducted customer surveys
Correct

Incorrect
Question 70 of 1443

70. Question
Which of the following SQL functions can be used on date fields?
Select all that apply
- GETDATE()
- DATEDIFF()
- DATEADD()
- MIN()
- PATINDEX()
- RTRIM()
Correct

Incorrect
Question 71 of 1443

71. Question
Which operator pulls rows that contain specified terms you’re searching for to create a new dataset with only those rows?
- ```
 %in% 
```
- ```
 in 
```
- ```
 %% 
```
- ```
 <- 
```
Correct

Incorrect
Question 72 of 1443

72. Question
Fill in the blank below.
- The 'gg' in ggplot2 stands for .
Correct

Incorrect
Question 73 of 1443

73. Question
Fill in the blank below.
- To speed up our work and avoid running calculations on each data point manually, we can use the loop function, which performs a set of operations as many times as you tell it to. The advantage to this loop is that you can run different types of data through the same operations, and it does it automatically.
Correct

Incorrect
Question 74 of 1443

74. Question
How do we know we still need to refine our model further?
- The standard error of the residuals is still very large.
- The p-value is almost zero.
- There is a small difference between R squared and adjusted R squared.
- Each categorical variable had been split into its components.
Correct

Incorrect
Question 75 of 1443

75. Question
What result would we get if we used the formula RIGHT(G2, 3)?
- dan
- Sed
- eda
- 4D
Correct

Incorrect
Question 76 of 1443

76. Question
What result would we get if we used the formula RIGHT(G2, 3)?
- dan
- Sed
- eda
- 4D
Correct

Incorrect

Question 77 of 1443

77. Question

Match the terms to the descriptions.

Sort elements

Between sum of squares
Within sum of squares
Total sum of squares

Sum of all the squared distances between data points in different clusters

Sum of all the squared distances between points within the same cluster

Total sum of squares

Correct

Incorrect

Question 78 of 1443

78. Question
Match each data quality with its description.
Sort elements
- The data was recorded correctly
- All relevant data was recorded
- Entities are recorded once
- The data is kept up to date
- The data agrees with itself
- Accuracy
- Completeness
- Uniqueness
- Timeliness
- Consistency
Correct

Incorrect
Question 79 of 1443

79. Question
In which scenario below would we want to minimize the false negatives of our model?
Please select all that apply
- Predicting whether or not an airplane will crash if they go a certain period without a maintenance check.
- Predicting whether or not someone is pregnant based on shopping habits.
- Predicting whether or not a disease will spread based on certain population characteristics.
- Predicting whether or not a bank will be robbed based on its location and surrounding crime.
Correct

Incorrect
Question 80 of 1443

80. Question
What are some of the fields that text mining methods employ?
Please select all that apply
- Mathematics and statistics
- Computational linguistics
- Programming
- Deep learning
Correct

Incorrect
Question 81 of 1443

81. Question
How does clustering help when there are more than 3 attributes in the data?
- Clustering helps identify groups with many attributes that you can't easily visualize.
- Clustering can only cluster when there are more than three attributes.
- Clustering is the best method to visualize more than 4 attributes at a time.
- Clustering can gather data more accurately when there are many attributes.
Correct

Incorrect
Question 82 of 1443

82. Question
Please fill in the blanks below:
- is the most popular hierarchical clustering method, it’s a bottom-up approach, while does the opposite and is a top-down approach.
Correct

Incorrect
Question 83 of 1443

83. Question
5. Which of these is an example of exploratory data analysis?
- Using customer attributes to group customers and find new patterns
- Grouping customers into groups that have been created already
- Identifying outliers in the data based on the model
- Analyzing the data to match expected outcomes
Correct

Incorrect
Question 84 of 1443

84. Question
Which piece of code will retrieve only the third column of a matrix called 'm'?
- ```
 m[, 3] 
```
- ```
 m[3, ] 
```
- ```
 m[3, 3] 
```
- ```
 m[[, 3]] 
```
Correct

Incorrect
Question 85 of 1443

85. Question
What types of data are difficult to cluster?
Select all that apply
- Circular/elliptical data
- Data that are unequally distributed
- Data that don't have similar density
- Data that has an uneven concentration of points in a cluster
Correct

Incorrect
Question 86 of 1443

86. Question
Which questions can datafication help us answer
Select all that apply
- How do categorical variables affect a numerical outcome?
- How do different types of weather (rain, hail, etc.) affect demand for bike rentals?
- How does bias in the data affect the validity of our model?
- How does the correlation between variables affect the significance of the outcome?
Correct

Incorrect
Question 87 of 1443

87. Question
Which function converts wide data to long data?
- ```
 gather() 
```
- ```
 ddply() 
```
- ```
 summarize() 
```
- ```
 convert() 
```
Correct

Incorrect
Question 88 of 1443

88. Question
Which of these is an example of exploratory data analysis?
- Using customer attributes to group customers and find new patterns
- Grouping customers into groups that have been created already
- Identifying outliers in the data based on the model
- Analyzing the data to match expected outcomes
Correct

Incorrect
Question 89 of 1443

89. Question
How can we address the limitations of our analysis to see the data differently?
Select all that apply
- We could change the starting point or set.seed.
- We could look at different variables.
- We could normalize the data.
- We could add data to the data set.
Correct

Incorrect
Question 90 of 1443

90. Question
What is heteroscedasticity?
- Bias in the residuals as a function of predicted value.
- Bias as a result of outliers in the data.
- No bias evident in the residuals as a function of predicted value.
- No bias because there are no outliers in the data.
Correct

Incorrect
Question 91 of 1443

91. Question
Which function would I use to change transform the text below to all capital letters?
```
 coUpE --> COUPE
```
- ```
 UPPER() 
```
- ```
 LOWER() 
```
- ```
 CAPS() 
```
- ```
 PROPER() 
```
Correct

Incorrect
Question 92 of 1443

92. Question
What is R Squared?
- A number that indicates the proportion of the variance in the dependent variable that is predictable from the independent variable.
- A number that indicates the confidence we can have in the results of our data.
- A number that indicates the difference between the dependent variable and the independent variable.
- A number that indicates the length of the y-axis in comparison with the x-axis of our model.
Correct

Incorrect
Question 93 of 1443

93. Question
What is the objective of solving an optimization problem?
- To find the combination of variables that yields you the greatest output
- To figure out which variable causes the most problems
- To determine how to balance out the variables in your formula
- To adjust the combination of variables for the smallest returns
Correct

Incorrect
Question 94 of 1443

94. Question
What type of data analysis gives you an overview of your data quickly by visualizing it?
- Exploratory data analysis
- Time-series analysis
- k-Nearest Neighbors
- k-means clustering
Correct

Incorrect
Question 95 of 1443

95. Question
Which R function makes sure that we can reproduce the exact same results when we use the createFolds() function?
- set.seed()
- createFolds()
- predict()
- prediction()
Correct

Incorrect
Question 96 of 1443

96. Question
What is the output of the formula below?
- Misclassification error
- Adjusted term
- Expanded weight
- Polynomial coefficients
Correct

Incorrect
Question 97 of 1443

97. Question
What is a moving average?
- It's an average of a set number of points
- It's the average deviation of all data points
- It's the mean of the full data set
- It's the average standard deviance over a set number of points
Correct

Incorrect
Question 98 of 1443

98. Question
Why do we use the silhouette coefficient to determine the number of clusters for sk-means?
- We don't have the outputs necessary for either of those methods.
- The silhouette coefficient is more accurate than NbClust or the elbow method.
- The silhouette coefficient is faster than NbClust or the elbow method.
- The silhouette coefficient is part of the k-means package.
Correct

Incorrect
Question 99 of 1443

99. Question
Why is it important to convert all words to lower case before removing 'stop words'?
- Because R reads lowercase and uppercase letters as two different letters
- Because R can't process punctuation
- Because R cannot process proper nouns
- Because R reads lowercase and uppercase letters as the same letters
Correct

Incorrect
Question 100 of 1443

100. Question
1. Fill in the blank below.
- is the process of extracting information from large quantities of data to find insights, patterns and other latent information.
Correct

Incorrect
Question 101 of 1443

101. Question
1. Fill in the blank below.
- The goal of an rule is to extract correlation relationships in the large datasets of items. Ideally, these relationships will be causative.
Correct

Incorrect
Question 102 of 1443

102. Question
1. Fill in the blank below.
- If you ignore factors, then the most recent trend is used in conjunction with the last data point to make a static projection.
Correct

Incorrect
Question 103 of 1443

103. Question
1. Why does R put an 'X' in front of numerical column names?
Select all that apply
- Because variables can't start with numbers
- Because the '$' syntax won't work with numerals
- So as to always differentiate between numbers and years
- Because R is reading it in as strings
Correct

Incorrect
Question 104 of 1443

104. Question
1. Fill in the blank below.
- The function allows us to wrap multiple Shiny apps into one and click between them.
Correct

Incorrect
Question 105 of 1443

105. Question
1. Why did NAs appear when we initially read in the data?
- R read the zeroes as missing data.
- That's how R displays zeroes in data.
- The NAs are a category of data.
- R read it in as individuals not buying products.
Correct

Incorrect
Question 106 of 1443

106. Question
Based on this comparison cloud, order the customer comments according to those that are most consistently complimentary (at the top) and those most negative and pervasive(at the bottom).
- "Great staff!"
- "I had a hard time making a reservation."
- "The variety of activities available were a big plus."
- "Did this place get more expensive?!!!!!!"
Correct

Incorrect
Question 107 of 1443

107. Question
1. What can we conclude about the statistic that there are 88 guns for every 100 Americans?
- We can't definitively conclude anything without additional data.
- 88% of Americans own guns.
- 43% of American households have guns.
- 12% of Americans cannot afford guns.
Correct

Incorrect
Question 108 of 1443

108. Question
1. Which of the following sources would need to be datafied or converted into numbers, so that you can run analyses and gain greater insight?
Select all that apply
- Text in last year's emails
- The viral videos from this week
- Predictions for next quarter's revenue
- Growth rate of someone's Instagram following
Correct

Incorrect
Question 109 of 1443

109. Question
1. Which function should I use if I want to make sure my results are reproducible?
- set.seed()
- plot()
- library()
- simplify()
Correct

Incorrect
Question 110 of 1443

110. Question
1. Fill in the blanks below.
- Sometimes, in order to analyze relationships between variables we need to the data in order to isolate certain effects, make the scales similar, etc. The package is very helpful at transforming a lot of data quickly.
Correct

Incorrect
Question 111 of 1443

111. Question
1. What does the ".SD" notation stand for?
- subset data table
- standard deviation
- data table
- standard data
Correct

Incorrect
Question 112 of 1443

112. Question
1. Fill in the blank below.
- The Index will calculate the similarity between politicians # and their donors by comparing the people that they are connected to.
Correct

Incorrect
Question 113 of 1443

113. Question
2. Drag the term to the box next to the correct description.
Sort elements
- Vector
- Matrix
- List
- Data frame
- Collection of elements of the same type
- Multiple rows and columns of the same data type
- Collection of elements of different types
- Multiple rows and columns of different data types
Correct

Incorrect
Question 114 of 1443

114. Question
3. Which function generates a vector of numbers with a specified range that can count by another number?
- ```
 seq() 
```
- ```
 count() 
```
- ```
 breaks() 
```
- ```
 aes() 
```
Correct

Incorrect
Question 115 of 1443

115. Question
2. Which package can we use to scrape websites for information?
- rvest
- html
- scraper
- devtools
Correct

Incorrect
Question 116 of 1443

116. Question
3. What are some conclusions we see when we graph points per game by minutes per game with three clusters?
Select all that apply
- Some players with good statistics are paid much less than other players with similar statistics.
- There are players who are paid a lot, but don't have good statistics.
- There's no correlation between minutes per game and points per game.
- The most talented players are the lowest paid.
Correct

Incorrect
Question 117 of 1443

117. Question
3. When summarizing a long document via text analysis, the most "important" words will be determined by:
- the frequency in which key topics and words occur
- whether the reference library is broad enough to include all jargon
- scaling the words by intensity
- sentiment analysis
Correct

Incorrect
Question 118 of 1443

118. Question
2. Which aspect of Big Data is highlighted in these examples?

-Location data from mobile phones can infer how many people were in Macy’s on Black Friday, estimating sales before Macy's aggregates the numbers.
-Amazon's product catalogue receives more than 50 million updates a week, and deliveries and inventories are tracked in real time.
- Velocity
- Volume
- Variety
- Predictive analytics
Correct

Incorrect
Question 119 of 1443

119. Question
3. Naïve Bayes is probabilistic classification method commonly used for text classification. Most spam filters are based on a variant of Naïve Bayes.

Order the steps that a spam filter takes when deciding whether or not a new email should be placed in the spam folder. Place the first step at the top.
- Based on the results of the search, the probability of the email being spam is determined
- The spam filter searches for keywords or word combinations that are frequently found in spam
- The probability is evaluated against its threshold and decides whether to sort the email as spam or not
- The text of the new is email is analyzed
Correct

Incorrect
Question 120 of 1443

120. Question
2. Order the steps needed to build a multivariate regression model.
- Decide what you want to predict
- Check your test statistics to see measures of accuracy, correlation, and error
- Run a regression analysis
- Identify variables and data you think will influence the prediction
- Refine your model
Correct

Incorrect
Question 121 of 1443

121. Question
3. Use the edge.attr.comb argument to
- specify whether or not to ignore the weights of the connections in a network
- find out the total number of connections in a network
- tell R which kind of degree centrality to calculate
- create a pdf version of the network visualization
Correct

Incorrect
Question 122 of 1443

122. Question
3. Match the output elements shown in the console below to what information they are providing.
Sort elements
- includes the values of the boxplot levels. The five rows include the bottom whisker, the 25th percentile, the 50th percentile, the 75th percentile and the top whisker
- includes the number of values in each variable
- includes something called notches or the median plus and minus roughly one point five times the inter-quartile range
- includes the values of the outliers, which you can also see in the boxplot
- $stats
- $n
- $conf
- $out
Correct

Incorrect
Question 123 of 1443

123. Question
2. What does the Akaike Information Criterion (AIC) do?
Select all that apply
- Measures the "quality" of several statistical models in comparison to each other.
- Provides an estimate of the information lost when the variables in the model are adjusted.
- Explains if heteroscedasticity is likely present in the regression model.
- Measures how much the variance of a regression coefficient is increased due to collinearity.
Correct

Incorrect
Question 124 of 1443

124. Question
3. Put the steps of the data science control cycle in the correct order.
- Validate: Do the model and assumptions work as expected?
- Ask: What is the problem(s) we need to solve?
- Research: What data do we need and how do we get it?
- Interpret: How can we use the conclusions in the real world?
- Model: Which method(s) is appropriate to use?
- Test: How does the model generalize to real world data?
Correct

Incorrect
Question 125 of 1443

125. Question
3. What is TRUE about an identity matrix?
Select all that apply
- An an identity matrix has a diagonal with 1's and the rest of the numbers are 0's.
- When a matrix is multiplied by an identity matrix, it's left unchanged.
- An an identity matrix has all 1's.
- When a matrix is multiplied by an identity matrix, it changes.
Correct

Incorrect
Question 126 of 1443

126. Question
2. Fill in the blank below.
- The Index is a way of measuring the extent of similarity between two people or objects.
Correct

Incorrect
Question 127 of 1443

127. Question
3. Please fill in the blanks below:
- identifies people who have similar connections, not necessarily people who are connected. identifies communities that are indeed connected.
Correct

Incorrect
Question 128 of 1443

128. Question
3. Match the function to its purpose.
Sort elements
- searches for text in data
- gives you the length of any vector
- tabulates the number of entries for categorical data
- sorts data by a particular column
- grep()
- length()
- table()
- order()
Correct

Incorrect
Question 129 of 1443

129. Question
4. Fill in the blank below.
- In graphics, transparency is called , where 0 is entirely transparent, and the default of 1 is entirely opaque.
Correct

Incorrect
Question 130 of 1443

130. Question
4. Visualization is an iterative process. Put the steps in order starting with "Analyze"
- Repeat
- Manipulate
- Analyze
- Graph
Correct

Incorrect
Question 131 of 1443

131. Question
4. What is the output of this R code?
```
 "matrix"[,c(1:4)] 
```
- The first through fourth columns of a matrix
- The first through fourth rows of a matrix
- Only the first and fourth column of a matrix
- Only the first and fourth row of a matrix
Correct

Incorrect
Question 132 of 1443

132. Question
4. In order for us to determine how much variation our clusters account for, we need to:
Select all that apply
- divide the inter-cluster variance by the total variance.
- divide the total variance by the inter-cluster variance.
- divide the intra-cluster variance by the inter-cluster variance.
- divide the inter-cluster variance by the intra-cluster variance.
Correct

Incorrect
Question 133 of 1443

133. Question
3. Which of the following are epistemological challenges?
- Determining causality from correlation
- Data collection processes
- Identifying sample biases
- Developing proprietary algorithms
Correct

Incorrect
Question 134 of 1443

134. Question
5. Networks can contain a wealth of information. Which of the following questions are best be answered by measuring an aspect of the network (assuming the necessary data are available).
- How many other people/objects can someone/something reach?
- How important is someone/something as a connector to the structure of the network?
- How long it takes for information to travel through a network?
- How tight knit is a network or community?
Correct

Incorrect
Question 135 of 1443

135. Question
4. Data analysis can provide:
Select all that apply
- Accountability
- New insights
- Freedom
- A new business strategy
Correct

Incorrect
Question 136 of 1443

136. Question
5. Fill in the missing words.
- Knowing the task at hand with help you choose the best tool for the job. When processing videos and images, manipulating files quickly, analyzing data, or creating dynamic or interactive visualizations, is a better tool to use than , which is better for when you have received minimal training or are viewing data.
Correct

Incorrect
Question 137 of 1443

137. Question
4. Which assumption does Naïve Bayes make about its attributes?
- It assumes its attributes are independent.
- It assumes its attributes are dependent.
- It assumes its attributes are co-linear.
- It assumes its attributes are qualitative.
Correct

Incorrect
Question 138 of 1443

138. Question
5. If most of the data points cluster around a regression line, it may be the case that:
- The variables are highly correlated
- Not a lot of randomness can be explained
- The dependent variable and independent variable need to be switched
- There is a large distance between the average expected value and actual value
Correct

Incorrect
Question 139 of 1443

139. Question
3. Which function would you use to pull the longitude and latitude figures from Google?
- geocode()
- ggmap()
- c("google")
- geosphere()
Correct

Incorrect
Question 140 of 1443

140. Question
5. Match the questions with the correct step in the Data Science Control Cycle.
Sort elements
- What is the problem we need to solve?
- What data do you need for your analysis and how can you get it?
- Which method(s) is appropriate to use?
- Do the model and assumptions work as expected?
- How does the model generalize to real world data?
- How can we use the conclusions in the real world?
- Ask
- Research
- Model
- Validate
- Test
- Interpret
Correct

Incorrect
Question 141 of 1443

141. Question
5. How do we know we still need to refine our model further?
- The standard error of the residuals is still very large.
- The p-value is almost zero.
- There is a small difference between R squared and adjusted R squared.
- Each categorical variable had been split into its components.
Correct

Incorrect
Question 142 of 1443

142. Question
4. Put the steps of time series data methodology in correct order.
- Check for seasonality
- Determine if the pattern is additive or multiplicative
- Create a confidence interval for your forecast
- Forecast using trend-corrected seasonally-adjusted exponential smoothing
- Identify the periodicity of the seasonal pattern
Correct

Incorrect
Question 143 of 1443

143. Question
4. What is TRUE about top observers or collectors of information in a network?
Select all that apply
- Top observers have low betweenness centrality.
- Top observers have high in-degree centrality.
- Top observers might be "spies" or fraudulent accounts.
- Top observers have high out-degree centrality.
Correct

Incorrect
Question 144 of 1443

144. Question
4. Put the steps of the Data Science Control Cycle in order.
- Ask - What is the problem(s) we need to solve?
- Research - What data do we need and how do we get it?
- Validate - Do the model and assumptions work as expected?
- Model - Which method(s) is appropriate to use?
- Test - How does the model generalize to real world data?
Correct

Incorrect
Question 145 of 1443

145. Question
4. Which method allows us to build a recommendation engine?
- The Jaccard index
- Louvain modularity
- Hierarchical clustering
- PageRank
Correct

Incorrect
Question 146 of 1443

146. Question
3. Why is it important to understand different data types?
Select all that apply
- Data type determines the functions and commands we can use in R
- Data type determines the type of data you have
- Different data types have different data limits
- Data types help us determine how we should format our data
Correct

Incorrect
Question 147 of 1443

147. Question
5. Why do we build visualizations?
Select all that apply
- To communicate ideas
- To better understand data
- To discover new insights
- To gather new data
Correct

Incorrect
Question 148 of 1443

148. Question
5. When Hewlett Packard started tracking a range of employee factors, what were the results?
- HP increased employee retention
- HP decreased recruiting costs
- HP employees felt more scrutinized
- HP increased professional development costs
Correct

Incorrect
Question 149 of 1443

149. Question
5. You are creating a feedback survey to send your customers. You already know their zip code, education level, and age. Which additional survey item captures a different type of information and may add explanatory power to your model?
- What is your address?
- What is the highest degree or level of school you have completed?
- What generation are you in?
- What is your gender?
Correct

Incorrect
Question 150 of 1443

150. Question
5. What is R Squared?
- A number that indicates the proportion of the variance in the dependent variable that is predictable from the independent variable.
- A number that indicates the confidence we can have in the results of our data.
- A number that indicates the difference between the dependent variable and the independent variable.
- A number that indicates the length of the y-axis in comparison with the x-axis of our model.
Correct

Incorrect
Question 151 of 1443

151. Question
5. What is important to keep in mind when calculating eigenvector centrality in R?
Select all that apply
- If you don't specify weights, then degree or edge centrality is used
- If edges are weighted, then the weights are used by default unless you set weights = NULL
- You can give R a vector that matches node names to use for calculating eigenvector centrality
- There is nothing important to keep in mind when calculating eigenvector centrality in R
Correct

Incorrect
Question 152 of 1443

152. Question
5. What is a limitation of calculating edge betweenness in large networks?
- It is incredibly computationally expensive and takes a really long time.
- It can only handle a certain number of nodes per calculation.
- It can't handle multiple connections to the same nodes.
- It requires weighted edges to determine which edges to remove.
Correct

Incorrect
Question 153 of 1443

153. Question
5. Why do we build visualizations?
- To communicate ideas
- To better understand data
- To discover new insights
- To gather new data
Correct

Incorrect
Question 154 of 1443

154. Question
Fill in the blank below.
- The main goal of clustering is to intra-cluster distance (the distance between points in a cluster) and inter-cluster distance (the distance between clusters). This ensures that the clusters are as defined and separated as possible.
Correct

Incorrect
Question 155 of 1443

155. Question
Table: Company_Employees

Which SQL code will result in a frequency distribution of the Company field?
- SELECT Company, COUNT(*) FROM Company_Employees GROUP BY Company
- SELECT * FROM Company_Employees
- SELECT COUNT(*) FROM Company_Employees
- SELECT Year, COUNT(*) FROM Company_Employees GROUP BY Company
Correct

Incorrect
Question 156 of 1443

156. Question
How do you measure the explanatory power of your predictive model?
- R squared
- Q squared
- B squared
- C squared
Correct

Incorrect
Question 157 of 1443

157. Question
Put the four steps of building a classification tree in order.
- Stop growing the tree when there is no more information gain
- Conditional on the previous answer, select the next question
- Ask the question with the most amount of information
- Create a new question branch after the previous one
Correct

Incorrect
Question 158 of 1443

158. Question
Put the following SQL clauses in the correct standard query template order:
- SELECT
- GROUP BY
- HAVING
- ORDER BY
- INTO
- FROM
- WHERE
Correct

Incorrect
Question 159 of 1443

159. Question
Put the six Data Science control cycle steps in order, starting with “Ask”
- Validate
- Test
- Model
- Interpret
- Research
- Ask
Correct

Incorrect
Question 160 of 1443

160. Question
Identify the three things necessary to make a graph in ggplot2.
Select all that apply
- The data
- The shapes on the screen (such as bars or points)
- A way to map the data to the shapes
- Titles and labeled axes
Correct

Incorrect
Question 161 of 1443

161. Question
Match the functions to the actions they perform in R.
Sort elements
- produces the variance of a data set
- creates a data frame
- view the output
- calculates the standard deviation
- var()
- data.frame()
- View()
- sd()
Correct

Incorrect
Question 162 of 1443

162. Question
What are some things you should always check for in your model?
Select all that apply
- Outliers
- Multicollinearity
- Bias in the residuals
- Inliers
Correct

Incorrect

Question 163 of 1443

163. Question

Match the function names to their descriptions.

Sort elements

LEN()
SEARCH()
REPLACE()

Returns the length of a cell, or number of characters of text in a cell

Returns the number of the character at which a specific character or text string starts

Replaces part of a text string with another text string

Correct

Incorrect

Question 164 of 1443

164. Question

Match the function names to their descriptions.

Sort elements

LEN()
SEARCH()
REPLACE()

Returns the length of a cell, or number of characters of text in a cell

Returns the number of the character at which a specific character or text string starts

Replaces part of a text string with another text string

Correct

Incorrect

Question 165 of 1443

165. Question
In order for us to determine how much variation our clusters account for, we need to:
- divide the inter-cluster variance by the total variance.
- divide the total variance by the inter-cluster variance.
- divide the intra-cluster variance by the inter-cluster variance.
- divide the inter-cluster variance by the intra-cluster variance.
Correct

Incorrect
Question 166 of 1443

166. Question
The datasets graphed all have similar summary statistics (including means and variances). What valuable lesson(s) can be learned from comparing the graphs?

Select all that apply
- Exploratory visualizations done before analyzing data can be insightful
- Outliers can really effect statistical properties
- Cleaning can prevent poor visualization
- Summary statistics can be misleading
Correct

Incorrect
Question 167 of 1443

167. Question
What are some of the strengths of Naive Bayes classifier?
Please select all that apply
- It's good for high dimensionality of data.
- It has high accuracy and speed with larger data sets.
- It assumes that features are dependent.
- It finds new categories of data.
Correct

Incorrect
Question 168 of 1443

168. Question
Match the definition to the term by dragging and dropping it into the chart.
Sort elements
- Web scraping
- Corpus
- Text mining
- Retrieving data from an online source, usually a web page
- Collection of documents
- A process that focuses on obtaining insights from text data
Correct

Incorrect
Question 169 of 1443

169. Question
What do these four columns represent when the 'centers' are called up from the k-means analysis?
- Each column represents a centroid from the data set.
- Each column represents a customer's purchases.
- Each column represents the types of cheese in the data.
- Each column represents the customer purchases on different days.
Correct

Incorrect
Question 170 of 1443

170. Question
Hierarchical clustering assumes that points with the shortest distance between them are:
- most similar
- most different
- connected by multiple paths
- disconnected
Correct

Incorrect
Question 171 of 1443

171. Question
Was this hard?
- Yes
- No
Correct

Incorrect
Question 172 of 1443

172. Question
Why do we use ‘==‘ instead of ‘=‘ to pull the day shift data?
- We are not defining a variable, we are telling R what data to select
- We need to define the data as a variable
- We need to indicate that the data is in column form, not row form
- We need to reformat the data as a vector
Correct

Incorrect
Question 173 of 1443

173. Question
What does it mean when there is a high positive correlation between two attributes?
- It shows a strong relationship where as one attribute increases, the other attribute increases as well.
- It shows a strong relationship where as one attribute increases, the other attribute decreases.
- It shows a strong relationship where one attribute's increase causes the other attribute's increase.
- It shows a strong relationship where one attribute's increase causes the other attribute's decrease.
Correct

Incorrect
Question 174 of 1443

174. Question
What is a good way to test for multicollinearity?
- Variance-inflation factors
- Cook's Distance
- Box plots
- Q-Q plot
Correct

Incorrect
Question 175 of 1443

175. Question
Why does R put an 'X' in front of numerical column names?
Select all that apply
- Because variables can't start with numbers
- Because the '$' syntax won't work with numerals
- So as to always differentiate between numbers and years
- Because R is reading it in as strings
Correct

Incorrect
Question 176 of 1443

176. Question
What happened when Telenor started contacting its customers?
- Telenor's customers started defecting to other companies.
- Telenor's customers renewed their contracts with Telenor.
- Telenor's customers upgraded their contracts with Telenor.
- Telenor's customers convinced their friends and family to sign up for Telenor.
Correct

Incorrect
Question 177 of 1443

177. Question
Why do we look at correlations between players' salaries and player statistics?
- So we can see if salary and performance are connected for players.
- So we can determine a player's salary.
- So we can look at how much higher professional athletes' salaries are compared to college athletes.
- So we can see which attribute(s) causes a player's salary.
Correct

Incorrect
Question 178 of 1443

178. Question
Which statement is not true if you receive a positive result from a cancer test that is 95% accurate with a base rate of 1 out of 5,000 people a month?
- There is still a small chance that you have cancer
- There is a 95% chance that you have cancer
- You should get more tests before starting any treatments
- A higher base rate would lead to a smaller chance that the result is a true positive
Correct

Incorrect
Question 179 of 1443

179. Question
Please fill in the blank below.
- These two functions allow you to look up a value in a table, and return the desired value from another column in that table. The is for vertical tables, and the is for horizontal tables.
Correct

Incorrect
Question 180 of 1443

180. Question
What does it mean if you have a small p-value?
- It means that there is a very low probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are connected.
- It means that there is a very high probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are not connected.
- It means that there is a very low probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are not connected.
- It means that there is a very high probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are connected.
Correct

Incorrect
Question 181 of 1443

181. Question
What happened when Telenor started contacting its customers?
- Telenor's customers started defecting to other companies.
- Telenor's customers renewed their contracts with Telenor.
- Telenor's customers upgraded their contracts with Telenor.
- Telenor's customers convinced their friends and family to sign up for Telenor.
Correct

Incorrect
Question 182 of 1443

182. Question
What is the inter-quartile range of this boxplot?
- 2
- 4
- 6
- 10
Correct

Incorrect
Question 183 of 1443

183. Question
Which function creates data for the ROC curve?
- performance()
- prediction()
- predict()
- naiveBayes()
Correct

Incorrect
Question 184 of 1443

184. Question
Which package do we use to implement boosting in R?
- Adabag
- RandomForest
- Boosting
- BagTree
Correct

Incorrect
Question 185 of 1443

185. Question
Which function applies linear filtering to time series?
- filter()
- linearts()
- ts()
- convolution()
Correct

Incorrect
Question 186 of 1443

186. Question
Why wouldn't a silhouette value be computed with one cluster?
- Because there wouldn't be an inter-cluster distance.
- Because there wouldn't be an intra-cluster distance.
- Because there need to be at least three clusters for the silhouette value to be calculated.
- Because the silhouette value relies on k-means, which needs at least two clusters.
Correct

Incorrect
Question 187 of 1443

187. Question
What is a Term Document Matrix?
- It is a matrix that shows the frequency of words in a corpus.
- It is a matrix that shows the syntax of words in a corpus.
- It is a matrix that shows the parts of speech of words in a corpus.
- It is a matrix that shows the grammatical usage of words in a corpus.
Correct

Incorrect
Question 188 of 1443

188. Question
1. Fill in the blank below.
- Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together are probably more similar than data points that are farther apart.
Correct

Incorrect
Question 189 of 1443

189. Question
1. Fill in the blank.
- There are many methods for analyzing data. When forecasting or predicting future events, the two most common methods are classification and .
Correct

Incorrect
Question 190 of 1443

190. Question
1. Match the functions to the actions that they perform in R.
Sort elements
- sets up the network data
- checks the structure of the output
- pulls attributes of graph vertices
- pulls attributes of graph edges
- graph.data.frame()
- str()
- V()
- E()
Correct

Incorrect
Question 191 of 1443

191. Question
1. Which function converts wide data to long data?
- ```
 gather() 
```
- ```
 ddply() 
```
- ```
 summarize() 
```
- ```
 convert() 
```
Correct

Incorrect
Question 192 of 1443

192. Question
1. Fill in the blank below
- Many organizations make their data publicly available through APIs, which stands for .
Correct

Incorrect
Question 193 of 1443

193. Question
1. List the six steps of the Clustering control cycle, starting with "Load Data".
- Cluster
- Interpret results
- Visualize data
- Predict clusters
- Check variance
- Load data
Correct

Incorrect
Question 194 of 1443

194. Question
2. Fill in the blank.
- After text mining, it is often helpful to your data. Word clouds and histograms can help you communicate your findings to others.
Correct

Incorrect
Question 195 of 1443

195. Question
2. Identify five basic tools you will need to do a data science project
Select all that apply
- Someone with statistical programming skills (fluent in R or Python)
- Sufficient hardware (RAM)
- A collaborative work environment (e.g., GitHub)
- A Customer Relationship Management system (e.g., Salesforce)
- If launching a data science product, someone adept in visualization languages and data integration tools (e.g., APIs)
- A dedicated information technology specialist
- Someone to work with data storage technology (SQL)
Correct

Incorrect
Question 196 of 1443

196. Question
1. Which of these methods extracts latent topics from text?
- Topic modeling
- Sentiment analysis
- Naïve Bayes
- k-means clustering
Correct

Incorrect
Question 197 of 1443

197. Question
2. Match the arguments you might use when plotting a base map.
Sort elements
- determines the color of each region of the map
- determines whether or not to apply the fill colors
- sets the background color
- thickness of separating lines
- horizontal limits
- vertical limits
- col
- fill
- bg
- lwd
- xlim
- ylim
Correct

Incorrect
Question 198 of 1443

198. Question
1. Fill in the blank below.
- is a pattern that occurs at a regular interval over time.
Correct

Incorrect
Question 199 of 1443

199. Question
1. Which function can you use to calculate betweenness centrality?
- betweenness()
- b()
- between()
- betweennesscentrality()
Correct

Incorrect
Question 200 of 1443

200. Question
1. Which package has the gather function?
- tidyr
- plyr
- ggplot
- igraph
Correct

Incorrect
Question 201 of 1443

201. Question
1. Drag the description into the box next to the appropriate term.
Sort elements
- whole number
- number with decimals
- data type written out in words
- either TRUE or FALSE
- variable that can be assigned value
- Integer
- Double
- String/characters
- Boolean/logicals
- Factor
Correct

Incorrect
Question 202 of 1443

202. Question
2. Why has creating 3D visualizations become easier in R?
- Data visualization experts and R community users have leveraged JavaScript and modern web browsers
- 3D graphing has become more ubiquitous
- Existing packages were already equipped to graph in 3D and individuals are now using it
- R users have segmented 3D plotting away from other visualizations
Correct

Incorrect
Question 203 of 1443

203. Question
2. What are the three main columns we need to specify in the network data frame to describe relationships in the network?
- Source, target, and value
- Source, target, and frequency
- Frequency, target, and value
- Frequency, source, and value
Correct

Incorrect
Question 204 of 1443

204. Question
2. Why do we need to transpose our data set?
- So that the data can be read correctly, with data points as rows and variables as columns.
- So that we can find new insights in the data.
- So that we can transform the data into categories.
- So we can transform the data into factors.
Correct

Incorrect

Question 205 of 1443

205. Question

3. Sets of variables are provided. Identify whether the set is most likely an example of correlation or causation.

Sort elements

Causation
Correlation
Correlation
Correlation
Causation
Causation

Number of school hours missed and student achievement

Number of fire trucks dispatched and amount of property damage

Amount of money spent each week on vegetables and changes in weight

Number of purchases of rain boots and percent of delayed flights

Correct

Incorrect

Question 206 of 1443

206. Question
3. How was Capital One able to provide customized credit products?
- Capital One invested in data to create predictive models.
- Capital One created diverse offerings and saw which ones were most popular.
- Capital One conducted phone surveys to determine customers' interests.
- Capital One segmented their demographic randomly and made different products.
Correct

Incorrect
Question 207 of 1443

207. Question
2. You are testing a new classification algorithm. Which of the following results may suggest that your algorithm is performing accurately?

Select all that apply
- True positives
- False negatives
- False positives
- True negatives
Correct

Incorrect
Question 208 of 1443

208. Question
2. What does R squared measure?
- R squared measures the how much variability your algorithm explains.
- R squared measures the how much your dependent variable fluctuates.
- R squared measures the the slope of linear regression.
- R squared measures the number of variables in your model.
Correct

Incorrect
Question 209 of 1443

209. Question
2. Fill in the blank below.
- When column titles have dashes or titles using hyphens or spaces, you need to enclose the title with the accent.
Correct

Incorrect
Question 210 of 1443

210. Question
2. What are some ways you can identify outliers in your data?
Select all that apply
- Scatterplots
- Box-and-whisker plots
- Cook's distance
- Other methods covered in other courses
Correct

Incorrect
Question 211 of 1443

211. Question
3. What are some key things you should always check for in your model?
Select all that apply
- Outliers
- Multicollinearity and correlation among the variables
- Adjusted R squared
- Model bias and distribution of residuals (Q-Q plot)
- Standard deviation of residuals to assess model fit
- Heteroscedasticity / pattern of residuals vs. fitted values
Correct

Incorrect
Question 212 of 1443

212. Question
2. What data CAN'T we get from Twitter?
- Personal credit card information
- Text, links, author and blurb about author
- Author's followers and author's friends (people followed)
- Location of author at time of tweet and time the tweet was made
- Retweets from other users and trending topics
Correct

Incorrect
Question 213 of 1443

213. Question
2. Match the symbols to what they represent when calculating an eigenvector.
Sort elements
- a matrix, such as an adjacency matrix
- identity matrix of A
- an eigenvalue of matrix A
- eigenvector of matrix A
- A
- I
- λ
- ν
Correct

Incorrect
Question 214 of 1443

214. Question
2. What are some things we can figure out using The Sunlight Foundation API?
Select all that apply
- Which donors wield influence?
- Which politicians have similar funding sources?
- How politicians and donors cluster into communities?
- Predict which politicians are likely to receive contributions from which donors.
Correct

Incorrect
Question 215 of 1443

215. Question
3. Put the steps of the cluster_edge_betweenness() function in order.
- Remove the edge or edges with the highest edge betweenness score.
- Again remove the edge or edges with the highest edge betweenness score.
- Recalculate the edge betweenness score for each edge.
- Calculate the edge betweenness of all the edges in the graph.
- Stop when either the modularity reaches 0.75 or when there are no more superlative edges left, whichever comes first.
Correct

Incorrect
Question 216 of 1443

216. Question
2. Fill in the blank below.
- R is a powerful tool for because the graphics tie in with the functions used to analyze data
Correct

Incorrect
Question 217 of 1443

217. Question
5. Match the function names to their descriptions.
Sort elements
- labs()
- coord_flip()
- facet_wrap()
- geom_area()
- Sets labels for axes and title
- Flips the axes of a graph
- Splits up data by category to give smaller individual graphs
- Creates an area plot
Correct

Incorrect
Question 218 of 1443

218. Question
4. What are some common languages to create websites?
Select all that apply
- CSS
- JavaScript
- HTML
- Swift
Correct

Incorrect

Question 219 of 1443

219. Question

5. What R code would give me the information in the 3rd-6th rows of the crime data set?

```
 crime_incidents_2013[c(3:6), ] 
```
```
 crime_incidents_2013[ ,c(3:6)] 
```
```
 crime_incidents_2013[c(3, 6), ] 
```
```
 crime_incidents_2013[ ,c(3,6) ] 
```

Correct

Incorrect

Question 220 of 1443

220. Question
4. How do you fix the following error?
```
Error in plot.new() : figure margins too large
```
- Resize the plot window to make it bigger and re-run the code.
- Restart RStudio and run the code again.
- Reload NbClust and run the code again.
- Restructure your data so it's categories with as.factor() function and re-run the code.
Correct

Incorrect
Question 221 of 1443

221. Question
4. Which of the following are ethical questions you may face when using data?
- Which data are private and which data are subject to unlimited use by anyone?
- How is confidentiality supposed to be maintained, when should the data be anonymized?
- How long may you store the data?
- Were the data collected knowingly or were people tricked into supplying information?
Correct

Incorrect
Question 222 of 1443

222. Question
4. Your target demographic is educated, single women who are 25-40 years old. If you use text mining to analyze their comments on your website, what might you find?
- The most common words they use
- Key topics of discussion
- An estimate of their income level
- The best time of day to update your website
Correct

Incorrect
Question 223 of 1443

223. Question
5. In his experiment, Philip Tetlock found that:
Select all that apply
- Experts performed better than random chance.
- Experts underperformed against a statistical model.
- Experts performed better outside their area of expertise.
- Experts outperformed a minimally sophisticated statistical model.
Correct

Incorrect
Question 224 of 1443

224. Question
4. Which programming languages can not be used for data analysis?
- Python
- R
- SQL
- HTML
Correct

Incorrect
Question 225 of 1443

225. Question
4. Increasing the number of variables in a predictive model may not be beneficial because...
- it could decrease its generalizability
- it could lessen its accuracy
- it would reduce the amount of data available for validation tests
- N/A. It is always beneficial to add variables to your model.
Correct

Incorrect
Question 226 of 1443

226. Question
4. If two factors are strongly correlated, such as "temperature" and "what the temperature feels like" in our Bikeshare example, then we are...
Select all that apply
- double counting
- skewing our results
- reducing the interpretability of the model
- not finished refining our model
Correct

Incorrect
Question 227 of 1443

227. Question
4. Networks can be measured and visualized by
Select all that apply
- Importance of jobs
- Location of employees
- Number of emails sent or received
- Type of relationship
Correct

Incorrect
Question 228 of 1443

228. Question
3. Which of these use cases are examples of how regression is applied? Select all that apply.
Select all that apply
- GlaxoSmithKline predicts the supply of participants for clinical trials of new drugs in order to allocate drug trial resources.
- Amazon uses forecasts of consumer demand to manage inventory levels.
- Ford uses forecasts of seasonally adjusted annual sales rates to manufacture cars.
- New South Wales, Australia, predicts travel time in Sydney based on events in the city and weather forecasts.
- The Heritage Provider Network predicts the number of days a patient will spend in the hospital over the next year.
Correct

Incorrect
Question 229 of 1443

229. Question
4. Which package do we use to run the vif() function?
- car
- ggplot
- plyr
- tidyr
Correct

Incorrect
Question 230 of 1443

230. Question
4. What is important to know about span?
Select all that apply
- Span is a smoothing parameter used to determine how many neighbors should be included in each mini-model.
- Span specifies the proportion of the data that are used to fit each point.
- The larger the span is, the smoother the plot will be, and the less close the model will fit the data.
- The smaller the span is, the smoother the plot will be, and the model will not fit the data.
Correct

Incorrect
Question 231 of 1443

231. Question
4. What is a scalar?
- A regular number that is multiplied by a matrix.
- A regular number that is multiplied by another regular number.
- A decimal that is multiplied by a matrix.
- A decimal that is multiplied by a regular number.
Correct

Incorrect
Question 232 of 1443

232. Question
4. What is TRUE about the Jaccard Index?
- The higher the Jaccard Index, the more the behavior of one person or entity is likely to accurately predict the behavior of another person or entity.
- The lower the Jaccard Index, the more the behavior of one person or entity is likely to accurately predict the behavior of another person or entity.
- The higher the Jaccard Index, the less the behavior of one person or entity is likely to accurately predict the behavior of another person or entity.
- The Jaccard Index cannot help you to predict the behavior of another person or entity.
Correct

Incorrect
Question 233 of 1443

233. Question
4. What is the given modularity score to stop iterations at?
- 0.75
- 0.5
- 0.9
- 0.8
Correct

Incorrect
Question 234 of 1443

234. Question
4. What is the output of this R code?
```
 "matrix"[,c(1:4)] 
```
- The first through fourth columns of a matrix
- The first through fourth rows of a matrix
- Only the first and fourth column of a matrix
- Only the first and fourth row of a matrix
Correct

Incorrect
Question 235 of 1443

235. Question
5. Why is it easier to create interactive experiences in R today?
- Because there's been an explosion of R packages with these capabilities
- Because more R users are using JavaScript
- Because coding in HTML has become easier
- Because websites have become more interactive
Correct

Incorrect
Question 236 of 1443

236. Question
5. Which aspect of Big Data is highlighted in these examples?

-Location data from mobile phones can infer how many people were in Macy’s on Black Friday, estimating sales before Macy's aggregates the numbers.
-Amazon's product catalogue receives more than 50 million updates a week, and deliveries and inventories are tracked in real time.
- Velocity
- Volume
- Variety
- Predictive analytics
Correct

Incorrect
Question 237 of 1443

237. Question
5. Why is S more intuitive than R squared?
- S uses the units of the dependent variable.
- S is measured more simply than R-squared.
- S is a more accurate measure than R-squared.
- S is used more commonly than R-squared.
Correct

Incorrect
Question 238 of 1443

238. Question
5. What is TRUE about outliers?
Select all that apply
- Outliers can skew results.
- Outliers can throw off our analysis.
- Outliers should be closely examined and considered.
- Outliers should always be removed from the data.
Correct

Incorrect
Question 239 of 1443

239. Question
5. What do networks represent?
Select all that apply
- Organizational relationships
- Communications patterns
- Economic, environmental, and geographic relationships
- Connections based on interests, preferences, and similarities
Correct

Incorrect
Question 240 of 1443

240. Question
5. What happened when we ran label propagation 100 times on the congressional data?
- The number of clusters increased from 33 to 34.
- There were no changes after 100 iterations.
- New nodes were discovered that weren't included previously.
- R had a higher incidence of crashing.
Correct

Incorrect
Question 241 of 1443

241. Question
5. Which package should you install to plot an interactive network visualization?
- networkD3
- ggplot
- scatterplot3D
- plyr
Correct

Incorrect
Question 242 of 1443

242. Question
Fill in the blank below.
- The method plots the percentage of variance explained by clustering for different numbers of clusters, which allows us to see how the variance differs with the number of clusters that you choose. It can usually be visualized with the graph below:
Correct

Incorrect
Question 243 of 1443

243. Question
Match the table type to the statement that would create it:
Sort elements
- INTO ##TableName
- INTO #TableName
- INTO TableName
- Global Temporary Table
- Local Temporary Table
- Permanent table
Correct

Incorrect
Question 244 of 1443

244. Question
Match the terms to their definitions.
Sort elements
- Measure of how dispersed the data is
- Standardized measure of how dispersed the data is
- Check if there is bias in the data or the model
- Measure of linear relationship between variables (positive/negative)
- Measure of strength of linear relationship between variables (positive/negative)
- How a change in variable x will affect variable y
- % of variation in y that can be explained by the variation in x
- The probability that the pattern exists through random chance, in the absence of a relationship between variables
- Variance
- Standard deviation
- Distribution and "normality"
- Covariance
- Correlation
- Slope
- R squared
- p-values
Correct

Incorrect

Question 245 of 1443

245. Question

Match the attributes to the decision tree calculation.

Sort elements

Entropy
Gini impurity

Categorical attributes

Finds the largest class in the data

Uses algorithms

Continuous variables

Finds groups of classes that make up over 50% of their data

Minimizes classification

Correct

Incorrect

Question 246 of 1443

246. Question
What are the three types of relationships between tables?
Select all that apply
- One-to-Many
- One-to-One
- Many-to-Many
- multi relational
- joined
Correct

Incorrect
Question 247 of 1443

247. Question
Good coding habits include:
Select all that apply
- Adding frequent comments to your code
- Separating out your code in different lines to make it easier to read
- Saving your R scripts for re-use
- Putting multiple functions on one line
Correct

Incorrect
Question 248 of 1443

248. Question
Fill in the blank below.
- While it may look like a scatter plot, this plot maps a third variable to the size of its points so that it can give more information in one graph about the variables in the data.
Correct

Incorrect
Question 249 of 1443

249. Question
Fill in the blank below.
- can have a very negative impact on linear regressions if they are not identified and handled properly because they can skew the algorithm. It's important to identify them early and determine why they do not conform to the majority of the data points in case you need to adjust your model. You can identify them with Cook's distance or boxplots.
Correct

Incorrect
Question 250 of 1443

250. Question
Match the key terms below to their descriptions.
Sort elements
- Check if there is bias in the data or the model
- The probability that the pattern exists through random chance
- Test for multicollinearity and independent variable interaction
- Check the residuals for heteroscedasticity (pattern contingent on fitted values)
- Check for information loss when selecting the right model for your data
- Q-Q plot/ distribution of errors
- p-values
- VIF
- Breusch-Pagan test
- AIC
Correct

Incorrect
Question 251 of 1443

251. Question
What are some ways you can adjust text in the 'Alignment' tab?
- Merge cells
- Indent the text
- Change the direction of the text (vertical vs. horizontal)
- Change the color of the text
Correct

Incorrect
Question 252 of 1443

252. Question
What are some ways you can adjust text in the 'Alignment' tab?
- Merge cells
- Indent the text
- Change the direction of the text (vertical vs. horizontal)
- Change the color of the text
Correct

Incorrect
Question 253 of 1443

253. Question
What are some conclusions from the visualization of Congress?
Select all that apply
- The Democrats are not as tightly clustered as the Republicans.
- There are some members of Congress who don't follow their parties' voting patterns.
- Democrats and Republicans vote similarly.
- Democrats are more tightly clustered than Republicans.
Correct

Incorrect
Question 254 of 1443

254. Question
What are some important features of visual interactivity?
Select all that apply
- It expands the physical limit of what you can show in a given space
- It increases the quality and broadens the variety of angles of analysis
- It facilitates manipulation of the data
- It increases the overall control and customization options
Correct

Incorrect
Question 255 of 1443

255. Question
Please fill in the blank below.
- Logistic regression can also be described as regression, which means that there are only two categories for classification.
Correct

Incorrect

Question 256 of 1443

256. Question

Please match the text mining function to the descriptions.

Sort elements

Vcorpus
Corpus
SimpleCorpus
PCorpus
DCorpus

Creates a volatile corpus object that is fully kept in memory
Creates a corpus with metadata from an object
Creates a simple corpus
Stores documents outside of R in a database

Creates a distributed corpus, a corpus that resides in a certain distributed file system

Correct

Incorrect

Question 257 of 1443

257. Question
Please fill in the blank below.
- distance measures the distance between points by taking the cosine of the angle between them, which measures the similarity between those points both based on the attributes they have and the difference between the attributes they don't have.
Correct

Incorrect
Question 258 of 1443

258. Question
Please fill in the blanks below:
- If two nodes are in the same community, then their delta is equal to , otherwise it’s equal to .
Correct

Incorrect
Question 259 of 1443

259. Question
3. What does it mean to “practice” coding?
- Figure out how to fix bugs
- Type out the code
- Study the material
- Play the scales
Correct

Incorrect
Question 260 of 1443

260. Question
Why does it make sense to separate the code in multiple steps?
Select all that apply
- To follow the code more easily
- To decrease the chance of making mistakes
- To make the code shorter
- To make the code longer and more complex
Correct

Incorrect
Question 261 of 1443

261. Question
Why do we look at correlations between players' salaries and player statistics?
- So we can see if salary and performance are connected for players.
- So we can determine a player's salary.
- So we can look at how much higher professional athletes' salaries are compared to college athletes.
- So we can see which attribute(s) can determine a player's salary.
Correct

Incorrect
Question 262 of 1443

262. Question
Which package do we use to run the vif() function?
- car
- ggplot
- plyr
- tidyr
Correct

Incorrect
Question 263 of 1443

263. Question
Why do we use packages for data manipulation instead of just using built-in R functions?
Select all that apply
- These packages are easier to learn
- This allows you more time to think about the data instead of complex programming
- So we can directly chat to the R community
- These packages are updated regularly, while R is not
Correct

Incorrect
Question 264 of 1443

264. Question
Which of these are examples of clustering?
Select all that apply
- Identifying customer shopping patterns based on previous behavior.
- Identifying voting patterns in a population.
- Identifying how a voter will vote.
- Identifying the seasonal effects in time-series data.
Correct

Incorrect
Question 265 of 1443

265. Question
Which functions below allow you to compare your data set to a normal distribution and plot a bifurcating line on the graph?
Select all that apply
- qqnorm()
- qqline()
- qqplot()
- qqlwd()
Correct

Incorrect
Question 266 of 1443

266. Question
What function do you need to use to perform the k Nearest Neighbors algorithm?
- ```
 kNN() 
```
- ```
 nearest() 
```
- ```
 nfld() 
```
- ```
 k() 
```
Correct

Incorrect
Question 267 of 1443

267. Question
Which function looks up a particular value in a table and produces the row in which the value is located?
- MATCH()
- VLOOKUP()
- INDEX()
- FIND()
Correct

Incorrect
Question 268 of 1443

268. Question
What do you need to run an F-test?
Select all that apply
- The number of coefficients in the model excluding the y-intercept
- The degrees of freedom
- The F-statistic
- The Cook's distance
Correct

Incorrect
Question 269 of 1443

269. Question
Which of these are examples of clustering?
Select all that apply
- Identifying customer shopping patterns based on previous behavior.
- Identifying voting patterns in a population.
- Identifying how a voter will vote.
- Identifying the seasonal effects in time-series data.
Correct

Incorrect
Question 270 of 1443

270. Question
Which language is HighCharts originally written in?
- JavaScript
- C++
- Python
- R
Correct

Incorrect
Question 271 of 1443

271. Question
What is the implication if you have an AUC of 1?
Please select all that apply
- You have overfit the data.
- You have found the optimal model.
- The model won't generalize well to new datasets.
- The model has minimized the number of false positives.
Correct

Incorrect
Question 272 of 1443

272. Question
What package can you use at transforming data quickly?
- Scales
- Transform
- AdaBoost
- RandomForest
Correct

Incorrect
Question 273 of 1443

273. Question
What is one of the differences between additive and multiplicative seasonality?
- Additive models only use addition in their seasonality factors, while multiplicative models use multiplication for their seasonality factors.
- Seasonality factors don't change over time for additive models, but do change over time for multiplicative models.
- Seasonality factors trend downward in additive models, and trend upwards in multiplicative models.
- Seasonality factors change over time for additive models, but do not change over time for multiplicative models.
Correct

Incorrect
Question 274 of 1443

274. Question
Why do we look at 4 and 9 clusters when they only have an explained variance of 13.4%?
- It's the highest number compared to other numbers of clusters.
- We don't actually need a high explained variance in sk-means analysis.
- We want a low number for sk-means and a high number for k-means.
- The explained variance doesn't matter as much as the number of clusters.
Correct

Incorrect
Question 275 of 1443

275. Question
What is the output of Luhn's method?
- Luhn's method
- Bag of words
- Zipf's law
- Term document matrix
Correct

Incorrect
Question 276 of 1443

276. Question
1. Clustering is a form of what type of comparison?
- Quantitative
- Qualitative
- Realistic
- Contrasting
Correct

Incorrect
Question 277 of 1443

277. Question
1. Fill in the blank.
- regression is just like univariate or linear regression, but instead of using just two variables to build a model, it can factor in more variables when building the forecasting model.
Correct

Incorrect
Question 278 of 1443

278. Question
1. Approximately how many needlesticks are there annually in the United States?
- 800,000
- 16,000
- 100
- 50,000
Correct

Incorrect
Question 279 of 1443

279. Question
1. Which function creates a heatmap?
- ```
 geom_tile() 
```
- ```
 heatmap() 
```
- ```
 geom_area() 
```
- ```
 geom_point() 
```
Correct

Incorrect
Question 280 of 1443

280. Question
1. Fill in the blank below
- Networks are composed of two main concepts - , which represent the entities we're interested in, and , which are the relationships between these entities.
Correct

Incorrect
Question 281 of 1443

281. Question
1. Why do we use the silhouette coefficient to determine the number of clusters for sk-means?
- We don't have the outputs necessary for either of those methods.
- The silhouette coefficient is more accurate than NbClust or the elbow method.
- The silhouette coefficient is faster than NbClust or the elbow method.
- The silhouette coefficient is part of the k-means package.
Correct

Incorrect
Question 282 of 1443

282. Question
2. Forecasts of consumer demand can inform decision making by...
- providing insights into how much inventory is needed
- providing estimates for resource allocation
- building greater trust within the network
- summarizing customer reviews
Correct

Incorrect
Question 283 of 1443

283. Question
1. When should you identify how much and what type of data you have?
- Before analyzing data
- While analyzing data (you can't know this until you start analyzing)
- After analyzing data (when sharing findings and insights)
Correct

Incorrect
Question 284 of 1443

284. Question
2. Forecasts of consumer demand can inform decision making by...
Select all that apply
- providing insights into how much inventory is needed
- providing estimates for resource allocation
- building greater trust within the network
- summarizing customer reviews
Correct

Incorrect
Question 285 of 1443

285. Question
1. What data did we remove because it was creating "noise" in our visualization?
Select all that apply
- Cities that have the fewest flights going in and out.
- Cities that have the lowest degree centrality.
- Cities that have the highest degree centrality.
- Cities that have the most flights going in and out.
Correct

Incorrect
Question 286 of 1443

286. Question
2. Match the common seasonal patterns to the correct unit of time.
Sort elements
- Commercials on TV captured every minute
- Electricity use captured every hour
- Hours worked captured every day
- Checks cashed captured every month
- Demand and sales of everything from ice cream to electricity fluctuate in annual patterns
- Hourly
- Daily
- Weekly
- Monthly
- Yearly
Correct

Incorrect
Question 287 of 1443

287. Question
1. Fill in the blank below.
- You can quantify a network via an matrix where the rows and columns represent the nodes and the values in the matrix represent the strength of the connections.
Correct

Incorrect
Question 288 of 1443

288. Question
1. What is TRUE if more people know each other in a community?
Select all that apply
- The more they dislike each other
- The faster a message spreads
- The more resilient the network
- The easier it is to make product recommendations to "similar" people
Correct

Incorrect
Question 289 of 1443

289. Question
2. Which type of data contain sets of categories?
- Factor
- Character
- Boolean
- String
Correct

Incorrect
Question 290 of 1443

290. Question
3. How do we add color by category within the 'threejs' package?
- Create a new column with color names
- Set the fill argument equal to the category you want
- Use the color argument in the aes layer
- Set the renderer argument equal to 'canvas'
Correct

Incorrect
Question 291 of 1443

291. Question
3. What are some insights we discovered when we visualized the Saturday Night Live data?
Select all that apply
- There are distinct eras of cast members
- Al Franken is the most highly connected node
- Early on, there was a big turnover between casts
- There is a high level of interaction between earlier and later cast members
Correct

Incorrect
Question 292 of 1443

292. Question
2. What is one limitation of the k-means analysis we performed in the previous video?
- The analysis didn't differentiate between customers who bought the same product and customers who didn't buy the same product.
- The analysis didn't come up with the correct number of clusters.
- The analysis didn't distinguish between products that were bought and which customers bought them.
- The analysis didn't give us the information about the customers for each cluster.
Correct

Incorrect
Question 293 of 1443

293. Question
3. When building a forecasting model, it is better to have more variables in the model.

How often is this statement true?
- always
- sometimes
- never
Correct

Incorrect
Question 294 of 1443

294. Question
2. Which of these social media sites share data with 3rd parties?
Select all that apply
- Facebook
- Twitter
- Pinterest
- Foursquare
Correct

Incorrect
Question 295 of 1443

295. Question
3. If your classification model has a perfect, 100% accuracy, which of the following questions should you ask?
Select all that apply
- Is the model consistently 100% accurate?
- Was the model validated and tested?
- Were the data used representative of our target population?
- Are the predictor variables independent?
Correct

Incorrect
Question 296 of 1443

296. Question
2. When would bike demand be the greatest in Washington DC?
- 8 am on a sunny Wednesday in April.
- 8 am on a rainy Wednesday in April.
- 8 am on a sunny Wednesday in December.
- 8 pm on a sunny Wednesday in April.
Correct

Incorrect
Question 297 of 1443

297. Question
3. Match the terms to the their definitions.
Sort elements
- Average the shortest path lengths from the node to every other node in the network. Measures who has the broadest reach, which can be useful when determining shipping routes or developing a viral marketing campaign.
- Displays the first given number of elements in the variable.
- Tells you how many nodes exist.
- Closeness centrality
- head( )
- length( )
Correct

Incorrect
Question 298 of 1443

298. Question
3. Match the models to the correlation they are displaying.
Sort elements
- Correlation = 1
- Correlation = 0
- Correlation = -1
Correct

Incorrect
Question 299 of 1443

299. Question
3. Fill in the blank below.
- In a relationship, the value of the dependent variable changes in a non-linear fashion.
Correct

Incorrect
Question 300 of 1443

300. Question
3. Put the steps in order for accessing the Twitter API for free.
- Go to the "Permissions" tab to grant read, write and access direct messages
- Create a new application.
- Go to Twitter.com/signup and create an account
- Press the "Generate My Access Tokens" button and copy both the Access Token Secret and Access Token because you will need to enter these credentials into R later on
- Use the "Keys and Access Tokens" tab to save your API Key and API Secret
Correct

Incorrect
Question 301 of 1443

301. Question
2. What is TRUE about node 7 in the network image below?

Select all that apply
- Node 7 has high eigenvector centrality.
- Node 7 has high betweenness.
- Node 7 is an important member of this network due to its ability to connect communities and disseminate a message.
- Node 7 has low eigenvector centrality and low betweenness.
Correct

Incorrect
Question 302 of 1443

302. Question
2. Fill in the blank below.
- Use the function when combining two data sets into one.
Correct

Incorrect
Question 303 of 1443

303. Question
2. What should you keep in mind as you run a label propagation algorithm?
Select all that apply
- There is no unique solution for the algorithm.
- The algorithm uses network structure to guide community assignment.
- The algorithm is computationally expensive with a larger network.
- The algorithm needs to have a few assigned nodes before it can run properly.
Correct

Incorrect
Question 304 of 1443

304. Question
2. Fill in the blank below.
- To make sure your images are reproducible, you can use the function.
Correct

Incorrect
Question 305 of 1443

305. Question
3. Which of these functions are the "bare bones" of ggplot2?
Select all that apply
- ```
 ggplot() 
```
- ```
 aes() 
```
- ```
 geom() 
```
- ```
 fill() 
```
Correct

Incorrect
Question 306 of 1443

306. Question
3. Why do we invert the y-scale in the rCharts plot?
- So the higher-ranking teams are on top
- So we can separate out the higher-ranking teams
- So we can see all the teams more clearly
- So we can change the scale of the y axis
Correct

Incorrect
Question 307 of 1443

307. Question
3. Why is it important to understand different data types?
Select all that apply
- Data type determines the functions and commands we can use in R
- Data type determines the type of data you have
- Different data types have different data limits
- Data types help us determine how we should format our data
Correct

Incorrect
Question 308 of 1443

308. Question
4. Why do we look at correlations between players' salaries and player statistics?
- So we can see if salary and performance are connected for players.
- So we can determine a player's salary.
- So we can look at how much higher professional athletes' salaries are compared to college athletes.
- So we can see which attribute(s) can determine a player's salary.
Correct

Incorrect
Question 309 of 1443

309. Question
5. Fill in the blank.
- To avoid falsely concluding that one event caused another, a possible method to use is .
Correct

Incorrect
Question 310 of 1443

310. Question
5. A company is planning to re-brand one of its products and the product director wants to know how customers feel about some potential product names. Text mining can be most beneficial in which of these situations?
- Summarizing open response feedback from survey respondents
- Summarizing a 1-5 rating from survey respondents
- Identifying the number of times a focus group participant smiles when they hear the new brand name
- Predicting the likelihood that a customer will like the new name based on whether a similar customer likes the name
Correct

Incorrect
Question 311 of 1443

311. Question
4. Which of these are examples of product recommendations?
Select all that apply
- Approximately 35% of Amazon's sales come from customized suggestions.
- Roughly 70% of Netflix movie choices arise based on lists customized to the viewer's viewing history.
- FedEx can predict which customers will defect to a competitor with 65% - 90% accuracy.
- UPS optimizes truck delivery routes, eliminating left turns to decrease travel time.
Correct

Incorrect
Question 312 of 1443

312. Question
4. Match each data quality with its description.
Sort elements
- The data was recorded correctly
- All relevant data was recorded
- Entities are recorded once
- The data is kept up to date
- The data agrees with itself
- Accuracy
- Completeness
- Uniqueness
- Timeliness
- Consistency
Correct

Incorrect
Question 313 of 1443

313. Question
5. Using this network visualization, order the relationships based upon the strength of their connection. Put the strongest connection on top.
- The relationship between A and B
- The relationship between C and B
- The relationship between A and C
- The relationship between C and D
Correct

Incorrect
Question 314 of 1443

314. Question
3. Which of these conditions would result in a high R-squared?
- The independent variables in the model account for most of the variability in the model.
- The dependent variable in the model account for most of the variability in the model.
- The model is very generalizable to other data sets.
- The model has 10 variables.
Correct

Incorrect
Question 315 of 1443

315. Question
4. Fill in the blank below.
- The package is great for reformatting data in order to visualize the data the way we want to.
Correct

Incorrect
Question 316 of 1443

316. Question
4. Using the Capital Bikeshare data, what questions can we answer through regression analysis?
Select all that apply
- How does any single given factor (air temperature, humidity, wind speed) affect demand for bikes?
- How do several variables (air temperature, humidity, wind speed, day of the week, holidays, hour of the day) affect demand for bikes?
- How can you factor seasonality and cyclicality when forecasting demand?
- How can you figure out how the color of the bike impacts customer ride satisfaction?
Correct

Incorrect
Question 317 of 1443

317. Question
5. Match the methods of variable selection to the correct descriptions.
Sort elements
- Algorithm starts with a model of 0 variables and continues to add more variables based upon a specified measure
- Starts with a model of all variables, and removes variables based upon a specified measure
- Combination of forward and backward selection that starts with a model of 0 variables and adds variables, but can also remove variables based upon a specified measure
- Forward selection
- Backward selection
- Step-wise selection
Correct

Incorrect
Question 318 of 1443

318. Question
4. Match the regression analyses we have learned about to the correct visualization.
Sort elements
- Linear
- Multivariate
- Polynomial
- LOESS
Correct

Incorrect
Question 319 of 1443

319. Question
5. Match the symbols to what they represent when calculating eigenvalues.
Sort elements
- a matrix, such as an adjacency matrix
- identity matrix of A
- an eigenvalue of matrix A
- A
- I
- λ
Correct

Incorrect
Question 320 of 1443

320. Question
3. Which package can you use to find the political data from the Sunlight Foundation?
- rsunlight
- sunlight
- tidyr
- ggplot
Correct

Incorrect
Question 321 of 1443

321. Question
4. How can you determine if the communities that were detected by label propagation are stable?
- Run the algorithm multiple times and then compile the results.
- Determine the number of clusters beforehand.
- Subset the data so there are fewer iterations for communities.
- Exclude any points that look like they're in their own community.
Correct

Incorrect

Question 322 of 1443

322. Question

5. What R code would give me the information in the 3rd-6th rows of the crime data set?

```
 crime_incidents_2013[c(3:6), ] 
```
```
 crime_incidents_2013[ ,c(3:6)] 
```
```
 crime_incidents_2013[c(3, 6), ] 
```
```
 crime_incidents_2013[ ,c(3,6) ] 
```

Correct

Incorrect

Question 323 of 1443

323. Question
5. Why is it advantageous to create interactive graphs for this data set?
Select all that apply
- It makes the data more engaging
- Individuals can select which data to focus on
- It's static and easier to read
- It can switch between data sets automatically
Correct

Incorrect
Question 324 of 1443

324. Question
Based on this decision tree, order the students based on the likelihood that they will be accepted into a graduate program.
- Student with a 4.0 GPA and GRE score in the 70th percentile
- Student with a GRE score in the 80th percentile and GPA less than 2.8
- Student with a 3.8 GPA and GRE score in the 80th percentile
Correct

Incorrect
Question 325 of 1443

325. Question
5. You finished building a predictive model. Which questions may people have about it?
Select all that apply
- What are the most important factors?
- How well does it predict the outcome?
- Where did you get the data?
- What are the general trends?
Correct

Incorrect
Question 326 of 1443

326. Question
5. When does the adjusted R squared increase?
- If the additional variables improve the model more than would be expected by random chance
- If the additional variables improve the model by random chance
- If the additional variables weaken the model more than would be expected by random chance
- If the additional variables don't have an impact on the model
Correct

Incorrect
Question 327 of 1443

327. Question
5. Why does the space after the carat below matter?

"[^ [:graph:]]"
- The space preserves the spaces between the words.
- The space removes the spaces between the words.
- The space doesn't really matter.
- The space adds spaces between letters.
Correct

Incorrect
Question 328 of 1443

328. Question
5. Which node has the highest PageRank value?
- Node 6
- Node 7
- Node 5
- Node 1
Correct

Incorrect
Question 329 of 1443

329. Question
Good coding habits include:
Select all that apply
- Commenting out your code
- Separating out your code
- Wrapping code for re-use
- Putting multiple functions on one line
Correct

Incorrect
Question 330 of 1443

330. Question
Clustering is a form of what type of comparison?
- Quantitative
- Qualitative
- Realistic
- Contrasting
Correct

Incorrect
Question 331 of 1443

331. Question
What are views in SQL valuable for?
Select all that apply
- Creating simplicity by hiding complex queries from end users of data
- Creating security through hiding fields with private information and/or preventing changes to base tables
- Preventing redundancy & increase consistency by providing a common source for data users
- Creating a user interface that allows drag and drop search of tables
- Decreasing the number of objects in a database
Correct

Incorrect
Question 332 of 1443

332. Question
Put the steps of running a t-test in the correct order.
- Build a new regression model based on each sample.
- Use the standard deviation to calculate the p-value for the coefficient value in your regression model.
- Take several samples of the data.
- Measure how the coefficient of each variable changes.
Correct

Incorrect
Question 333 of 1443

333. Question
Put the steps of k-NN in order.
- Calculate distance from the point of interest to all other points
- Perform a majority class vote based on the k nearest neighbors
- Select k, the number of neighbors for the majority vote
Correct

Incorrect

Question 334 of 1443

334. Question

Given the table above, what type of SQL statement should be used to perform the following tasks?

Sort elements

DROP
INSERT
UPDATE
CASE
DELETE
CAST or CONVERT

Remove the entire table
Add a row for company ABC for 2005 to the table
Change the value 1,500 to 15,000

Return a conditional value (e.g. large company or small company) based on the number of employees

Remove the row with company ABC
Specify that a query return the employees field as an integer

Correct

Incorrect

Question 335 of 1443

335. Question

Match the method to the description (note: there are more methods listed than necessary).

Sort elements

Clustering
Network analysis
Text mining
Forecasting
Regression

Measures similarity between data points to group them and identify key similarities that you can use to find trends

Looks at how people, places, and other entities are connected, which can help you determine a sphere of influence and how to propagate your message quickly and effectively

Digests large amounts of text quickly and finds common themes, messages and patterns.

Correct

Incorrect

Question 336 of 1443

336. Question
Fill in the blank below.
- A big strength of the package is the ability to customize graphs by adding layers and adjusting the data so it doesn't look generic. This is a package that brings a lot of flexibility in visualizing your data beyond bar charts.
Correct

Incorrect
Question 337 of 1443

337. Question
Match the models to the correlation they are displaying.
Sort elements
- Correlation = 1
- Correlation = 0
- Correlation = -1
Correct

Incorrect
Question 338 of 1443

338. Question
Fill in the blank below:
- In entropy, the number indicates 100% of the data is the same, and the number indicates a 50-50 split.
Correct

Incorrect

Question 339 of 1443

339. Question

Match the function names to their descriptions.

Sort elements

COUNTIF()
SUMIF()
AVERAGEIF()
AND()

Counts the frequency of occurrence of data points that meet a specific condition

Sums up data across ranges that meet a specific condition

Takes an average of values across ranges that meet a specific condition

Helps you pick out records that meet a variety of conditions you set

Correct

Incorrect

Question 340 of 1443

340. Question

Match the function names to their descriptions.

Sort elements

COUNTIF()
SUMIF()
AVERAGEIF()
AND()

Counts the frequency of occurrence of data points that meet a specific condition

Sums up data across ranges that meet a specific condition

Takes an average of values across ranges that meet a specific condition

Helps you pick out records that meet a variety of conditions you set

Correct

Incorrect

Question 341 of 1443

341. Question
Fill in the blank below.
- Clustering and data mining are types of data analysis , which is a type of data analysis where the intent is to see what the data can tell us beyond modeling or hypothesis testing.
Correct

Incorrect
Question 342 of 1443

342. Question
Match the Plotly function to its symbol:
Sort elements
- Zoom
- Save as png
- Reset axis
- Auto scale
Correct

Incorrect

Question 343 of 1443

343. Question

Match the confusion matrix term with the question that corresponds to it

Sort elements

Accuracy
Misclassification Rate
Specificity
Sensitivity/Precision

Overall, how often is the classifier correct?
Overall, how often is the classifier wrong?

How many actual negative outcomes were correctly predicted as negative?

How many actual positive outcomes were correctly predicted as positive?

Correct

Incorrect

Question 344 of 1443

344. Question
What are some quick and easy-to-use visualizations that express frequency of words?
Please select all that apply
- Bar charts
- Word clouds
- 3D visualizations
- Geographical mapping
Correct

Incorrect
Question 345 of 1443

345. Question
Please fill in the blank below.
- In perfect clustering, the silhouette value for each point will approach .
Correct

Incorrect
Question 346 of 1443

346. Question
What are some use cases for the Jaccard Index?
Select all that apply
- Identify similar patients that may respond to treatment in a similar way
- Identify similar neighborhoods for locating stores or offices
- Identify suppliers that can replace your current ones
- Identify cheaper substitutes for the products and services you purchase
Correct

Incorrect
Question 347 of 1443

347. Question
Why do we need to validate the model?
- To make sure it works with other data
- To prove that it works with the test data
- To determine what type of data we’re working with
- To answer questions about the data
Correct

Incorrect
Question 348 of 1443

348. Question
Which of the answers below is a function?
Select all that apply
- ```
 subset() 
```
- ```
 ggsave() 
```
- ```
 %in% 
```
- ```
 == 
```
Correct

Incorrect
Question 349 of 1443

349. Question
Which two variables showed the strongest correlations with two clusters?
Select all that apply
- Points per game
- Minutes per game
- Rebounds per game
- Free throws per game
Correct

Incorrect
Question 350 of 1443

350. Question
What is TRUE about this Q-Q Plot?
Select all that apply
- This Q-Q Plot shows that our model has fewer residuals at the tails of the distribution.
- The residuals may not be normally distributed, meaning that we could have achieved results at random.
- This Q-Q Plot shows that our model has more residuals at the tails of the distribution.
- The residuals are normally distributed, meaning that we could not have achieved results at random.
Correct

Incorrect
Question 351 of 1443

351. Question
What does SQL stand for?
- Structured Query Language
- Structured Question Language
- Standard Query Language
- Street Questioning Literature
Correct

Incorrect
Question 352 of 1443

352. Question
Which of the following statements is not true about k-means clustering?
Select all that apply
- The centroid has to be a defined point in the data set.
- k-means clustering is an iterative process.
- The set.seed() function ensures that k-means clustering will ensure that the results are reproducible.
- The centroid is the average location of all points in the cluster.
Correct

Incorrect
Question 353 of 1443

353. Question
What does it mean if your model has a smaller standard deviation of residuals?
- Your model is more accurate.
- Your model is less accurate.
- Your model has no errors.
- Your model has innumerable errors.
Correct

Incorrect
Question 354 of 1443

354. Question
What is one of the most effective methods for tuning an algorithm?
- k-fold cross validation
- Information gain
- Testing the data
- Validating the base rate
Correct

Incorrect
Question 355 of 1443

355. Question
Why is it useful to audit your formulas?
- Because you can trace back how you got the results in the cell
- Because you need to check that all the values in the cells are correct
- Because you need to build your formula
- Because you should simplify your formula
Correct

Incorrect
Question 356 of 1443

356. Question
What does this cartoon illustrate?
- Correlation does not always equal causation
- Your sample size should represent your population
- Your model should accurately explain the phenomenon you're observing
- A model is not valid unless it has statistical significance
Correct

Incorrect
Question 357 of 1443

357. Question
Which of the following statements is not true about k-means clustering?
Select all that apply
- The centroid has to be a defined point in the data set.
- k-means clustering is an iterative process.
- The set.seed() function ensures that k-means clustering will ensure that the results are reproducible.
- The centroid is the average location of all points in the cluster.
Correct

Incorrect
Question 358 of 1443

358. Question
What is a conclusion we can make from this visualization?
- The Asian student population has the most variance in scores.
- The average score range is most similar to the Hispanic student population.
- Low income students don't have any outliers.
- There are more Asian students than any other student demographic.
Correct

Incorrect
Question 359 of 1443

359. Question
What is the range of outputs you can expect from a logistic regression model?
- 0 to 1
- -1 to 1
- 1 to 100
- -100 to 100
Correct

Incorrect
Question 360 of 1443

360. Question
What happens when you have a higher degree polynomial model?
- It can fit a more exact curve to the data.
- It is always more generalizable.
- It minimizes the complexity of the model.
- It becomes faster than a linear model.
Correct

Incorrect
Question 361 of 1443

361. Question
Which of these are common smoothing techniques?
Please select all that apply
- Moving averages
- Weighted moving averages
- Centered moving averages
- Multiplicative averages
Correct

Incorrect
Question 362 of 1443

362. Question
Which of these are examples of clustering?
Select all that apply
- Identifying customer shopping patterns based on previous behavior.
- Identifying voting patterns in a population.
- Identifying how a voter will vote.
- Identifying the seasonal effects in time-series data.
Correct

Incorrect
Question 363 of 1443

363. Question
Which package can you use to convert HTML into a readable list?
- XML package
- HTML package
- Parse package
- API package
Correct

Incorrect
Question 364 of 1443

364. Question
Fill in the blank.
- Classification is a method that the behavior of an object or individual.
Correct

Incorrect
Question 365 of 1443

365. Question
1. Label the variables for the equation of the slope.
Sort elements
- Dependent variable
- Independent variable
- Slope of the line
- Y-intercept
- Y
- X
- M
- B
Correct

Incorrect
Question 366 of 1443

366. Question
1. The first component ggplot2 starts with is the:
- data
- graph type
- fill function
- labels and title
Correct

Incorrect
Question 367 of 1443

367. Question
1. Which function pulls geographical information from Google?
- ```
 geocode() 
```
- ```
 locate() 
```
- ```
 ggmap() 
```
- ```
 geolocate() 
```
Correct

Incorrect
Question 368 of 1443

368. Question
2. Fill in the blank below.
- In perfect clustering, the silhouette value for each point will approach .
Correct

Incorrect
Question 369 of 1443

369. Question
1. Which of these questions may be most impacted by seasons or cycles?
- How much revenue will the new summer blockbuster bring in over Memorial Day weekend?
- How many school supplies should we stock during the week before local schools start?
- On average, how much will people spend at the grocery store this week?
- How long are new employees willing to commute to work?
Correct

Incorrect
Question 370 of 1443

370. Question
2. Match the data science team structure with its pros and cons. In each situation the client, may represent either an internal or external team.
Sort elements
- Pros: Standardized processes + strategic goals/vision + client goals met
- Pros: Client goals met
  
  Cons: No strategic goals/vision met and Inconsistent & redundant processes
- Pros: Standardized processes + strategic goals/vision met
  
  Cons: Client goals not met
Correct

Incorrect
Question 371 of 1443

371. Question
2. Choose the correct pair of words to complete the statement.

The objective of a good model is not to fit the data perfectly, it's to have the lowest ___ when applied to new, _____ data.
- error rate, real world
- error rate, testing
- adjusted R squared, varied
- accuracy rate, real world
Correct

Incorrect
Question 372 of 1443

372. Question
1. Remember to always use this function to ensure that R is reading the data as characters before using the grep() function.
- as.character()
- character.as()
- character()
- in.character()
Correct

Incorrect
Question 373 of 1443

373. Question
1. Which visualization below shows a trend with multiplicative seasonality?
Correct

Incorrect
Question 374 of 1443

374. Question
1. Fill in the blank below.
- The eigenvalue is essentially a number that can scale a matrix up and still maintain the of it.
Correct

Incorrect
Question 375 of 1443

375. Question
1. What are two ways to identify clusters from the hclust function?
Select all that apply
- Specify the number of clusters you want.
- Specify the value threshold to split the data.
- Specify the nodes you want in each cluster.
- Specify the linkage type you want between clusters.
Correct

Incorrect
Question 376 of 1443

376. Question
1. Match the names to the graphs below.
Sort elements
- Scatter plot
- Line graph
- Bar graph
- Histogram
Correct

Incorrect
Question 377 of 1443

377. Question
2. Match the map pictures to the type
Sort elements
- Satellite
- Roadmap
- Terrain
- Hybrid
Correct

Incorrect
Question 378 of 1443

378. Question
2. Match the picture type to the tile type
Sort elements
- "Stamen.Toner"
- "Stamen.Watercolor"
- "Esri.WorldImagery"
Correct

Incorrect
Question 379 of 1443

379. Question
3. Fill in the blank below.
- distance measures the distance between points by taking the cosine of the angle between them, which measures the similarity between those points both based on the attributes they have and the difference between the attributes they don't have.
Correct

Incorrect
Question 380 of 1443

380. Question
2. Order the steps needed to build a multivariate regression model.
- Run a regression analysis
- Refine your model
- Decide what you want to predict
- Check your test statistics to see measures of accuracy, correlation, and error
- Identify variables and data you think will influence the prediction
Correct

Incorrect
Question 381 of 1443

381. Question
2. Which of the following is a practical challenge that companies face when using data?
- A limited pool of data-literate talent
- Insufficient time to collect data
- Competing digital projects
- Out-dated technology and resources
Correct

Incorrect
Question 382 of 1443

382. Question
2. Match the network to the possible connections.
Sort elements
- followers
- friends
- co-stars
- alma mater
- emails
- what grocery store you go to
- Twitter
- Facebook
- Netflix catalog
- Past presidents
- Your company
- City
Correct

Incorrect
Question 383 of 1443

383. Question
3. Fill in the blank below.
- Since seasonality and cycles affect forecasts and it is easier to predict outcomes than long-term outcomes - timing always matters when building predictive models.
Correct

Incorrect
Question 384 of 1443

384. Question
3. Put the steps in order for testing the resilience of a network.
- Find the most connected nodes and remove them from the data set
- Remove the nodes with the least amount of connectedness and then plot the network again.
- Plot the data again without those nodes.
Correct

Incorrect
Question 385 of 1443

385. Question
2. What is TRUE about correlation?
Select all that apply
- Correlation identifies the strength of the linear relationship between variables on a scale of -1 to 1.
- A correlation of 1 means that the variables move perfectly in tandem - if one variable increases, then the other increases at a fixed rate.
- A correlation of -1 means that the variables move in a perfectly inverse fashion - if one variable decreases, then the other increases at a fixed rate.
- A correlation of 0 means that there is no linear relationship between the change in x and the change in y.
- Correlation always implies causation.
Correct

Incorrect
Question 386 of 1443

386. Question
2. Match the seasonality analysis use case questions to the business function they relate to.
Sort elements
- What features do we expect customers to like depending on season, time, etc.?
- How much will prices increase on construction supplies next fall?
- How many skis should Head expect to sell next winter?
- How much energy will be consumed in the middle of the day?
- How many newspapers will be sold on Sunday?
- Research & Development
- Buy
- Make
- Ship
- Sell
Correct

Incorrect
Question 387 of 1443

387. Question
2. Match the functions below to the actions they perform.
Sort elements
- creates a text string
- converts the text into an expression that can be manipulated, used in a calculation, etc.
- evaluates the expression
- paste0()
- parse()
- eval()
Correct

Incorrect
Question 388 of 1443

388. Question
2. Match the questions to the 3 or 4 states that someone can have when considering disease spread.
Sort elements
- What is the probability of infection for someone who is susceptible?
- What is the duration of infection?
- Why does someone recover?
- Why does someone become susceptible to disease again?
- Susceptible to infection
- Infected
- Recovered
- Susceptible to infection – again
Correct

Incorrect
Question 389 of 1443

389. Question
3. Which function allows you to remove duplicate data from your data set?
- unique()
- duplicate()
- remove()
- u()
Correct

Incorrect
Question 390 of 1443

390. Question
3. What are some additional parameters you can use for label propagation?
Select all that apply
- The weights of the edges
- Initial community assignments of nodes
- Fixed community assignments of nodes
- The cutoff point for hclust
Correct

Incorrect
Question 391 of 1443

391. Question
3. How can you check the warnings to see if there is anything you should be concerned about?
- warnings()
- errors()
- war()
- help()
Correct

Incorrect
Question 392 of 1443

392. Question
4. Fill in the blank below.
- The function allows you to save graphs in pdf or png formats.
Correct

Incorrect
Question 393 of 1443

393. Question
4. Fill in the blank below
- The term refers to data in quotation marks - the gsub() function manipulates and replaces patterns in this term.
Correct

Incorrect
Question 394 of 1443

394. Question
4. What can you type into RStudio to find more information about a function?
Select all that apply
- ```
 help.search() 
```
- ```
 example() 
```
- ```
 ?? 
```
- ```
 ? 
```
Correct

Incorrect
Question 395 of 1443

395. Question
5. How can you ensure that your analysis is reproducible?
- Use set.seed() at the beginning of the analysis.
- Save the code as a function and run it.
- Use the same data for each iteration.
- Choose the same number of clusters.
Correct

Incorrect
Question 396 of 1443

396. Question
4. You analyzed your customer data and found it made sense to cluster your customers into three distinct categories. When visualizing the data you see that there are a couple of data points that look to be between clusters. What could you conclude?
- There are a couple of customers that don't act like your other customers.
- There are three main types of customers.
- Some customers are in two different clusters.
- You'll need to re-do the analysis because you should only have two clusters.
Correct

Incorrect
Question 397 of 1443

397. Question
3. Why might it be advantageous to analyze word combinations or phrases instead of single words when doing a sentiment analysis?
- The meaning of words change based on context
- Word combinations can be more specific
- Smaller frequencies of word combinations provide more insight
- Word combinations remove unhelpful words (e.g., "the" and "it")
Correct

Incorrect
Question 398 of 1443

398. Question
4. When Hewlett Packard started tracking a range of employee factors, what were the results?
Select all that apply
- HP increased employee retention
- HP decreased recruiting costs
- HP employees felt more scrutinized
- HP increased professional development costs
Correct

Incorrect
Question 399 of 1443

399. Question
5. The datasets graphed all have similar summary statistics (including means and variances). What valuable lesson(s) can be learned from comparing the graphs?

Select all that apply
- Exploratory visualizations done before analyzing data can be insightful
- Outliers can really effect statistical properties
- Cleaning can prevent poor visualization
- Summary statistics can be misleading
Correct

Incorrect
Question 400 of 1443

400. Question
4. Match the common relationships or patterns with the type of network they represent.
Sort elements
- Organizational relationships
- Communication patterns
- Economic, environmental, or geographic relationships
- Connections based on interests, preferences, or similarities
- Manager and employee
- Emails between co-workers
- Citizens in the same tax bracket or state
- Members of the same gym
Correct

Incorrect
Question 401 of 1443

401. Question
4. What does S represent?
Select all that apply
- The average distance that the observed values fall from the regression line.
- The variability that is accounted for by the model.
- The precision of the model.
- The impact of individual variables.
Correct

Incorrect
Question 402 of 1443

402. Question
5. Complete the matrix below.
Sort elements
- Airports that likely serve as the entry points into the U.S. Passengers are likely arriving from abroad, then flying domestically throughout the U.S.
- Airports that likely serve as exit points out of the U.S. These are passengers arriving from domestic flights and then transferring to international flights leaving the country.
- high out-degrees but low in-degrees
- high in-degrees but low out-degrees
Correct

Incorrect
Question 403 of 1443

403. Question
4. Which function adds a linear regression line to your model?
- stat_smooth()
- ggplot()
- geom_point()
- theme()
Correct

Incorrect
Question 404 of 1443

404. Question
4. After running a Breusch-Pagan test, how would I know that there is no heteroscedasticity?
Select all that apply
- The p-value is very large.
- The p-value is very small.
- The residuals are evenly distributed.
- The residuals are not evenly distributed.
Correct

Incorrect
Question 405 of 1443

405. Question
5. Match the Twitter terminology to the correct description.
Sort elements
- relates to topics mentioned by other users
- the user name of a person on Twitter
- signals the start of a user's Twitter handle
- friend or someone interested in what you have to say!
- # (Hashtag)
- Twitter handle
- @ (at)
- Follower
Correct

Incorrect
Question 406 of 1443

406. Question
3. Why do we need to take the largest positive eigenvalue?
- A large positive eigenvalue will ensure that our system of equations has positive and negative values being added together, where no value is very close to zero.
- A large positive eigenvalue will ensure that our system of equations has only negative values being added together.
- A large positive eigenvalue will ensure that our system of equations has only positive values being added together.
- A large positive eigenvalue will ensure that our system of equations has values being added together, where each value is very close to zero.
Correct

Incorrect
Question 407 of 1443

407. Question
4. Which function allows you to read zip files?
- unz()
- zip()
- file()
- read.file()
Correct

Incorrect
Question 408 of 1443

408. Question
4. Why is calculating PageRank so much faster than calculating eigenvector centrality?
- PageRank doesn't calculate eigenvector values for each node.
- PageRank ignores less important nodes in a network.
- Eigenvector doesn't calculate eigenvector values for each node.
- The nodes are weighted in PageRank, not in eigenvector centrality.
Correct

Incorrect
Question 409 of 1443

409. Question
4. What type of data does the grep() function work with?
- Character vectors
- Booleans
- Strings
- Data frames
Correct

Incorrect
Question 410 of 1443

410. Question
5. What are some ways to save and display your rCharts plot?
Select all that apply
- A standalone HTML file
- Embed it in an existing website
- Publish it to rPubs
- Publish it to github
Correct

Incorrect
Question 411 of 1443

411. Question
5. Which question(s) can you answer with text analysis?
- What is this text about?
- What is the probability that new texts will be similar?
- What do people read first when review a text?
- Which reference library is most highly correlated?
Correct

Incorrect
Question 412 of 1443

412. Question
5. When detecting outliers, the chief goal is to:
- Identify data that have a very low probability of occurring.
- Remove data that doesn't fit your expected model.
- Compare artificial and real data points.
- Visualize unusual data from the data set.
Correct

Incorrect
Question 413 of 1443

413. Question
5. Why is it important to remove multicollinearity?
- Multicollinearity reduces the interpretability of the coefficients of the regression model and can cause fluctuation in the significance of variables
- Multicollinearity increases the interpretability of the coefficients of the regression model and can causes stability in the significance of variables
- Multicollinearity has no impact on the interpretability of the coefficients of the regression model or the significance of variables
- Multicollinearity doesn't have to be removed because it doesn't have significant impact on the model
Correct

Incorrect
Question 414 of 1443

414. Question
5. Which function allows you to check the structure of a data set that you create?
- str()
- structure()
- stu()
- s()
Correct

Incorrect
Question 415 of 1443

415. Question
5. Which function can remove "<>" from the data?
- gsub() function
- remove() function
- erase() function
- clean() function
Correct

Incorrect
Question 416 of 1443

416. Question
Match the function to its purpose.
Sort elements
- searches for text in data
- gives you the length of any vector
- tabulates the number of entries for categorical data
- sorts data by a particular column
- grep()
- length()
- table()
- order()
Correct

Incorrect
Question 417 of 1443

417. Question
Fill in the blank below.
- is a measure of the extent to which an increase in one variable corresponds to the increase in another variable. This does not imply causation, which determines whether or not a variable causes the effect on another variable - rather, it determines whether or not there is any connection between the variables that we can quantify.
Correct

Incorrect
Question 418 of 1443

418. Question
Can a table in SQL be joined to itself? (True/False)
- TRUE
- FALSE
Correct

Incorrect
Question 419 of 1443

419. Question
How do you calculate R squared?
- Subtract the ratio of the randomness to the total variance from the number 1.
- Divide the ratio of the randomness to the total variance from the number 1.
- Add the ratio of the randomness to the total variance from the number 1.
- Multiply the ratio of the randomness to the total variance from the number 1.
Correct

Incorrect
Question 420 of 1443

420. Question
The aes layer contains:
- The mappings between the data and the graph
- The geom layer
- The titles and the axes labels
- The initial data analysis
Correct

Incorrect
Question 421 of 1443

421. Question
Match the following terms to the correct definition:
Sort elements
- Pulls rows where the value of the joining field is present in both tables
- Pulls all rows from one table, and only the rows from the second table where the value of the joining field matches a value.
- Pulls all rows from both tables.
- Pulls all possible combinations of rows in all tables.
- INNER JOIN
- (LEFT or RIGHT) OUTER JOIN
- FULL OUTER JOIN
- CROSS JOIN
Correct

Incorrect
Question 422 of 1443

422. Question
Fill in the blank below.
- Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together physically in a graph are probably more similar than data points that are farther apart.
Correct

Incorrect
Question 423 of 1443

423. Question
Fill in the blank below.
- The function shows us the structure of the data.
Correct

Incorrect
Question 424 of 1443

424. Question
How do you measure the explanatory power of your predictive model?
- R squared
- Q squared
- B squared
- C squared
Correct

Incorrect
Question 425 of 1443

425. Question
How was Target able to identify their pregnant customers?
- They used a decision tree to classify their shoppers
- They used k-means clustering to classify their shoppers
- They used a multiple regression to classify their shoppers
- They conducted customer surveys
Correct

Incorrect
Question 426 of 1443

426. Question
Please fill in the blank below.
- The function can help you summarize your data in different ways with just a few clicks - this can be faster and more efficient than using VLOOKUP() and HLOOKUP().
Correct

Incorrect
Question 427 of 1443

427. Question
Please fill in the blank below.
- The function can help you summarize your data in different ways with just a few clicks - this can be faster and more efficient than using VLOOKUP() and HLOOKUP().
Correct

Incorrect
Question 428 of 1443

428. Question
Fill in the blanks below.
- The method is part of unsupervised machine learning and discovers new patterns or groups of data, while the method is part of supervised machine learning and assigns data points to known groups or categories.
Correct

Incorrect

Question 429 of 1443

429. Question

Match the Leaflet functions to the descriptions below:

Sort elements

leaflet()
addTiles()
geocode()
addCircles()

This base function generates the relevant map objects

Creates the visual that defines the look and feel of the map. Many kinds of title are available

Pulls geographical information from Google Maps. Google limits the number of queries to 2500 per day, so use geocodeQueryCheck() to track how many you have used

Plot for latitude and longitude positions of all points in our data frame

Correct

Incorrect

Question 430 of 1443

430. Question
What are some of the negative effects of adding as many variables as possible to our model?
Select all that apply
- Multicollinearity
- Over-fitting
- Heteroscedasticity
- Generalizability
Correct

Incorrect
Question 431 of 1443

431. Question
Please put the steps in order for scraping text data from a webpage.
- Convert to desired data type (e.g. character vector, String class, corpus object)
- Proceed with text cleaning and analyses
- Inspect the output format
- Read in and save the contents of the page into a variable
- Access web page(s) from R and inspect the page structure
Correct

Incorrect
Question 432 of 1443

432. Question
In order to ensure that you don't mistake randomness for patterns, what do you need to do?
- Test and validate results
- Reuse your data in a different order
- Build another model
- Use classification models
Correct

Incorrect
Question 433 of 1443

433. Question
What are some reasons to use APIs?
Please select all that apply
- Convenience
- Don’t have to write custom code for every problem!
- Unified access to data and available tools
- More complicated code is necessary
Correct

Incorrect
Question 434 of 1443

434. Question
What is supervised machine learning?
- Classifying data based on pre-determined categories
- Analysis done under a superior’s supervision
- Data analysis with non-obvious outputs
- An iterative process that creates an accurate model
Correct

Incorrect
Question 435 of 1443

435. Question
Fill in the blank below.
- You can create a by wrapping code in curly braces. This can help you streamline your code to perform multiple steps in one line, similar to a for() loop.
Correct

Incorrect
Question 436 of 1443

436. Question
Why do we cluster the basketball data with 3 clusters after we analyze it with 2 clusters?
- Three clusters may give us more information about patterns in the data that two clusters didn't show.
- More clusters are more accurate, so we'll be able to generalize the data better.
- Two clusters are not complex enough to get insights.
- Three clusters are the default recommendation for k-means clustering.
Correct

Incorrect
Question 437 of 1443

437. Question
When running a variable selection model, how does the computer know when it has found the right variables?
- The Akaike Information Criterion
- Cook's Distance
- Adjusted R Square
- The Breusch-Pagan Test
Correct

Incorrect
Question 438 of 1443

438. Question
Why do we use packages for data manipulation instead of just using built-in R functions?
Select all that apply
- These packages are easier to learn
- This allows you more time to think about the data instead of complex programming
- So we can directly chat to the R community
- These packages are updated regularly, while R is not updated
Correct

Incorrect
Question 439 of 1443

439. Question
Which function makes sure that the k-means analysis is reproducible?
- set.seed()
- kmeans()
- iterate()
- head()
Correct

Incorrect
Question 440 of 1443

440. Question
Which function can you use to see the list of contents that your linear regression model produces?
- ls( )
- lm( )
- lr( )
- lo( )
Correct

Incorrect
Question 441 of 1443

441. Question
What is the complexity parameter?
- It is the amount of improvement in relative error for each node
- It is the level of importance for each node
- It is the amount of complexity of the node
- It determines whether or not a node has multiple categories
Correct

Incorrect
Question 442 of 1443

442. Question
Which error appears for an invalid cell reference?
- #REF!
- #NULL!
- #NAME?
- #VALUE!
Correct

Incorrect
Question 443 of 1443

443. Question
What happens as your model become more precise?
- The model becomes less generalizable to new data
- The model becomes more generalizable to new data
- The model becomes less descriptive of your data set
- The model increases its validity
Correct

Incorrect
Question 444 of 1443

444. Question
Why is clustering more powerful than visualizing?
Select all that apply
- Clustering mathematically defines similarity between all the data points, even the ones on the periphery.
- Clustering can work with many more dimensions than we can visualize.
- Clustering is easier than visualizing data.
- Clustering can group data into pre-defined groups.
Correct

Incorrect
Question 445 of 1443

445. Question
Please fill in the blanks below:
- adjustments affect what data is displayed, while adjustments affect how the data is displayed.
Correct

Incorrect
Question 446 of 1443

446. Question
Which function can you use to run a logistic regression model?
- glm()
- logreg()
- predict()
- summary()
Correct

Incorrect
Question 447 of 1443

447. Question
Why is it important to remove multicollinearity?
- Multicollinearity reduces the interpretability of the coefficients of the regression model and can cause fluctuation in the significance of variables
- Multicollinearity increases the interpretability of the coefficients of the regression model and can causes stability in the significance of variables
- Multicollinearity has no impact on the interpretability of the coefficients of the regression model or the significance of variables
- Multicollinearity doesn't have to be removed because it doesn't have significant impact on the model
Correct

Incorrect
Question 448 of 1443

448. Question
Please fill in the blank below.
- The interval of a regression line, or the standard error, tells you with a certain percentage certainty where the best fit line can be.
Correct

Incorrect
Question 449 of 1443

449. Question
When should you use density-based clustering?
Please select all that apply
- When clusters are irregular
- When clusters are evenly-sized
- When outliers are present
- When noise is present
Correct

Incorrect
Question 450 of 1443

450. Question
1. Fill in the blank.
- A connects points (nodes) by lines that represent relationships. By studying the interactions between people, places and events, you can determine how messages, ideas, and diseases spread and how a change in one thing can cause a cascading set of effects.
Correct

Incorrect
Question 451 of 1443

451. Question
1. How can you make differentiate between categorical variables in a regression?
- Assign each category to a numerical value.
- Rename the categorical variables to numbers in word form.
- Merge all the categories into a continuous scale.
- Find other variables that are quantitative.
Correct

Incorrect
Question 452 of 1443

452. Question
2. Fill in the blanks below.
- Data can be , like temperature which increases incrementally and can be fluid, or , like country names that cannot be divided.
Correct

Incorrect
Question 453 of 1443

453. Question
2. Drag the term to the box next to the correct description.
Sort elements
- Vector
- Matrix
- List
- Data frame
- Collection of elements of the same type
- Multiple rows and columns of the same data type
- Collection of elements of different types
- Multiple rows and columns of different data types
Correct

Incorrect
Question 454 of 1443

454. Question
1. Which industries can benefit from using predictive analytics?
- Politics
- Finance
- Telecommunications
- Human Resources
Correct

Incorrect
Question 455 of 1443

455. Question
2. Choose the correct pair of words to complete the statement.

The objective of a good model is not to fit the data perfectly, it's to have the lowest ___ when applied to new, _____ data.
- error rate, real world
- error rate, testing
- adjusted R squared, varied
- accuracy rate, real world
Correct

Incorrect
Question 456 of 1443

456. Question
1. Which of the following data science skill areas are necessary, if an organization wants to use data to drive decision making?
Select all that apply
- Data visualization and communication
- Data processing
- Data architecture
- Statistics
Correct

Incorrect
Question 457 of 1443

457. Question
1. Which of these questions may be most impacted by seasons or cycles?
Select all that apply
- How much revenue will the new summer blockbuster bring in over Memorial Day weekend?
- How many school supplies should we stock during the week before local schools start?
- On average, how much will people spend at the grocery store this week?
- How long are new employees willing to commute to work?
Correct

Incorrect
Question 458 of 1443

458. Question
1. Which questions will you be able to answer by the end of this course?
Select all that apply
- How do many variables combine in complex relationships to predict how many (bike rentals, boat rentals, concert tickets, etc.) will people buy?
- How do seeming non-quantifiable variables (ex. whether it rains) affect a numerical outcome (ex. number of visitors)?
- What is the pattern of behavior across time?
- How do relationships change over time?
Correct

Incorrect
Question 459 of 1443

459. Question
2. Fill in the blank below.
- A interval is a range of possible values based on the standard deviation of the data.
Correct

Incorrect
Question 460 of 1443

460. Question
1. Why do we define eigenvector centrality based on the number of followers that an account has?
- The more followers an account has, the more it has the ability to influence public opinion.
- The more followers an account has, the less it has the ability to influence public opinion.
- The less followers an account has, the more it has the ability to influence public opinion.
- There is no reasoning for defining eigenvector centrality based on the number of followers an account has.
Correct

Incorrect
Question 461 of 1443

461. Question
1. What does modularity measure?
- Modularity measure the number of communities.
- Modularity measures the probability that identified communities exist.
- Modularity measures the number of nodes in each community.
- Modularity measures the number of communities in each network.
Correct

Incorrect
Question 462 of 1443

462. Question
1. Which two functions do you need to save your image output as a PDF?
Select all that apply
- pdf()
- dev.off()
- save()
- off()
Correct

Incorrect
Question 463 of 1443

463. Question
2. Which of the answers below is a function?
Select all that apply
- ```
 subset() 
```
- ```
 ggsave() 
```
- ```
 %in% 
```
- ```
 == 
```
Correct

Incorrect
Question 464 of 1443

464. Question
3. Fill in the blank below
- are a common format for storing and manipulating geospatial data such as complex borders, polygons, and so on.
Correct

Incorrect
Question 465 of 1443

465. Question
3. Match the function to the question.
Sort elements
- How do you optimize the design of your products?
- Where can your organization get the resources it needs to create its good or service?
- How can you ensure the quality of the product and forecast demand?
- How can you improve routes and identify the most efficient yet low cost strategy for the number of delivery vehicles and staff?
- Which people are you targeting with your advertising?
- Research and Development
- Procurement
- Production
- Shipment
- Sales
Correct

Incorrect
Question 466 of 1443

466. Question
3. Fill in the blank.
- Since seasonality and cycles affect forecasts and it is easier to predict short-term outcomes than long-term outcomes - always matters when building predictive models.
Correct

Incorrect
Question 467 of 1443

467. Question
2. What questions should you ask about your data?
Select all that apply
- Are the data points a representative sample of the population you are investigating?
- Are there missing values or duplicate values in the data?
- What is the original source of the data?
- How much of a data sample do you need to create a valid model?
Correct

Incorrect
Question 468 of 1443

468. Question
3. Put the events in order to describe how an economic network may expand. Place the first event on top.
- A company decides to build new a factory in town
- The factory requires the presence of construction crews and equipment
- New factories and stores are built to meet an increase in demand
- The families of the construction crews spend money in the city
Correct

Incorrect
Question 469 of 1443

469. Question
3. Fill in the blank below based on this chart.
- If you add the row and the row, then you get the observed data row.
Correct

Incorrect
Question 470 of 1443

470. Question
2. Fill in the blank below.
- The function tells R that you'll be working with the vertices or nodes of the graph, and encloses the graph data frame we created earlier.
Correct

Incorrect
Question 471 of 1443

471. Question
3. Put the steps of running a t-test in the correct order.
- Measure how the coefficient of each variable changes.
- Build a new regression model based on each sample.
- Take several samples of the data.
- Use the standard deviation to calculate the p-value for the coefficient value in your regression model.
Correct

Incorrect
Question 472 of 1443

472. Question
3. When do you need to do a logarithmic transformation?
- If you have data that’s growing at very different rates
- If you have data that's growing at the same rates
- If you have data that doesn't need to be transformed
- If you have data that doesn't have any outliers
Correct

Incorrect
Question 473 of 1443

473. Question
3. Fill in the blank below.
- Use the function to rename the columns of the files that you create.
Correct

Incorrect
Question 474 of 1443

474. Question
3. What type of model is the example below?

Common cold: someone can catch a cold, recover, enjoy a period of immunity and then be susceptible to another cold.
- SIRS model
- SIR model
- SI model
- S model
Correct

Incorrect
Question 475 of 1443

475. Question
3. Match these three formatting functions to the correct actions.
Sort elements
- converts any color to the 3-number color identifier
- converts the RBG colors to an hsv format that we can use to add transparency to the color
- adds the level of transparency using the alpha argument
- col2rgb()
- rgb2hsv()
- hsv()
Correct

Incorrect
Question 476 of 1443

476. Question
2. What was Google's main ingredient of its successful algorithm?
- They had the ability to define what was important.
- They had more powerful computers than their competitors.
- They were the first search engine on the internet.
- They had already catalogued all the webpages on the internet.
Correct

Incorrect
Question 477 of 1443

477. Question
5. Which function calls up all the files in a folder for an overview?
- ```
 dir() 
```
- ```
 read.csv() 
```
- ```
 library() 
```
- ```
 help.search() 
```
Correct

Incorrect
Question 478 of 1443

478. Question
5. If your data are not compelling at first,
Select all that apply
- go out and collect more data
- manipulate your data and visualize it differently
- focus on the story you want to tell and adjust the graph
- start collecting data again to get the results you want
Correct

Incorrect
Question 479 of 1443

479. Question
4. What is the main reason to set the 'standalone' parameter to TRUE for saving an HTML file?
- So it can be shared, independent of the computer it's being viewed on
- So it's protected by intellectual property law
- So others can add on more data and functionality
- So you can print it out from anywhere
Correct

Incorrect
Question 480 of 1443

480. Question
4. What are some qualities that make a good visualization?
Select all that apply
- They should be easy to understand
- They should convey information and data intuitively
- They should be visually appealing
- They should contain as much information as possible
Correct

Incorrect
Question 481 of 1443

481. Question
5. How does industry knowledge help us understand our analysis?
- There may be latent factors that we wouldn't know unless we had expertise in that field.
- You can't make an impact unless you're an expert in the field.
- Only athletes can understand how athletes are paid.
- Industry knowledge indicates a better understanding of data analysis.
Correct

Incorrect
Question 482 of 1443

482. Question
Fill in the blank.
- One of the benefits of is that you can evaluate many different factors or dimensions (more than humans can see) when looking for patterns.
Correct

Incorrect
Question 483 of 1443

483. Question
4. Fill in the blank.
- Ideas, such as sentiments about gun control or abortion, can be evaluated using sentiment analysis, a branch of text mining. To do this analysis you will need to use a scale that positions people on a according to the intensity of their feelings.
Correct

Incorrect
Question 484 of 1443

484. Question
5. The example below is an example of which aspect of the 3 V's?

"1 flight of a Boeing 737 across the continental United States generates as much data as is stored in the U.S. Library of Congress."
- Volume
- Velocity
- Variety
- Veracity
Correct

Incorrect
Question 485 of 1443

485. Question
3. You are looking to fill a data analyst position. The following descriptions of previous work experience were found in different resumes. Which resume should you continue reading?
- Cleaned and manipulated large data sets (proficiency in R)
- Organized Brown Bag lunches on data visualization
- Maintained the website for a rapidly growing fashion magazine (proficiency in WordPress)
- Created a strong social media presence and doubled the number of Twitter followers in four months
Correct

Incorrect
Question 486 of 1443

486. Question
5. There is a new employee starting at the firm. Using the graph of current employees, which employee do you predict the new employee will have the strongest ties with?

The new employee (CTO) majored in Literature and previously worked at Microsoft. They have one child and vacation every year in Jamaica. They will be located at the San Francisco office.
- Allie
- Bob
- Cara
- Dave
Correct

Incorrect
Question 487 of 1443

487. Question
4. Fill in the blank below.
- stands for locally weighted scatterplot smoothing and uses linear regression giving more weight to points closer to each point being fitted.
  
  stands for local regression and fits a polynomial function to small segments of the data.
Correct

Incorrect
Question 488 of 1443

488. Question
4. What conclusions were we able to draw from visualizing the network?
Select all that apply
- The cities with the greatest number of connections are Anchorage, Chicago, Atlanta, Detroit and Washington, D.C.
- The biggest hubs should be most guarded to protect the integrity of the airline system
- The airlines that have the most frequent domestic flights
- The number of airline passengers that travel across the United States in one day
Correct

Incorrect
Question 489 of 1443

489. Question
5. True or False. Once you have the results of your model you have conclusively determined the trend of the data and/or have an accurate representation of what is happening in the world.
- FALSE
- TRUE
Correct

Incorrect
Question 490 of 1443

490. Question
4. You can use the predict() function to make predictions using the model that you developed. Put the prediction use cases below in order according to the business function it was used for.
- Make: Life insurance companies predict the age of death in order to approve policies and set pricing
- Ship: Energex (Australian utility) predicts 20 years of electricity demand growth to direct infrastructure investment
- Sell: Harrah's Hotel and Casino in Las Vegas predicts how much a customer will spend over the years, estimating their lifetime value to the casino
- R&D: As much as 40% of trading on the London Stock Exchange is estimated to be driven by trading algorithms
- Buy: Ski manufacturers predict demand for skis each winter, stocking up on supplies
Correct

Incorrect
Question 491 of 1443

491. Question
4. What is TRUE about Twitter?
Select all that apply
- Twitter is a micro blogging site with over 300 million users where people can post messages of up to 140 characters
- 2 people don't have to follow each other, which makes Twitter a directed network
- Users can re-broadcast other users' messages, we can see how news propagates through the network
- Twitter's following model creates more of an interest network, as opposed to a social network
Correct

Incorrect
Question 492 of 1443

492. Question
4. What is TRUE about the network image below?
- The eigenvector centrality values are equal to one because no node is more central than any other node.
- The eigenvector centrality values are equal to zero because no node is more central than any other node.
- There is no way to tell based on this image what the eigenvector centrality values are.
- Some nodes are more central than other nodes.
Correct

Incorrect
Question 493 of 1443

493. Question
4. Match the functions to the correct actions.
Sort elements
- saves the data set
- checks the structure of the data
- tells R what to do if a condition is not met
- checks the sum of the output
- write.csv()
- str()
- ifelse()
- sum()
Correct

Incorrect
Question 494 of 1443

494. Question
4. What are some of the conclusions from the analysis we did on the political data set?
Select all that apply
- The people who are best connected also have similar connections.
- Three of the top 20 nodes are either associations or groups.
- According to Louvain Modularity, legislators who are in communities 5 and 6 tend to be the best funded.
- Most associations or groups only contribute to one political party.
Correct

Incorrect
Question 495 of 1443

495. Question
3. Why might bar graphs be misleading?
- Because the x axis is usually categorical even if it looks numerical
- Because the x and y axes are flipped
- Because the bars are not indicative of categorical data
- Because the bar graph is usually not labeled properly
Correct

Incorrect
Question 496 of 1443

496. Question
5. Which function results in an interactive slider?
- ```
 sliderInput() 
```
- ```
 sliderOutput() 
```
- ```
 slider() 
```
- ```
 interInput() 
```
Correct

Incorrect
Question 497 of 1443

497. Question
5. You are creating a feedback survey to send your customers. You already know their zip code, education level, and age. Which additional survey item captures a different type of information and may add explanatory power to your model?
- What is your address?
- What is the highest degree or level of school you have completed?
- What generation are you in?
- What is your gender?
Correct

Incorrect
Question 498 of 1443

498. Question
5. What should you keep in mind as you're developing and running your model?
Select all that apply
- Where did the data come from?
- Is your sample data representative of the population?
- Does your model account for a lot of variability in the data?
- Have you trained and validated your model?
Correct

Incorrect
Question 499 of 1443

499. Question
5. What is TRUE about this Q-Q Plot?
Select all that apply
- This Q-Q Plot shows that our model has fewer residuals at the tails of the distribution.
- The residuals may not be normally distributed, meaning that we could have achieved results at random.
- This Q-Q Plot shows that our model has more residuals at the tails of the distribution.
- The residuals are normally distributed, meaning that we could not have achieved results at random.
Correct

Incorrect
Question 500 of 1443

500. Question
5. Which function makes the output randomized?
- sample()
- random()
- R()
- randomized()
Correct

Incorrect
Question 501 of 1443

501. Question
6. Which of these safety tips are correct? Select all that apply.
- Point the needle towards you while applying the safety
- Never bend a sharp
- Leave the sharp covered until it is time to use it
- Keep your fingers away from the tip of the sharp object
Correct

Incorrect
Question 502 of 1443

502. Question
Which type of data contain sets of categories?
- Factor
- Character
- Boolean
- String
Correct

Incorrect
Question 503 of 1443

503. Question
How can you ensure that your analysis is reproducible?
- Use set.seed() at the beginning of the analysis.
- Save the code as a function and run it.
- Use the same data for each iteration.
- Choose the same number of clusters.
Correct

Incorrect
Question 504 of 1443

504. Question
Which of the following SQL functions can be used on text fields?
Select all that apply
- LEFT()
- MAX()
- CONCAT()
- SUBSTRING()
- STDEV()
- DATENAME()
Correct

Incorrect
Question 505 of 1443

505. Question
What are some important questions you have to ask in order to be comfortable with your model?
Select all that apply
- Does the variance of the residuals change with the predicted value?
- Do the forces affecting the dependent variable change in some parts of the data and should our model reflect that?
- Does it make a difference to identify outliers or bias in the residuals in our model?
- Is there any way to change our model or does it always have to stay the same?
Correct

Incorrect
Question 506 of 1443

506. Question
Match the function names to their descriptions.
Sort elements
- labs()
- coord_flip()
- facet_wrap()
- geom_area()
- Sets labels for axes and title
- Flips the axes of a graph
- Splits up data by category to give smaller individual graphs
- Creates an area plot
Correct

Incorrect
Question 507 of 1443

507. Question
Given the table above, what SQL function would you use to perform the following tasks?
Sort elements
- SUM(Employees)
- MIN(Employees)
- LEN(Company)
- LEFT(Company,2)
- COUNT(*)
- SUBSTRING(Company,2,1)
- Return the total number of Employees in the table
- Return the smallest number of Employees in the table
- Return the number of characters in the Company field
- Return the first 2 characters of the Company field
- Return the number of records in the table
- Return the second character of the Company field
Correct

Incorrect
Question 508 of 1443

508. Question
How does clustering help when there are more than 3 attributes in the data?
- Clustering helps identify groups with many attributes that you can't easily visualize.
- Clustering can only cluster when there are more than three attributes.
- Clustering is the best method to visualize more than 4 attributes at a time.
- Clustering can gather data more accurately when there are many attributes.
Correct

Incorrect
Question 509 of 1443

509. Question
Fill in the blank below.
- In graphics, the transparency argument is called , where 0 is entirely transparent, and the default of 1 is entirely opaque.
Correct

Incorrect
Question 510 of 1443

510. Question
Match the terms to their definitions.
Sort elements
- Measure of how dispersed the data is
- Standardized measure of how dispersed the data is
- Check if there is bias in the data or the model
- Measure of linear relationship between variables (positive/negative)
- Measure of strength of linear relationship between variables (positive/negative)
- How a change in variable x will affect variable y
- % of variation in y that can be explained by the variation in x
- The probability that the pattern exists through random chance, in the absence of a relationship between variables
- Variance
- Standard deviation
- Distribution and "normality"
- Covariance
- Correlation
- Slope
- R squared
- p-values
Correct

Incorrect
Question 511 of 1443

511. Question
Put the four steps of building a classification tree in order.
- Conditional on the previous answer, select the next question
- Create a new question branch after the previous one
- Stop growing the tree when there is no more information gain
- Ask the question with the most amount of information
Correct

Incorrect
Question 512 of 1443

512. Question
What are descriptive statistics?
- They provide a quantitative summary of the data
- They discover trends in the data
- They make predictions about the data
- They provide the statistics for unsupervised machine learning
Correct

Incorrect
Question 513 of 1443

513. Question
Please fill in the blank below.
- In order to freeze a reference, you can use the .
Correct

Incorrect
Question 514 of 1443

514. Question
Fill in the blank below.
- The main goal of clustering is to intra-cluster distance (the distance between points in a cluster) and inter-cluster distance (the distance between clusters). This ensures that the clusters are as defined and separated as possible.
Correct

Incorrect
Question 515 of 1443

515. Question
Please fill in the blank below:
- , the data visualization expert, argues that the visualization presented initially to NASA did not make the dangers clear for the Challenger shuttle launch, which exploded shortly after launch.
Correct

Incorrect
Question 516 of 1443

516. Question
What are some of the advantages of using LASSO over Ridge?
Please select all that apply
- Its quality control to reduce the effects of having too many or highly correlated variables.
- It can be used for variable selection.
- It measures heteroscedasticity.
- It increases the variance in the model.
Correct

Incorrect
Question 517 of 1443

517. Question
Identify the seasonality time frames for each example.
Sort elements
- Hourly
- Daily
- Weekly
- Monthly
- Patterns of TV commercials
- Electricity use
- Typical office hours
- Cable bills
Correct

Incorrect
Question 518 of 1443

518. Question
How might finding the purchase patterns of different groups help you with your customers?
Select all that apply
- You can send special offers to individuals who are more likely to use them.
- You could stock your shelves with products that are bought together more frequently.
- You could make better product recommendations that complement the purchase patterns.
- You could let your clients know you have their data so they can make more deliberate purchases.
Correct

Incorrect
Question 519 of 1443

519. Question
Put the basics of making API calls in order.
- Construct URI with API key and query parameters (if any)
- The response received from the server
- The request sent to the server
Correct

Incorrect
Question 520 of 1443

520. Question
What is unsupervised machine learning?
- Data analysis that leads to new patterns and conclusions
- Classification and regression
- A way to identify who will vote Democrat and Republican
- Data analysis that classifies data based on pre-determined categories
Correct

Incorrect
Question 521 of 1443

521. Question
Why do we need to validate the model?
- To make sure it works with other data
- To prove that it works with the test data
- To determine what type of data we’re working with
- To answer questions about the data
Correct

Incorrect
Question 522 of 1443

522. Question
How can we address the limitations of our analysis to see the data differently?
Select all that apply
- We could change the starting point or set.seed.
- We could look at different variables.
- We could normalize the data.
- We could add data to the data set.
Correct

Incorrect
Question 523 of 1443

523. Question
What does the Akaike Information Criterion (AIC) do?
Select all that apply
- Measures the "quality" of several statistical models in comparison to each other.
- Provides an estimate of the information lost when the variables in the model are adjusted.
- Explains if heteroscedasticity is likely present in the regression model.
- Measures how much the variance of a regression coefficient is increased due to collinearity.
Correct

Incorrect
Question 524 of 1443

524. Question
What type of data does the grep() function work with?
- Character vectors
- Booleans
- Strings
- Data frames
Correct

Incorrect
Question 525 of 1443

525. Question
What is an important step so that R can read numbers as categories?
- Use the as.factor() function on the data.
- Use the as.character() function on the data.
- Use the as.numerical() function on the data.
- Use the as.category() function on the data.
Correct

Incorrect
Question 526 of 1443

526. Question
Which picture below displays the standard error of a best fit line?
Correct

Incorrect
Question 527 of 1443

527. Question
Which of these is not an attribute of classification?
Select all that apply
- Discovers patterns in data
- Assigns data points to known groups or categories
- Calculates probabilities of events occurring or group membership
- Exploratory data analysis (EDA)
Correct

Incorrect
Question 528 of 1443

528. Question
Why should you avoid using pie charts when visualizing data?
- The human eye does not distinguish differences between areas very well
- The pie chart can only use integers
- The human eye does not distinguish between colors very well
- Excel cannot create pie charts
Correct

Incorrect
Question 529 of 1443

529. Question
Which standard delimiters can Excel identify to split text into multiple columns?
- Tab
- Semicolon
- Comma
- Space
Correct

Incorrect
Question 530 of 1443

530. Question
What is one of the dangers of increasing the number of clusters?
- You could overfit the data so it doesn't generalize well.
- You could increase the complexity of the algorithm beyond a computer's capability.
- You could distort the original data.
- You could discount some data points.
Correct

Incorrect
Question 531 of 1443

531. Question
Which function can you use to check the number of queries you have left for geocode()?
Note: Google limits your queries to 2500 per day, so make to check this if you are going to be geocoding a lot of data points.
- geocodeQueryCheck()
- geocode()
- geoCheck()
- checkGeo()
Correct

Incorrect
Question 532 of 1443

532. Question
Why do we want to minimize false negatives in the bank marketing example?
- So that we don't miss any potential customers who are likely to buy the product.
- So that we only reach out to the customers who are 100% likely to buy the product.
- So that we spend the most amount of resources to capture attention.
- So that we get the highest percentage of customers who buy the product relative to the customers who we reach out to.
Correct

Incorrect
Question 533 of 1443

533. Question
What is multicollinearity?
- When 2 or more independent variables are strongly correlated to one another.
- When 2 or more independent variables show no correlation to on another.
- When 2 or more dependent variables are strongly correlated to one another.
- When 2 or more dependent variables show no correlation to one another.
Correct

Incorrect
Question 534 of 1443

534. Question
What does it mean if you have a small p-value?
- It means that there is a very low probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are connected.
- It means that there is a very high probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are not connected.
- It means that there is a very low probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are not connected.
- It means that there is a very high probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are connected.
Correct

Incorrect
Question 535 of 1443

535. Question
What inputs do you need to give to dbscan()?
Please select all that apply
- eps
- minPts
- silhouette
- k
Correct

Incorrect
Question 536 of 1443

536. Question
1. Classify the each document into one of these three categories: Talent, Research and development, Budget.
Sort elements
- Talent
- Budget
- Research and development
- Budget
- Research and development
- Talent
- Meeting minutes on recruitment
- Meeting minutes on expenditures
- Meeting minutes on A/B testing
- Meeting minutes on new vendors
Correct

Incorrect
Question 537 of 1443

537. Question
1. Fill in the blank below.
- Highly unusual data and other anomalies in a data set are called .
Correct

Incorrect
Question 538 of 1443

538. Question
1. How much experience do you have in Excel?
- Never used it before 1 2 3 4 5 I'm an expert
Correct

Incorrect
Question 539 of 1443

539. Question
1. Why is it important to format the axes correctly?
Select all that apply
- To prevent assumptions that the axes represent data similarly, even if that is not the case
- To clarify the analysis of the statistically-smoothed line
- To make the trends more obvious and accurate to the observer
- To help format the legends and titles
Correct

Incorrect
Question 540 of 1443

540. Question
1. Drag the description into the box next to the appropriate term.
Sort elements
- whole number
- number with decimals
- data type written out in words
- either TRUE or FALSE
- variable that can be assigned value
- Integer
- Double
- String/characters
- Boolean/logicals
- Factor
Correct

Incorrect
Question 541 of 1443

541. Question
1. Match the reason data are valuable with its description.
Sort elements
- Maintaining accurate and secure data for a long period of time may help prove accountability and avoid penalties.
- Through the use of data, processes that were previously manual may be made more efficient.
- Descriptive statistics can reveal what has already happened and may provide surface-level insights.
- The use of data science methods can help extract novel, powerful insights, anticipate behaviors, and build tools.
- Compliance
- Automation
- Dashboards
- Predictive analytics
Correct

Incorrect
Question 542 of 1443

542. Question
2. If the average high temperature in January is 40- 50 degrees Fahrenheit, order the temperatures with the temperatures that are most likely outliers at the top and the temperatures that are least likely outliers at the bottom.
- 40 degrees
- 45 degrees
- 63 degrees
- 22 degrees
Correct

Incorrect
Question 543 of 1443

543. Question
1. What does every row in the customer data set represent?
- Each row represents a dimension.
- Each row represents a product.
- Each row represents a characteristic
- Each row represents a business.
Correct

Incorrect
Question 544 of 1443

544. Question
2. If the average high temperature in January is 40- 50 degrees Fahrenheit, order the temperatures with the temperatures that are most likely outliers at the top and the temperatures that are least likely outliers at the bottom.
- 63 degrees
- 22 degrees
- 45 degrees
- 40 degrees
Correct

Incorrect
Question 545 of 1443

545. Question
1. Fill in the blank below.
- Regression is a model that captures the between two or more variables.
Correct

Incorrect
Question 546 of 1443

546. Question
2. Fill in the blank below.
- stands for LOcal regrESSion, which is a black box method that fits a polynomial function to small segments of the data.
Correct

Incorrect
Question 547 of 1443

547. Question
1. What are some examples that prove what happens to one node can happen to other nodes in a network?
Select all that apply
- Diseases and messages spread
- Information technology and internet networks collapse
- Ecosystems grow or shrink
- Economic boom and bust cycles work
Correct

Incorrect
Question 548 of 1443

548. Question
2. Please fill in the blanks below:
- If two nodes are in the same community, then their delta is equal to , otherwise it’s equal to .
Correct

Incorrect
Question 549 of 1443

549. Question
2. What are two ways to import data from your computer into RStudio?
Select all that apply
- Tools > Import Dataset > From Text File
- ```
 variable = read.csv("Name of file") 
```
- Session > Load Workspace...
- ```
 variable = read.data("Name of file") 
```
Correct

Incorrect
Question 550 of 1443

550. Question
2. Fill in the blank below.
- R is a powerful tool for because the graphics tie in with the functions used to analyze data
Correct

Incorrect
Question 551 of 1443

551. Question
3. Fill in the blank below.
- The merge() function or join() function can be used to combine two .
Correct

Incorrect
Question 552 of 1443

552. Question
2. Please fill in the blank below.
- When recognized and credentialed experts incorrectly predict outcomes, they may claim that despite poor results, their reasoning was sound or there were poorly timed serendipitous events. Greater can help ensure incorrect predictions have real consequences.
Correct

Incorrect
Question 553 of 1443

553. Question
3. To do outlier detection you need:
- To know the underlying pattern
- To visualize all the data available
- To first classify the data into categories
- To map on a regression line and see which points are above the line
Correct

Incorrect
Question 554 of 1443

554. Question
3. Put the steps of the data chain in order.
- Visualization
- Storage
- Analysis
- Acquisition
- Cleaning
Correct

Incorrect
Question 555 of 1443

555. Question
3. What is an adjacency matrix?
- It represents the connections in a network in a chart.
- It represents a node and its adjacent nodes.
- It indicates the type of relationships in a network.
- It is an interactive visualization of a network.
Correct

Incorrect
Question 556 of 1443

556. Question
3. To do outlier detection, you need:
Select all that apply
- To know the underlying pattern.
- To visualize all the data available.
- To first classify the data into categories.
- To map on a regression line and see which points are above the line.
Correct

Incorrect
Question 557 of 1443

557. Question
2. Which symbol do you use to tell R that you are ending the use of your function()?
- }
- )
- -
- ]
Correct

Incorrect
Question 558 of 1443

558. Question
2. What does it mean if you have a small p-value?
- It means that there is a very low probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are connected.
- It means that there is a very high probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are not connected.
- It means that there is a very low probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are not connected.
- It means that there is a very high probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are connected.
Correct

Incorrect
Question 559 of 1443

559. Question
2. Put the steps for standardizing scales in order.
- For normally distributed data, this centers the data at 0 and gives it a standard deviation of 1
- Divide each observation by the standard deviation of the data
- Subtract the mean from each observation type
Correct

Incorrect
Question 560 of 1443

560. Question
2. Fill in the blank below.
- You can remove punctuation in your data with the function.
Correct

Incorrect
Question 561 of 1443

561. Question
2. Match the steps in identifying message diffusion to the correct image.
Sort elements
- 1. Identify the point where the message originates from.
- 2. Find the neighbors (direct connections) of original point
- 3. Run the algorithm that uses the probability/parameters to determine which (if any) neighbors receive the message
- 4. Recalculate the algorithm on the neighbors of the points who received the message
- 5. Continue until the parameters you set have been met
Correct

Incorrect
Question 562 of 1443

562. Question
2. Which code below can be used to remove the record in the 312th row of data?
- Selected_names_party = Selected_names_party[-c(312), ]
- Selected_names_party = Selected_names_party[+c(312), ]
- Selected_names_party = Selected_names_party[-c(300), ]
- Selected_names_party = Selected_names_party[-c(312)]
Correct

Incorrect
Question 563 of 1443

563. Question
3. Please fill in the blank below:
- The sum of all PageRank values for a network is .
Correct

Incorrect
Question 564 of 1443

564. Question
4. Fill in the blank below.
- is a programming term that means "named"; we use this term to indicate that data has been input into the R environment.
Correct

Incorrect
Question 565 of 1443

565. Question
4. Why did the linear model line break into 5 different ribbons with the "fill = Continent" code?
- The code was inherited from the ggplot function to the other layers
- We specified to fill the line by continent
- Ggplot2 wanted to group the data by linear model
- The code was isolated in the ggplot layer
Correct

Incorrect
Question 566 of 1443

566. Question
4. Which function enables the display size to adjust to the size of the browser window?
- ```
 fluidPage() 
```
- ```
 shinyUI() 
```
- ```
 titlePanel() 
```
- ```
 sidebarLayout() 
```
Correct

Incorrect
Question 567 of 1443

567. Question
5. Interactive visualization allows users to:
Select all that apply
- explore the data at their own pace.
- choose which information they want to see.
- swap in their own data to visualize that instead.
- determine the type of analysis that will be done on the data.
Correct

Incorrect
Question 568 of 1443

568. Question
4. How can we address the limitations of our analysis to see the data differently?
Select all that apply
- We could change the starting point or set.seed.
- We could look at different variables.
- We could normalize the data.
- We could add data to the data set.
Correct

Incorrect
Question 569 of 1443

569. Question
3. You already have a successful business in Philadelphia. You want to expand into another, similar city. Use the visualization to help determine what city you should launch in next.
- Chicago
- New York City
- San Diego
- Lancaster
Correct

Incorrect
Question 570 of 1443

570. Question
5. If you wanted to replicate the case study on Newton and extract political dispositions among Twitter users, order the steps you would take.
- Collect data from Twitter
- Sort tweets according to topic
- Use domain knowledge to interpret the results
- Visualize frequency of tweets on each topic
Correct

Incorrect
Question 571 of 1443

571. Question
4. The 3 V's of data are:
Select all that apply
- Volume
- Velocity
- Variety
- Veracity
Correct

Incorrect
Question 572 of 1443

572. Question
5. Fill in the missing words.
- 5. Accountability is important to consider prior to launching a data science project.
  
  Accountability: a clear project owner and the key stakeholders of the project need to be clearly defined – and this depends on the project. It may be the or another C-suite executive such as the Chief Data Scientist.
Correct

Incorrect
Question 573 of 1443

573. Question
4. How many in-degrees and out-degrees does Cara have?
- Out-degree of 2, in-degree of 1
- Out-degree of 1, in-degree of 2
- Out-degree of 3, in-degree of 0
- Out-degree of 0, in-degree of 3
Correct

Incorrect
Question 574 of 1443

574. Question
5. Identify the seasonality time frames for each example.
Sort elements
- Hourly
- Daily
- Weekly
- Monthly
- Patterns of TV commercials
- Electricity use
- Typical office hours
- Cable bills
Correct

Incorrect
Question 575 of 1443

575. Question
4. Which calculation does the "closeness" function in the igraph package perform?
- Divides one by the sum of the lengths of the shortest paths to or from all other nodes in the graph.
- Multiplies one by the sum of the lengths of the shortest paths to or from all other nodes in the graph.
- Subtracts one from the sum of the lengths of the shortest paths to or from all other nodes in the graph.
- Adds one to the sum of the lengths of the shortest paths to or from all other nodes in the graph.
Correct

Incorrect
Question 576 of 1443

576. Question
4. What does it mean if your model has a smaller standard deviation of residuals?
- Your model is more accurate.
- Your model is less accurate.
- Your model has no errors.
- Your model has innumerable errors.
Correct

Incorrect
Question 577 of 1443

577. Question
5. Match the key terms below to their descriptions.
Sort elements
- Check if there is bias in the data or the model
- The probability that the pattern exists through random chance
- Test for multicollinearity and independent variable interaction
- Check the residuals for heteroscedasticity (pattern contingent on fitted values)
- Check for information loss when selecting the right model for your data
- Q-Q plot/ distribution of errors
- p-values
- VIF
- Breusch-Pagan test
- AIC
Correct

Incorrect
Question 578 of 1443

578. Question
4. Which function lets you see how many calls you have left from the Twitter API?
- getCurRateLimitInfo()
- getinfo()
- twitteRinfo()
- twitterapi()
Correct

Incorrect
Question 579 of 1443

579. Question
3. Why is sorting your data by betweenness and eigenvector centrality based on the number of followers helpful?
- It shows you the people in your network that are the most important.
- It shows you the people in your network that are the least important.
- It isn't all that helpful to sort your data by betweenness and eigenvector centrality.
- It is helpful for unknown reasons.
Correct

Incorrect
Question 580 of 1443

580. Question
4. Fill in the blank below.
- stands for Hue, Saturation, and Value.
Correct

Incorrect
Question 581 of 1443

581. Question
5. How can a politician use these metrics to help with their fundraising?
- A politician can reach out to organizations that have donated to similar politicians, but haven't donated to her.
- A politician can use this information to quit and become a lobbyist.
- A politician can publish these results to put pressure on organizations to donate to her campaign.
- A politician can use this to ask for more money from the donors she already has.
Correct

Incorrect
Question 582 of 1443

582. Question
4. Visualization is an iterative process. Put the steps in order starting with "Analyze"
- Graph
- Repeat
- Manipulate
- Analyze
Correct

Incorrect
Question 583 of 1443

583. Question
5. Which part of the "Simple App" application is the user input?
- When the user slides the slider to a different location
- When the graph adjusts itself based on the slider
- When the user views the graph
- When the server determines the number of breaks in the graph
Correct

Incorrect
Question 584 of 1443

584. Question
5. You finished building a predictive model. Which questions may people have about it?
- What are the most important factors?
- How well does it predict the outcome?
- Where did you get the data?
- What are the general trends?
Correct

Incorrect
Question 585 of 1443

585. Question
5. Why is it important to sanity check yourself before you move on to the next step of your analysis?
Select all that apply
- It helps you to determine whether or not the data and your analysis make sense.
- It helps create a more efficient process for realizing when you need to go back and redo any of the steps.
- It helps you better understand the functions of an organization.
- It helps you learn the common applications of network analysis across industries.
Correct

Incorrect
Question 586 of 1443

586. Question
5. Which visualizations below show time-series analysis?
Select all that apply
Correct

Incorrect
Question 587 of 1443

587. Question
5. What conclusions can be drawn from the visualization below showing cumulative dispersion of Tweets over time?

Select all that apply
- Tweeting a message just once won’t get it heard by everyone you want to reach.
- You may want to re-post a message several times to spread it.
- Tweeting a message just once is enough.
- Don't re-post a message because it will be too repetitive to users.
Correct

Incorrect
Question 588 of 1443

588. Question
7. When should you report a needlestick injury to your regional manager?
- Never
- First 24 hours
- Same day
- 1 month later
Correct

Incorrect
Question 589 of 1443

589. Question
Which operator pulls rows that contain specified terms you're searching for to create a new dataset with only those rows?
- ```
 %in% 
```
- ```
 in 
```
- ```
 %% 
```
- ```
 <- 
```
Correct

Incorrect
Question 590 of 1443

590. Question
What are some conclusions we see when we graph points per game by minutes per game with three clusters?
Select all that apply
- Some players with good statistics are paid much less than other players with similar statistics.
- There are players who are paid a lot, but don't have good statistics.
- There's no correlation between minutes per game and points per game.
- The most talented players are the lowest paid.
Correct

Incorrect
Question 591 of 1443

591. Question
Order these logical operators from fastest to slowest in terms of query performance:
- >,>=,<,<=
- =
- LIKE
- <>
Correct

Incorrect
Question 592 of 1443

592. Question
After running a Breusch-Pagan test, how would you know that there is no heteroscedasticity?
Select all that apply
- The p-value is very large.
- The p-value is very small.
- The residuals are evenly distributed.
- The residuals are not evenly distributed.
Correct

Incorrect
Question 593 of 1443

593. Question
Fill in the blank below.
- The 'gg' in ggplot2 stands for .
Correct

Incorrect
Question 594 of 1443

594. Question
Table: Company_Employees

Which SQL code will result in a frequency distribution of the Company field?
- SELECT Company, COUNT(*) FROM Company_Employees GROUP BY Company
- SELECT * FROM Company_Employees
- SELECT COUNT(*) FROM Company_Employees
- SELECT Year, COUNT(*) FROM Company_Employees GROUP BY Company
Correct

Incorrect

Question 595 of 1443

595. Question

Match the terms to the descriptions.

Sort elements

Betweenss
Withinss
Totss

Sum of all the squared distances between data points in different clusters

Sum of all the squared distances between points within the same cluster

Total sum of squares

Correct

Incorrect

Question 596 of 1443

596. Question
Fill in the blanks below.
- In order to best visualize your data, you may have to transform it to a different format. data displays multiple occurrences per row and is easier to read in tables, while data displays one observation per row and is easier to plot with in ggplot2.
Correct

Incorrect
Question 597 of 1443

597. Question
What do you need to run an F-test?
Select all that apply
- The number of coefficients in the model excluding the y-intercept
- The degrees of freedom
- The F-statistic
- The Cook's distance
Correct

Incorrect

Question 598 of 1443

598. Question

Match the attributes to the decision tree calculation.

Sort elements

Entropy
Gini impurity

Categorical attributes

Finds the largest class in the data

Uses algorithms

Continuous variables

Finds groups of classes that make up over 50% of their data

Minimizes classification

Correct

Incorrect

Question 599 of 1443

599. Question

Match the description to the statistics term

Sort elements

Mean
Mode
Median
Variance

The average value you should expect to get out of a set of numbers

The number that occurs most frequently in a data set

The middle value when a set of numbers is arranged in either a decreasing or increasing order

Measures the dispersion of the data

Correct

Incorrect

Question 600 of 1443

600. Question
What do What-If analysis tools do?
- They change the values of variables in cells to test out multiple scenarios of a model
- They give you lists of dependent and independent variables to test out
- They give you a list of results based on the dimensions that you want
- They give you a list of questions to ask about your model
Correct

Incorrect
Question 601 of 1443

601. Question
Fill in the blank below.
- The method plots the percentage of variance explained by clustering for different numbers of clusters, which allows us to see how the variance differs with the number of clusters that you choose. It can usually be visualized with the graph below:
Correct

Incorrect
Question 602 of 1443

602. Question
What are some aspects of a good visualization?
Please select all that apply
- They are easy to understand
- They convey information intuitively
- They are visually appealing
- They create more confusion about the data
Correct

Incorrect
Question 603 of 1443

603. Question
Please fill in the blank below
- Group learning is called learning, and is made up of many 'weak learners', which is the term for a classification algorithm that performs better than random chance. This type of approach to classification trees is called a random forest.
Correct

Incorrect

Question 604 of 1443

604. Question

Match the types of time series analyses to their descriptions.

Sort elements

Spectral analysis
Wavelet analysis
Auto-correlation
Cross-correlation

Analyzes how time-series data can decompose into different frequencies that constitute underlying patterns

Studies wave patterns used to extract information from audio signals and images

The delayed correlation of a signal with itself (the past predicts the future)

Measures similarity between 2 different series and understands dependency between variables in the context of time

Correct

Incorrect

Question 605 of 1443

605. Question
Please fill in the blank before.
- DBSCAN uses a center-based approach to estimate for a particular point by counting the number of points present in a specified radius (eps) of that point.
Correct

Incorrect
Question 606 of 1443

606. Question
What are some quick and easy-to-use visualizations that express frequency of words?
Please select all that apply
- Bar charts
- Word clouds
- 3D visualizations
- Geographical mapping
Correct

Incorrect
Question 607 of 1443

607. Question
Which of these is an example of exploratory data analysis?
- Using customer attributes to group customers and find new patterns
- Grouping customers into groups that have been created already
- Identifying outliers in the data based on the model
- Analyzing the data to match expected outcomes
Correct

Incorrect
Question 608 of 1443

608. Question
What is supervised machine learning?
- Classifying data based on pre-determined categories
- Analysis done under a superior’s supervision
- Data analysis with non-obvious outputs
- An iterative process that creates an accurate model
Correct

Incorrect
Question 609 of 1443

609. Question
What does SQL stand for?
- Structured Query Language
- Structured Question Language
- Standard Query Language
- Street Questioning Literature
Correct

Incorrect
Question 610 of 1443

610. Question
What is heteroscedasticity?
- Bias in the residuals as a function of predicted value.
- Bias as a result of outliers in the data.
- No bias evident in the residuals as a function of predicted value.
- No bias because there are no outliers in the data.
Correct

Incorrect
Question 611 of 1443

611. Question
Which function eliminates duplicate rows?
- ```
 unique() 
```
- ```
 order() 
```
- ```
 duplicate() 
```
- ```
 grep() 
```
Correct

Incorrect
Question 612 of 1443

612. Question
Why is clustering more powerful than visualizing?
Select all that apply
- Clustering mathematically defines similarity between all the data points, even the ones on the periphery.
- Clustering can work with many more dimensions than we can visualize.
- Clustering is easier than visualizing data.
- Clustering can group data into pre-defined groups.
Correct

Incorrect
Question 613 of 1443

613. Question
What is TRUE about correlation?
Select all that apply
- Correlation identifies the strength of the linear relationship between variables on a scale of -1 to 1.
- A correlation of 1 means that the variables move perfectly in tandem - if one variable increases, then the other increases at a fixed rate.
- A correlation of -1 means that the variables move in a perfectly inverse fashion - if one variable decreases, then the other increases at a fixed rate.
- A correlation of 0 means that there is no linear relationship between the change in x and the change in y.
- Correlation always implies causation.
Correct

Incorrect
Question 614 of 1443

614. Question
Which attribute does not belong to clustering?
Select all that apply
- Exploratory data analysis (EDA)
- Discovers and forms new groups or categories of data
- Calculates probabilities of events occurring or group membership
- Supervised machine learning technique
Correct

Incorrect
Question 615 of 1443

615. Question
What tab can you find the charting functionality in?
- Insert
- Tables
- Graphs
- Plots
Correct

Incorrect
Question 616 of 1443

616. Question
Which function would I use to change transform the text below to all capital letters?
```
 coUpE --> COUPE
```
- ```
 UPPER() 
```
- ```
 LOWER() 
```
- ```
 CAPS() 
```
- ```
 PROPER() 
```
Correct

Incorrect
Question 617 of 1443

617. Question
What is one of the most effective methods for tuning an algorithm?
- k-fold cross validation
- Information gain
- Testing the data
- Validating the base rate
Correct

Incorrect
Question 618 of 1443

618. Question
Which piece of code will pass along the output from one function to the input of another function?
- ```
 %>% 
```
- ```
 %in% 
```
- ```
 fwd() 
```
- ```
 %pass% 
```
Correct

Incorrect
Question 619 of 1443

619. Question
What type of regression offsets the inclusion of too many or irrelevant variables?
- Penalized regression
- Logistic regression
- Linear regression
- Multiple regression
Correct

Incorrect
Question 620 of 1443

620. Question
4. Which image below shows a polynomial regression?
Correct

Incorrect
Question 621 of 1443

621. Question
What type of approach is support vector machines?
- Classification
- Unsupervised
- Neural network
- Network analysis
Correct

Incorrect
Question 622 of 1443

622. Question
Which function in R can help you scale your data?
- scale()
- scale.data()
- cluster()
- dbscan()
Correct

Incorrect
Question 623 of 1443

623. Question
1. Fill in the blank.
- There are many methods for analyzing data. When forecasting or predicting future events, the two most common methods are classification and .
Correct

Incorrect
Question 624 of 1443

624. Question
1. Fill in the blank below.
- A is a web of connections.
Correct

Incorrect
Question 625 of 1443

625. Question
1. How can you search for different types of visualizations ?
- ?visNetwork
- visNetwork?
- help?
- ?help
Correct

Incorrect
Question 626 of 1443

626. Question
4. Fill in the blank below.
- is a programming term that means "named"; we use this term to indicate that data has been input into the R environment.
Correct

Incorrect
Question 627 of 1443

627. Question
1. Why might three dimensional graphs be a good visualization option?
Select all that apply
- You can display additional variables on the same visualization
- It can be more engaging to the audience
- It automatically makes the data easier to interpret
- It is better at conveying simple data
Correct

Incorrect
Question 628 of 1443

628. Question
2. According to Edward Tufte, what is one of the main reasons why the Challenger Space Shuttle was allowed to take off?
- The dangers of the cold weather were not conveyed well through visualizations.
- NASA decided that the risk was low enough to launch the shuttle.
- The dangers of the timing of the launch were not conveyed well through visualizations.
- A final safety check was not completed before the launch.
Correct

Incorrect
Question 629 of 1443

629. Question
1. When using data you may encounter any of the challenges listed below. Match each challenge with its description.
Sort elements
- Having the right staff with the right skills
- Getting the right data, the right sample size, and statistical significance
- Using data that may not have been collected with your intended use in mind
- Putting all the pieces together to extract meaningful insights from your data and use them in a responsible way
- Practical challenge
- Epistemological challenge
- Ethical challenge
- Grand challenge
Correct

Incorrect
Question 630 of 1443

630. Question
1. Match the opportunity for using data with its business function.
Sort elements
- How can you optimize the design of your products and decrease the development time?
- How can you decrease costs and optimize inventory levels?
- How do you ensure the quality of the product and forecast demand to ensure that you are not producing too few or too many goods?
- How can you ensure that you are optimizing routing and have the most efficient yet low cost strategy?
- How can you find customers for your products, which people should you target with your advertising, and how can you predict how much of your goods and services people will want?
- R&D
- Buy
- Make
- Ship
- Sell
Correct

Incorrect
Question 631 of 1443

631. Question
2. Fill in the blank below.
- Classification is a method that the behavior of an object or individual so we can forecast future events.
Correct

Incorrect

Question 632 of 1443

632. Question

1. Match the method to the questions below.

Sort elements

Categorization (Clustering + Text Mining)
Categorization (Classification)
Regression
Relationship Learning (Network Analysis)
Regression

How do people group together based on preferences?
How can you anticipate what people will like?
How can you anticipate how much someone will buy?

How can you reach the maximum number of people most efficiently?

How can you predict overall sales?

Correct

Incorrect

Question 633 of 1443

633. Question
1. Fill in the blank below.
- To speed up our work and avoid running calculations on each data point manually, we can use the loop function, which performs a set of operations as many times as you tell it to.
Correct

Incorrect
Question 634 of 1443

634. Question
1. Which questions can we use local regression to answer?
Select all that apply
- How do relationships change over time?
- How can you predict demand for bike rentals in the short term in the absence of rich, contextual data?
- How do categorical variables affect a numerical outcome?
- How do we forecast and identify short term trends with historical data?
Correct

Incorrect
Question 635 of 1443

635. Question
1. Put the steps in order that you go through in R to understand message diffusion.
- Continue until the parameters you set have been met
- Recalculate the algorithm on the neighbors of the points who received the message
- Find the neighbors (direct connections) of original point
- Run the algorithm that uses the probability/parameters to determine which (if any) neighbors receive the message
- Identify the point where the message originates from
Correct

Incorrect
Question 636 of 1443

636. Question
2. Does node 6 belong in a community with node 1?
- Yes
- No
Correct

Incorrect
Question 637 of 1443

637. Question
3. What are two ways to load data from the Internet into RStudio?
Select all that apply
- Tools > Import Dataset > From Web URL
- ```
 variable = read.csv("Web URL") 
```
- Session > Set working directory > Choose directory
- ```
 variable = load.data("Web URL") 
```
Correct

Incorrect
Question 638 of 1443

638. Question
3. Fill in the blank below
- Web pages today are - they constantly update as the relevant information changes.
Correct

Incorrect
Question 639 of 1443

639. Question
2. Which of these functions can be used for applying functions to data frames?
- ```
 ddply() 
```
- ```
 ifelse() 
```
- ```
 rename() 
```
- ```
 cbind() 
```
Correct

Incorrect
Question 640 of 1443

640. Question
2. What is Big Data?
- The accumulation of large volumes of a variety of data at an unprecedented velocity
- A revolution in measurement
- A point of view which guides how decisions should be made
- A resource
Correct

Incorrect
Question 641 of 1443

641. Question
3. Fill in the missing word.
- Some of the most interesting data generated and publicly available today can be accessed through something called an . Instead of downloading finite-sized excel spreadsheets or data sets, it allows you direct access to a database. You don’t have to download all of it, you can query for the sections and types of data you want or syphon data at a much higher speed than a regular download.
Correct

Incorrect
Question 642 of 1443

642. Question
2. Which task can take up a considerable amount of a data scientist's time?
- Data preparation and cleaning
- Data retrieval
- Data analysis
- Data visualization
Correct

Incorrect
Question 643 of 1443

643. Question
3. Match each person in the network with their strength.
Sort elements
- Greatest ability to spread a message (degree centrality)
- Can reach the most people in the shortest amount of time (closeness centrality)
- Is a great connector (betweenness centrality)
- A person with a lot of connections
- A person whose average path to all others is shortest
- A person with the most shortest paths
Correct

Incorrect
Question 644 of 1443

644. Question
2. Match the networks to their network analysis use cases.
Sort elements
- Prevent or control the spread of disease.
- Support participation and contributions from many types of users.
- Provide immediate assistance to customers who have problems or complaints, and anticipate problems before they occur.
- Band together to better understand their communities and government, or take collective action.
- Healthcare organizations
- Websites
- Businesses
- Individuals
Correct

Incorrect
Question 645 of 1443

645. Question
3. Fill in the blank below.
- vertex_connectivity() can tell you how many need to be removed from a graph in order for any 2 nodes to become disconnected.
Correct

Incorrect
Question 646 of 1443

646. Question
2. Fill in the blank below.
- You can run the correlation analysis using the function .
Correct

Incorrect
Question 647 of 1443

647. Question
3. What is the moving average?
- An average of a set number of points.
- An average of all data points.
- An average of a single point.
- An average of an undetermined number of points.
Correct

Incorrect
Question 648 of 1443

648. Question
2. Fill in the blank below.
- The package contains nice, pre-set color schemes that are useful when creating an interactive network visualization.
Correct

Incorrect
Question 649 of 1443

649. Question
3. What does the "ego" function do?
- Finds all vertices within a pre-set limit from a given vertex
- Finds all edges within a pre-set limit from a given vertex
- Tells R to subset certain vertices or nodes from a graph
- Tells R to subset certain edges from a graph
Correct

Incorrect
Question 650 of 1443

650. Question
2. Which statement below is TRUE?
- Having high school classmates as mutual friends indicates a strong connection.
- Having high-profile celebrities as mutual friends indicates a strong connection.
- Having high-profile politicians as mutual friends indicates a strong connection.
- Making contributions to political campaigns on both sides of the aisle says a lot about community membership.
Correct

Incorrect
Question 651 of 1443

651. Question
2. Why might it be better to use the PageRank metric instead of the eigenvector metric for a large network?
Select all that apply
- PageRank takes the broader structure of the network into account.
- PageRank's computational time is much faster for large networks.
- PageRank has much higher accuracy.
- Eigenvector doesn't work on large networks.
Correct

Incorrect
Question 652 of 1443

652. Question
3. Why do we use ‘==‘ instead of ‘=‘ to pull the day shift data?
- We are not defining a variable, we are telling R what data to select
- We need to define the data as a variable
- We need to indicate that the data is in column form, not row form
- We need to reformat the data as a vector
Correct

Incorrect
Question 653 of 1443

653. Question
4. Fill in the blank below.
- The function adds a layer to a graph without having to create a data.frame and map it to the scales. This allows us to easily add text or other information right onto our visualization.
Correct

Incorrect
Question 654 of 1443

654. Question
4. How do you stop the application from running?
- Click the small "Stop Sign" at the corner of the console window
- Close the browser window of the application
- Type "stopApp("application")" into the window or console
- Click on the "Source" button in the script window
Correct

Incorrect
Question 655 of 1443

655. Question
3. While you may have access to big data, it does not mean that you have
- insights and increased revenue.
- a lot of information.
- untapped data.
- the potential for insights.
Correct

Incorrect
Question 656 of 1443

656. Question
3. Why can't we use View function to view the k-means results?
- Since k-means created lists with different numbers of rows, you'd get an error with the 'View' function.
- Because the data can't be seen in an organized format.
- K-means creates data that is too complicated to be visualized in R.
- The k-means results are not numerical.
Correct

Incorrect
Question 657 of 1443

657. Question
4. Ensemble learning, a concept in machine learning, happens when a group of learners are used together to arrive at a more accurate decision. Based on this concept, which Yelp! review would you consider when deciding whether or not to dine at a restaurant?
- The most recent review of a restaurant
- The first review of a restaurant
- A group of reviews of a restaurant
- None of the reviews because you don't know if the reviewers are unbiased
Correct

Incorrect
Question 658 of 1443

658. Question
4. If you wanted to use text mining to know the road conditions, which source would you try using first?
- Updated website posts
- Radio talk show updates
- TV news updates
- Images of updated traffic maps
Correct

Incorrect
Question 659 of 1443

659. Question
3. Which of these is not a type of data?
- GPS coordinates
- Traveling speed
- Social media "Likes"
- Petabyte
Correct

Incorrect
Question 660 of 1443

660. Question
4. When evaluating a work sample provided by a candidate for your data science team, which of these questions might you ask?
Select all that apply
- How did you know that your analysis was valid?
- How did you tailor or customize your work for the company?
- What was the question you sought to answer with this work?
- What new insights did you gain from your work?
Correct

Incorrect
Question 661 of 1443

661. Question
4. Which of the following is MOST characteristic of an opinion leader?
- Someone with a blog that is followed by many other people
- Someone who knows the most people
- Someone with the easiest access to the most powerful people
- Someone who follows a lot of blogs
Correct

Incorrect

Question 662 of 1443

662. Question

4. Match the situation with your next step.

Sort elements

Next, you should run a regression analysis.
Next, you should run a multivariate regression analysis.
Next, you should run a polynomial regression analysis.
Next, you should run a LOWESS or LOESS regression analysis.

You want to see how two variables interact.
You want to see how five variables interact.

You want to see if you can get a better fit with your five variables.

You want to see if your five variables are being influenced by seasonal changes.

Correct

Incorrect

Question 663 of 1443

663. Question
4. ignore.case = TRUE is useful when
- You want to make sure you will not get an error due to lower/upper case of the letters.
- You are looking for similar cases to the one you are working with.
- You want to make sure that all of the data has to be in capital letters.
- You are looking to change the case of the word you specify.
Correct

Incorrect
Question 664 of 1443

664. Question
4. Which functions below allow you to compare your data set to a normal distribution and plot a bifurcating line on the graph?
Select all that apply
- qqnorm()
- qqline()
- qqplot()
- qqlwd()
Correct

Incorrect
Question 665 of 1443

665. Question
4. Which image below shows a polynomial regression?
Correct

Incorrect
Question 666 of 1443

666. Question
5. Match the searchTwitter() arguments to the correct functions.
Sort elements
- Search query to issue to twitter. Use "+" to separate query terms
- Maximum number of tweets to return
- Languages to use
- Earliest date to pull tweets, formatted as YYYY-MM-DD
- Latest date to pull tweets, formatted as YYYY-MM-DD
- Tweets by users located within a given radius of the given latitude/longitude
- Tweets with IDs greater (i.e. newer) than the specified ID
- Tweets with IDs smaller (i.e. older) than the specified ID
- searchString
- n
- lang
- since
- until
- geocode
- sinceID
- maxID
Correct

Incorrect
Question 667 of 1443

667. Question
4. Why is it important to save dynamic plots as html files?
Select all that apply
- Dynamic plots render faster in a web browser.
- You can share your dynamic plots more easily with others.
- Dynamic plots are harder to read in a web browser.
- It makes it more difficult to share your plots.
Correct

Incorrect
Question 668 of 1443

668. Question
5. Match the functions to what they do.
Sort elements
- rename the columns
- renames the rows
- view the output
- save the output
- colnames()
- rownames()
- view()
- write.csv()
Correct

Incorrect
Question 669 of 1443

669. Question
4. In order for the str_detect() function to work, what format does the data need to be in?
- character format
- factor format
- numeric format
- Boolean format
Correct

Incorrect
Question 670 of 1443

670. Question
4. How do you know that this is an undirected graph?
- There are no arrows
- The thickness of lines varies
- It's circular and not hierarchical
- Not all nodes are connected
Correct

Incorrect
Question 671 of 1443

671. Question
5. Why is the 'if(!require)' function used in the code below?
```
 if (!require("devtools")) 
```
```
 install.packages("devtools") 
```
```
 devtools::install_github("rstudio/shinyapps") 
```
- To ensure that the package is installed if it is not already
- To automatically install the package when run
- To prioritize the installation of the package
- To uninstall the package before the code is run
Correct

Incorrect
Question 672 of 1443

672. Question
5. When detecting outliers, the chief goal is to:
- Identify data have a very low probability of occuring
- Remove data that doesn't fit your expected model
- Compare artificial and real data points
- Visualize unusual data from the data set
Correct

Incorrect
Question 673 of 1443

673. Question
5. What can we learn from the IBM case study?
- Network analysis, when used effectively, can have a hugely positive impact on businesses and organizations.
- Network analysis is nice to have as a tool, but not essential to business success.
- There are more efficient ways to find answers than to use network analysis.
- Network analysis rarely impacts the actions of businesses and organizations because it is too difficult to implement.
Correct

Incorrect
Question 674 of 1443

674. Question
5. Which function can help you identify periodicity quickly?
- acf()
- diff()
- str()
- ts()
Correct

Incorrect
Question 675 of 1443

675. Question
5. Which outputs do you get from the dispersion simulation?
Select all that apply
- HTML file that contains the simulation output
- Folder that contains all the images for the animation
- "js" folder that includes the execution files (you can open these in R)
- "css" folder that defines the graphics of the output (you can open these in R)
Correct

Incorrect
Question 676 of 1443

676. Question
1. Data science is at the intersection of which three domains?
- Industry Knowledge, Machine Learning, Programming
- Programming, Mathematics and Statistics, Industry Knowledge
- Mathematics and Statistics, Programming, Big Data
- Programming, Industry Knowledge, Big Data
Correct

Incorrect
Question 677 of 1443

677. Question
Put the six Data Science control cycle steps in order, starting with “Ask”
- Validate
- Model
- Test
- Ask
- Research
- Interpret
Correct

Incorrect
Question 678 of 1443

678. Question
How does industry knowledge help us understand our analysis?
- There may be latent factors that we wouldn't know unless we had expertise in that field.
- You can't make an impact unless you're an expert in the field.
- Only athletes can understand how athletes are paid.
- Industry knowledge indicates a better understanding of data analysis.
Correct

Incorrect
Question 679 of 1443

679. Question
Match the following SQL Server components to their definition
Sort elements
- programs that provides database services to other computer programs
- a container of data/information organized into tables (and other structures) so that they can be easily managed and accessed back in same fashion.
- data stored in a tabular format with rows of named columns
- An application used to configure, manage, and administer components of SQL server (i.e. the user interface for accessing servers, databases, and tables to launch commands to the server)
- Server
- Database
- Table
- SQL Server Management Studio
Correct

Incorrect
Question 680 of 1443

680. Question
Match the methods of variable selection to the correct descriptions.
Sort elements
- Algorithm starts with a model of 0 variables and continues to add more variables based upon a specified measure
- Starts with a model of all variables, and removes variables based upon a specified measure
- Combination of forward and backward selection that starts with a model of 0 variables and adds variables, but can also remove variables based upon a specified measure
- Forward selection
- Backward selection
- Step-wise selection
Correct

Incorrect
Question 681 of 1443

681. Question
Identify the three things necessary to make a graph in ggplot2.
Select all that apply
- The data
- The shapes on the screen (such as bars or points)
- A way to map the data to the shapes
- Titles and labeled axes
Correct

Incorrect
Question 682 of 1443

682. Question
Match the table type to the statement that would create it:
Sort elements
- INTO ##TableName
- INTO #TableName
- INTO TableName
- Global Temporary Table
- Local Temporary Table
- Permanent table
Correct

Incorrect
Question 683 of 1443

683. Question
In order for us to determine how much variation our clusters account for, we need to:
- divide the inter-cluster variance by the total variance.
- divide the total variance by the inter-cluster variance.
- divide the intra-cluster variance by the inter-cluster variance.
- divide the inter-cluster variance by the intra-cluster variance.
Correct

Incorrect
Question 684 of 1443

684. Question
Clustering is a form of what type of comparison?
- Quantitative
- Qualitative
- Realistic
- Contrasting
Correct

Incorrect
Question 685 of 1443

685. Question
Put the steps of running a t-test in the correct order.
- Build a new regression model based on each sample.
- Use the standard deviation to calculate the p-value for the coefficient value in your regression model.
- Take several samples of the data.
- Measure how the coefficient of each variable changes.
Correct

Incorrect
Question 686 of 1443

686. Question
Put the steps of k-NN in order.
- Select k, the number of neighbors for the majority vote
- Perform a majority class vote based on the k nearest neighbors
- Calculate distance from the point of interest to all other points
Correct

Incorrect
Question 687 of 1443

687. Question
Please fill in the blank below.
- If the data is to one side or another, then there are more data points either greater or less than the average, which affects what you can deduce from the data.
Correct

Incorrect

Question 688 of 1443

688. Question

Match the What-If analysis tool to its description:

Sort elements

Goal Seek
Scenario Manager
Data Table

Determines how to get a desired result for a dependent variable within the analysis

Allows you to consider multiple combination of values for independent input variables in a analysis

Sees the effects of varying one or two variables in an analysis

Correct

Incorrect

Question 689 of 1443

689. Question
Fill in the blank below:
- In entropy, the number indicates 100% of the data is the same, and the number indicates a 50-50 split.
Correct

Incorrect
Question 690 of 1443

690. Question
Please fill in the blanks below
- Shiny applications have two basic components - the script, which defines the appearance of the app, and the script, which contains actions to perform based on user input.
Correct

Incorrect
Question 691 of 1443

691. Question
Please put the flow of AdaBoost.M1 in order below.
- Compute misclassification error
- Compute new weights using adjustment term
- Compute adjustment term
- Run Classification Algorithm
- Assign initial (equal weights to data points)
- Run classification algorithm again and repeat
Correct

Incorrect
Question 692 of 1443

692. Question
Please fill in the blank below.
- The process of dividing the time series into its components is called - it divides it up into four components called level, trend, seasonality, and random error.
Correct

Incorrect

Question 693 of 1443

693. Question

Please match the point names to the descriptions.

Sort elements

Core points
Border points
Noise points

The points which are present in the interior of the dense region

The points which are present on the edge of a dense region
The points which are in a sparsely occupied region

Correct

Incorrect

Question 694 of 1443

694. Question
Put the steps of simplified Luhn's method in order below:
- Break up each document into sentences.
- Select only sentences that are “important" for our summary
- Get the list of the first n most frequently occurring words in the corpus
- Select only the sentences whose indices were returned from the previous step
Correct

Incorrect
Question 695 of 1443

695. Question
1. Why did we choose R programming language over other languages?
- It’s the language of choice for statisticians
- It has a large library of tools and packages
- It’s mainly used for programming
- It is flexible and creates powerful visualizations
Correct

Incorrect
Question 696 of 1443

696. Question
What is unsupervised machine learning?
- Data analysis that leads to new patterns and conclusions
- Classification and regression
- A way to identify who will vote Democrat and Republican
- Data analysis that classifies data based on pre-determined categories
Correct

Incorrect
Question 697 of 1443

697. Question
Which functions below allow you to compare your data set to a normal distribution and plot a bifurcating line on the graph?
Select all that apply
- qqnorm()
- qqline()
- qqplot()
- qqlwd()
Correct

Incorrect
Question 698 of 1443

698. Question
Which statement is not true if you receive a positive result from a cancer test that is 95% accurate with a base rate of 1 out of 5,000 people a month?
- There is still a small chance that you have cancer
- There is a 95% chance that you have cancer
- You should get more tests before starting any treatments
- A higher base rate would lead to a smaller chance that the result is a true positive
Correct

Incorrect
Question 699 of 1443

699. Question
What would be the output of the following code for vector 'v':
```
 v[2:7] 
```
- The second term through the seventh term
- The data in the second row and seventh column
- The second term and the seventh term
- The terms with numbers ‘2’ and ‘7'
Correct

Incorrect
Question 700 of 1443

700. Question
What is one of the dangers of increasing the number of clusters?
- You could overfit the data so it doesn't generalize well.
- You could increase the complexity of the algorithm beyond a computer's capability.
- You could distort the original data.
- You could discount some data points.
Correct

Incorrect
Question 701 of 1443

701. Question
What is R Squared?
- A number that indicates the proportion of the variance in the dependent variable that is predictable from the independent variable.
- A number that indicates the confidence we can have in the results of our data.
- A number that indicates the difference between the dependent variable and the independent variable.
- A number that indicates the length of the y-axis in comparison with the x-axis of our model.
Correct

Incorrect
Question 702 of 1443

702. Question
Which of these is not a strength of kNN?
Select all that apply
- It's easy to explain
- It's computationally inexpensive to run, with fast results in real time
- New data can be added any time to the algorithm
- It's fast for recommendation engines once the metrics have been calculated and stored
Correct

Incorrect
Question 703 of 1443

703. Question
Which of these is regression not designed to do?
Select all that apply
- Predict probabilities
- Confirm causation
- Predict the value of a variable
- Accurately predict the exact number of a future event
Correct

Incorrect
Question 704 of 1443

704. Question
Please fill in the blank below.
- These two functions allow you to look up a value in a table, and return the desired value from another column in that table. The is for vertical tables, and the is for horizontal tables.
Correct

Incorrect
Question 705 of 1443

705. Question
What is the complexity parameter?
- It is the amount of improvement in relative error for each node
- It is the level of importance for each node
- It is the amount of complexity of the node
- It determines whether or not a node has multiple categories
Correct

Incorrect
Question 706 of 1443

706. Question
Which analogy best describes the UI and server script?
- The UI script is the outside of the car and the server is the engine.
- The UI script is a tree, and the server is the roots.
- The UI script is a computer and the server is the monitor.
- The UI script is the coffee and the server is the coffee beans.
Correct

Incorrect
Question 707 of 1443

707. Question
What should you do before applying Ridge regressions?
- Standardize the predictors
- Select the correct value of the tuning parameter
- Remove multiple variables from the model
- Sum up all the data points
Correct

Incorrect
Question 708 of 1443

708. Question
What is a powerful R package you can use for text mining?
- tm
- XML
- TextMining
- ReadHTML
Correct

Incorrect
Question 709 of 1443

709. Question
Which SVM classifier plots a 'worst-fit line'?
- Maximal margin
- Transformation
- Transductive
- Bayesian
Correct

Incorrect
Question 710 of 1443

710. Question
Which of the following are applications of principal component analysis?
Please select all that apply
- Image analysis
- Facial recognition
- Community detection
- Risk analysis
Correct

Incorrect
Question 711 of 1443

711. Question
1. Fill in the blank.
- regression is just like univariate or linear regression, but instead of using just two variables to build a model, it can factor in more variables when building the forecasting model.
Correct

Incorrect
Question 712 of 1443

712. Question
1. Fill in the blank below.
- plots intermediate points between two locations.
Correct

Incorrect
Question 713 of 1443

713. Question
1. How does R determine whether two nodes belong in the same community?
- The modularity will increase.
- The modularity will decrease.
- The modularity stays the same.
- The modularity becomes negative.
Correct

Incorrect
Question 714 of 1443

714. Question
1. Fill in the blank below.
- is a term that means only looking at a portion of the data. It is denoted in R by a '$' symbol.
Correct

Incorrect
Question 715 of 1443

715. Question
1. Which operator pulls rows that contain specified terms you're searching for to create a new dataset with only those rows?
- ```
 %in% 
```
- ```
 in 
```
- ```
 %% 
```
- ```
 <- 
```
Correct

Incorrect
Question 716 of 1443

716. Question
2. When Hewlett Packard started tracking a range of employee factors, what were the results?
Select all that apply
- HP increased employee retention
- HP decreased recruiting costs
- HP employees felt more scrutinized
- HP increased professional development costs
Correct

Incorrect
Question 717 of 1443

717. Question
1. Put the steps of a data science project in order.
- Storage- store and access data
- Cleaning- re-format and check data for accuracy
- Acquisition- acquire or collect data
- Visualization- present your data
- Analysis- analyze the data
Correct

Incorrect
Question 718 of 1443

718. Question
1. Why did the Oakland A's 'Moneyball' strategy succeed?
- The leader understood the potential of analytics in his business and was able to implement changes in the organization.
- The data was not unavailable until that year.
- The methods were only perfected at that time.
- Paul DePodesta was able to convince Billy Beane to try the analytics for baseball players.
Correct

Incorrect
Question 719 of 1443

719. Question
1. What are some pitfalls of clustering?
Select all that apply
- Poor data quality may result in clustering that is not accurate.
- Too many dimensions may increase computing power.
- Non-spherical data or uneven clusters may result in poor clustering.
- Categories may not be present in clustering.
Correct

Incorrect
Question 720 of 1443

720. Question
2. How can you determine what people are interested in?
Select all that apply
- Mine customer reviews on Amazon.
- Mine email customer feedback.
- Perform sentiment analysis on tweets.
- Look at behaviors of interest groups.
Correct

Incorrect
Question 721 of 1443

721. Question
1. Fill in the blank below.
- can have a very negative impact on linear regressions if they are not identified and handled properly.
Correct

Incorrect
Question 722 of 1443

722. Question
1. What are some sanity check questions you should ask yourself when building regression models?
Select all that apply
- Is the data that you used temporal? Are there season trends? Do the relationships between variables change over time?
- What is the distribution of the data? Is it "normal", are there biases?
- Is there multicollinearity or heteroscedasticity?
- How many variables are you using and what is the adjusted R squared?
Correct

Incorrect
Question 723 of 1443

723. Question
1. Which function allows you to select rows that you will keep in your data set?
- lapply()
- rows()
- selectrows()
- apply()
Correct

Incorrect
Question 724 of 1443

724. Question
2. What is the underlying concept of edge betweenness?
- It's the percentage of shortest paths in a network that include a given edge.
- It's the number of connections in the shortest path.
- It's the number of nodes that have the highest number of shortest paths.
- It's the number of paths within a given network.
Correct

Incorrect
Question 725 of 1443

725. Question
2. Which piece of code will retrieve only the third column of a matrix called 'm'?
- ```
 m[, 3] 
```
- ```
 m[3, ] 
```
- ```
 m[3, 3] 
```
- ```
 m[[, 3]] 
```
Correct

Incorrect
Question 726 of 1443

726. Question
2. Why is faceting the baseball plot not a good option?
Select all that apply
- It's difficult to make direct comparisons between teams
- The plots are small and hard to see
- It's hard to see each team individually
- It's missing important data
Correct

Incorrect
Question 727 of 1443

727. Question
3. Fill in the blank below.
- data is also valuable for the initial exploratory data analysis.
Correct

Incorrect
Question 728 of 1443

728. Question
2. Which of the following is a practical challenge that companies face when using data?
- A limited pool of data-literate talent
- Insufficient time to collect data
- Competing digital projects
- Out-dated technology and resources
Correct

Incorrect
Question 729 of 1443

729. Question
2. Which three words are used to describe Big Data?
Select all that apply
- Variety
- Velocity
- Version
- Vision
- Virtual
- Volume
Correct

Incorrect
Question 730 of 1443

730. Question
3. Identify where in the data chain the issues below could be addressed.

NOTE: Preparing and visualizing data are often iterative processes and you should remember to "Sanity Checks" to ensure that you're continuing to move in the right direction.
Sort elements
- the maintenance of data quality
- your audience
- errors and outliers
- When cleaning data, consider...
- When visualizing data, consider...
- When both cleaning and visualizing data, consider...
Correct

Incorrect
Question 731 of 1443

731. Question
3. Fill in the blank below.
- The Index measures similarity between people or places in a network. It’s a simple calculation that takes the number of things in common divided by the total unique number of things or connections that each node in the network has.
Correct

Incorrect
Question 732 of 1443

732. Question
3. The best way to gain new insights about people and places is to
- Look at events and interactions in the broader context and understand the network and ecosystem that you are investigating.
- Ignore the context of the data that you are investigating to ensure that you don't lose focus.
- Zoom into the specific context without considering the broader implications to draw stronger conclusions.
- Look at one type of network interaction and assume that it applies in a universal context.
Correct

Incorrect
Question 733 of 1443

733. Question
2. What is NOT an example of a regression analysis output you will create in this course?
- Linear regression
- Multivariate regression
- Polynomial regression
- LOESS regression
- Elastic Net regression
Correct

Incorrect
Question 734 of 1443

734. Question
2. Which package can you use to create a 3D plot?
- scatterplot3d
- ggplot3D
- ggally3D
- ggmap3D
Correct

Incorrect
Question 735 of 1443

735. Question
2. Match the functions below to the correct actions.
Sort elements
- converts data into time series data
- calculates the seasonality factors in data
- compares the lagged difference in the residuals
- finds the periodicity
- runs a linear regression model
- calculates the moving average
- ts()
- decompose()
- diff()
- acf()
- lm()
- filter()
Correct

Incorrect
Question 736 of 1443

736. Question
3. Match the arguments used to plot an interactive graph to what they do.
Sort elements
- defines the network structure
- determines the attributes of the nodes such as colors
- is where the edges start
- is where the edges go to
- is the thickness of the edges
- defines the names of the points
- defines the colors of the points
- determines the size of the node labels
- determines the transparency of the elements in the graph
- defines how zoomed in or out the graph is, positive to zoom in, negative to zoom out
- Links =
- Nodes =
- Source =
- Target =
- Value =
- NodeID =
- Group =
- fontSize =
- opacity =
- charge =
Correct

Incorrect
Question 737 of 1443

737. Question
2. Fill in the blank below.
- Use the function to ensure there are no duplicate records in your data.
Correct

Incorrect
Question 738 of 1443

738. Question
3. How do you down-weight famous people and boost less famous ones?
- Use the weighted Jaccard Index
- Use the regular Jaccard Index
- Use regression analysis
- Use unweighted logarithms
Correct

Incorrect
Question 739 of 1443

739. Question
3. Which function did we use to calculate the total number of donors for each legislator?
- table()
- summarise()
- PageRank()
- combine()
Correct

Incorrect
Question 740 of 1443

740. Question
5. Fill in the blank below
- In order to select specific values from our crime data, we first need to tell R that the
  
  data we want to use is in a format.
Correct

Incorrect
Question 741 of 1443

741. Question
4. When should you set the 'flip.y' argument equal to FALSE?
- When you want your X and Y axes to be identical
- When you want your X and Y axes to be different
- When you add a third axis
- When you map more than four variables on the graph
Correct

Incorrect
Question 742 of 1443

742. Question
3. Which UI component adds a checkbox to the application?
- ```
 checkboxInput() 
```
- ```
 selectInput() 
```
- ```
 addCheckbox() 
```
- ```
 inputCheckbox() 
```
Correct

Incorrect
Question 743 of 1443

743. Question
4. The real value of more accurate information is in the
- incremental improvement above the status quo.
- increased amount of data structure.
- decreased amount of data.
- Increased number of employees.
Correct

Incorrect
Question 744 of 1443

744. Question
4. What do these four columns represent when the 'centers' are called up from the k-means analysis?
- Each column represents a centroid from the data set.
- Each column represents a customer's purchases.
- Each column represents the types of cheese in the data.
- Each column represents the customer purchases on different days.
Correct

Incorrect
Question 745 of 1443

745. Question
5. If you want to know how accurate your classification model is, what information do you need?
- The number of cases in which customers are correctly classified
- The total number of cases classified
- The probability of randomly classifying correctly
- The number of variables in the algorithm
Correct

Incorrect
Question 746 of 1443

746. Question
4. Label each aspect of the graph.
Sort elements
- regression line
- Actual data points
- independent variable
- dependent variable
- D
- C
- A
- B
Correct

Incorrect
Question 747 of 1443

747. Question
4. What's new about Big Data?
- You can access data on specific individuals to make targeted recommendations.
- You can use new statistical models on big data.
- Big data is automatically cleaned before analysis.
- Most computers can fit big data on their hard drives.
Correct

Incorrect
Question 748 of 1443

748. Question
4. Data science is at the intersection of which three domains?
- Industry Knowledge, Machine Learning, Programming
- Programming, Mathematics and Statistics, Industry Knowledge
- Mathematics and Statistics, Programming, Big Data
- Programming, Industry Knowledge, Big Data
Correct

Incorrect
Question 749 of 1443

749. Question
5. Fill in the blank below.
- centrality is a metric evaluates a node’s or a person’s importance by giving consideration to the importance of the nodes or people connected to it.
Correct

Incorrect
Question 750 of 1443

750. Question
4. Which of the following are common causes of outliers?
Select all that apply
- Someone entered their data incorrectly.
- A computer or machine recorded or transferred the data incorrectly.
- Pure chance.
- The data weren't visualized properly.
Correct

Incorrect
Question 751 of 1443

751. Question
5. The "-", or minus, symbol before the "grep" function tells R to
- exclude the following items.
- include the following items.
- combine the following items.
- change the following items.
Correct

Incorrect
Question 752 of 1443

752. Question
4. How does Cook's distance help to identify outliers that can skew your analysis?
- Measures the effect an observation has on a regression model.
- Measures the distance of an outlier from the median point of a regression model.
- Measures the average value of all the outliers in a regression model.
- Measures the difference between the largest and smallest observation of a regression model.
Correct

Incorrect
Question 753 of 1443

753. Question
5. How did we know our polynomial regression model was stronger than our linear regression model?
Select all that apply
- The adjusted R squared had a 50% increase over the linear model
- The residual standard error was lower
- The residuals looked normally distributed
- The residual standard error was higher
Correct

Incorrect
Question 754 of 1443

754. Question
4. Which package do you need to load to use the setnames() function?
- data.table
- plyr
- ggplot2
- tidyr
Correct

Incorrect
Question 755 of 1443

755. Question
4. What are some things that infection can depend on?
Select all that apply
- The severity of the disease
- Duration of the disease
- The number of adjacent nodes infected
- Probability of infection given exposure
Correct

Incorrect
Question 756 of 1443

756. Question
4. What can a basic link prediction algorithm be used for?
Select all that apply
- You can predict which nodes will be the next ones to link.
- You can make product recommendations and friendship recommendations based on the predictions.
- You can use it for fraud detection and duplicate accounts.
- You can use it to make aesthetic changes to your network analysis visualizations.
Correct

Incorrect
Question 757 of 1443

757. Question
5. Put the steps to extract an email address from an email in order.
- Convert the e-mail files from an mbox format to an individual e-mail txt file format.
- Remove extraneous characters from the rows you've selected.
- Install and load the two packages to read in emails.
- Select only the rows that contain the email address.
Correct

Incorrect
Question 758 of 1443

758. Question
1. Data science is at the intersection of which three domains?
- Industry Knowledge, Machine Learning, Programming
- Programming, Mathematics and Statistics, Industry Knowledge
- Mathematics and Statistics, Programming, Big Data
- Programming, Industry Knowledge, Big Data
Correct

Incorrect
Question 759 of 1443

759. Question
5. What should you keep in mind when scraping information?
Select all that apply
- The selector pattern of HTML tags surrounding the data you want
- The type of information you want to gather
- The other sites linked to the page you're scraping
- The types of font displayed on the page
Correct

Incorrect
Question 760 of 1443

760. Question
Was this easy?
- Yes
- No
Correct

Incorrect
Question 761 of 1443

761. Question
5. Which argument will determine the color of each region of the map?
- col =
- fill =
- bg =
- lwd =
Correct

Incorrect
Question 762 of 1443

762. Question
5. What term do we use to refer the growth factor (slope of best fit line) that adjust the level (y-intercept of the best fit line)?
- Trend
- Seasonality
- Periodicity
- Exponential
Correct

Incorrect
Question 763 of 1443

763. Question
5. What are some ways that we can measure the relationship between 2 individuals?
Select all that apply
- The duration of the connection
- The number of friends in common
- The number of links they share with each other
- Favorite vacation spot
Correct

Incorrect
Question 764 of 1443

764. Question
2. Data scientists’ responsibilities may include:
- Visualizing data
- Analyzing data
- Asking questions about data
- Formatting data
Correct

Incorrect

Question 765 of 1443

765. Question

Match the method to the description (note: there are more methods listed than necessary).

Sort elements

Clustering
Network analysis
Text mining
Forecasting
Regression

Measures similarity between data points to group them and identify key similarities that you can use to find trends

Looks at how people, places, and other entities are connected, which can help you determine a sphere of influence and how to propagate your message quickly and effectively

Digests large amounts of text quickly and finds common themes, messages and patterns.

Correct

Incorrect

Question 766 of 1443

766. Question
Which of these is not a common data problem?
- Consistent data that reconciles to data sources
- Unreliable or unusable data
- Inaccurate interpretation of fields
- Issues with joining fields
Correct

Incorrect
Question 767 of 1443

767. Question
Match the table combination types with their definitions:
Sort elements
- brings columns from 2 different tables into a combined table
- appends records from 2 tables into a combined table
- JOIN
- UNION
Correct

Incorrect
Question 768 of 1443

768. Question
What are some key things you should always check for in your model?
Select all that apply
- Outliers
- Multicollinearity and correlation among the variables
- Adjusted R squared
- Model bias and distribution of residuals (Q-Q plot)
- Standard deviation of residuals to assess model fit
- Heteroscedasticity / pattern of residuals vs. fitted values
Correct

Incorrect
Question 769 of 1443

769. Question
Fill in the blank below.
- While it may look like a scatter plot, this plot maps a third variable to the size of its points so that it can give more information in one graph about the variables in the data.
Correct

Incorrect
Question 770 of 1443

770. Question
What are views in SQL valuable for?
Select all that apply
- Creating simplicity by hiding complex queries from end users of data
- Creating security through hiding fields with private information and/or preventing changes to base tables
- Preventing redundancy & increase consistency by providing a common source for data users
- Creating a user interface that allows drag and drop search of tables
- Decreasing the number of objects in a database
Correct

Incorrect
Question 771 of 1443

771. Question
What are some conclusions from the visualization of Congress?
Select all that apply
- The Democrats are not as tightly clustered as the Republicans.
- There are some members of Congress who don't follow their parties' voting patterns.
- Democrats and Republicans vote similarly.
- Democrats are more tightly clustered than Republicans.
Correct

Incorrect
Question 772 of 1443

772. Question
Fill in the blank below.
- is a measure of the extent to which an increase in one variable corresponds to the increase in another variable. This does not imply causation, which determines whether or not a variable causes the effect on another variable - rather, it determines whether or not there is any connection between the variables that we can quantify.
Correct

Incorrect
Question 773 of 1443

773. Question
How do you calculate R squared?
- Subtract the ratio of the randomness to the total variance from the number 1.
- Divide the ratio of the randomness to the total variance from the number 1.
- Add the ratio of the randomness to the total variance from the number 1.
- Multiply the ratio of the randomness to the total variance from the number 1.
Correct

Incorrect
Question 774 of 1443

774. Question
Put the six Data Science control cycle steps in order, starting with “Ask”
- Validate
- Interpret
- Test
- Ask
- Model
- Research
Correct

Incorrect

Question 775 of 1443

775. Question

Match the correlation values to their meaning.

Sort elements

1
-1
0

The variables move perfectly in tandem
The variables move in a perfectly inverse fashion

There is no linear relationship between the variance of the variables

Correct

Incorrect

Question 776 of 1443

776. Question
Put the steps of Goal Seek in order, starting with Click on the "What-If Analysis Group" button.
- Click on the What-If Analysis Group button
- In the To value prompt, declare the target value that will be calculated in the Set cell location
- If you wish to accept the new value, click OK
- In the Set cell prompt, declare the cell that contains the formula that calculates the target value of interest
- Click OK, Excel will overwrite the previous cell value with the new one
- In the By changing cell prompt, declare the location of the model input that will be changed by the function
Correct

Incorrect
Question 777 of 1443

777. Question
How was Target able to identify their pregnant customers?
- They used a decision tree to classify their shoppers
- They used k-means clustering to classify their shoppers
- They used a multiple regression to classify their shoppers
- They conducted customer surveys
Correct

Incorrect
Question 778 of 1443

778. Question
Put the steps for creating a Shiny application in order by dragging them below.
- Run the application
- Build the server.R script
- Create a new folder with the name of the application
- Save the server.R script in the folder
- Save the ui.R script in the folder
- Build the ui.R script
Correct

Incorrect
Question 779 of 1443

779. Question
What are some advantages of boosting?
Please select all that apply
- It's useful for large amounts of data.
- It provides variable importance.
- It is best used for linear regression.
- It doesn't give weight to any of the variables.
Correct

Incorrect
Question 780 of 1443

780. Question
What is the name of the plot below that plots the autocorrelation function for different values of time lag?
- Correlogram
- Line graph
- Autocorrelation graph
- Correlation graph
Correct

Incorrect
Question 781 of 1443

781. Question
Please put the steps of n-fold cross-validation in order
- Repeat the process for every subset you create
- Use each subset as the test data set and use the rest of the data as the training data set
- Split the data set into several subsets ("n" number of subsets) of equal size
Correct

Incorrect
Question 782 of 1443

782. Question
3. What does it mean to “practice” coding?
Select all that apply
- Figure out how to fix bugs
- Type out the code
- Study the material
- Play the scales
Correct

Incorrect
Question 783 of 1443

783. Question
3. Fill in the blank below.
- Excel's visualization capabilities can be limited by its menus. R does not have such a constraint. As a result, you can create beautiful, varied visualizations and make more nuanced changes with R than with Excel.
Correct

Incorrect
Question 784 of 1443

784. Question
Which of these is an example of exploratory data analysis?
- Using customer attributes to group customers and find new patterns
- Grouping customers into groups that have been created already
- Identifying outliers in the data based on the model
- Analyzing the data to match expected outcomes
Correct

Incorrect
Question 785 of 1443

785. Question
What does it mean if your model has a smaller standard deviation of residuals?
- Your model is more accurate.
- Your model is less accurate.
- Your model has no errors.
- Your model has innumerable errors.
Correct

Incorrect
Question 786 of 1443

786. Question
What function do you need to use to perform the k Nearest Neighbors algorithm?
- ```
 kNN() 
```
- ```
 nearest() 
```
- ```
 nfld() 
```
- ```
 k() 
```
Correct

Incorrect
Question 787 of 1443

787. Question
Which piece of code will retrieve only the third column of a matrix called 'm'?
- ```
 m[, 3] 
```
- ```
 m[3, ] 
```
- ```
 m[3, 3] 
```
- ```
 m[[, 3]] 
```
Correct

Incorrect
Question 788 of 1443

788. Question
What types of data are difficult to cluster?
Select all that apply
- Circular/elliptical data
- Data that are unequally distributed
- Data that don't have similar density
- Data that has an uneven concentration of points in a cluster
Correct

Incorrect
Question 789 of 1443

789. Question
What does it mean if you have a small p-value?
- It means that there is a very low probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are connected.
- It means that there is a very high probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are not connected.
- It means that there is a very low probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are not connected.
- It means that there is a very high probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are connected.
Correct

Incorrect
Question 790 of 1443

790. Question
Which one of these diagrams shows an entropy of 1?
Correct

Incorrect
Question 791 of 1443

791. Question
What is the Analysis ToolPak?
- An Excel add-in program that provides data analysis tools
- An Excel add-in program that provides data visualization tools
- An Excel add-in program that automatically detects descriptive statistics
- An Excel add-in program that is necessary for Excel to do basic calculations
Correct

Incorrect
Question 792 of 1443

792. Question
Which function looks up a particular value in a table and produces the row in which the value is located?
- MATCH()
- VLOOKUP()
- INDEX()
- FIND()
Correct

Incorrect
Question 793 of 1443

793. Question
Which of these is not an attribute of classification?
Select all that apply
- Discovers patterns in data
- Assigns data points to known groups or categories
- Calculates probabilities of events occurring or group membership
- Exploratory data analysis (EDA)
Correct

Incorrect
Question 794 of 1443

794. Question
Which function enables the display size to adjust to the size of the browser window?
- ```
 fluidPage() 
```
- ```
 shinyUI() 
```
- ```
 titlePanel() 
```
- ```
 sidebarLayout() 
```
Correct

Incorrect
Question 795 of 1443

795. Question
What is the only difference between the Ridge penalty and LASSO penalty?
- Ridge uses the square of the coefficients, while LASSO uses absolute value.
- LASSO uses the square of the coefficients, while Ridge uses absolute value.
- Ridge minimizes the coefficients, while LASSO maximizes the coefficients.
- Ridge maximizes the coefficients, while LASSO minimizes the coefficients.
Correct

Incorrect
Question 796 of 1443

796. Question
Please fill in the blank below.
- The term means "data about data". It contains context ("resource descriptions") for objects of interest, such as MP3 files, library books, or satellite images.
Correct

Incorrect
Question 797 of 1443

797. Question
Please fill in the blank below.
- The product is the sum of the products of all the same dimensions, while the product is the sum of the products of all the different dimensions.
Correct

Incorrect
Question 798 of 1443

798. Question
Please fill in the blank below.
- is used for dimensionality reduction of the data by decomposing a matrix into three different matrices.
Correct

Incorrect
Question 799 of 1443

799. Question
1. Fill in the blank.
- Highly unusual data and other anomalies in a data set are called .
Correct

Incorrect
Question 800 of 1443

800. Question
1. Before combining data from two separate data sets using the "join" command it is important to
- make sure the columns you are trying to combine have a common title.
- use an addition operation to combine the data.
- have different column names for all of the columns in the two data sets.
- organize the data in the column in decreasing numerical order.
Correct

Incorrect
Question 801 of 1443

801. Question
1. How can you make sure that R doesn't take the weights of a network graph into account?
- Set the 'weights' argument equal to NULL.
- Reformat the data to delete the weights.
- Set the 'weights' argument to zeroes.
- Set the edge weights to the node numbers.
Correct

Incorrect
Question 802 of 1443

802. Question
1. Match the functions to the actions that they perform in R.
Sort elements
- sets up the network data
- checks the structure of the output
- pulls attributes of graph vertices
- pulls attributes of graph edges
- graph.data.frame()
- str()
- V()
- E()
Correct

Incorrect
Question 803 of 1443

803. Question
1. Match the names to the graphs below.
Sort elements
- Scatter plot
- Line graph
- Bar graph
- Histogram
Correct

Incorrect
Question 804 of 1443

804. Question
2. Fill in the blank below.
- Clustering and data mining are types of data analysis.
Correct

Incorrect

Question 805 of 1443

805. Question

2. Match each question with the best method for answering it.

Sort elements

Clustering
Classification
Clustering
Classification
Clustering
Classification

Based on their shopping history, what commonalities are there among our customers?

Based on a customer's shopping patterns, is it likely that this customer is pregnant?

What do people think about our brand?
Is it likely that this shopper will purchase our product?

When a disease spreads, are there any patterns in its spreading?

With the symptoms exhibited, what diagnosis might a doctor propose?

Correct

Incorrect

Question 806 of 1443

806. Question
1. Put the 5 functions of an organization in order.
- Production
- Procurement
- Sales
- Shipment
- Research and development
Correct

Incorrect
Question 807 of 1443

807. Question
2. How do businesses use probabilistic algorithms?
Select all that apply
- Researchers can predict hit songs and movie blockbusters.
- Insurance companies predict the life span of their clients.
- Fedex predicts the defection rate of its customers.
- HP predicts sales outcomes of certain divisions.
Correct

Incorrect
Question 808 of 1443

808. Question
1. Fill in the blank below.
- Networks are not always obvious, they are in the vastness of increasing amounts of data collected today.
Correct

Incorrect
Question 809 of 1443

809. Question
1. Which package helps us to visualize all the correlations in the data at once so we can get a better sense for the variables that may have the greatest predictive power?
- GGally
- GGplot
- GGmap
- GGcorr
Correct

Incorrect
Question 810 of 1443

810. Question
2. Fill in the blank below.
- Data is extracting information from large quantities of data to find insights, patterns and latent connections.
Correct

Incorrect
Question 811 of 1443

811. Question
1. Which function counts the number of nodes (vertices) in a graph?
- goorder()
- order()
- V()
- O()
Correct

Incorrect
Question 812 of 1443

812. Question
1. Put the five steps of label propagation in order.
- Nodes having the same labels are all assigned to the same community.
- At every subsequent step, each node adopts the most popular label amongst its neighbors.
- As the labels propagate through the network, densely connected groups adopt the same label.
- Label each node in a network with a unique label.
- Nodes assume the labels of adjacent nodes that they’re connected to at random.
Correct

Incorrect
Question 813 of 1443

813. Question
2. Which of the answers below is a function?
Select all that apply
- ```
 subset() 
```
- ```
 ggsave() 
```
- ```
 %in% 
```
- ```
 == 
```
Correct

Incorrect
Question 814 of 1443

814. Question
2. Which of these functions is similar to the 'search and replace' function in other programs?
- ```
 gsub() 
```
- ```
 search() 
```
- ```
 gather() 
```
- ```
 %>% 
```
Correct

Incorrect

Question 815 of 1443

815. Question

3. Match the method to the description (note: there are more methods listed than necessary).

Sort elements

Clustering
Network analysis
Text mining
Forecasting
Regression

Measures similarity between data points to group them and identify key similarities that you can use to find trends

Looks at how people, places, and other entities are connected, which can help you determine a sphere of influence and how to propagate your message quickly and effectively

Digests large amounts of text quickly and finds common themes, messages and patterns.

Correct

Incorrect

Question 816 of 1443

816. Question
2. When categorizing observations, you should...
- find similarities or patterns among groups, people, or objects
- answer the question, "What or who is this object or person like?"
- identify the single, underlying factor that separates the groups, people, or objects
- build a dataset that includes all the factors could connect your observations
Correct

Incorrect

Question 817 of 1443

817. Question

3. Data science teams produce two types of products - functional and one-off products. Match the type of product with the description.

Sort elements

Functional product
Functional product
One-off product
One-off product
Not a data science team product

An interactive tool that is used repeatedly to gain new information

A tool that your client interacts with to get updated information

An analysis used to deliver information in a presentation
A visualization that supports a story

Correct

Incorrect

Question 818 of 1443

818. Question
2. Put the steps of the functions of a business in order, starting with "Research and Development"
- Research and Development: optimize design and decrease time
- Buy: manage inventory
- Make: ensure quality and gauge demand
- Sell: target clients and suggest products
- Ship: optimize route
Correct

Incorrect
Question 819 of 1443

819. Question
2. What happened when Wal-Mart decided to combine the data from its loyalty card system with that from its point of sale systems?
- They found that beer and diapers were highly correlated on Friday afternoons.
- They found that their loyalty card system was incentivizing customers to buy more products.
- They found gaps in their beer inventory.
- They found that their point of sale systems were not synchronized with their loyalty card system.
Correct

Incorrect
Question 820 of 1443

820. Question
2. Put the 5 similar functions of an organization in the correct order.
- Research and development
- Buy
- Ship
- Make
- Sell
Correct

Incorrect
Question 821 of 1443

821. Question
3. Fill in the blank below.
- The key to staying current with your data analysis skills is to continue reading, and staying up to date with the latest tools and applications.
Correct

Incorrect
Question 822 of 1443

822. Question
3. Match the arguments in 3D plotting to their actions.
Sort elements
- set the limits of the x-axis
- set the limits of the y-axis
- set the limits of the z-axis
- size of the points
- whether to display a box around the grid
- whether to display a grid between the x and y axes
- can be used to automatically create a color gradient
- adds vertical lines that lead to the points
- specifies what type of line should lead up to the points if type ="h"
- angle between the x and y axes
- xlim
- ylim
- zlim
- pch
- box
- grid
- highlight.3d
- type = "h"
- lty.hplot
- angle
Correct

Incorrect
Question 823 of 1443

823. Question
3. Which function do you need to run an exponential smoothing model?
- HoltWinters()
- ggplot()
- str()
- decompose()
Correct

Incorrect
Question 824 of 1443

824. Question
2. Why is it important to keep scale in mind when creating visualizations?
- Scaling makes the graphs easier to read.
- Scaling makes the graphs more difficult to read.
- Scaling helps clean the data.
- Scaling is not important to keep in mind.
Correct

Incorrect
Question 825 of 1443

825. Question
3. Which function denotes when you are looking for vertices?
- V()
- v()
- vert()
- vertices()
Correct

Incorrect
Question 826 of 1443

826. Question
3. Which function can you use to merge two files into one?
- cbind()
- c()
- subset()
- bind()
Correct

Incorrect
Question 827 of 1443

827. Question
2. Which two packages do you need to load to parse email data?
Select all that apply
- tm
- tm.plugin.mail
- gmail
- mail.reader
Correct

Incorrect
Question 828 of 1443

828. Question
4. Why does it make sense to separate the code in multiple steps?
Select all that apply
- To follow the code more easily
- To decrease the chance of making mistakes
- To make the code shorter
- To make the code longer and more complex
Correct

Incorrect
Question 829 of 1443

829. Question
5. Fill in the blank below.
- In order to plot points of different sizes, you need to set the renderer argument to .
Correct

Incorrect
Question 830 of 1443

830. Question
4. Why is the second line of code below (italicized for emphasis) important for the pull-down menu function?
```
selectInput('pTypeSubset', 'Provider Type Subset',
```
```
            c("All", sort(as.vector(unique(medicare$provider_type)))))
```
- It pulls all the categories from the existing data set
- It creates new options from the existing data set
- It creates names for the healthcare providers
- It pulls all the averages of the categories
Correct

Incorrect
Question 831 of 1443

831. Question
5. What are some predictions that have been made based on social media, such as LinkedIn, blogs, Facebook, etc?
Select all that apply
- Heart disease
- Loan risk
- Reliability in task completion
- Propensity for travel
Correct

Incorrect
Question 832 of 1443

832. Question
5. How might finding the purchase patterns of different groups help you with your customers?
Select all that apply
- You can send special offers to individuals who are more likely to use them.
- You could stock your shelves with products that are bought together more frequently.
- You could make better product recommendations that complement the purchase patterns.
- You could let your clients know you have their data so they can make more deliberate purchases.
Correct

Incorrect
Question 833 of 1443

833. Question
3. You are testing a new classification algorithm. Which of the following results may suggest that your algorithm is performing accurately?
- True positives
- False negatives
- False positives
- True negatives
Correct

Incorrect
Question 834 of 1443

834. Question
5. If most of the data points cluster around a regression line, it may be the case that:
- The variables are highly correlated
- Not a lot of randomness can be explained
- The dependent variable and independent variable need to be switched
- There is a large distance between the average expected value and actual value
Correct

Incorrect

Question 835 of 1443

835. Question

5. Match the analytics methods to the companies.

Sort elements

Kabbage
Capital One
President Obama's campaign

Uses a variety of data sources about the business and analyzes data from customer ratings and reviews to sales trends in real time to provide loans.

Used predictive analytics algorithms to provide customized credit offers to its customers.

Used an analytics team to build a predictive model that was deployed among millions of swing voters to persuade them to vote for the President.

Correct

Incorrect

Question 836 of 1443

836. Question
3. Why is clustering more powerful than visualizing?
Select all that apply
- Clustering mathematically defines similarity between all the data points, even the ones on the periphery.
- Clustering can work with many more dimensions than we can visualize.
- Clustering is easier than visualizing data.
- Clustering can group data into pre-defined groups.
Correct

Incorrect
Question 837 of 1443

837. Question
4. Sort the network from the broadest community to the most niche community.
- Music lovers
- Rock and Pop concert goers
- Casual listeners of music and podcasts
- Alternative and Top 40 bloggers and columnists
Correct

Incorrect
Question 838 of 1443

838. Question
4. Which method can you use to find opinion leaders?
- Network analysis
- Regression
- Classification
- Clustering
Correct

Incorrect
Question 839 of 1443

839. Question
3. Which package is useful to parse text?
- stringr
- plyr
- ggmap
- tidyr
Correct

Incorrect
Question 840 of 1443

840. Question
4. How do you measure the explanatory power of your predictive model?
- R squared
- Q squared
- B squared
- C squared
Correct

Incorrect
Question 841 of 1443

841. Question
4. Which statement below is TRUE?
- A better fit to the data does not always mean that you have a better model.
- A better fit to the data always means that you have a better model.
- A better fit to the data always means that you can generalize your model.
- A better fit to the data always means that you can make predictions about the future.
Correct

Incorrect
Question 842 of 1443

842. Question
3. Which notation below would correctly identify punctuation?
- [[:punct:]]
- [punct]
- [[punct]]
- :punct:
Correct

Incorrect
Question 843 of 1443

843. Question
4. Which symbol below means "not" in R?
- !
- ~
- ()
- ^
Correct

Incorrect
Question 844 of 1443

844. Question
4. How can you remove individuals from your data that do not have connections?
- Take out all connections with a Jaccard similarity score of zero.
- Take out all connections with a Jaccard similarity score greater than zero.
- Leave in all connections with a Jaccard similarity score of zero.
- Leave in all connections with a Jaccard similarity score less than zero.
Correct

Incorrect
Question 845 of 1443

845. Question
4. Please fill in the blank below:
- The () function finds the contents of a variable and the () function applies that first function to the list of variable names in a data set.
Correct

Incorrect
Question 846 of 1443

846. Question
4. Big data is like
- a library
- money under the mattress
- diamonds in a vault
- a bank of computer servers
Correct

Incorrect
Question 847 of 1443

847. Question
5. Which plot would best help us visualize nested data (lists within lists)?
- A tree diagram
- A Sankey plot
- An rCharts graph
- A map
Correct

Incorrect
Question 848 of 1443

848. Question
5. Which industries can benefit from using predictive analytics?
Select all that apply
- Politics
- Finance
- Telecommunications
- Human Resources
Correct

Incorrect
Question 849 of 1443

849. Question
5. What is something we CAN'T measure?
- The number of participants in a network.
- The number of connections among participants in a network.
- The distance in path of communication.
- The strength or frequency of connection or communication.
- The degree to which participants in a network believe in networks.
Correct

Incorrect
Question 850 of 1443

850. Question
5. Which statement below is TRUE?
- You should never rely on numbers produced by a computer blindly.
- You should always rely on numbers produced by a computer blindly.
- You should never calculate multiplicative seasonality forecasting.
- You should never update the level and trend values in your model.
Correct

Incorrect
Question 851 of 1443

851. Question
5. What are some use cases for the Jaccard Index?
Select all that apply
- Identify similar patients that may respond to treatment in a similar way
- Identify similar neighborhoods for locating stores or offices
- Identify suppliers that can replace your current ones
- Identify cheaper substitutes for the products and services you purchase
Correct

Incorrect
Question 852 of 1443

852. Question
4. Big data is like
- a library
- money under the mattress
- diamonds in a vault
- a bank of computer servers
Correct

Incorrect
Question 853 of 1443

853. Question
Fill in the blank below.
- Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together physically in a graph are probably more similar than data points that are farther apart.
Correct

Incorrect
Question 854 of 1443

854. Question
Which of the following are methods for importing data into SQL server?
- SQL Server Integration Services (SSIS)
- Import/Export Wizard
- Bulk Inserts
- All of the above
Correct

Incorrect
Question 855 of 1443

855. Question
How can you determine if the histogram of your residuals is really a normal, unbiased distribution?
- Run a quantile-quantile (QQ) plot analysis
- Run a standard deviation (SD) analysis
- Run a confidence interval (CI) analysis
- Run a linear regression line (LRL) analysis
Correct

Incorrect
Question 856 of 1443

856. Question
What are the two ways you can use categorical variables in regression models?
Select all that apply
- If the categories have a natural sequence, they can be assigned unique values.
- If you can’t order categorical variables you can treat each category as it’s own, separate variable.
- If the categories have no order, they can be assigned unique values.
- If the categorical variables can't be ordered you can group them together as one variable.
Correct

Incorrect
Question 857 of 1443

857. Question
Fill in the blank below.
- A big strength of the package is the ability to customize graphs by adding layers and adjusting the data so it doesn't look generic. This is a package that brings a lot of flexibility in visualizing your data beyond bar charts.
Correct

Incorrect
Question 858 of 1443

858. Question
Can a table in SQL be joined to itself? (True/False)
- TRUE
- FALSE
Correct

Incorrect
Question 859 of 1443

859. Question
Which function runs thirty different tests and aggregates the results from each one?
- NbClust
- The elbow method
- k-means clustering
- Total variance
Correct

Incorrect
Question 860 of 1443

860. Question
How can you ensure that your analysis is reproducible?
- Use set.seed() at the beginning of the analysis.
- Save the code as a function and run it.
- Use the same data for each iteration.
- Choose the same number of clusters.
Correct

Incorrect
Question 861 of 1443

861. Question
What are some important questions you have to ask in order to be comfortable with your model?
Select all that apply
- Does the variance of the residuals change with the predicted value?
- Do the forces affecting the dependent variable change in some parts of the data and should our model reflect that?
- Does it make a difference to identify outliers or bias in the residuals in our model?
- Is there any way to change our model or does it always have to stay the same?
Correct

Incorrect
Question 862 of 1443

862. Question
How did GE use predictive analytics to offer tailored products to their customers?
- They clustered their customers based on credit card usage and profitability.
- They sent out surveys to all their existing customers.
- They used a marketing company to create new products.
- They mined social media to create customer profiles.
Correct

Incorrect
Question 863 of 1443

863. Question
Please fill in the blank below.
- The , illustrated by the red arrows in the chart below, is the distance from the actual data points to the average expected value.
Correct

Incorrect
Question 864 of 1443

864. Question
Please fill in the blank below:
- A analysis quickly analyzes the outcome under a range of scenarios. It can be performed with one-way or two-way data tables.
Correct

Incorrect
Question 865 of 1443

865. Question
Put the four steps of building a classification tree in order.
- Create a new question branch after the previous one
- Ask the question with the most amount of information
- Stop growing the tree when there is no more information gain
- Conditional on the previous answer, select the next question
Correct

Incorrect
Question 866 of 1443

866. Question
Please fill in the blank below.
- The two variables, and , are how the UI and server script communicate between each other. One of them is passed as the user puts in a command, and the other is passed back out with the results.
Correct

Incorrect
Question 867 of 1443

867. Question
Please fill in the blank below.
- Forecasting means to something after looking at the available information.
Correct

Incorrect
Question 868 of 1443

868. Question
What are the three parameters that the Holt-Winters() function requires?
Please select all that apply
- Alpha
- Beta
- Gamma
- Delta
Correct

Incorrect
Question 869 of 1443

869. Question
Please put the general steps of principal component analysis in order.
- Calculating eigenvalues and eigenvectors of the covariance matrix
- Data preprocessing
- Calculation of covariance matrix from the preprocessed data
- Transform the samples onto new subspace
- Choosing the components and forming a feature vector
Correct

Incorrect
Question 870 of 1443

870. Question
3. Fill in the blank below.
- Excel's visualization capabilities can be limited by its menus. R does not have such a constraint. As a result, you can create beautiful, varied visualizations and make more nuanced changes with R than with Excel.
Correct

Incorrect
Question 871 of 1443

871. Question
4. Fill in the blank below.
- It is best practice to annotate your code with , which you can do by putting a hashmark at the beginning of a line.
Correct

Incorrect
Question 872 of 1443

872. Question
What happened when Telenor started contacting its customers?
- Telenor's customers started defecting to other companies.
- Telenor's customers renewed their contracts with Telenor.
- Telenor's customers upgraded their contracts with Telenor.
- Telenor's customers convinced their friends and family to sign up for Telenor.
Correct

Incorrect
Question 873 of 1443

873. Question
Which function can you use to see the list of contents that your linear regression model produces?
- ls( )
- lm( )
- lr( )
- lo( )
Correct

Incorrect
Question 874 of 1443

874. Question
What is one of the most effective methods for tuning an algorithm?
- k-fold cross validation
- Information gain
- Testing the data
- Validating the base rate
Correct

Incorrect
Question 875 of 1443

875. Question
Why do we use ‘==‘ instead of ‘=‘ to pull the day shift data?
- We are not defining a variable, we are telling R what data to select
- We need to define the data as a variable
- We need to indicate that the data is in column form, not row form
- We need to reformat the data as a vector
Correct

Incorrect
Question 876 of 1443

876. Question
Which of these functions are the "bare bones" of ggplot2?
Select all that apply
- ```
 ggplot() 
```
- ```
 aes() 
```
- ```
 geom() 
```
- ```
 fill() 
```
Correct

Incorrect
Question 877 of 1443

877. Question
You can use the predict() function to make predictions using the model that you developed. Put the prediction use cases below in order according to the business function it was used for.
- Sell: Harrah's Hotel and Casino in Las Vegas predicts how much a customer will spend over the years, estimating their lifetime value to the casino
- Make: Life insurance companies predict the age of death in order to approve policies and set pricing
- Ship: Energex (Australian utility) predicts 20 years of electricity demand growth to direct infrastructure investment
- Buy: Ski manufacturers predict demand for skis each winter, stocking up on supplies
- R&D: As much as 40% of trading on the London Stock Exchange is estimated to be driven by trading algorithms
Correct

Incorrect
Question 878 of 1443

878. Question
What is supervised machine learning?
- Classifying data based on pre-determined categories
- Analysis done under a superior’s supervision
- Data analysis with non-obvious outputs
- An iterative process that creates an accurate model
Correct

Incorrect
Question 879 of 1443

879. Question
Which term represents the degree of slant in the data?
- Skewness
- Kurtosis
- Median
- Variance
Correct

Incorrect
Question 880 of 1443

880. Question
Why is it useful to audit your formulas?
- Because you can trace back how you got the results in the cell
- Because you need to check that all the values in the cells are correct
- Because you need to build your formula
- Because you should simplify your formula
Correct

Incorrect
Question 881 of 1443

881. Question
Which attribute does not belong to clustering?
Select all that apply
- Exploratory data analysis (EDA)
- Discovers and forms new groups or categories of data
- Calculates probabilities of events occurring or group membership
- Supervised machine learning technique
Correct

Incorrect
Question 882 of 1443

882. Question
Which function results in an interactive slider?
- ```
 sliderInput() 
```
- ```
 sliderOutput() 
```
- ```
 slider() 
```
- ```
 interInput() 
```
Correct

Incorrect
Question 883 of 1443

883. Question
What type of penalty shrinks coefficients towards zero to control variance?
- Ridge penalty
- Logistic penalty
- Penalized coefficient
- Multiple penalty
Correct

Incorrect
Question 884 of 1443

884. Question
Which text mining approach focuses on word counts without regarding the words' positions in sentences, part of speech, or meaning?
- Bag of words
- Probabilistic
- Natural language processing
- Deep learning
Correct

Incorrect
Question 885 of 1443

885. Question
Which special operator does R have to calculate the dot product?
- %*%
- %>%
- %%
- %.%
Correct

Incorrect
Question 886 of 1443

886. Question
Which plot (pictured below) is an exploratory graph used for generalization of the simple two-variable scatterplot?
- Biplots
- Elbow graph
- Exploratory data analysis
- Scatterplot
Correct

Incorrect
Question 887 of 1443

887. Question
1. Fill in the missing word.
- Algorithms and are data science tools. They go beyond running an analysis. When they are customized they take in your questions and data and yield specific, and often actionable, answers and outputs.
Correct

Incorrect
Question 888 of 1443

888. Question
1. Fill in the blank below.
- Forecasting means to something after looking at the available information.
Correct

Incorrect
Question 889 of 1443

889. Question
1. Please fill in the blank below:
- The centrality of a node is the percentage of shortest paths in a network that include a given node. This metric allows you to assess which nodes are prominent connectors in a network, indicating that this individual can be a vital connector, or this node can be a critical liability in a computer network or supply chain.
Correct

Incorrect
Question 890 of 1443

890. Question
1. Why is it hard to visualize the baseball data with ggplot2?
Select all that apply
- Because the data is too messy
- Because the data doesn't really answer any questions
- Because the data is not accurate
- Because the axes are plotted incorrectly
Correct

Incorrect
Question 891 of 1443

891. Question
1. Put the six Data Science control cycle steps in order, starting with “Ask”
- Interpret
- Research
- Validate
- Model
- Test
- Ask
Correct

Incorrect
Question 892 of 1443

892. Question
1. Fill in the blank.
- Some classification algorithms can go beyond determining whether someone will buy your product or not. The benefit of these algorithms is that they can tell you the that someone will buy your product.
Correct

Incorrect
Question 893 of 1443

893. Question
1. What happened when Telenor started contacting its customers?
- Telenor's customers started defecting to other companies.
- Telenor's customers renewed their contracts with Telenor.
- Telenor's customers upgraded their contracts with Telenor.
- Telenor's customers convinced their friends and family to sign up for Telenor.
Correct

Incorrect
Question 894 of 1443

894. Question
1. Match the description to the correct data set.
Sort elements
- Fits an initial model that can predict an outcome or category from one or more predictor variables
- Tunes the details of your model and checks the model assumptions
- Assesses the performance of an algorithm as if the data were from the real world
- Training data set
- Validation data set
- Test data set
Correct

Incorrect
Question 895 of 1443

895. Question
1. What are some of the practical applications of network analysis?
Select all that apply
- In marketing, network analysis can help you figure out how to reach the greatest number of people at the lowest cost.
- In healthcare, network analysis can help you understand how a disease will spread across a population.
- In finance, network analysis can help you understand how money flows through the system so you can identify suspicious transfers, money laundering, or terrorism financing activities.
- In politics, network analysis can help you identify key influencers to understand who affects donors and voters most.
Correct

Incorrect
Question 896 of 1443

896. Question
1. When is adjusted R squared significantly different from regular R squared?
- When the denominator is very small or when the number of data points is close to the number of predictive variables.
- When the denominator is very large or when the number of data points is far away from the number of predictive variables.
- When the numerator is very small or when the number of data points is close to the number of dependent variables.
- When the numerator is very large or when the number of data points is very small.
Correct

Incorrect
Question 897 of 1443

897. Question
1. What are the goals of this course?
Select all that apply
- Identify key influencers and discover which connectors are most important
- Create a strategy to spread your message most effectively and test your model in a simulation!
- Visualize networks with several interactive visualizations that will allow you to drill deeper into network data
- Learn the basics of data collection and regression analysis
Correct

Incorrect
Question 898 of 1443

898. Question
1. Which functions do you need to use to plot a network where the nodes change color as they become affected or receive the message?
Select all that apply
- E()
- V()
- C()
- B()
Correct

Incorrect
Question 899 of 1443

899. Question
1. What is an issue that Google ran into with its search algorithm?
- Eigenvector centrality was too computationally expensive.
- They hadn't catalogued all the pages on the internet.
- The internet wasn't big enough for their search engine.
- They didn't know how to rank pages in terms of importance.
Correct

Incorrect
Question 900 of 1443

900. Question
3. Fill in the blank below.
- 3. It's important to look at the raw data before you it so you can see what it looks like and find initial patterns and insights without doing too much analysis.
Correct

Incorrect
Question 901 of 1443

901. Question
3. How does the 'expanded' layout change how the information is presented?
- It visualizes the data in terms of percentages
- It visualizes the data by stacking it cumulatively
- It visualizes the data by centering it over the x axis
- It visualizes the data by expanding the x axis scale
Correct

Incorrect
Question 902 of 1443

902. Question
3. What happened when Telenor started contacting its customers?
- Telenor's customers started defecting to other companies.
- Telenor's customers renewed their contracts with Telenor.
- Telenor's customers upgraded their contracts with Telenor.
- Telenor's customers convinced their friends and family to sign up for Telenor.
Correct

Incorrect
Question 903 of 1443

903. Question
2. Naïve Bayes is probabilistic classification method commonly used for text classification. Most spam filters are based on a variant of Naïve Bayes.

Order the steps that a spam filter takes when deciding whether or not a new email should be placed in the spam folder. Place the first step at the top.
- Based on the results of the search, the probability of the email being spam is determined
- The spam filter searches for keywords or word combinations that are frequently found in spam
- The probability is evaluated against its threshold and decides whether to sort the email as spam or not
- The text of the new is email is analyzed
Correct

Incorrect
Question 904 of 1443

904. Question
2. Which of the following are questions that should be asked when building a model?
Select all that apply
- Which variables should we use as inputs?
- How should we visualize the data?
- How fast will it run?
- Under what conditions will it break?
- How will we collect the data?
Correct

Incorrect
Question 905 of 1443

905. Question
3. What is the Ark of the Covenant similar to in your data strategy?
- A novel insight you didn't know about your data
- Your data warehouse
- Stacks of documents
- A data algorithm
Correct

Incorrect
Question 906 of 1443

906. Question
3. What outcome can an association rule predict?
Select all that apply
- The purchase of hamburgers if onions and buns are purchased.
- The purchase of a hard drive if a customer buys a laptop.
- The division of data into previously unknown groups.
- The likelihood of two words occurring together in a document.
Correct

Incorrect
Question 907 of 1443

907. Question
3. Fill in the blank.
- At each step of the process important decisions need to be made, and you should be using to make the right choices.
Correct

Incorrect
Question 908 of 1443

908. Question
2. What can regression do?
- Predict the numerical value of a variable based on the value of another variable(s)
- Predict probabilities
- Confirm causation
- Predict group membership
Correct

Incorrect
Question 909 of 1443

909. Question
3. What is multicollinearity?
- When 2 or more independent variables are strongly correlated to one another.
- When 2 or more independent variables show no correlation to on another.
- When 2 or more dependent variables are strongly correlated to one another.
- When 2 or more dependent variables show no correlation to one another.
Correct

Incorrect
Question 910 of 1443

910. Question
2. What is TRUE about finding alpha?
Select all that apply
- You are minimizing the squared error of the model.
- The best alpha will make the average of the squared error 0, which means the forecast is unbiased
- R automatically calculates alpha the same way that it calculates the best fit line by minimizing the sum of squared errors
- Finding alpha is never important.
Correct

Incorrect
Question 911 of 1443

911. Question
3. Match the function to the action it takes.
Sort elements
- creates the visualization
- further customizes the presentation of the network
- adds interactivity to the visualization
- defines the motion of the nodes and edges
- visNetwork()
- visOptions()
- visInteraction()
- visPhysics()
Correct

Incorrect
Question 912 of 1443

912. Question
2. What does running a "while" loop do?
- Tells R to run a series of commands enclosed in curly brackets as long as criteria defined in “while” command are met.
- Tells R to run a single command.
- Tells R to run a command for a "while".
- Tells R to run a series of commands enclosed in curly brackets.
Correct

Incorrect
Question 913 of 1443

913. Question
2. What does the gather function do?
Select all that apply
- Takes multiple columns and collapses them into key-value pairs.
- Reduces the number of columns in the data set.
- Converts your wide data into long form.
- Takes a single column and expands it into multiple columns.
Correct

Incorrect
Question 914 of 1443

914. Question
3. What type of data does metadata contain?
Select all that apply
- The time the email was sent
- The date the email was sent on
- The IP addresses of the emails
- The subject matter of the email
Correct

Incorrect
Question 915 of 1443

915. Question
3. Fill in the blank below.
- You can create a by wrapping code in curly braces.
Correct

Incorrect
Question 916 of 1443

916. Question
3. Why do these warning messages appear when we map dc_map?
- Some of the data points are outside the mapping area
- Some of the data points are not formatted correctly
- Some of the data points are overlapping
- Some of the data points are not located in Washington DC
Correct

Incorrect
Question 917 of 1443

917. Question
5. How does the regression feature help us understand this data set?
- It tells us the average that Medicare paid in comparison to the charges that were submitted
- It tells us how much each provider covered with Medicare
- It tells us how each category compares to the overall average charges made
- It adjusts the numbers to the logarithmic scale
Correct

Incorrect
Question 918 of 1443

918. Question
4. Why is it important to understand the purpose behind various data science methods?
- So you can choose the right method to direct your research.
- So you don't have to learn any new methods when they arise.
- So you will be able to calculate them both manually and in R.
- So you will better understand when to use R and when to use Python.
Correct

Incorrect
Question 919 of 1443

919. Question
4. Which function implements k-means clustering with cosine distance?
- skmeans()
- kmeans()
- cosine()
- spherical.kmeans()
Correct

Incorrect
Question 920 of 1443

920. Question
4. If your classification model has a perfect, 100% accuracy, which of the following questions should you ask?
- Is the model consistently 100% accurate?
- Was the model validated and tested?
- Were the data used representative of our target population?
- Are the predictor variables independent?
Correct

Incorrect
Question 921 of 1443

921. Question
4. If two factors are strongly correlated, such as "temperature" and "what the temperature feels like" in our Bikeshare example, then we are...
- double counting
- skewing our results
- reducing the interpretability of the model
- not finished refining our model
Correct

Incorrect
Question 922 of 1443

922. Question
4. Which of these questions can you answer with data mining?
Select all that apply
- Are there people ready to buy who you’re not contacting?
- What do people actually think about your products?
- How much would someone pay for your products or services?
- Is there a pattern in the customers who buy my products?
Correct

Incorrect
Question 923 of 1443

923. Question
4. Which of these are examples of clustering?
Select all that apply
- Identifying customer shopping patterns based on previous behavior.
- Identifying voting patterns in a population.
- Identifying how a voter will vote.
- Identifying the seasonal effects in time-series data.
Correct

Incorrect
Question 924 of 1443

924. Question
5. Networks can contain a wealth of information. Which of the following questions are best be answered by measuring an aspect of the network (assuming the necessary data are available).
Select all that apply
- How many other people/objects can someone/something reach?
- How important is someone/something as a connector to the structure of the network?
- How long it takes for information to travel through a network?
- How tight knit is a network or community?
Correct

Incorrect
Question 925 of 1443

925. Question
3. Which of these are types of classification?
Select all that apply
- Decision trees
- Naïve Bayes
- k-means
- Hierarchical clustering
Correct

Incorrect
Question 926 of 1443

926. Question
4. Which function is the correct way to tell R to read the data as characters?
- as.character()
- ascharacter()
- character()
- as.character.()
Correct

Incorrect
Question 927 of 1443

927. Question
5. Match the terms to their definitions.
Sort elements
- Measure of how dispersed the data is
- Standardized measure of how dispersed the data is
- Check if there is bias in the data or the model
- Measure of linear relationship between variables (positive/negative)
- Measure of strength of linear relationship between variables (positive/negative)
- How a change in variable x will affect variable y
- % of variation in y that can be explained by the variation in x
- The probability that the pattern exists through random chance, in the absence of a relationship between variables
- Variance
- Standard deviation
- Distribution and "normality"
- Covariance
- Correlation
- Slope
- R squared
- p-values
Correct

Incorrect
Question 928 of 1443

928. Question
4. Which visualizations below helped us to identify the periodicity of errors in our model?
Select all that apply
Correct

Incorrect
Question 929 of 1443

929. Question
4. Which function will make the data set horizontal?
- t()
- tran()
- tpose()
- p()
Correct

Incorrect
Question 930 of 1443

930. Question
4. What does the "select_nodes" function help you do?
Select all that apply
- Creates a list of the names of the newly affected nodes.
- Adds the previously affected nodes.
- Subsets all the affected nodes from the graph object.
- Deletes all of the nodes.
Correct

Incorrect
Question 931 of 1443

931. Question
4. Please fill in the blanks below:
- is the most popular hierarchical clustering method, it’s a bottom-up approach, while does the opposite and is a top-down approach.
Correct

Incorrect
Question 932 of 1443

932. Question
3. Which function will convert a website response to an R-readable format?
- fromJSON() function
- JSONtoR() function
- RtoJSON() function
- toJSON() function
Correct

Incorrect
Question 933 of 1443

933. Question
5. Match the job with the description.
Sort elements
- Asks questions about the data to find patterns and interpret results
- Builds models to answer questions and understand the data source
- Organizes the data and creates some basic visualizations to give an overview of the data
- Manipulates the data into proper format
- Data scientist
- Data modeler
- Data analyst
- Data wrangler
Correct

Incorrect
Question 934 of 1443

934. Question
5. What are some of the advantages of interactive visualizations?
Select all that apply
- They allow users to view only the data they want
- They engage the audience
- They can display different visualization types within the same application
- They can be created with R packages
Correct

Incorrect
Question 935 of 1443

935. Question
5. Which one of these is not a way for Kabbage to gather information on lending decisions?
Select all that apply
- Instagram
- Facebook
- Amazon
- Accounting software
Correct

Incorrect
Question 936 of 1443

936. Question
5. Which function can you use to eliminate specific pieces of data?
- ifelse( )
- View( )
- as.character( )
- for( )
Correct

Incorrect
Question 937 of 1443

937. Question
5. Which function can you use to create a forecast using the LOESS model?
- predict()
- loess()
- str()
- plot()
Correct

Incorrect
Question 938 of 1443

938. Question
5. Which function allows you to combine two separate data sets?
- rbind()
- bind()
- combine()
- r()
Correct

Incorrect
Question 939 of 1443

939. Question
5. Match the job with the description.
Sort elements
- Asks questions about the data to find patterns and interpret results
- Builds models to answer questions and understand the data source
- Organizes the data and creates some basic visualizations to give an overview of the data
- Manipulates the data into proper format
- Data scientist
- Data modeler
- Data analyst
- Data wrangler
Correct

Incorrect
Question 940 of 1443

940. Question
How does clustering help when there are more than 3 attributes in the data?
- Clustering helps identify groups with many attributes that you can't easily visualize.
- Clustering can only cluster when there are more than three attributes.
- Clustering is the best method to visualize more than 4 attributes at a time.
- Clustering can gather data more accurately when there are many attributes.
Correct

Incorrect
Question 941 of 1443

941. Question
Which of the following SQL functions can be used on date fields?
Select all that apply
- MAX()
- DATEDIFF()
- DAY()
- MIN()
- SUM()
- ROUND()
Correct

Incorrect
Question 942 of 1443

942. Question
3. Match the output elements shown in the console below to what information they are providing.
Sort elements
- includes the values of the boxplot levels. The five rows include the bottom whisker, the 25th percentile, the 50th percentile, the 75th percentile and the top whisker
- includes the number of values in each variable
- includes something called notches or the median plus and minus roughly one point five times the inter-quartile range
- includes the values of the outliers, which you can also see in the boxplot
- $stats
- $n
- $conf
- $out
Correct

Incorrect
Question 943 of 1443

943. Question
Which approach allows for the inclusion of categorical variables with multiple levels in regression models?
- Dummy coding
- Masking variables
- Variable coding
- Categorical coding
Correct

Incorrect
Question 944 of 1443

944. Question
Fill in the blank below.
- The function shows us the structure of the data.
Correct

Incorrect
Question 945 of 1443

945. Question
Which of the following SQL functions can be used on text fields?
Select all that apply
- LEFT()
- MAX()
- CONCAT()
- SUBSTRING()
- STDEV()
- DATENAME()
Correct

Incorrect
Question 946 of 1443

946. Question
Fill in the blank below.
- Clustering and data mining are types of data analysis , which is a type of data analysis where the intent is to see what the data can tell us beyond modeling or hypothesis testing.
Correct

Incorrect
Question 947 of 1443

947. Question
What are some conclusions we see when we graph points per game by minutes per game with three clusters?
Select all that apply
- Some players with good statistics are paid much less than other players with similar statistics.
- There are players who are paid a lot, but don't have good statistics.
- There's no correlation between minutes per game and points per game.
- The most talented players are the lowest paid.
Correct

Incorrect
Question 948 of 1443

948. Question
After running a Breusch-Pagan test, how would you know that there is no heteroscedasticity?
Select all that apply
- The p-value is very large.
- The p-value is very small.
- The residuals are evenly distributed.
- The residuals are not evenly distributed.
Correct

Incorrect
Question 949 of 1443

949. Question
The example below is an example of which aspect of the 3 V's?

"1 flight of a Boeing 737 across the continental United States generates as much data as is stored in the U.S. Library of Congress."
- Volume
- Velocity
- Variety
- Veracity
Correct

Incorrect
Question 950 of 1443

950. Question
After running a Breusch-Pagan test, how would I know that there is no heteroscedasticity?
Select all that apply
- The p-value is very large.
- The p-value is very small.
- The residuals are evenly distributed.
- The residuals are not evenly distributed.
Correct

Incorrect
Question 951 of 1443

951. Question
What are some of the benefits of Scenario Manager?
Select all that apply
- You can switch between scenarios without changing the values manually
- You can easily merge multiple scenarios from different teams
- You can create a summary report of the scenarios
- You can program the models to tell you which one to use
Correct

Incorrect

Question 952 of 1443

952. Question

Match the attributes to the decision tree calculation.

Sort elements

Entropy
Gini impurity

Categorical attributes

Finds the largest class in the data

Uses algorithms

Continuous variables

Finds groups of classes that make up over 50% of their data

Minimizes classification

Correct

Incorrect

Question 953 of 1443

953. Question
Put the steps in order for how the UI script communicates with the server script.
- The UI passes information to the server
- The server computes some new output
- The UI displays the information for the user
- The user interacts with the UI
- The server sends the output to the UI
Correct

Incorrect
Question 954 of 1443

954. Question
What can regression do?
- Predict the numerical value of a variable based on the value of another variable(s)
- Predict probabilities
- Confirm causation
- Predict group membership
Correct

Incorrect
Question 955 of 1443

955. Question
Match the terms to their definitions.
Sort elements
- Measure of how dispersed the data is
- Standardized measure of how dispersed the data is
- Check if there is bias in the data or the model
- Measure of linear relationship between variables (positive/negative)
- Measure of strength of linear relationship between variables (positive/negative)
- How a change in variable x will affect variable y
- % of variation in y that can be explained by the variation in x
- The probability that the pattern exists through random chance, in the absence of a relationship between variables
- Variance
- Standard deviation
- Distribution and "normality"
- Covariance
- Correlation
- Slope
- R squared
- p-values
Correct

Incorrect
Question 956 of 1443

956. Question
Match the types of trust to their definitions.
Sort elements
- Rational decision about whether to trust someone based on the potential costs and benefits of the decisions involved
- Trust developed over a long relationship
- Trust based on likeness in preferences, opinions
- Trust based on the guarantees promised by an institutions, such as defined benefit pension plans or a government's promise to protect its people
- Calculation-based trust
- Personal-based trust
- Similarity-based trust
- Institution-based trust
Correct

Incorrect
Question 957 of 1443

957. Question
4. Fill in the blank below.
- It is best practice to annotate your code with , which you can do by putting a hashmark at the beginning of a line.
Correct

Incorrect
Question 958 of 1443

958. Question
5. Why will the following code give you an error?
```
a <- "Hello"
```
```
A
```
- Because there is no double equals sign
- Because the variable is uppercase instead of lowercase
- Because the quotation marks should be single, not double
- Because there is a missing parenthesis
Correct

Incorrect
Question 959 of 1443

959. Question
Which of these are examples of clustering?
Select all that apply
- Identifying customer shopping patterns based on previous behavior.
- Identifying voting patterns in a population.
- Identifying how a voter will vote.
- Identifying the seasonal effects in time-series data.
Correct

Incorrect
Question 960 of 1443

960. Question
Which picture below displays the standard error of a best fit line?
Correct

Incorrect
Question 961 of 1443

961. Question
What is the complexity parameter?
- It is the amount of improvement in relative error for each node
- It is the level of importance for each node
- It is the amount of complexity of the node
- It determines whether or not a node has multiple categories
Correct

Incorrect
Question 962 of 1443

962. Question
Why does it make sense to separate the code in multiple steps?
Select all that apply
- To follow the code more easily
- To decrease the chance of making mistakes
- To make the code shorter
- To make the code longer and more complex
Correct

Incorrect
Question 963 of 1443

963. Question
Which function adds up the data in the previous cells to create a new column or row with cumulative sums?
- ```
 cumsum() 
```
- ```
 ddply() 
```
- ```
 numcolwise() 
```
- ```
 cumulate() 
```
Correct

Incorrect
Question 964 of 1443

964. Question
Which questions can datafication help us answer
Select all that apply
- How do categorical variables affect a numerical outcome?
- How do different types of weather (rain, hail, etc.) affect demand for bike rentals?
- How does bias in the data affect the validity of our model?
- How does the correlation between variables affect the significance of the outcome?
Correct

Incorrect
Question 965 of 1443

965. Question
What is unsupervised machine learning?
- Data analysis that leads to new patterns and conclusions
- Classification and regression
- A way to identify who will vote Democrat and Republican
- Data analysis that classifies data based on pre-determined categories
Correct

Incorrect
Question 966 of 1443

966. Question
Which function measures the kurtosis of your data?
- KURT()
- KURTOSIS()
- SKEW()
- KURTOS()
Correct

Incorrect
Question 967 of 1443

967. Question
Which error appears for an invalid cell reference?
- #REF!
- #NULL!
- #NAME?
- #VALUE!
Correct

Incorrect
Question 968 of 1443

968. Question
What is the complexity parameter?
- It is the amount of improvement in relative error for each node
- It is the level of importance for each node
- It is the amount of complexity of the node
- It determines whether or not a node has multiple categories
Correct

Incorrect
Question 969 of 1443

969. Question
Which part of the "SimpleApp" application is the user input?
- When the user slides the slider to a different location.
- When the graph adjusts itself based on the slider.
- When the user views the graph.
- When the server determines the number of breaks in the graph.
Correct

Incorrect
Question 970 of 1443

970. Question
Why is bagging useful when you are working with a small dataset?
- Because it creates 'new' datasets by using sampling with replacement to train and test the model.
- Because it generates new data points from the existing data set.
- Because it re-arranges existing data points in a new order.
- Because it automatically runs each data point through different types of algorithms.
Correct

Incorrect
Question 971 of 1443

971. Question
What does the syntax below mean in R?

"[^A-z]"
- Do not include any alphabetical characters
- Do not include uppercase letters
- Do not include lowercase letters
- Only include alphabetical characters
Correct

Incorrect
Question 972 of 1443

972. Question
Which R package contains the SVM model functions?
- e1071
- suppmachine
- stats
- Base R
Correct

Incorrect
Question 973 of 1443

973. Question
What is an example of how the direction of trust does not always go both ways?
- Young children trust their parents, but parents may not trust their young children.
- A patient trusts her doctor and the doctor always trusts the patient.
- A website developer trusts a web hosting service and the service trust its user.
- A customer trusts a bank and a bank always trusts its customers to be honest.
Correct

Incorrect
Question 974 of 1443

974. Question
1. Internal data may be some of a company's most valuable data. Which of the following may be valuable sources of internal data?
Select all that apply
- HR and performance data
- Sales data
- Wikipedia articles
- Census data
Correct

Incorrect
Question 975 of 1443

975. Question
1. Which function can you use to see the list of contents that your linear regression model produces?
- ls( )
- lm( )
- lr( )
- lo( )
Correct

Incorrect
Question 976 of 1443

976. Question
1. Please fill in the blank below:
- communication played a key role in the infamous bankruptcy of energy trader ENRON.
Correct

Incorrect
Question 977 of 1443

977. Question
1. Which of the answers below passes the output of one function into the input of another function?
- ```
 %>% 
```
- ```
 gather 
```
- ```
 nPlot() 
```
- ```
 ddply() 
```
Correct

Incorrect
Question 978 of 1443

978. Question
2. How does clustering help when there are more than 3 attributes in the data?
- Clustering helps identify groups with many attributes that you can't easily visualize.
- Clustering can only cluster when there are more than three attributes.
- Clustering is the best method to visualize more than 4 attributes at a time.
- Clustering can gather data more accurately when there are many attributes.
Correct

Incorrect
Question 979 of 1443

979. Question
1. With network analysis, which of the following is the least common metric to be determined?
- The probability of a new connection
- The number of participants in a network
- The number of connections between participants
- The strength of a connection
Correct

Incorrect
Question 980 of 1443

980. Question
2. How did GE use predictive analytics to offer tailored products to their customers?
- They clustered their customers based on credit card usage and profitability.
- They sent out surveys to all their existing customers.
- They used a marketing company to create new products.
- They mined social media to create customer profiles.
Correct

Incorrect
Question 981 of 1443

981. Question
2. Fill in the blank below.
- Each represents an individual or an object in the network. Each represents a relationship between two people, places or objects.
Correct

Incorrect
Question 982 of 1443

982. Question
2. Match the function to its purpose.
Sort elements
- setting your working directory
- load your data
- load the ggmap package
- see how many calls you have remaining
- setwd()
- read.csv()
- install.packages("ggmap")
- geocodeQueryCheck()
Correct

Incorrect
Question 983 of 1443

983. Question
1. Which questions can datafication help us answer?
Select all that apply
- How do categorical variables affect a numerical outcome?
- How do different types of weather (rain, hail, etc.) affect demand for bike rentals?
- How does bias in the data affect the validity of our model?
- How does the correlation between variables affect the significance of the outcome?
Correct

Incorrect
Question 984 of 1443

984. Question
1. What is TRUE about API?
Select all that apply
- API stands for application programming interface.
- APIs allow you to use the data of sites like Google Maps, Pinterest, Twitter, WalMart and Best Buy.
- APIs allow you to download large amounts of data and custom select the subset of data that you want to use.
- APIs only allow you to download small amounts of data that you cannot subset for your specific purposes.
Correct

Incorrect
Question 985 of 1443

985. Question
1. Put the steps in order for creating a function that will render an animation for the dispersion simulation.
- Close the while() loop
- Update the variable from step 2 to tell R that we’re on the next step of the simulation
- For each step in the simulation update the color of the points
- Plot the 3rd graph
- Plot the 2nd graph
- Set the layout of the image defining how the 3 or more graphs should be laid out
- Plot the 1st graph
- Run a while() loop that runs starting with the number of the variable in step 2 and until every node in your graph is reached
- Set a new variable that denotes the step of the simulation equal to 1
- Use the saveHTML function in the animation package
- Set the file name, image size and other output options
Correct

Incorrect
Question 986 of 1443

986. Question
1. What does it mean when R says a graph is acyclic?
- The graph has dead ends.
- The graph doesn't have time-series cycles.
- The network has loops.
- The network has multiple shortest paths.
Correct

Incorrect
Question 987 of 1443

987. Question
2. Fill in the blanks below.
- data displays multiple occurrences per row and is easier to read in tables, while data displays one observation per row and is easier to plot with in ggplot2.
Correct

Incorrect
Question 988 of 1443

988. Question
2. Put the steps for creating a Shiny application in order
- Build the ui.R script
- Build the server.R script
- Save the ui.R script in the folder
- Save the server.R script in the folder
- Run the application
- Create a new folder with the name of the application
Correct

Incorrect
Question 989 of 1443

989. Question
2. Which of the following statements is not true about k-means clustering?
- The centroid has to be a defined point in the data set.
- k-means clustering is an iterative process.
- The set.seed() function ensures that k-means clustering will ensure that the results are reproducible.
- The centroid is the average location of all points in the cluster.
Correct

Incorrect
Question 990 of 1443

990. Question
2. Match the network to the possible connections.
Sort elements
- followers
- friends
- co-stars
- alma mater
- emails
- what grocery store you go to
- Twitter
- Facebook
- Netflix catalog
- Past presidents
- Your company
- City
Correct

Incorrect
Question 991 of 1443

991. Question
2. Please fill in the blank below:
- The opposite of big data is .
Correct

Incorrect
Question 992 of 1443

992. Question
2. Which of the following statements is not true about k-means clustering?
- The centroid has to be a defined point in the data set.
- k-means clustering is an iterative process.
- k-means clustering minimizes the distance between a central location and other data points in the cluster.
- The centroid is the average location of all points in the cluster.
Correct

Incorrect
Question 993 of 1443

993. Question
3. Based on this word cloud, generated from hotel reviews, order the words by frequency. Put the most frequently used words at the top.
- Staff
- Golf
- Beach
- Room
Correct

Incorrect
Question 994 of 1443

994. Question
2. Networks can be comprised of
Select all that apply
- E-mail usage
- Shopping patterns
- Factories and stores
- Actors in hollywood
Correct

Incorrect
Question 995 of 1443

995. Question
2. Match the variables in the equation "y = mx + b" to what they represent in the equation.
Sort elements
- The output variable, or the dependent variable
- The independent variable, or the variable you plug in to determine the output
- The y-intercept, or the value of y when x = 0
- The rate of change, or the slope
- y
- x
- b
- m
Correct

Incorrect
Question 996 of 1443

996. Question
2. Fill in the blank below.
- You can create a plot to check if the residuals in your model are normally distributed.
Correct

Incorrect
Question 997 of 1443

997. Question
2. Which function allows you to create a forecast for your model?
- forecast.HoltWinters()
- forecast()
- HoltWinters()
- HoltWinters.forecast()
Correct

Incorrect
Question 998 of 1443

998. Question
3. What does it mean when someone communicates a lot or has a lot of followers but has no incoming messages and follows few others?
- They could be an opinion leader or a celebrity.
- They could be a bot.
- They could be a spy.
- They could be a fraudulent account.
Correct

Incorrect
Question 999 of 1443

999. Question
3. Match the functions to their actions.
Sort elements
- Reads each line as a separate character
- Split the characters based on the condition in the quotes
- Creates a data frame that shows the number of nodes that received the message
- Plots a graph with given data
- readLines()
- strsplit()
- sapply()
- plot()
Correct

Incorrect
Question 1000 of 1443

1000. Question
2. What can identification of communities help uncover?
Select all that apply
- Political factions
- Genetic families
- Cyber-communities in social networks
- Terrorist groups
Correct

Incorrect
Question 1001 of 1443

1001. Question
3. What's the appropriate syntax for calling up sequential file names in a loop?
- eval(parse(text = paste0("filename", i)))
- parse(eval(text = paste0("filename", i)))
- eval(text = paste0("filename", i))
- parse(text = "filename")
Correct

Incorrect
Question 1002 of 1443

1002. Question
4. Why is it useful to create functions?
Select all that apply
- So you can reuse it later with different data
- So you can let other people input their data
- So you can use only the same data set to perform the analysis
- So you don't have to retype the same code multiple times
Correct

Incorrect
Question 1003 of 1443

1003. Question
4. Fill in the blank below.
- As it turns out, a map of DC looks somewhat similar to the crime map of DC.
Correct

Incorrect
Question 1004 of 1443

1004. Question
3. Which two functions can pass rCharts objects to Shiny?
Select all that apply
- ```
 renderChart2() 
```
- ```
 showOutput() 
```
- ```
 renderShiny() 
```
- ```
 shinyUI() 
```
Correct

Incorrect
Question 1005 of 1443

1005. Question
4. Fill in the blanks below.
- The method discovers new groups or categories of data, while the method assigns data points to known groups or categories.
Correct

Incorrect
Question 1006 of 1443

1006. Question
3. Why wouldn't a silhouette value be computed with one cluster?
- Because there wouldn't be an inter-cluster distance.
- Because there wouldn't be an intra-cluster distance.
- Because there need to be at least three clusters for the silhouette value to be calculated.
- Because the silhouette value relies on k-means, which needs at least two clusters.
Correct

Incorrect
Question 1007 of 1443

1007. Question
5. Increasing the number of variables in a predictive model may not be beneficial because...
- it could decrease its generalizability
- it could lessen its accuracy
- it would reduce the amount of data available for validation tests
- N/A. It is always beneficial to add variables to your model.
Correct

Incorrect

Question 1008 of 1443

1008. Question

4. Match the situation with your next step.

Sort elements

Next, you should run a regression analysis.
Next, you should run a multivariate regression analysis.
Next, you should run a polynomial regression analysis.
Next, you should run a LOWESS or LOESS regression analysis.

You want to see how two variables interact.
You want to see how five variables interact.

You want to see if you can get a better fit with your five variables.

You want to see if your five variables are being influenced by seasonal changes.

Correct

Incorrect

Question 1009 of 1443

1009. Question
4. Why did the Literary Digest incorrectly forecast a presidential win?
- They did not correctly sample the general population.
- Their sample deliberately lied about who they would vote for.
- They did not store their data correctly.
- They did not understand important statistical methods.
Correct

Incorrect
Question 1010 of 1443

1010. Question
5. How might finding the purchase patterns of different groups help you with your customers?
Select all that apply
- You can send special offers to individuals who are more likely to use them.
- You could stock your shelves with products that are bought together more frequently.
- You could make better product recommendations that complement the purchase patterns.
- You could let your clients know you have their data so they can make more deliberate purchases.
Correct

Incorrect
Question 1011 of 1443

1011. Question
4. How can you use association rules in different industries?
Sort elements
- Organize store layout, create catalogs, design discount patterns
- Track behavior of users, Improve ad positioning, detect intrusions
- Find functional/structural patterns for a set of proteins
- Find words that your customers use frequently to evaluate marketing campaigns, feature improvement
- Retail
- Site developers
- Bioinformatics
- Social Media marketing
Correct

Incorrect
Question 1012 of 1443

1012. Question
4. The process of extracting information from large quantities of data to find insights, patterns and other latent information is referred to as
- Data mining
- Networking
- Visualizing
- Data dumping
Correct

Incorrect
Question 1013 of 1443

1013. Question
5. Match the functions to their purpose.
Sort elements
- to see how many calls you have remaining
- to see the warnings
- to combine two columns of your data set
- to rename columns
- to save your file so that you don't have to re-run the code again
- geocodeQueryCheck()
- warnings ()
- cbind()
- as.data.frame()
- write.csv()
Correct

Incorrect
Question 1014 of 1443

1014. Question
4. What do you need to run an F-test?
Select all that apply
- The number of coefficients in the model excluding the y-intercept
- The degrees of freedom
- The F-statistic
- The Cook's distance
Correct

Incorrect
Question 1015 of 1443

1015. Question
3. Which visualization below is showing the periodicity of the data?
Correct

Incorrect
Question 1016 of 1443

1016. Question
4. Which package allows you save your output as an html file?
- magrittr
- plyr
- networkD3
- htmlr
Correct

Incorrect
Question 1017 of 1443

1017. Question
4. Which function allows you to create a data frame?
- data.frame()
- dataframe()
- data()
- frame()
Correct

Incorrect
Question 1018 of 1443

1018. Question
5. Hierarchical clustering assumes that points with the shortest distance between them are:
- most similar
- most different
- connected by multiple paths
- disconnected
Correct

Incorrect
Question 1019 of 1443

1019. Question
4. What does it mean if someone on Twitter has a high in-degree and low out-degree?
Select all that apply
- The person has a lot of followers.
- The person is more likely to be an influencer.
- The person is more likely to be a bot.
- The person is more likely to be famous.
Correct

Incorrect
Question 1020 of 1443

1020. Question
2. Data scientists’ responsibilities may include:
Select all that apply
- Visualizing data
- Analyzing data
- Asking questions about data
- Formatting data
Correct

Incorrect
Question 1021 of 1443

1021. Question
5. Which of these features are available in RStudio? (Go into Global Options)
Select all that apply
- Spell check
- Keyboard shortcuts
- Automatic package loading and installing
- Font and background changes
Correct

Incorrect
Question 1022 of 1443

1022. Question
5. Which of these are examples of dark data?
Select all that apply
- Misnamed customer data files.
- Data on dark matter used to prove its existence.
- Forgotten shipping histories of freight.
- Murder rates in the United States.
Correct

Incorrect
Question 1023 of 1443

1023. Question
5. Why is calculating closeness centrality useful?
Select all that apply
- You learn who or what is the most central hub in a network.
- You learn who or what can reach every other node in the network in the shortest amount of time.
- You learn who or what is the most frequent connector of a network.
- You learn who or what is the most likely to break down in a network.
Correct

Incorrect
Question 1024 of 1443

1024. Question
5. What can you do if you have missing data?
Select all that apply
- You can repeat the values before and after the missing data, which holds the value constant.
- You can use regression to interpolate the trend, or draw a best-fit line between two points.
- You can use average or median value to fill in the missing data.
- You don't need to do anything to deal with missing data.
Correct

Incorrect
Question 1025 of 1443

1025. Question
5. Which package has the "ddply" function?
- plyr
- tidyr
- ggplot
- rsunlight
Correct

Incorrect
Question 1026 of 1443

1026. Question
Put the six Data Science control cycle steps in order, starting with “Ask”
- Validate
- Model
- Ask
- Research
- Interpret
- Test
Correct

Incorrect

Question 1027 of 1443

1027. Question

Match the terms to the descriptions.

Sort elements

Betweenss
Withinss
Totss

Sum of all the squared distances between data points in different clusters

Sum of all the squared distances between points within the same cluster

Total sum of squares

Correct

Incorrect

Question 1028 of 1443

1028. Question
Which of the following SQL functions can be used on date fields?
Select all that apply
- GETDATE()
- DATEDIFF()
- DATEADD()
- MIN()
- PATINDEX()
- RTRIM()
Correct

Incorrect
Question 1029 of 1443

1029. Question
5. What are two ways you can have increased certainty in your model's accuracy?
Select all that apply
- If you have a 95% confidence interval.
- If you plot your residuals in a histogram and they have a normal distribution.
- If you plot your residuals in a histogram and they have a skewed distribution.
- If you have a <50% confidence interval.
Correct

Incorrect
Question 1030 of 1443

1030. Question
Sort the variables as either continuous or discrete.
Sort elements
- Number of cars, number of buildings, temperature
- Days of the week, months of the year
- Colors, types of weather, names
- Continuous variables
- Discrete variables
- Discrete variables without a defined sequence
Correct

Incorrect
Question 1031 of 1443

1031. Question
Fill in the blank below.
- In graphics, the transparency argument is called , where 0 is entirely transparent, and the default of 1 is entirely opaque.
Correct

Incorrect
Question 1032 of 1443

1032. Question
Order these logical operators from fastest to slowest in terms of query performance:
- LIKE
- <>
- =
- >,>=,<,<=
Correct

Incorrect
Question 1033 of 1443

1033. Question
Fill in the blanks below.
- The method is part of unsupervised machine learning and discovers new patterns or groups of data, while the method is part of supervised machine learning and assigns data points to known groups or categories.
Correct

Incorrect
Question 1034 of 1443

1034. Question
How does industry knowledge help us understand our analysis?
- There may be latent factors that we wouldn't know unless we had expertise in that field.
- You can't make an impact unless you're an expert in the field.
- Only athletes can understand how athletes are paid.
- Industry knowledge indicates a better understanding of data analysis.
Correct

Incorrect
Question 1035 of 1443

1035. Question
Match the methods of variable selection to the correct descriptions.
Sort elements
- Algorithm starts with a model of 0 variables and continues to add more variables based upon a specified measure
- Starts with a model of all variables, and removes variables based upon a specified measure
- Combination of forward and backward selection that starts with a model of 0 variables and adds variables, but can also remove variables based upon a specified measure
- Forward selection
- Backward selection
- Step-wise selection
Correct

Incorrect
Question 1036 of 1443

1036. Question
The 3 V's of data are:
Select all that apply
- Volume
- Velocity
- Variety
- Veracity
Correct

Incorrect
Question 1037 of 1443

1037. Question
How do you measure the explanatory power of your predictive model?
- R squared
- Q squared
- B squared
- C squared
Correct

Incorrect
Question 1038 of 1443

1038. Question
What are the three parts of an optimization model?
- Target cell, Changing cells, Constraints
- Target cell, Fixed cells, Constraints
- Fixed cell, Changing cells, Constraints
- Target cell, Constraints, Cell parameters
Correct

Incorrect
Question 1039 of 1443

1039. Question
Put the four steps of building a classification tree in order.
- Conditional on the previous answer, select the next question
- Stop growing the tree when there is no more information gain
- Ask the question with the most amount of information
- Create a new question branch after the previous one
Correct

Incorrect
Question 1040 of 1443

1040. Question
What are two ways to run a Shiny application?
Select all that apply
- Click the "runApp" button in the script window
- Type 'runApp' in the script or console window
- Click the 'run' button in the script window
- Type run("application") in the script window
Correct

Incorrect
Question 1041 of 1443

1041. Question
Using the Capital Bikeshare data, what questions can we answer through regression analysis?
Select all that apply
- How does any single given factor (air temperature, humidity, wind speed) affect demand for bikes?
- How do several variables (air temperature, humidity, wind speed, day of the week, holidays, hour of the day) affect demand for bikes?
- How can you factor seasonality and cyclicality when forecasting demand?
- How can you figure out how the color of the bike impacts customer ride satisfaction?
Correct

Incorrect
Question 1042 of 1443

1042. Question
Fill in the blank below based on this chart.
- If you add the row and the row, then you get the observed data row.
Correct

Incorrect
Question 1043 of 1443

1043. Question
Please fill in the blank below.
- The Index is a way of measuring the extent of similarity between two people or objects.
Correct

Incorrect
Question 1044 of 1443

1044. Question
5. Why will the following code give you an error?
```
a <- "Hello"
```
```
A
```
- Because there is no double equals sign
- Because the variable is uppercase instead of lowercase
- Because the quotation marks should be single, not double
- Because there is a missing parenthesis
Correct

Incorrect
Question 1045 of 1443

1045. Question
1. Why is it useful to use the script window for writing code?
Select all that apply
- You can save the code
- You can write many lines at once without executing them
- You can see the output in the window
- The code runs immediately after each line
Correct

Incorrect
Question 1046 of 1443

1046. Question
Which of the following statements is not true about k-means clustering?
Select all that apply
- The centroid has to be a defined point in the data set.
- k-means clustering is an iterative process.
- The set.seed() function ensures that k-means clustering will ensure that the results are reproducible.
- The centroid is the average location of all points in the cluster.
Correct

Incorrect
Question 1047 of 1443

1047. Question
What is TRUE about correlation?
Select all that apply
- Correlation identifies the strength of the linear relationship between variables on a scale of -1 to 1.
- A correlation of 1 means that the variables move perfectly in tandem - if one variable increases, then the other increases at a fixed rate.
- A correlation of -1 means that the variables move in a perfectly inverse fashion - if one variable decreases, then the other increases at a fixed rate.
- A correlation of 0 means that there is no linear relationship between the change in x and the change in y.
- Correlation always implies causation.
Correct

Incorrect
Question 1048 of 1443

1048. Question
Which of these is not an attribute of classification?
Select all that apply
- Discovers patterns in data
- Assigns data points to known groups or categories
- Calculates probabilities of events occurring or group membership
- Exploratory data analysis (EDA)
Correct

Incorrect
Question 1049 of 1443

1049. Question
Which of the answers below is a function?
Select all that apply
- ```
 subset() 
```
- ```
 ggsave() 
```
- ```
 %in% 
```
- ```
 == 
```
Correct

Incorrect
Question 1050 of 1443

1050. Question
Which function converts wide data to long data?
- ```
 gather() 
```
- ```
 ddply() 
```
- ```
 summarize() 
```
- ```
 convert() 
```
Correct

Incorrect
Question 1051 of 1443

1051. Question
What is a good way to test for multicollinearity?
- Variance-inflation factors
- Cook's Distance
- Box plots
- Q-Q plot
Correct

Incorrect
Question 1052 of 1443

1052. Question
Which of these are examples of dark data?
Select all that apply
- Misnamed customer data files.
- Data on dark matter used to prove its existence.
- Forgotten shipping histories of freight.
- Murder rates in the United States.
Correct

Incorrect
Question 1053 of 1443

1053. Question
What does covariance measure?
Select all that apply
- Measure how one variable affects another variable
- Measures if a relationship is positive or negative
- Easily measures comparison of relationships between variables
- Measures the impact that the variable has on the model
Correct

Incorrect
Question 1054 of 1443

1054. Question
What tab can you find the charting functionality in?
- Insert
- Tables
- Graphs
- Plots
Correct

Incorrect
Question 1055 of 1443

1055. Question
Which of these is not an attribute of classification?
Select all that apply
- Discovers patterns in data
- Assigns data points to known groups or categories
- Calculates probabilities of events occurring or group membership
- Exploratory data analysis (EDA)
Correct

Incorrect
Question 1056 of 1443

1056. Question
Why can't we calculate the denominator of the Naive Bayes formula?
- Because we don't know if the variable are conditionally independent
- Because we don't know the probability of the variables
- Because the denominator is canceled out by the numerator
- Because the numerator can't be calculated
Correct

Incorrect
Question 1057 of 1443

1057. Question
Which statement is NOT true about random forest?
- Decision trees are merged together when running the model
- Each tree is grown to the largest extent possible
- A new test data point is run through each decision tree in the forest.
- The label that gets the most votes is how the point is classified.
Correct

Incorrect
Question 1058 of 1443

1058. Question
Why is it important to convert all words to lower case before removing 'stop words'?
- Because R reads lowercase and uppercase letters as two different letters
- Because R can't process punctuation
- Because R cannot process proper nouns
- Because R reads lowercase and uppercase letters as the same letters
Correct

Incorrect
Question 1059 of 1443

1059. Question
What should you do to a non-linearly separable data set?
- You need to transform it.
- You need to keep it the same.
- You need to use maximal margin SVM.
- You need to use neural networks.
Correct

Incorrect
Question 1060 of 1443

1060. Question
What is TRUE about the Jaccard Index?
- The higher the Jaccard Index, the more the behavior of one person or entity is likely to accurately predict the behavior of another person or entity.
- The lower the Jaccard Index, the more the behavior of one person or entity is likely to accurately predict the behavior of another person or entity.
- The higher the Jaccard Index, the less the behavior of one person or entity is likely to accurately predict the behavior of another person or entity.
- The Jaccard Index cannot help you to predict the behavior of another person or entity.
Correct

Incorrect
Question 1061 of 1443

1061. Question
1. Fill in the blank below.
- William Deming who is known for training hundreds of engineers, managers, and scholars about statistical process control in Japan after World War 2 has a saying “In God we trust. All others must bring .”
Correct

Incorrect
Question 1062 of 1443

1062. Question
1. Fill in the blank below.
- measures how changes in one variable effects another variable.
Correct

Incorrect
Question 1063 of 1443

1063. Question
1. How do you automate the process for reading and formatting multiple files?
- Use for() loops.
- Rerun your code multiple times with varying data sets.
- Combine variables with different data.
- Use repeat() loops
Correct

Incorrect
Question 1064 of 1443

1064. Question
1. Why is it useful to use the script window for writing code?
Select all that apply
- You can save the code
- You can write many lines at once without executing them
- You can see the output in the window
- The code runs immediately after each line
Correct

Incorrect
Question 1065 of 1443

1065. Question
1. What is the main problem of visualizing the ebola data set in ggplot?
- Some of the countries are on different scales
- There are different diseases being visualized
- Not all the countries are in the same geographic area
- There isn't enough information for analysis
Correct

Incorrect
Question 1066 of 1443

1066. Question
1. Fill in the blank below.
- The main goal of clustering is to intra-cluster distance (the distance between points in a cluster) and inter-cluster distance (the distance between clusters).
Correct

Incorrect
Question 1067 of 1443

1067. Question
2. Fill in the blank.
- You can use to help answer questions, such as "Who do your customers trust?" and "How does information spread within your company?"
Correct

Incorrect
Question 1068 of 1443

1068. Question
1. Match the reason data are valuable with its description.
Sort elements
- Maintaining accurate and secure data for a long period of time may help prove accountability and avoid penalties.
- Through the use of data, processes that were previously manual may be made more efficient.
- Descriptive statistics can reveal what has already happened and may provide surface-level insights.
- The use of data science methods can help extract novel, powerful insights, anticipate behaviors, and build tools.
- Compliance
- Automation
- Dashboards
- Predictive analytics
Correct

Incorrect
Question 1069 of 1443

1069. Question
1. Which of these relationship types could be in a network?
Select all that apply
- Environmental relationships
- Economic relationships
- Geographic relationships
- Communication patterns
Correct

Incorrect
Question 1070 of 1443

1070. Question
1. Why is it important to think about the purpose of the analysis before you manipulate your data?
Select all that apply
- To have a clear idea of what the data should look like in order to analyze or visualize it.
- To minimize the amount of time that it will take to complete the work.
- To make conclusions about trends in the data.
- To better understand the purpose of data science and analysis.
Correct

Incorrect
Question 1071 of 1443

1071. Question
1. Fill in the blanks below.
- You always have to your models and check for potential even before you can test it on new data!
Correct

Incorrect
Question 1072 of 1443

1072. Question
1. What is the "for" loop used for in R?
- The "for" loop automatically performs a designated operation for as many times as you tell it to.
- The "for" loop can perform a designated operation a single time.
- The "for" loop automatically saves your data.
- The "for" loop subsets the data into whatever designated columns you tell it to.
Correct

Incorrect
Question 1073 of 1443

1073. Question
1. What problems will you solve in this course?
Select all that apply
- Measure trust between people and among groups to identify influence
- Detect communities to understand how people and objects interact and group together
- Apply Google's PageRank algorithm to determine the most important nodes in a network
- Learn the basics of regression analysis
Correct

Incorrect
Question 1074 of 1443

1074. Question
2. How can you search for multiple terms in one row?
- Use the & sign.
- Use the "and" term.
- Use the "%and%" term.
- Use the "%in%" term.
Correct

Incorrect
Question 1075 of 1443

1075. Question
2. Fill in the blank below.
- The str() function shows us the of the data.
Correct

Incorrect
Question 1076 of 1443

1076. Question
3. Which analogy best describes the UI and server script?
- The UI script is the outside of the car and the server is the engine
- The UI script is a tree, and the server is the roots
- The UI script is a computer and the server is the monitor
- The UI script is the coffee and the server is the coffee beans
Correct

Incorrect
Question 1077 of 1443

1077. Question
2. Match the ggplot function to its purpose.
Sort elements
- geom_point
- ggtitle
- xlab
- ylab
- scale_shape_manual
- Specifies to map the data as points.
- Specifies the title of the graph.
- Labels the x axis.
- Labels the y axis.
- Specifies how the legend and data points should be displayed.
Correct

Incorrect
Question 1078 of 1443

1078. Question
3. Put the events in order to describe how an economic network may expand. Place the first event on top.
- The factory requires the presence of construction crews and equipment
- New factories and stores are built to meet an increase in demand
- The families of the construction crews spend money in the city
- A company decides to build new a factory in town
Correct

Incorrect
Question 1079 of 1443

1079. Question
3. When you have data,
- quantitative analysis always wins over qualitative analysis and intuition.
- qualitative analysis always wins over quantitative analysis and intuition.
- you should use your intuition to find patterns in the data.
- you should segment it to support your hypothesis.
Correct

Incorrect
Question 1080 of 1443

1080. Question
2. Why are the coordinates of the cheese data set centroids in decimals?
- Because it represents the average between values that are either 0 or 1.
- Because the cheese is sold in partial pounds.
- Because the centroid coordinates can never be whole numbers.
- Because the centroid always represents a particular data point.
Correct

Incorrect
Question 1081 of 1443

1081. Question
2. Fill in the blank below.
- The fallacy involves making a decision based solely on easy quantitative observations and ignoring all other latent observations.
Correct

Incorrect
Question 1082 of 1443

1082. Question
3. Select the statement below that is TRUE.
- A network is comprised of vertices or nodes, and edges or connecting lines between the vertices.
- A network is a program that we can use to analyze data.
- A network is very difficult to measure or quantify.
- A network is an unreliable way to evaluate and analyze relationships and transactions within a business.
Correct

Incorrect
Question 1083 of 1443

1083. Question
3. What is the first thing you need to do when you begin working in R?
- Set the working directory
- Load the data
- Load the packages you need
- Set the column headings on your data
Correct

Incorrect
Question 1084 of 1443

1084. Question
3. Fill in the blank below.
- coding is an approach that allows for the inclusion of categorical variables with multiple levels in regression models.
Correct

Incorrect
Question 1085 of 1443

1085. Question
3. What do you need to do after every forecast period?
Select all that apply
- Update the level values.
- Update the trend values.
- Find the difference of the residuals.
- Find the outliers in the data.
Correct

Incorrect
Question 1086 of 1443

1086. Question
2. Fill in the blank below.
- You can use the function from the "data.table" package to make sure the columns have the same name so we can join the two data sets.
Correct

Incorrect
Question 1087 of 1443

1087. Question
2. Which code below correctly shows how to take all the points that are reached by the 20th iteration of the simulation and make them blue?
- V(Twitter_network_simulation_1_graph_simple) $color[V(Twitter_network_simulation_1_graph_simple) %in% reached[[20]]] = "blue"
- V(Twitter_network_simulation_1_graph_simple) $color[V(Twitter_network_simulation_1_graph_simple) %in% reached[[20]]] = "orange"
- E(Twitter_network_simulation_1_graph_simple)$color = "light blue"
- pdf("Twitter network simulation_color.pdf", width = 10, height = 10)
Correct

Incorrect
Question 1088 of 1443

1088. Question
3. Match the definition of the type of linkage with the correct term.
Sort elements
- Minimum distance between points
- Maximum distance between points
- Group distance average
- Distance between centroids
- Single linkage
- Complete linkage
- Average linkage
- Centroid linkage
Correct

Incorrect
Question 1089 of 1443

1089. Question
2. What are some properties of data in JSON format?
Select all that apply
- It's written in JavaScript notation.
- It is language independent.
- It is language dependent.
- It is good for holding large amounts of data.
Correct

Incorrect
Question 1090 of 1443

1090. Question
5. Good coding habits include:
Select all that apply
- Commenting out your code
- Separating out your code
- Wrapping code for re-use
- Putting multiple functions on one line
Correct

Incorrect
Question 1091 of 1443

1091. Question
5. Fill in the blank below.
- The package makes it easier to work with dates in R
Correct

Incorrect
Question 1092 of 1443

1092. Question
4. Why is it useful to host apps on external sites?
Select all that apply
- So anyone can access them without installing R or RStudio
- So you can increase the number of R users in your community
- So others can edit the code of the application
- So you can easily share your work with others
Correct

Incorrect
Question 1093 of 1443

1093. Question
3. Why do we cluster only Democrat-introduced bills instead of both Democrat- and Republican-introduced bills?
- So we limit the number of variables we have to account for.
- Because we only want to look at Democrats.
- So we can have a bigger dataset to analyze.
- So we can predict how Democrats will vote on Republican-introduced bills.
Correct

Incorrect
Question 1094 of 1443

1094. Question
4. Why do we look at 4 and 9 clusters when they only have an explained variance of 13.4%?
- It's the highest number compared to other numbers of clusters.
- We don't actually need a high explained variance in sk-means analysis.
- We want a low number for sk-means and a high number for k-means.
- The explained variance doesn't matter as much as the number of clusters.
Correct

Incorrect
Question 1095 of 1443

1095. Question
4. Match the common relationships or patterns with the type of network they represent.
Sort elements
- Organizational relationships
- Communication patterns
- Economic, environmental, or geographic relationships
- Connections based on interests, preferences, or similarities
- Parent and child
- Emails between co-workers
- Citizens in the same tax bracket or state
- Members of the same gym
Correct

Incorrect
Question 1096 of 1443

1096. Question
4. Which of the following are common causes of outliers?
- Someone entered their data incorrectly
- A computer or machine recorded or transferred the data incorrectly
- Pure chance
- The data weren't visualized properly
Correct

Incorrect
Question 1097 of 1443

1097. Question
5. Fill in the blank below:
- Remember, like all resources, - data that cannot fit on a single computer or server - has to be cost effective.
Correct

Incorrect
Question 1098 of 1443

1098. Question
4. Fill in the blank below.
- You need to understand how much your algorithm can explain the results in your data and how much of it is that is always present in large data sets and cannot be accounted for by your model.
Correct

Incorrect
Question 1099 of 1443

1099. Question
5. A company is planning to re-brand one of its products and the product director wants to know how customers feel about some potential product names. Text mining can be most beneficial in which of these situations?
- Summarizing open response feedback from survey respondents
- Summarizing a 1-5 rating from survey respondents
- Identifying the number of times a focus group participant smiles when they hear the new brand name
- Predicting the likelihood that a customer will like the new name based on whether a similar customer likes the name
Correct

Incorrect
Question 1100 of 1443

1100. Question
5. Besides network analysis, what are some other ways to approach data mining?
Select all that apply
- Visualization
- Descriptive statistics
- Clustering
- Text mining
Correct

Incorrect
Question 1101 of 1443

1101. Question
3. Fill in the blank below.
- You can automate a customized analysis by creating a !
Correct

Incorrect
Question 1102 of 1443

1102. Question
3. Which function should you use to save your ggpairs analysis in R?
- pdf()
- doc()
- save()
- ggpairs()
Correct

Incorrect
Question 1103 of 1443

1103. Question
5. Exponential smoothing assumes data is made up of which 2 components?
Select all that apply
- A level or average value
- An error value around that level or average
- Exponential smoothing doesn't assume anything about the data
- There is no error value
Correct

Incorrect
Question 1104 of 1443

1104. Question
4. What are some examples of the functionality can have in your interactive visualization?
Select all that apply
- The drop-down menu allows you to select individual nodes.
- You can zoom in and out, center, and move the graph.
- Edges have pop-up labels with the number of tweets.
- The nodes have pop-ups with clearly visible names.
Correct

Incorrect
Question 1105 of 1443

1105. Question
3. Which function do you need to use to finish the saving process?
- dev.off()
- save()
- save.work()
- off()
Correct

Incorrect
Question 1106 of 1443

1106. Question
3. Why do you need to include two data sets in the graph.data.frame() function?
- Because there is too much data for one data set.
- Because one data set contains node attributes and one contains edge attributes.
- Because one data set contains network data, and one contains supplemental information.
- Because there isn't enough information to graph a network with one data set.
Correct

Incorrect

Question 1107 of 1443

1107. Question

5. Match the method with the description.

Sort elements

Closeness centrality
Betweenness centrality
Eigenvector centrality
PageRank score
Jaccard Index
Hierarchical clustering

Measures how quickly someone can spread a message that reaches every other point in the network

Measures how significant a node is as a connector of the ecosystem.

Measures who important someone is something is based on what they’re connected to.

Measures how important someone is based on their relative role in the network.

Measures how similar or redundant 2 elements of a network are and helps detect fraud

Takes the Jaccard Index to the next level of application and identifies communities by measuring the similarities of nodes.

Correct

Incorrect

Question 1108 of 1443

1108. Question
2. R can read all these different file types except
- JPEG
- CSV
- TXT
- DOC
Correct

Incorrect
Question 1109 of 1443

1109. Question
5. Which method helps you model and understand how a disease spreads?
- Network analysis
- Text mining
- Clustering
- Regression
Correct

Incorrect
Question 1110 of 1443

1110. Question
5. Which of the examples below demonstrates the "Shipping" step of business functions?
Select all that apply
- Finding the most fuel efficient routes for trucks in a city
- Optimizing the way products are stacked on a cargo plane
- Identifying which customers to sell to
- Designing the most aerodynamic product
Correct

Incorrect
Question 1111 of 1443

1111. Question
5. Which symbol below do you use to end the for( ) loop?
- }
- )
- `
- -
Correct

Incorrect
Question 1112 of 1443

1112. Question
5. Why is it important to make sure your data looks clean?
- It makes the data easier to work with.
- It gives you all the information you need.
- It isn't really that important to have clean data sets.
- It makes the data harder to work with.
Correct

Incorrect
Question 1113 of 1443

1113. Question
5. Why do you use the "set seed" function?
Select all that apply
- Make the graph reproduciable
- Assign the graph to a variable
- Set the thickness of the lines
- Save the graphic image
Correct

Incorrect
Question 1114 of 1443

1114. Question
2. R can read all these different file types except
- JPEG
- CSV
- TXT
- DOC
Correct

Incorrect
Question 1115 of 1443

1115. Question
In order for us to determine how much variation our clusters account for, we need to:
- divide the inter-cluster variance by the total variance.
- divide the total variance by the inter-cluster variance.
- divide the intra-cluster variance by the inter-cluster variance.
- divide the inter-cluster variance by the intra-cluster variance.
Correct

Incorrect
Question 1116 of 1443

1116. Question
Put the following SQL clauses in the correct standard query template order:
- GROUP BY
- HAVING
- SELECT
- INTO
- ORDER BY
- WHERE
- FROM
Correct

Incorrect
Question 1117 of 1443

1117. Question
In a non-biased model errors will be random. If errors are not random it means...
Select all that apply
- there is a "bias" in the model.
- you're not taking something into account.
- your model has a normal distribution of errors.
- your model has a skewed distribution of errors.
Correct

Incorrect
Question 1118 of 1443

1118. Question
How do we know we still need to refine our model further?
- The standard error of the residuals is still very large.
- The p-value is almost zero.
- There is a small difference between R squared and adjusted R squared.
- Each categorical variable had been split into its components.
Correct

Incorrect
Question 1119 of 1443

1119. Question
Fill in the blanks below.
- In order to best visualize your data, you may have to transform it to a different format. data displays multiple occurrences per row and is easier to read in tables, while data displays one observation per row and is easier to plot with in ggplot2.
Correct

Incorrect
Question 1120 of 1443

1120. Question
Match the following SQL Server components to their definition
Sort elements
- programs that provides database services to other computer programs
- a container of data/information organized into tables (and other structures) so that they can be easily managed and accessed back in same fashion.
- data stored in a tabular format with rows of named columns
- An application used to configure, manage, and administer components of SQL server (i.e. the user interface for accessing servers, databases, and tables to launch commands to the server)
- Server
- Database
- Table
- SQL Server Management Studio
Correct

Incorrect
Question 1121 of 1443

1121. Question
Fill in the blank below.
- The main goal of clustering is to intra-cluster distance (the distance between points in a cluster) and inter-cluster distance (the distance between clusters). This ensures that the clusters are as defined and separated as possible.
Correct

Incorrect
Question 1122 of 1443

1122. Question
How can you determine if the histogram of your residuals is really a normal, unbiased distribution?
- Run a quantile-quantile (QQ) plot analysis
- Run a standard deviation (SD) analysis
- Run a confidence interval (CI) analysis
- Run a linear regression line (LRL) analysis
Correct

Incorrect
Question 1123 of 1443

1123. Question
What are some key things you should always check for in your model?
Select all that apply
- Outliers
- Multicollinearity and correlation among the variables
- Adjusted R squared
- Model bias and distribution of residuals (Q-Q plot)
- Standard deviation of residuals to assess model fit
- Heteroscedasticity / pattern of residuals vs. fitted values
Correct

Incorrect
Question 1124 of 1443

1124. Question
What are some advantages of Excel?
Select all that apply
- Easy to learn
- Has third-party add-ins
- Write custom functions
- Great for creating presentations of data summaries
- Very flexible and interactive visualizations
Correct

Incorrect
Question 1125 of 1443

1125. Question
How do you calculate R squared?
- Subtract the ratio of the randomness to the total variance from the number 1.
- Divide the ratio of the randomness to the total variance from the number 1.
- Add the ratio of the randomness to the total variance from the number 1.
- Multiply the ratio of the randomness to the total variance from the number 1.
Correct

Incorrect
Question 1126 of 1443

1126. Question
Match the algorithm to the situation below:
Sort elements
- Generalized Reduced Gradient (GRG) Nonlinear
- Simplex LP
- Evolutionary
- A problem that is smooth and nonlinear
- A problem that is linear
- A problem that is non-smooth
Correct

Incorrect

Question 1127 of 1443

1127. Question

Match the attributes to the decision tree calculation.

Sort elements

Entropy
Gini impurity

Categorical attributes

Finds the largest class in the data

Uses algorithms

Continuous variables

Finds groups of classes that make up over 50% of their data

Minimizes classification

Correct

Incorrect

Question 1128 of 1443

1128. Question

Match the R script to its corresponding description.

Sort elements

global.R
server.R
ui.R

Contains all the data and packages needed to run the application

Contains the computational logic needed to display results that depend on user input

Contains graphical user interface i.e what the app looks like and the control that user interacts with

Correct

Incorrect

Question 1129 of 1443

1129. Question
Put the steps in order for regression using random forest:
- Each tree is grown as long as possible without pruning
- Random p predictors are selected for splitting the node of trees
- Random forest selects n observations randomly with replacement (bootstrapping)
- New data is predicted by taking average of the predictions of all n trees
Correct

Incorrect

Question 1130 of 1443

1130. Question

Match the kernel type to the description below.

Sort elements

Linear
Polynomial
RBF (radial basis)
Sigmoid

Tells the SVM function that the data can be separated by a straight line

A function in the form of a polynomial

A function that maps / projects data that is non-linearly separable

A function that maps / projects non-linearly separable data, but doesn't exist under certain circumstances

Correct

Incorrect

Question 1131 of 1443

1131. Question
Please fill in the blank below.
- The Index will calculate the similarity between politicians # and their donors by comparing the people that they are connected to.
Correct

Incorrect
Question 1132 of 1443

1132. Question
1. Why did we choose R programming language over other languages?
Select all that apply
- It’s the language of choice for statisticians
- It has a large library of tools and packages
- It’s mainly used for programming
- It is flexible and creates powerful visualizations
Correct

Incorrect
Question 1133 of 1443

1133. Question
Why do we use packages for data manipulation instead of just using built-in R functions?
Select all that apply
- These packages are easier to learn
- This allows you more time to think about the data instead of complex programming
- So we can directly chat to the R community
- These packages are updated regularly, while R is not updated
Correct

Incorrect
Question 1134 of 1443

1134. Question
Which function makes sure that the k-means analysis is reproducible?
- set.seed()
- kmeans()
- iterate()
- head()
Correct

Incorrect
Question 1135 of 1443

1135. Question
What is R Squared?
- A number that indicates the proportion of the variance in the dependent variable that is predictable from the independent variable.
- A number that indicates the confidence we can have in the results of our data.
- A number that indicates the difference between the dependent variable and the independent variable.
- A number that indicates the length of the y-axis in comparison with the x-axis of our model.
Correct

Incorrect
Question 1136 of 1443

1136. Question
Which attribute does not belong to clustering?
Select all that apply
- Exploratory data analysis (EDA)
- Discovers and forms new groups or categories of data
- Calculates probabilities of events occurring or group membership
- Supervised machine learning technique
Correct

Incorrect
Question 1137 of 1443

1137. Question
Fill in the blank below.
- You can create a by wrapping code in curly braces. This can help you streamline your code to perform multiple steps in one line, similar to a for() loop.
Correct

Incorrect
Question 1138 of 1443

1138. Question
Why does R put an 'X' in front of numerical column names?
Select all that apply
- Because variables can't start with numbers
- Because the '$' syntax won't work with numerals
- So as to always differentiate between numbers and years
- Because R is reading it in as strings
Correct

Incorrect
Question 1139 of 1443

1139. Question
Which package do we use to run the vif() function?
- car
- ggplot
- plyr
- tidyr
Correct

Incorrect
Question 1140 of 1443

1140. Question
- When you’re removing duplicates, what should you do to ensure that you do not permanently lose the data?
- Copy and paste the data to a new worksheet
- Retype all the cell values in a different columns
- Use the Ctrl + Z keys to undo any deletions
- Highlight the numbers you want to delete
Correct

Incorrect
Question 1141 of 1443

1141. Question
Why should you be careful when using summary statistics?
- Summary statistics don’t always capture the shape of the data well
- Summary statistics are not widely accepted in the statistics community
- Summary statistics visualize data incorrectly
- Summary statistics can negate supervised machine learning
Correct

Incorrect
Question 1142 of 1443

1142. Question
What keys should you press to freeze panes?
- Alt + W + F + F
- Alt + A + T
- Alt + A + M
- Ctrl + E + S
Correct

Incorrect
Question 1143 of 1443

1143. Question
Which of the following is a practical challenge that companies face when using data?
- A limited pool of data-literate talent
- Insufficient time to collect data
- Competing digital projects
- Out-dated technology and resources
Correct

Incorrect
Question 1144 of 1443

1144. Question
Which of these is NOT an idea behind the Naive Bayes classifier?
- Attributes are assumed to be independent of one another.
- Attributes are assumed to be dependent on each other.
- The results are approximations of the likelihood of the classification.
- The algorithm is easy to implement with multiple classes in the data.
Correct

Incorrect
Question 1145 of 1443

1145. Question
What is true of the boosting approach?
Please select all that apply
- Bootstrapping converts weak learners into strong learners.
- Boosting doesn't involve bootstrap sampling.
- Boosting does involve bootstrap sampling.
- Multiple models are built simultaneously and then whittled down.
Correct

Incorrect
Question 1146 of 1443

1146. Question
What is a Term Document Matrix?
- It is a matrix that shows the frequency of words in a corpus.
- It is a matrix that shows the syntax of words in a corpus.
- It is a matrix that shows the parts of speech of words in a corpus.
- It is a matrix that shows the grammatical usage of words in a corpus.
Correct

Incorrect
Question 1147 of 1443

1147. Question
What is the implication of overfitting?
Please select all that apply
- The model won't generalize well.
- The predictive accuracy of new data in the model will decrease.
- The predictive accuracy of new data in the model will increase.
- The model will outperform previous models on new data.
Correct

Incorrect
Question 1148 of 1443

1148. Question
Which statement below is TRUE?
- Having high school classmates as mutual friends indicates a strong connection.
- Having high-profile celebrities as mutual friends indicates a strong connection.
- Having high-profile politicians as mutual friends indicates a strong connection.
- Making contributions to political campaigns on both sides of the aisle says a lot about community membership.
Correct

Incorrect
Question 1149 of 1443

1149. Question
1. Fill in the blank below.
- Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together are probably more similar than data points that are farther apart.
Correct

Incorrect
Question 1150 of 1443

1150. Question
1. How do you calculate R squared?
- Subtract the ratio of the randomness to the total variance from the number 1.
- Divide the ratio of the randomness to the total variance from the number 1.
- Add the ratio of the randomness to the total variance from the number 1.
- Multiply the ratio of the randomness to the total variance from the number 1.
Correct

Incorrect
Question 1151 of 1443

1151. Question
1. What would be the output of the following code for vector 'v':
```
 v[2:7] 
```
- The second term through the seventh term
- The data in the second row and seventh column
- The second term and the seventh term
- The terms with numbers ‘2’ and ‘7'
Correct

Incorrect
Question 1152 of 1443

1152. Question
2. Match each plot to the layout
Sort elements
- Stacked
- Expanded
- Streamed
Correct

Incorrect
Question 1153 of 1443

1153. Question
1. What is an important step so that R can read numbers as categories?
- Use the as.factor() function on the data.
- Use the as.character() function on the data.
- Use the as.numerical() function on the data.
- Use the as.category() function on the data.
Correct

Incorrect
Question 1154 of 1443

1154. Question
1. The strength of a relationship or ties can vary. Match the description to either weak or strong ties.
Sort elements
- Strong ties
- Weak ties
- Weak ties
- Strong ties
- People we really trust and rely on
- Help a company learn and expand its reach
- People we're connected to with different perspectives
- Help a company through difficult times and gain a reputation
Correct

Incorrect
Question 1155 of 1443

1155. Question
2. Fill in the blank below:
- Data is not a burdensome workload; rather, it is a that companies can tap to discover new insights and drive their business forward.
Correct

Incorrect
Question 1156 of 1443

1156. Question
1. With network analysis, which of the following is the least common metric to be determined?
- The probability of a new connection
- The number of participants in a network
- The number of connections between participants
- The strength of a connection
Correct

Incorrect
Question 1157 of 1443

1157. Question
1. Match the measures to their descriptions.
Sort elements
- Measures the number of edges that each node has. Useful if you are thinking about who to target first in a marketing campaign.
- Measures the average of the shortest path lengths from the node to every other node in the network. Useful to understand a viral marketing campaign or when selecting the best shipping routes.
- Measures the percentage of shortest paths in a network that include a given node. Useful to assess the extent to which someone is a prominent connector in a network.
- Degree centrality
- Closeness centrality
- Betweenness centrality
Correct

Incorrect
Question 1158 of 1443

1158. Question
2. Fill in the blank below.
- A good explanatory model will have residuals whose variance does not depend on the (predictor) variables.
Correct

Incorrect
Question 1159 of 1443

1159. Question
1. Match the functions to the actions they perform in R.
Sort elements
- applies a function to a list and outputs a list.
- searches for an object and returns its contents.
- executes a function onto a list of arguments.
- saves your work so you don't lose the data!
- lapply()
- get()
- do.call()
- write.csv()
Correct

Incorrect
Question 1160 of 1443

1160. Question
1. Match the types of trust to their definitions.
Sort elements
- Rational decision about whether to trust someone based on the potential costs and benefits of the decisions involved
- Trust developed over a long relationship
- Trust based on likeness in preferences, opinions
- Trust based on the guarantees promised by an institutions, such as defined benefit pension plans or a government's promise to protect its people
- Calculation-based trust
- Personal-based trust
- Similarity-based trust
- Institution-based trust
Correct

Incorrect
Question 1161 of 1443

1161. Question
1. Please fill in the blank below:
- An address has 4 numbers between 0 and 255 separate by periods that is assigned to an individual accessing a website. The site can often identify which specific computer is being used.
Correct

Incorrect

Question 1162 of 1443

1162. Question

2. Which piece of code will change the heatmap colors so they have a low of green and a high of red?

scale_fill_gradient(low = "green", high = "red")

scale_fill_gradient(low = "red", high = "green")

scale_fill_brewer(low = "green", high = "red")

scale_fill_discrete(low = "green"  high = "red")

Correct

Incorrect

Question 1163 of 1443

1163. Question
2. Put the steps in order for how the UI script communicates with the server script
- The server computes some new output
- The UI passes information to the server
- The UI displays the information for the user
- The user interacts with the UI
- The server sends the output to the UI
Correct

Incorrect
Question 1164 of 1443

1164. Question
2. Which function runs thirty different tests and aggregates the results from each one?
- NbClust
- The elbow method
- k-means clustering
- Total variance
Correct

Incorrect
Question 1165 of 1443

1165. Question
3. Match each person in the network with their strength.
Sort elements
- Greatest ability to spread a message
- Can reach the most people in the shortest amount of time
- Is a great connector
- A person with a lot of connections
- A person whose average path to all others is shortest
- A person with the most shortest paths
Correct

Incorrect
Question 1166 of 1443

1166. Question
2. Match the function to the question.
Sort elements
- How do you optimize the design of your products?
- Where can your organization get the resources it needs to create its good or service?
- How can you ensure the quality of the product and forecast demand?
- How can you improve routes and identify the most efficient yet low cost strategy for the number of delivery vehicles and staff?
- Which people are you targeting with your advertising?
- Research and Development
- Procurement
- Production
- Shipment
- Sales
Correct

Incorrect
Question 1167 of 1443

1167. Question
3. What are some limitations of the k-means analysis on cheese customers?
Select all that apply
- It does not differentiate between 2 customers who bought the same product and 2 customers who did not buy the same product.
- It does not identify the underlying reason for the customer purchase.
- It does not identify similar groups of customers based on purchases.
- It does not find products that are bought together.
Correct

Incorrect
Question 1168 of 1443

1168. Question
2. What can we assume when there were no more spikes in tweets of sympathy after the Newtown shooting occurred?
- People stopped mourning the Newtown tragedy.
- People on Twitter stopped tweeting their support about Newtown.
- Twitter stopped highlighting the Newtown tragedy.
- There was another shooting that overtook the Newtown shooting.
Correct

Incorrect
Question 1169 of 1443

1169. Question
2. Using the the for loop function, or for(), is useful because
Select all that apply
- It reduces manual labor.
- It speeds up what you are trying to do.
- It helps you understand the data.
- It gives you more information about a function.
Correct

Incorrect
Question 1170 of 1443

1170. Question
2. Fill in the blank below.
- is on average, how widely actual data is dispersed around (the predicted values, the mean, etc.).
Correct

Incorrect
Question 1171 of 1443

1171. Question
2. What are the two ways you can use categorical variables in regression models?
Select all that apply
- If the categories have a natural sequence, they can be assigned unique values.
- If you can’t order categorical variables you can treat each category as it’s own, separate variable.
- If the categories have no order, they can be assigned unique values.
- If the categorical variables can't be ordered you can group them together as one variable.
Correct

Incorrect
Question 1172 of 1443

1172. Question
3. What is TRUE about measuring for seasonality?
Select all that apply
- Seasonality can hide in many ways that are not immediately obvious.
- If you don't measure for seasonality, you may miss what is actually happening in the data.
- Measuring for seasonality is not that important in data analysis.
- Seasonality always shows up clearly in the data.
Correct

Incorrect
Question 1173 of 1443

1173. Question
3. Match the measures of centrality to the correct description.
Sort elements
- % of shortest paths in a network that include a given node
- a count of the number of ties directed to the node
- the number of ties that the node directs to others
- Betweenness Centrality
- In-degree Centrality
- Out-degree Centrality
Correct

Incorrect
Question 1174 of 1443

1174. Question
2. Fill in the blanks below.
- par() stands for and mar() stands for .
Correct

Incorrect
Question 1175 of 1443

1175. Question
2. Which function creates a vector of sequential colors in a hexadecimal format?
- The rainbow() function
- The palette() function
- The hexadec() function
- The iris() function
Correct

Incorrect
Question 1176 of 1443

1176. Question
2. Fill in the blank below.
- To make sure your images are reproducible, you can use the function.
Correct

Incorrect
Question 1177 of 1443

1177. Question
4. Fill in the blank below.
- While it may look like a scatter plot, a maps a third variable to the size of its points.
Correct

Incorrect
Question 1178 of 1443

1178. Question
3. Fill in the blank below
- You can create a by wrapping code in curly braces.
Correct

Incorrect
Question 1179 of 1443

1179. Question
3. Why is SelectorGadget useful for web scraping?
- It shows us the HTML tags for the areas of the website that we specify
- It identifies which HTML tags are most important for the information we want
- It highlights text green, yellow and red
- It formats the HTML code into R code
Correct

Incorrect

Question 1180 of 1443

1180. Question

5. Match the terms to the descriptions.

Sort elements

Betweenss
Withinss
Totss

Sum of all the squared distances between data points in different clusters

Sum of all the squared distances between points within the same cluster

Total sum of squares

Correct

Incorrect

Question 1181 of 1443

1181. Question
5. It's very important that you don't mistake randomness for...
- patterns.
- analysis.
- conclusive results.
- clustering.
Correct

Incorrect
Question 1182 of 1443

1182. Question
5. Using this network visualization, order the relationships based upon the strength of their connection. Put the strongest connection on top.
- The relationship between C and D
- The relationship between C and B
- The relationship between A and B
- The relationship between A and C
Correct

Incorrect
Question 1183 of 1443

1183. Question
4. Which statement about APIs is true?
- Only R can be used to access APIs
- You can't point and click on an API to access data
- APIs are private and not accessible
Correct

Incorrect
Question 1184 of 1443

1184. Question
3. Which of the following are epistemological challenges?
Select all that apply
- Determining causality from correlation
- Data collection processes
- Identifying sample biases
- Developing proprietary algorithms
Correct

Incorrect
Question 1185 of 1443

1185. Question
5. How else can you apply clustering?
Select all that apply
- Grouping products based on similar attributes.
- Identifying new groups of voters based on past histories.
- Predicting how someone will vote based on pre-existing groups.
- Using credit histories to send targeted offers to groups of customers.
Correct

Incorrect
Question 1186 of 1443

1186. Question
4. Your target demographic is educated, single women who are 25-40 years old. If you use text mining to analyze their comments on your website, what might you find?
Select all that apply
- The most common words they use
- Key topics of discussion
- An estimate of their income level
- The best time of day to update your website
Correct

Incorrect
Question 1187 of 1443

1187. Question
4. Match the questions with the correct step in the Data Science Control Cycle.
Sort elements
- What is the problem we need to solve?
- What data do you need for your analysis and how can you get it?
- Which method(s) is appropriate to use?
- Do the model and assumptions work as expected?
- How does the model generalize to real world data?
- How can we use the conclusions in the real world?
- Step 1: Ask
- Step 2: Research
- Step 3: Model
- Step 4: Validate
- Step 5: Test
- Step 6: Interpret
Correct

Incorrect
Question 1188 of 1443

1188. Question
4. Which code below shows the correct way of using a function you have created?
Select all that apply
- my_function(5, 6)
- my_function(first, second)
- my_function(input_2 = second, input_1 = first)
- my_function
Correct

Incorrect
Question 1189 of 1443

1189. Question
4. Match the function to the visualization it helps to create.
Sort elements
- ggpairs()
- ggplot()
- corrplot()
- boxplot()
- cooks.distance()
Correct

Incorrect
Question 1190 of 1443

1190. Question
4. What are two conclusions we can draw from the visualization below?
Select all that apply
- Seasonality is occurring at regular intervals.
- The pattern in the errors show that we're missing some effects.
- Seasonality is occurring at irregular intervals.
- The pattern in the errors show that we are not missing anything in our model.
Correct

Incorrect
Question 1191 of 1443

1191. Question
4. Match the network analysis application to the correct part of the data control cycle.
Sort elements
- What does an economic system rely on?
- How do you optimize your supplier network?
- Determine concentration of risk in the production process.
- Optimize distribution of goods in your warehouses.
- Who has a reliable customer base? How can you extend it?
- R&D
- Buy
- Make
- Ship
- Sell
Correct

Incorrect
Question 1192 of 1443

1192. Question
4. Match the parts of the code below to their purposes.

matrix(c(1, 1, 2, 2,
1, 1, 3, 3),
nrow = 2,
ncol = 4,
byrow = TRUE)
Sort elements
- creates a matrix from the given set of values
- encloses the values of the matrix
- sets the number of rows
- sets the number of the columns
- tells R to order the data by rows instead of by columns
- m()
- c()
- nrow =
- ncol =
- byrow =
Correct

Incorrect
Question 1193 of 1443

1193. Question
4. What negative effect does the image below illustrate?
- The chain effect
- Cluster breakage
- Misappropriated clustering
- Cluster incongruity
Correct

Incorrect
Question 1194 of 1443

1194. Question
4. How do you know that this is an undirected graph?
- There are no arrows to denote the flow of information.
- The edge widths are not dependent on the number of tweets.
- The edge widths are dependent on the number of tweets.
- The size of the nodes depends on the number of followers.
Correct

Incorrect
Question 1195 of 1443

1195. Question
1. Put the six data science control cycle steps in order, starting with “Ask”.
- Validate
- Ask
- Test
- Interpret
- Research
- Model
Correct

Incorrect
Question 1196 of 1443

1196. Question
5. Which of these are examples of clustering?
Select all that apply
- Identifying customer shopping patterns based on previous behavior.
- Identifying voting patterns in a population.
- Identifying how a voter will vote.
- Identifying the seasonal effects in time-series data.
Correct

Incorrect
Question 1197 of 1443

1197. Question
5. Why would you use a confusion matrix?
- To identify the number of correct classifications and incorrect classifications so we can assess the accuracy of the classification algorithm.
- To determine which categories should be used in order to increase the accuracy of the classification algorithm.
- To interpret the results from the training and validation data sets.
- To adjust the classification model to increase accuracy.
Correct

Incorrect
Question 1198 of 1443

1198. Question
5. What is the first relationship we are going to look at using the Capital Bikeshare data?
- Relationship between the number of bikers and temperature.
- Relationship between the number of bikers and neighborhood.
- Relationship between the length of ride and temperature.
- Relationship between the damage of bikes and neighborhood.
Correct

Incorrect
Question 1199 of 1443

1199. Question
5. Which function helps you to remove duplicate data from your data set?
- duplicated()
- duplicate()
- d()
- duplication()
Correct

Incorrect
Question 1200 of 1443

1200. Question
5. Why is it helpful to sort the data in decreasing order by the weighted Jaccard similarity?
Select all that apply
- Makes it easier for you to see which nodes are most similar.
- Allows you to see who is most likely to donate to the same or similar politicians in the future.
- Allows you to see who receives donations from the same contributors.
- It is not particularly helpful to sort the data this way.
Correct

Incorrect
Question 1201 of 1443

1201. Question
2. What are two ways to import data from your computer into RStudio?
Select all that apply
- Tools > Import Dataset > From Text File
- ```
 variable = read.csv("Name of file") 
```
- Session > Load Workspace...
- ```
 variable = read.data("Name of file") 
```
Correct

Incorrect
Question 1202 of 1443

1202. Question
What are some conclusions from the visualization of Congress?
Select all that apply
- The Democrats are not as tightly clustered as the Republicans.
- There are some members of Congress who don't follow their parties' voting patterns.
- Democrats and Republicans vote similarly.
- Democrats are more tightly clustered than Republicans.
Correct

Incorrect
Question 1203 of 1443

1203. Question
What are the three types of relationships between tables?
Select all that apply
- One-to-Many
- One-to-One
- Many-to-Many
- multi relational
- joined
Correct

Incorrect
Question 1204 of 1443

1204. Question
Fill in the blank below.
- To speed up our work and avoid running calculations on each data point manually, we can use the loop function, which performs a set of operations as many times as you tell it to. The advantage to this loop is that you can run different types of data through the same operations, and it does it automatically.
Correct

Incorrect
Question 1205 of 1443

1205. Question
What are some things you should always check for in your model?
Select all that apply
- Outliers
- Multicollinearity
- Bias in the residuals
- Inliers
Correct

Incorrect
Question 1206 of 1443

1206. Question
Which of these is not a common data problem?
- Consistent data that reconciles to data sources
- Unreliable or unusable data
- Inaccurate interpretation of fields
- Issues with joining fields
Correct

Incorrect
Question 1207 of 1443

1207. Question
Match the table combination types with their definitions:
Sort elements
- brings columns from 2 different tables into a combined table
- appends records from 2 tables into a combined table
- JOIN
- UNION
Correct

Incorrect
Question 1208 of 1443

1208. Question
Fill in the blank below.
- The method plots the percentage of variance explained by clustering for different numbers of clusters, which allows us to see how the variance differs with the number of clusters that you choose. It can usually be visualized with the graph below:
Correct

Incorrect
Question 1209 of 1443

1209. Question
3. Match the output elements shown in the console below to what information they are providing.
Sort elements
- includes the values of the boxplot levels. The five rows include the bottom whisker, the 25th percentile, the 50th percentile, the 75th percentile and the top whisker
- includes the number of values in each variable
- includes something called notches or the median plus and minus roughly one point five times the inter-quartile range
- includes the values of the outliers, which you can also see in the boxplot
- $stats
- $n
- $conf
- $out
Correct

Incorrect
Question 1210 of 1443

1210. Question
What are the two ways you can use categorical variables in regression models?
Select all that apply
- If the categories have a natural sequence, they can be assigned unique values.
- If you can’t order categorical variables you can treat each category as it’s own, separate variable.
- If the categories have no order, they can be assigned unique values.
- If the categorical variables can't be ordered you can group them together as one variable.
Correct

Incorrect
Question 1211 of 1443

1211. Question
Match the Excel features to their function
Sort elements
- Columns
- Rows
- Name Box
- Formula Bar
- Worksheet / tab
- Alphabetically labeled vertical cells
- Numerically labeled horizontal cells
- Tells you what cell you are in
- Shows you the formula for the highlighted cell
- A sheet with individual rows and columns
Correct

Incorrect
Question 1212 of 1443

1212. Question
What did the polling practices of the 1936 U.S. election illustrate?
Select all that apply
- A larger sample size does not mean more accurate results
- Your data should properly represent the population you're sampling
- The more established company is more accurate
- Using a landline for polling will give you the best data
Correct

Incorrect

Question 1213 of 1443

1213. Question

Match the method to the description (note: there are more methods listed than necessary).

Sort elements

Clustering
Network analysis
Text mining
Forecasting
Regression

Measures similarity between data points to group them and identify key similarities that you can use to find trends

Looks at how people, places, and other entities are connected, which can help you determine a sphere of influence and how to propagate your message quickly and effectively

Digests large amounts of text quickly and finds common themes, messages and patterns.

Correct

Incorrect

Question 1214 of 1443

1214. Question
Order these data formats from most to least structured.
- Data with identifiable patterns in its presentation, but is without clear labels and organization
- Data without a pre-defined format, such as sound or video data
- Data with labels and organization, but is not in a table
- Data presented in a table
Correct

Incorrect
Question 1215 of 1443

1215. Question
What are some advantages of using Shiny?
Please select all that apply
- You can publish interactive dashboards to the public.
- It can replace expensive visualization software.
- The visualizations are very customizable.
- You need to know JavaScript to build a Shiny dashboard.
Correct

Incorrect
Question 1216 of 1443

1216. Question
What are some ways that you can fix multicollinearity?
Please select all that apply
- Remove the correlated independent variables.
- Combine highly correlated variables.
- Add more independent variables.
- Transform variables with logarithms or other functions.
Correct

Incorrect
Question 1217 of 1443

1217. Question
What are some of the pros of SVM?
- It is easy to train.
- It performs well on unconventional problems.
- Captures non-linear relationships between data points
- It is immune to noise and overlapping classes
Correct

Incorrect
Question 1218 of 1443

1218. Question
How do you down-weight famous people and boost less famous ones?
- Use the weighted Jaccard Index
- Use the regular Jaccard Index
- Use regression analysis
- Use unweighted logarithms
Correct

Incorrect
Question 1219 of 1443

1219. Question
3. What is supervised machine learning?
- Classifying data based on pre-determined categories
- Analysis done under a superior’s supervision
- Data analysis with non-obvious outputs
- An iterative process that creates an accurate model
Correct

Incorrect
Question 1220 of 1443

1220. Question
What type of data does the grep() function work with?
- Character vectors
- Booleans
- Strings
- Data frames
Correct

Incorrect
Question 1221 of 1443

1221. Question
What is an important step so that R can read numbers as categories?
- Use the as.factor() function on the data.
- Use the as.character() function on the data.
- Use the as.numerical() function on the data.
- Use the as.category() function on the data.
Correct

Incorrect
Question 1222 of 1443

1222. Question
What do you need to run an F-test?
Select all that apply
- The number of coefficients in the model excluding the y-intercept
- The degrees of freedom
- The F-statistic
- The Cook's distance
Correct

Incorrect
Question 1223 of 1443

1223. Question
Which of these is not a strength of kNN?
Select all that apply
- It's easy to explain
- It's computationally inexpensive to run, with fast results in real time
- New data can be added any time to the algorithm
- It's fast for recommendation engines once the metrics have been calculated and stored
Correct

Incorrect
Question 1224 of 1443

1224. Question
Why do we need to validate the model?
- To make sure it works with other data
- To prove that it works with the test data
- To determine what type of data we’re working with
- To answer questions about the data
Correct

Incorrect
Question 1225 of 1443

1225. Question
Why do we use packages for data manipulation instead of just using built-in R functions?
Select all that apply
- These packages are easier to learn
- This allows you more time to think about the data instead of complex programming
- So we can directly chat to the R community
- These packages are updated regularly, while R is not
Correct

Incorrect
Question 1226 of 1443

1226. Question
What is TRUE about this Q-Q Plot?
Select all that apply
- This Q-Q Plot shows that our model has fewer residuals at the tails of the distribution.
- The residuals may not be normally distributed, meaning that we could have achieved results at random.
- This Q-Q Plot shows that our model has more residuals at the tails of the distribution.
- The residuals are normally distributed, meaning that we could not have achieved results at random.
Correct

Incorrect
Question 1227 of 1443

1227. Question
What keys should you press to freeze panes?
- Alt + W + F + F
- Alt + A + T
- Alt + A + M
- Ctrl + E + S
Correct

Incorrect
Question 1228 of 1443

1228. Question
What is the objective of a good model?
- Explain as much variance as possible, while accurately representing the event you want to describe
- Explain as much variance as possible to your own data set
- Represent your data in a way that confirms your initial hypothesis
- Present the outcome that reflects most positively on your research
Correct

Incorrect
Question 1229 of 1443

1229. Question
Why do you need to change the NA values to '#N/A'?
- Excel interprets #N/A values as not available values
- Excel interprets NA values as not available values
- #N/A ensures that the data can't be alphabetically sorted
- Excel interprets #N/A as zeroes
Correct

Incorrect
Question 1230 of 1443

1230. Question
Which programming languages can not be used for data analysis?
- Python
- R
- SQL
- HTML
Correct

Incorrect
Question 1231 of 1443

1231. Question
Which number in this confusion matrix represents the number of false positives?
- 468
- 560
- 1
- 1439
Correct

Incorrect
Question 1232 of 1443

1232. Question
What does it mean if an observation has more weight in a decision tree?
- It means the observation is more prone to misclassification.
- It means the observation is more important in the dataset.
- It means the observation is less likely to be misclassified.
- It means the observation is always the first node in the tree.
Correct

Incorrect
Question 1233 of 1443

1233. Question
When would bike demand be the greatest in Washington DC?
- 8 am on a sunny Wednesday in April.
- 8 am on a rainy Wednesday in April.
- 8 am on a sunny Wednesday in December.
- 8 pm on a sunny Wednesday in April.
Correct

Incorrect
Question 1234 of 1443

1234. Question
Why did NAs appear when we initially read in the data?
- R read the zeroes as missing data.
- That's how R displays zeroes in data.
- The NAs are a category of data.
- R read it in as individuals not buying products.
Correct

Incorrect
Question 1235 of 1443

1235. Question
What is TRUE if more people know each other in a community?
Select all that apply
- The more they dislike each other
- The faster a message spreads
- The more resilient the network
- The easier it is to make product recommendations to "similar" people
Correct

Incorrect
Question 1236 of 1443

1236. Question
1. Google's API (application program interface) allows users to:
- pull information from its data repository to map and analyze data
- view the code to Google's webpage
- access Google's algorithms
- improve their privacy settings on Google
Correct

Incorrect
Question 1237 of 1443

1237. Question
1. How can you apply k-Nearest Neighbors for general functions of an organization?
Sort elements
- Classify products among competing companies
- Evaluate which parts are likely to fail based on usage history of similar parts
- Evaluate job applicants in a quantitative way to eliminate bias
- Quantitatively determine which customers are most similar and should be either approached in the same way or avoided entirely
- Research and development
- Procurement
- Production and manufacturing
- Sales and marketing
Correct

Incorrect
Question 1238 of 1443

1238. Question
1. Fill in the blank below.
- A regression model with several variables is called regression.
Correct

Incorrect
Question 1239 of 1443

1239. Question
1. Which operator pulls rows that contain specified terms you're searching for to create a new dataset with only those rows?
- ```
 %in% 
```
- ```
 in 
```
- ```
 %% 
```
- ```
 <- 
```
Correct

Incorrect
Question 1240 of 1443

1240. Question
1. Fill in the blanks below
- Shiny applications have two basic components - the script and the script.
Correct

Incorrect
Question 1241 of 1443

1241. Question
1. Fill in the blank below.
- The method plots the percentage of variance explained by clustering for different numbers of clusters, which allows us to see how the variance differs with the number of clusters that you choose.
Correct

Incorrect
Question 1242 of 1443

1242. Question
1. Which of the following sources would need to be datafied or converted into numbers, so that you can run analyses and gain greater insight?
- Text in last year's emails
- The viral videos from this week
- Predictions for next quarter's revenue
- Growth rate of someone's Instagram following
Correct

Incorrect
Question 1243 of 1443

1243. Question
1. Fill in the blank below.
- While Big Data is a resource, is the domain that will draw insights from the data.
Correct

Incorrect
Question 1244 of 1443

1244. Question
2. Fill in the blank below.
- You can use to help answer questions, such as "Who do your customers trust?" and "How does information spread within your company?"
Correct

Incorrect
Question 1245 of 1443

1245. Question
1. Fill in the blank below.
- You have to use the function for randomized algorithms to ensure consistency of outputs.
Correct

Incorrect
Question 1246 of 1443

1246. Question
1. What are some important questions you have to ask in order to be comfortable with your model?
Select all that apply
- Does the variance of the residuals change with the predicted value?
- Do the forces affecting the dependent variable change in some parts of the data and should our model reflect that?
- Does it make a difference to identify outliers or bias in the residuals in our model?
- Is there any way to change our model or does it always have to stay the same?
Correct

Incorrect
Question 1247 of 1443

1247. Question
1. Which package has the "join" command that allows you to combine datasets?
- plyr
- networkD3
- tidyr
- scatterplot3D
Correct

Incorrect
Question 1248 of 1443

1248. Question
1. Match the two sections of Congress to their correct descriptions.
Sort elements
- Made up of 435 elected representatives that are proportional to the population of each state. Each representative serves a 2 year term, no limits on re-election.
- Made up of 100 elected senators per state. Each senator serves a 6 year term, no limits on re-election.
- House of Representatives
- Senate
Correct

Incorrect
Question 1249 of 1443

1249. Question
10. After a needlestick injury, how soon should you have your blood drawn?
- First 12 hours
- 10 days
- 1 month
- First 24 hours
Correct

Incorrect
Question 1250 of 1443

1250. Question
3. How can you display the plot once it's assigned to a variable?
Select all that apply
- Use the print function
- Type in the variable and run the line
- Copy and paste the code again
- Use the View() function
Correct

Incorrect
Question 1251 of 1443

1251. Question
3. What are two ways to run a Shiny application?
Select all that apply
- Click the "runApp" button in the script window
- Type 'runApp' in the script or console window
- Click the 'run' button in the script window
- Type run("application") in the script window
Correct

Incorrect
Question 1252 of 1443

1252. Question
3. What is one of the dangers of increasing the number of clusters?
- You could overfit the data so it doesn't generalize well.
- You could increase the complexity of the algorithm beyond a computer's capability.
- You could distort the original data.
- You could discount some data points.
Correct

Incorrect
Question 1253 of 1443

1253. Question
2. There is a new employee starting at the firm. Using the graph of current employees, which employee do you predict the new employee will have the strongest ties with?

The new employee (CTO) majored in Literature and previously worked at Microsoft. They have one child and vacation every year in Jamaica. They will be located at the San Fransisco office.
- Allie
- Bob
- Cara
- Dave
Correct

Incorrect
Question 1254 of 1443

1254. Question
3. Data science will allow you to:
Select all that apply
- Identify new opportunities
- Anticipate events
- Use less resources
- Hire more data analysts
Correct

Incorrect

Question 1255 of 1443

1255. Question

3. Match each question with the best method for answering it.

Sort elements

Clustering
Classification
Clustering
Classification
Clustering
Classification

Based on their shopping history, what commonalities are there among our customers?

Based on a customer's shopping patterns, is it likely that this customer is pregnant?

What do people think about our brand?
Is it likely that this shopper will purchase our product?

When a disease spreads, are there any patterns in its spreading?

With the symptoms exhibited, what diagnosis might a doctor propose?

Correct

Incorrect

Question 1256 of 1443

1256. Question
3. Why might Item Response Theory have trouble analyzing sentiment from tweets?
Select all that apply
- The tweet is purely made of uncommon emojis.
- The tweet was written to be ironic.
- The tweet is in a foreign language.
- The tweet is in reference to another tweet.
Correct

Incorrect
Question 1257 of 1443

1257. Question
2. Which package can we use to plot a network graph and measure centrality?
- igraph
- ggmap
- tidyr
- ggplot2
Correct

Incorrect
Question 1258 of 1443

1258. Question
3. In a non-biased model errors will be random. If errors are not random it means...
Select all that apply
- there is a "bias" in the model.
- you're not taking something into account.
- your model has a normal distribution of errors.
- your model has a skewed distribution of errors.
Correct

Incorrect
Question 1259 of 1443

1259. Question
3. What is a good way to test for multicollinearity?
- Variance-inflation factors
- Cook's Distance
- Box plots
- Q-Q plot
Correct

Incorrect
Question 1260 of 1443

1260. Question
3. What is TRUE about LOESS?
Select all that apply
- LOESS is best to use when the nearest data points are most relevant.
- LOESS is computationally complex.
- You should only use LOESS when you don’t need to easily explain how you did the calculation or built the model.
- LOESS is best to use when the data points are farther away and least relevant.
Correct

Incorrect
Question 1261 of 1443

1261. Question
2. What does directed betweenness help you understand?
- The flow of communication in a network.
- The number of nodes in a network.
- The number of edges in a network.
- The type of network that is being analyzed.
Correct

Incorrect
Question 1262 of 1443

1262. Question
2. Match the networks to their network analysis use cases.
Sort elements
- How can I reach the greatest number of people at the lowest cost?
- How does a disease spread across a population?
- Are money flows suspicious?
- How to identify key influencers in a network to affect voters and donors?
- Marketing
- Healthcare
- Finance
- Politics
Correct

Incorrect
Question 1263 of 1443

1263. Question
3. What is the name of the chart below?
- Adjacency matrix
- Identity matrix
- Network matrix
- Confusion matrix
Correct

Incorrect
Question 1264 of 1443

1264. Question
2. What do you insert the needle through when drawing up medication?
- Rubber Stopper
- Vial
- Rubber cap
- None of the above
Correct

Incorrect
Question 1265 of 1443

1265. Question
5. The aes layer contains:
- The mappings between the data and the graph
- The geom layer
- The titles and the axes labels
- The initial data analysis
Correct

Incorrect
Question 1266 of 1443

1266. Question
4. Why is it useful to create functions?
Select all that apply
- So you can reuse it later with different data
- So you can let other people input their data
- So you can use only the same data set to perform the analysis
- So you don't have to retype the same code multiple times
Correct

Incorrect
Question 1267 of 1443

1267. Question
4. Match the HTML tags to their description (note: there are more answer choices than necessary)
Sort elements
- paragraph
- hyperlink
- table
- line break
- tab
- <p>
- <a>
- <tb>
Correct

Incorrect
Question 1268 of 1443

1268. Question
4. Which function makes sure that the k-means analysis is reproducible?
- set.seed()
- kmeans()
- iterate()
- head()
Correct

Incorrect
Question 1269 of 1443

1269. Question
4. What happened when Telenor started contacting its customers?
- Telenor's customers started defecting to other companies.
- Telenor's customers renewed their contracts with Telenor.
- Telenor's customers upgraded their contracts with Telenor.
- Telenor's customers convinced their friends and family to sign up for Telenor.
Correct

Incorrect
Question 1270 of 1443

1270. Question
4. Which of the following is MOST characteristic of an opinion leader?
- Someone with a blog that is followed by many other people
- Someone who knows the most people
- Someone with the easiest access to the most powerful people
- Someone who follows a lot of blogs
Correct

Incorrect
Question 1271 of 1443

1271. Question
5. Twitter is an example of a company that has an API. Which of the following data could you access via its API?
Select all that apply
- Tweet (text)
- Author of the tweet
- Names of the author's followers
- People who re-tweeted a tweet
Correct

Incorrect
Question 1272 of 1443

1272. Question
5. Fill in the blank.
- To avoid falsely concluding that one event caused another, a possible method to use is testing.
Correct

Incorrect
Question 1273 of 1443

1273. Question
5. Increasing the number of variables in a predictive model may not be beneficial because...
- it could decrease its generalizability
- it could lessen its accuracy
- it would reduce the amount of data available for validation tests
- N/A. It is always beneficial to add variables to your model.
Correct

Incorrect
Question 1274 of 1443

1274. Question
5. Fill in the blank below.
- is the conversion of seemingly immeasurable information into something that we can measure.
Correct

Incorrect
Question 1275 of 1443

1275. Question
3. Which theory posits that people are more interconnected than we may realize?
- Theory of 6 degrees of separation
- Theory of relativity
- Theory of evolution
- The big bang theory
Correct

Incorrect
Question 1276 of 1443

1276. Question
4. What are some use case questions that can be answered by analyzing networks?
Select all that apply
- How vulnerable is your supply chain?
- How do you optimize your shipping strategy?
- What are the different strengths of connections in your network?
- How does a network structure relate to geographic location?
Correct

Incorrect
Question 1277 of 1443

1277. Question
4. Which function below allows you to add a best-fit plane to your 3D plot?
- s3d$plane3d()
- 3dplane3d()
- s3d()
- plane$s3rd()
Correct

Incorrect
Question 1278 of 1443

1278. Question
4. What is the seasonality factor?
- The seasonal pattern for each time period.
- The season that the data was collected in.
- The seasonal weather for each time period.
- The season that causes the most outliers in the data.
Correct

Incorrect
Question 1279 of 1443

1279. Question
5. How can you avoid variance in the Twitter user data?
Select all that apply
- Create another loop to pull data for several thousand accounts.
- Take the last instance of each record we pulled.
- Leave the data exactly as is.
- Take the first instance of each record we pulled.
Correct

Incorrect
Question 1280 of 1443

1280. Question
5. Fill in the blanks below.
- The argument defines the size of the axis markers and the argument determines the size of the axis labels.
Correct

Incorrect
Question 1281 of 1443

1281. Question
5. Match the graphs to the modularity scores.
Sort elements
- Modularity of 0.9
- Modularity of 0.7
- Modularity of 0.55
- Modularity of 0.15
Correct

Incorrect
Question 1282 of 1443

1282. Question
5. Which package should you install to plot an interactive network visualization?
- networkD3
- ggplot
- scatterplot3D
- plyr
Correct

Incorrect
Question 1283 of 1443

1283. Question
5. Which function adds up the data in the previous cells to create a new column or row with cumulative sums?
- ```
 cumsum() 
```
- ```
 ddply() 
```
- ```
 numcolwise() 
```
- ```
 cumulate() 
```
Correct

Incorrect
Question 1284 of 1443

1284. Question
5. What types of data are difficult to cluster?
Select all that apply
- Circular/elliptical data
- Data that are unequally distributed
- Data that don't have similar density
- Data that has an uneven concentration of points in a cluster
Correct

Incorrect
Question 1285 of 1443

1285. Question
5. Which of these situations below can you apply Naïve Bayes to?
Select all that apply
- Spam filters
- Finding new voter groups
- Categorizing reviews
- Likelihood of a product purchase
Correct

Incorrect
Question 1286 of 1443

1286. Question
5. What are two ways you can have increased certainty in your model's accuracy?
Select all that apply
- If you have a 95% confidence interval.
- If you plot your residuals in a histogram and they have a normal distribution.
- If you plot your residuals in a histogram and they have a skewed distribution.
- If you have a <50% confidence interval.
Correct

Incorrect
Question 1287 of 1443

1287. Question
5. Why is creating a visNetwork useful?
Select all that apply
- It allows you to zoom into the network.
- It gives you additional information about the nodes and edges of the network.
- It doesn't allow you to zoom into the network.
- It gives you the visualization in 3D.
Correct

Incorrect
Question 1288 of 1443

1288. Question
5. Which of these are limitations of clustering that you should consider as the data set increases in size?
Select all that apply
- The amount of processing power
- The amount of time
- The number of rows
- The number of variables
Correct

Incorrect
Question 1289 of 1443

1289. Question
3. What are two ways to load data from the Internet into RStudio?
Select all that apply
- Tools > Import Dataset > From Web URL
- ```
 variable = read.csv("Web URL") 
```
- Session > Set working directory > Choose directory
- ```
 variable = load.data("Web URL") 
```
Correct

Incorrect
Question 1290 of 1443

1290. Question
Which function runs thirty different tests and aggregates the results from each one?
- NbClust
- The elbow method
- k-means clustering
- Total variance
Correct

Incorrect

Question 1291 of 1443

1291. Question

Given the table above, what type of SQL statement should be used to perform the following tasks?

Sort elements

DROP
INSERT
UPDATE
CASE
DELETE
CAST or CONVERT

Remove the entire table
Add a row for company ABC for 2005 to the table
Change the value 1,500 to 15,000

Return a conditional value (e.g. large company or small company) based on the number of employees

Remove the row with company ABC
Specify that a query return the employees field as an integer

Correct

Incorrect

Question 1292 of 1443

1292. Question
Match the functions to the actions they perform in R.
Sort elements
- produces the variance of a data set
- creates a data frame
- view the output
- calculates the standard deviation
- var()
- data.frame()
- View()
- sd()
Correct

Incorrect
Question 1293 of 1443

1293. Question
Match the key terms below to their descriptions.
Sort elements
- Check if there is bias in the data or the model
- The probability that the pattern exists through random chance
- Test for multicollinearity and independent variable interaction
- Check the residuals for heteroscedasticity (pattern contingent on fitted values)
- Check for information loss when selecting the right model for your data
- Q-Q plot/ distribution of errors
- p-values
- VIF
- Breusch-Pagan test
- AIC
Correct

Incorrect
Question 1294 of 1443

1294. Question
Which of the following are methods for importing data into SQL server?
- SQL Server Integration Services (SSIS)
- Import/Export Wizard
- Bulk Inserts
- All of the above
Correct

Incorrect
Question 1295 of 1443

1295. Question
Match the function to its purpose.
Sort elements
- searches for text in data
- gives you the length of any vector
- tabulates the number of entries for categorical data
- sorts data by a particular column
- grep()
- length()
- table()
- order()
Correct

Incorrect
Question 1296 of 1443

1296. Question
The aes layer contains:
- The mappings between the data and the graph
- The geom layer
- The titles and the axes labels
- The initial data analysis
Correct

Incorrect
Question 1297 of 1443

1297. Question
5. What are two ways you can have increased certainty in your model's accuracy?
Select all that apply
- If you have a 95% confidence interval.
- If you plot your residuals in a histogram and they have a normal distribution.
- If you plot your residuals in a histogram and they have a skewed distribution.
- If you have a <50% confidence interval.
Correct

Incorrect
Question 1298 of 1443

1298. Question
Which approach allows for the inclusion of categorical variables with multiple levels in regression models?
- Dummy coding
- Masking variables
- Variable coding
- Categorical coding
Correct

Incorrect
Question 1299 of 1443

1299. Question
Please fill in the blank below.
- In order to freeze a reference, you can use the .
Correct

Incorrect
Question 1300 of 1443

1300. Question
Match the terms to their definitions.
Sort elements
- Measure of how dispersed the data is
- Standardized measure of how dispersed the data is
- Measure of linear relationship between variables (positive/negative)
- Measure of strength of linear relationship between variables (positive/negative)
- How a change in variable x will affect variable y
- The probability that the pattern exists through random chance, in the absence of a relationship between variables
- Variance
- Standard deviation
- Covariance
- Correlation
- Slope
- p-values
Correct

Incorrect
Question 1301 of 1443

1301. Question
Fill in the blank below.
- Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together physically in a graph are probably more similar than data points that are farther apart.
Correct

Incorrect
Question 1302 of 1443

1302. Question
When using data you may encounter any of the challenges listed below. Match each challenge with its description.
Sort elements
- Having the right staff with the right skills
- Getting the right data, the right sample size, and statistical significance
- Using data that may not have been collected with your intended use in mind
- Putting all the pieces together to extract meaningful insights from your data and use them in a responsible way
- Practical challenge
- Epistemological challenge
- Ethical challenge
- Grand challenge
Correct

Incorrect
Question 1303 of 1443

1303. Question
How do you stop the Shiny application from running?
- Click the small "Stop Sign" at the corner of the console window.
- Close the browser window of the application.
- Type "stopApp("application")" into the window or console.
- Click on the "Source" button in the script window.
Correct

Incorrect
Question 1304 of 1443

1304. Question
What are some ways to standardize different scales?
- Subtract the mean from each observation type.
- Divide each observation by the standard deviation of the data.
- Take the logarithm of the data.
- Take the absolute values of the data.
Correct

Incorrect
Question 1305 of 1443

1305. Question
In the SVM sample, why do we only compare the x coordinates of each data point against the lines?
- Because the lines are vertical
- Because the lines are horizontal
- Because the points are one-dimensional
- Because the points are multi-dimensional
Correct

Incorrect
Question 1306 of 1443

1306. Question
What can identification of communities help uncover?
Select all that apply
- Political factions
- Genetic families
- Cyber-communities in social networks
- Terrorist groups
Correct

Incorrect
Question 1307 of 1443

1307. Question
2. Why do we need to validate the model?
- To make sure it works with other data
- To prove that it works with the test data
- To determine what type of data we’re working with
- To manipulate the data
Correct

Incorrect
Question 1308 of 1443

1308. Question
Which function eliminates duplicate rows?
- ```
 unique() 
```
- ```
 order() 
```
- ```
 duplicate() 
```
- ```
 grep() 
```
Correct

Incorrect
Question 1309 of 1443

1309. Question
Why is clustering more powerful than visualizing?
Select all that apply
- Clustering mathematically defines similarity between all the data points, even the ones on the periphery.
- Clustering can work with many more dimensions than we can visualize.
- Clustering is easier than visualizing data.
- Clustering can group data into pre-defined groups.
Correct

Incorrect
Question 1310 of 1443

1310. Question
What does it mean if you have a small p-value?
- It means that there is a very low probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are connected.
- It means that there is a very high probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are not connected.
- It means that there is a very low probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are not connected.
- It means that there is a very high probability that our results occurred without a relationship between the variables, so most likely, the variables we’re testing are connected.
Correct

Incorrect
Question 1311 of 1443

1311. Question
Which of these functions are the "bare bones" of ggplot2?
Select all that apply
- ```
 ggplot() 
```
- ```
 aes() 
```
- ```
 geom() 
```
- ```
 fill() 
```
Correct

Incorrect
Question 1312 of 1443

1312. Question
What is supervised machine learning?
- Classifying data based on pre-determined categories
- Analysis done under a superior’s supervision
- Data analysis with non-obvious outputs
- An iterative process that creates an accurate model
Correct

Incorrect
Question 1313 of 1443

1313. Question
What does it mean when there is a high positive correlation between two attributes?
- It shows a strong relationship where as one attribute increases, the other attribute increases as well.
- It shows a strong relationship where as one attribute increases, the other attribute decreases.
- It shows a strong relationship where one attribute's increase causes the other attribute's increase.
- It shows a strong relationship where one attribute's increase causes the other attribute's decrease.
Correct

Incorrect
Question 1314 of 1443

1314. Question
When running a variable selection model, how does the computer know when it has found the right variables?
- The Akaike Information Criterion
- Cook's Distance
- Adjusted R Square
- The Breusch-Pagan Test
Correct

Incorrect
Question 1315 of 1443

1315. Question
Why do you need to change the NA values to '#N/A'?
- Excel interprets #N/A values as not available values
- Excel interprets NA values as not available values
- #N/A ensures that the data can't be alphabetically sorted
- Excel interprets #N/A as zeroes
Correct

Incorrect
Question 1316 of 1443

1316. Question
Fill in the blank below.
- You can create a plot, as seen below, to check if the residuals in your model are normally distributed.
Correct

Incorrect
Question 1317 of 1443

1317. Question
Which of these data types can you select in the 'validation criteria'?
Please select all that apply
- Whole number
- Decimal
- Date
- Text length
Correct

Incorrect
Question 1318 of 1443

1318. Question
What questions should you ask about your data?
Select all that apply
- Are the data points a representative sample of the population you are investigating?
- Are there missing values or duplicate values in the data?
- What is the original source of the data?
- How much of a data sample do you need to create a valid model?
Correct

Incorrect
Question 1319 of 1443

1319. Question
What happens when the ROC curve is closer to a right angle?
- The model is more accurate under different thresholds.
- The model is less accurate under different thresholds.
- The AUC decreases.
- The ROC curve transforms into an oval.
Correct

Incorrect
Question 1320 of 1443

1320. Question
Which feature was the most important feature identified by the boosted tree algorithm?
- Folic acid
- Prenatal vitamins
- Implied gender
- Ginger ale
Correct

Incorrect
Question 1321 of 1443

1321. Question
Which of these questions may be most impacted by seasons or cycles?
Select all that apply
- How much revenue will the new summer blockbuster bring in over Memorial Day weekend?
- How many school supplies should we stock during the week before local schools start?
- On average, how much will people spend at the grocery store this week?
- How long are new employees willing to commute to work?
Correct

Incorrect
Question 1322 of 1443

1322. Question
Which function implements k-means clustering with cosine distance?
- skmeans()
- kmeans()
- cosine()
- spherical.kmeans()
Correct

Incorrect
Question 1323 of 1443

1323. Question
What type of clean text data do you need for annotation and parts of speech tagging?
- Clean, but not stemmed and containing punctuation
- Clean, not stemmed, and no punctuation
- Clean, stemmed, with no punctuation
- Clean, stemmed, with no vowels
Correct

Incorrect
Question 1324 of 1443

1324. Question
1. Conditional statements are useful when:
- you want an outcome to occur based on a test or statement that you are evaluating
- you aren’t sure which function you want to use based on the data
- you want to pull a subset from the data
- you want to create a new function
Correct

Incorrect
Question 1325 of 1443

1325. Question
1. If an algorithm has 25% probability of correct classification and then is increased to 50%, what is the percent increase in accuracy?
- 100% more accurate
- 25% more accurate
- No increase in accuracy
- 500% more accurate
Correct

Incorrect
Question 1326 of 1443

1326. Question
1. Fill in the blank below.
- seasonality is a seasonal pattern that is repeated plus a certain value.
Correct

Incorrect
Question 1327 of 1443

1327. Question
1. Fill in the blank below.
- The 'gg' in ggplot2 stands for .
Correct

Incorrect
Question 1328 of 1443

1328. Question
1. Fill in the blank below
- The two variables, and , are how the UI and server script communicate between each other.
Correct

Incorrect
Question 1329 of 1443

1329. Question
2. Fill in the blank below.
- is a measure of the extent to which an increase in one variable corresponds to the increase in another variable.
Correct

Incorrect
Question 1330 of 1443

1330. Question
2. Fill in the blank.
- Through the Security and Exchange Commission was able to identify what Enron leadership was communicating about via email.
Correct

Incorrect
Question 1331 of 1443

1331. Question
1. Order these data formats from most to least structured.
- Data without a pre-defined format, such as sound or video data
- Data with labels and organization, but is not in a table
- Data presented in a table
- Data with identifiable patterns in its presentation, but is without clear labels and organization
Correct

Incorrect
Question 1332 of 1443

1332. Question
1. The strength of a relationship or ties can vary. Match the description to either weak or strong ties.
Sort elements
- Strong ties
- Weak ties
- Weak ties
- Strong ties
- People we really trust and rely on
- Help a company learn and expand its reach
- People we're connected to with different perspectives
- Help a company through difficult times and gain a reputation
Correct

Incorrect
Question 1333 of 1443

1333. Question
1. Fill in the blank below.
- The number of shortest paths going through a given node is called centrality.
Correct

Incorrect
Question 1334 of 1443

1334. Question
1. When running a variable selection model, how does the computer know when it has found the right variables?
- The Akaike Information Criterion
- Cook's Distance
- Adjusted R Square
- The Breusch-Pagan Test
Correct

Incorrect
Question 1335 of 1443

1335. Question
2. In a communication network, what does it mean when someone "listens" or "follows" others but has no followers or outgoing communication?
- Could be a bot
- Could be a spy
- Could be a fraudulent account
- Could be a boogeyman
Correct

Incorrect
Question 1336 of 1443

1336. Question
1. Which function can you use to replace NAs from your data set?
- gsub()
- replace()
- sub()
- g()
Correct

Incorrect
Question 1337 of 1443

1337. Question
1. Fill in the blank below.
- The 'gg' in ggplot2 stands for .
Correct

Incorrect
Question 1338 of 1443

1338. Question
3. Which function creates a new layer with statistical smoothing of the plot?
- ```
 stat_smooth() 
```
- ```
 geom_smooth() 
```
- ```
 stat_line() 
```
- ```
 geom_line() 
```
Correct

Incorrect
Question 1339 of 1443

1339. Question
2. Why do we change the scale of the axes to a logarithmic scale?
- So as to make the data look more organized
- So as to create new data points
- So as to eliminate negative values from the data set
- So as to ensure that the data is accurate
Correct

Incorrect
Question 1340 of 1443

1340. Question
3. What does it mean when there is a high positive correlation between two attributes?
- It shows a strong relationship where as one attribute increases, the other attribute increases as well.
- It shows a strong relationship where as one attribute increases, the other attribute decreases.
- It shows a strong relationship where one attribute's increase causes the other attribute's increase.
- It shows a strong relationship where one attribute's increase causes the other attribute's decrease.
Correct

Incorrect
Question 1341 of 1443

1341. Question
3. What could result from people in your network having more ties?
- The network becomes more resilient
- A message could spread faster
- It becomes easier to make recommendations
- The brand becomes diluted
Correct

Incorrect
Question 1342 of 1443

1342. Question
3. When a volcano erupted in Europe in 2010, where did IBM foresee a bottleneck in shipping?
- Hong Kong
- Iceland
- Seattle
- London
Correct

Incorrect
Question 1343 of 1443

1343. Question
3. Ensemble learning, a concept in machine learning, happens when a group of learners are used together to arrive at a more accurate decision. Based on this concept, which Yelp! review would you consider when deciding whether or not to dine at a restaurant?
- The most recent review of a restaurant.
- The first review of a restaurant.
- A group of reviews of a restaurant.
- None of the reviews because they are not unbiased.
Correct

Incorrect

Question 1344 of 1443

1344. Question

3. Sets of variables are provided. Identify whether the set is most likely an example of correlation or causation.

Sort elements

Causation
Correlation
Correlation
Correlation
Causation
Causation

Number of school hours missed and student achievement

Number of fire trucks dispatched and amount of property damage

Amount of money spent each week on vegetables and changes in weight

Number of purchases of rain boots and percent of delayed flights

Correct

Incorrect

Question 1345 of 1443

1345. Question
3. Put the steps in order that we took to visualize flights.
- Graph it with the igraph package, both by frequency and just by route.
- Tally up the number of flight paths.
- Create unique IDs for flight paths between cities.
Correct

Incorrect
Question 1346 of 1443

1346. Question
3. How can you determine if the histogram of your residuals is really a normal, unbiased distribution?
- Run a quantile-quantile (QQ) plot analysis
- Run a standard deviation (SD) analysis
- Run a confidence interval (CI) analysis
- Run a linear regression line (LRL) analysis
Correct

Incorrect
Question 1347 of 1443

1347. Question
2. What are some things you should always check for in your model?
Select all that apply
- Outliers
- Multicollinearity
- Bias in the residuals
- Inliers
Correct

Incorrect
Question 1348 of 1443

1348. Question
2. Fill in the blank below.
- The objective of a model is not to perfectly fit data you already have, it's to have the highest predictive rate on new data.
Correct

Incorrect
Question 1349 of 1443

1349. Question
3. Which connectors spread the most information to the rest of a network?
- Connectors with high betweenness and high out-degree centrality
- Connectors with low betweenness and low out-degree centrality
- Connectors with high betweenness and low out-degree centrality
- Connectors with low betweenness and high out-degree centrality
Correct

Incorrect
Question 1350 of 1443

1350. Question
3. Put the functions of a business in the correct order.
- R&D - Optimize design and decrease time
- Sell - Target clients and suggest products
- Buy - Manage inventory
- Make - Ensure quality and gauge demand
- Ship - Optimize route
Correct

Incorrect
Question 1351 of 1443

1351. Question
3. Match the variables to their definition, where C represents a community.
Sort elements
- sum of the weights of the links inside community C
- sum of the weights of the links to nodes in community C
- sum of the weights of the links to the given node v
- sum of the weights of the links from the given node v to the community C
- sum of the weights of all the links in the network
- Li
- Lo
- Lv
- LC
- Ls
Correct

Incorrect
Question 1352 of 1443

1352. Question
3. Please fill in the blank below:
- After injecting the patient, remove the from the syringe.
Correct

Incorrect
Question 1353 of 1443

1353. Question
4. Fill in the blank below.
- A big strength of ggplot is the ability to graphs by adding layers and adjusting the data so it doesn't look generic.
Correct

Incorrect
Question 1354 of 1443

1354. Question
5. Good coding habits include:
Select all that apply
- commenting out your code
- separating out your code
- wrapping code for re-use
- putting multiple functions on one line
Correct

Incorrect
Question 1355 of 1443

1355. Question
4. Which type of data would the Sankey layout visualize best?
- Clickstream data
- Nested data
- Purchase histories
- Time series data
Correct

Incorrect
Question 1356 of 1443

1356. Question
4. What are some conclusions from the visualization of Congress?
- The Democrats are not as tightly clustered as the Republicans.
- There are some members of Congress who don't follow their parties' voting patterns.
- Democrats and Republicans vote similarly.
- Democrats are more tightly clustered than Republicans.
Correct

Incorrect
Question 1357 of 1443

1357. Question
Fill in the blank below.
- 3. While Big Data is a resource, is the analysis that will draw insights from the data.
Correct

Incorrect
Question 1358 of 1443

1358. Question
5. Google's search function is based on a scoring algorithm called PageRank. PageRank determines a website's importance by the number of important pages linked to it. Use the website's links to rank the websites according to their importance. Put the most important website at top.
- A website with Scientific America, US Weekly, and MTV linked to it
- A website with Pete's Coffee, Corner Bakery, and Dutch Brother's Coffee linked to it
- A website with the New York Times, Facebook, and Apple linked to it
- A website with your Etsy and Instagram accounts linked to it
Correct

Incorrect
Question 1359 of 1443

1359. Question
4. Order the steps a data scientist takes when working on a project. Put the first step on top.
- Communicate the results via visualizations, presentations, and products
- Wrangle the data (gather, clean, and sample)
- Confirm cause and effect relationships
- Understand the business problem
- Explore data to identify trends
- Make predictions about events and behaviors
Correct

Incorrect
Question 1360 of 1443

1360. Question
4. Analysis techniques need to provide a measurable benefit greater than the cost of data storage and management.
- True
- False
Correct

Incorrect
Question 1361 of 1443

1361. Question
4. Fill in the blank below.
- Some classification algorithms can go beyond determining whether someone will buy your product or not, they can quantify it by telling you the that someone will buy your product.
Correct

Incorrect
Question 1362 of 1443

1362. Question
4. What methods can you use for datafication?
Select all that apply
- Factor Analysis
- Item Response Theory
- Latent Dirichlet Allocation
- Non-negative matrix factorization
Correct

Incorrect
Question 1363 of 1443

1363. Question
4. What do networks represent?
Select all that apply
- Organizational relationships
- Communication patterns
- Economic relationships
- Connections based on interests, preferences and similarities
Correct

Incorrect
Question 1364 of 1443

1364. Question
5. What are some caveats to keep in mind when working with network metrics?
Select all that apply
- Direction of edges in a communication network contains extremely important information about how people interact and relate to one another.
- The metrics you learned to compute in R only calculate correctly when there are no disconnected sections of a network. If there are breaks or isolated communities in your network, then your results will be misleading.
- Network metrics for disconnected networks need to be calculated separately for each group!
- You have learned a lot in this course, but there is much more to learn about network analysis - check out Data Society to find more courses!
Correct

Incorrect
Question 1365 of 1443

1365. Question
4. What are some ways to fix multicollinearity?
Select all that apply
- Remove factors that are highly correlated
- Combine highly correlated variables together into a single variable
- Keep factors that are highly correlated
- Separate highly correlated variables
Correct

Incorrect
Question 1366 of 1443

1366. Question
4. Put the equations in order that you need to use for calculating multiplicative seasonality forecasts
- level1 = alpha * (forecast1 / seasonality-4) + (1 – alpha ) * (level0 + trend0)
- seasonality1 = gamma * (forecast1 / level1) + (1 - gamma) * seasonality-4
- forecast1 = (level0 + trend0) * seasonality-4
- trend1 = beta * (level1 – level0) + (1 - beta) * trend0
Correct

Incorrect
Question 1367 of 1443

1367. Question
4. Why is calculating betweenness centrality important?
Select all that apply
- Knowing betweenness centrality allows you to quantitatively assess the degree to which someone is a prominent connector.
- Knowing betweenness centrality can help you to to provide more targeted content and grow your audience more effectively.
- Knowing betweenness centrality is not very important in network analysis.
- Knowing betweenness centrality allows you to quantitatively assess the degree to which someone is being followed in a directed network.
Correct

Incorrect
Question 1368 of 1443

1368. Question
3. Why is it important to set the x-axis and y-axis limits?
- So that the axes don't change with every iteration.
- So that the axes change with every iteration.
- So that you can color the axes different colors.
- So that you can remove the axes.
Correct

Incorrect
Question 1369 of 1443

1369. Question
4. Which of the statements below are true for calculating modularity?
Select all that apply
- The calculation only contains nodes in the same community.
- A lower modularity indicates a higher likelihood of random communities.
- A more interconnected network leads to a higher modularity score.
- An increased number of nodes leads to an increased number of communities.
Correct

Incorrect
Question 1370 of 1443

1370. Question
4. Fill in the blank below.
- While it may look like a scatter plot, a maps a third variable to the size of its points.
Correct

Incorrect
Question 1371 of 1443

1371. Question
5. What types of data are mapped in the aes() function?
Select all that apply
- The data for the x and y axes
- Any aesthetics mapped to the data
- Any aesthetics mapped to a set value
- The data for the geom layer
Correct

Incorrect
Question 1372 of 1443

1372. Question
5. Which of these statements is true?
- You have to use set.seed() with sk-means.
- sk-means() calculates both Euclidean and cosine distances.
- Cosine distances are based on the angle between a point and the other points in the cluster.
- Spherical k-means uses Euclidean distances to measure similarity.
Correct

Incorrect
Question 1373 of 1443

1373. Question
5. Which of these questions should you ask for classification?
Select all that apply
- Are the categories statistically balanced?
- Does your data satisfy the underlying assumptions of the classification algorithm?
- Do you have a team that is experienced in supervised learning?
- Do you have data that needs to be explored for patterns?
Correct

Incorrect
Question 1374 of 1443

1374. Question
5. Which picture below displays the standard error of a best fit line?
Correct

Incorrect
Question 1375 of 1443

1375. Question
5. What is important to keep in mind when working with large amounts of network data?
Select all that apply
- You need to do some analysis on large amounts of data before creating visualizations.
- Subset the key nodes or sub-networks that can be more digestible.
- You don't need any additional analysis on large amounts of data.
- Take all the data exactly as it is and don't make any changes.
Correct

Incorrect
Question 1376 of 1443

1376. Question
5. Which function calculates and visualizes the communities in a network?
- cluster_louvain()
- louvain()
- modularity()
- plot()
Correct

Incorrect
Question 1377 of 1443

1377. Question
5. Which function calls up all the files in a folder for an overview?
- ```
 dir() 
```
- ```
 read.csv() 
```
- ```
 library() 
```
- ```
 help.search() 
```
Correct

Incorrect
Question 1378 of 1443

1378. Question
Fill in the blank below.
- Clustering and data mining are types of data analysis , which is a type of data analysis where the intent is to see what the data can tell us beyond modeling or hypothesis testing.
Correct

Incorrect
Question 1379 of 1443

1379. Question
Match the following terms to the correct definition:
Sort elements
- Pulls rows where the value of the joining field is present in both tables
- Pulls all rows from one table, and only the rows from the second table where the value of the joining field matches a value.
- Pulls all rows from both tables.
- Pulls all possible combinations of rows in all tables.
- INNER JOIN
- (LEFT or RIGHT) OUTER JOIN
- FULL OUTER JOIN
- CROSS JOIN
Correct

Incorrect
Question 1380 of 1443

1380. Question
Fill in the blank below.
- can have a very negative impact on linear regressions if they are not identified and handled properly because they can skew the algorithm. It's important to identify them early and determine why they do not conform to the majority of the data points in case you need to adjust your model. You can identify them with Cook's distance or boxplots.
Correct

Incorrect
Question 1381 of 1443

1381. Question
Fill in the blank below:
- In entropy, the number indicates 100% of the data is the same, and the number indicates a 50-50 split.
Correct

Incorrect
Question 1382 of 1443

1382. Question
Which of the following SQL functions can be used on date fields?
Select all that apply
- MAX()
- DATEDIFF()
- DAY()
- MIN()
- SUM()
- ROUND()
Correct

Incorrect
Question 1383 of 1443

1383. Question
Which type of data contain sets of categories?
- Factor
- Character
- Boolean
- String
Correct

Incorrect
Question 1384 of 1443

1384. Question
Match the function names to their descriptions.
Sort elements
- labs()
- coord_flip()
- facet_wrap()
- geom_area()
- Sets labels for axes and title
- Flips the axes of a graph
- Splits up data by category to give smaller individual graphs
- Creates an area plot
Correct

Incorrect
Question 1385 of 1443

1385. Question
In a non-biased model errors will be random. If errors are not random it means...
Select all that apply
- there is a "bias" in the model.
- you're not taking something into account.
- your model has a normal distribution of errors.
- your model has a skewed distribution of errors.
Correct

Incorrect
Question 1386 of 1443

1386. Question
Sort the variables as either continuous or discrete.
Sort elements
- Number of cars, number of buildings, temperature
- Days of the week, months of the year
- Colors, types of weather, names
- Continuous variables
- Discrete variables
- Discrete variables without a defined sequence
Correct

Incorrect
Question 1387 of 1443

1387. Question
Put the 5 functions of an organization in order.
- Procurement
- Sales
- Research and development
- Shipment
- Production
Correct

Incorrect
Question 1388 of 1443

1388. Question
Fill in the blank below.
- can have a very negative impact on linear regressions if they are not identified and handled properly, as they can skew the data because they lie outside the majority of data points.
Correct

Incorrect
Question 1389 of 1443

1389. Question
How does clustering help when there are more than 3 attributes in the data?
- Clustering helps identify groups with many attributes that you can't easily visualize.
- Clustering can only cluster when there are more than three attributes.
- Clustering is the best method to visualize more than 4 attributes at a time.
- Clustering can gather data more accurately when there are many attributes.
Correct

Incorrect
Question 1390 of 1443

1390. Question
Please fill in the blank below:
- Remember, like all resources, - data that cannot fit on a single computer or server - has to be cost effective. This term does not refer to data analytics, although it sometimes conflated to mean the same thing.
Correct

Incorrect
Question 1391 of 1443

1391. Question
Please fill in the blank below:
- Naive Bayes makes the assumption that the variables are , which means that the presence of one variable does not affect the presence of another variable.
Correct

Incorrect
Question 1392 of 1443

1392. Question
Please fill in the blank below.
- In a relationship, the value of the dependent variable changes in a non-linear fashion, so we may need more than one coefficient to predict trends.
Correct

Incorrect
Question 1393 of 1443

1393. Question
Please fill in the blank below.
- Clustering assumes that is a measure for similarity. In other words, data points that are “closer” together are probably more similar than data points that are farther apart.
Correct

Incorrect
Question 1394 of 1443

1394. Question
Match the definition of the type of linkage with the correct term.
Sort elements
- Minimum distance between points
- Maximum distance between points
- Group distance average
- Distance between centroids
- Single linkage
- Complete linkage
- Average linkage
- Centroid linkage
Correct

Incorrect
Question 1395 of 1443

1395. Question
4. What is unsupervised machine learning?
- Data analysis that leads to new patterns and conclusions
- Classification and regression
- A way to identify who will vote Democrat and Republican
- Data analysis that classifies data based on pre-determined categories
Correct

Incorrect
Question 1396 of 1443

1396. Question
What would be the output of the following code for vector 'v':
```
 v[2:7] 
```
- The second term through the seventh term
- The data in the second row and seventh column
- The second term and the seventh term
- The terms with numbers ‘2’ and ‘7'
Correct

Incorrect
Question 1397 of 1443

1397. Question
What is one of the dangers of increasing the number of clusters?
- You could overfit the data so it doesn't generalize well.
- You could increase the complexity of the algorithm beyond a computer's capability.
- You could distort the original data.
- You could discount some data points.
Correct

Incorrect
Question 1398 of 1443

1398. Question
You can use the predict() function to make predictions using the model that you developed. Put the prediction use cases below in order according to the business function it was used for.
- Sell: Harrah's Hotel and Casino in Las Vegas predicts how much a customer will spend over the years, estimating their lifetime value to the casino
- Buy: Ski manufacturers predict demand for skis each winter, stocking up on supplies
- R&D: As much as 40% of trading on the London Stock Exchange is estimated to be driven by trading algorithms
- Ship: Energex (Australian utility) predicts 20 years of electricity demand growth to direct infrastructure investment
- Make: Life insurance companies predict the age of death in order to approve policies and set pricing
Correct

Incorrect
Question 1399 of 1443

1399. Question
Which function adds up the data in the previous cells to create a new column or row with cumulative sums?
- ```
 cumsum() 
```
- ```
 ddply() 
```
- ```
 numcolwise() 
```
- ```
 cumulate() 
```
Correct

Incorrect
Question 1400 of 1443

1400. Question
What is unsupervised machine learning?
- Data analysis that leads to new patterns and conclusions
- Classification and regression
- A way to identify who will vote Democrat and Republican
- Data analysis that classifies data based on pre-determined categories
Correct

Incorrect
Question 1401 of 1443

1401. Question
Which two variables showed the strongest correlations with two clusters?
Select all that apply
- Points per game
- Minutes per game
- Rebounds per game
- Free throws per game
Correct

Incorrect
Question 1402 of 1443

1402. Question
Data science is at the intersection of three domains:
- Industry Knowledge, Machine Learning, Programming
- Programming, Mathematics and Statistics, Industry Knowledge
- Mathematics and Statistics, Programming, Big Data
- Programming, Industry Knowledge, Big Data
Correct

Incorrect
Question 1403 of 1443

1403. Question
1. Fill in the blank below.
- is a term that means only looking at a portion of the data. It is denoted in R by a '$' symbol.
Correct

Incorrect
Question 1404 of 1443

1404. Question
1. Match the functions to the actions that they perform in R.
Sort elements
- sets up the network data
- checks the structure of the output
- pull attributes of graph vertices
- pulls attributes of graph edges
- graph.data.frame()
- str()
- V()
- E()
Correct

Incorrect
Question 1405 of 1443

1405. Question
1. Match the functions to the actions that they perform in R.
Sort elements
- sets up the network data
- checks the structure of the output
- pull attributes of graph vertices
- pulls attributes of graph edges
- graph.data.frame()
- str()
- V()
- E()
Correct

Incorrect
Question 1406 of 1443

1406. Question
1. When running a variable selection model, how does the computer know when it has found the right variables?
- The Akaike Information Criterion
- Cook's Distance
- Adjusted R Square
- The Breusch-Pagan Test
Correct

Incorrect
Question 1407 of 1443

1407. Question
1. Fill in the blanks below.
- You always have to your models and check for potential even before you can test it on new data!
Correct

Incorrect
Question 1408 of 1443

1408. Question
2. Which type of data contain sets of categories?
- Factor
- Character
- Boolean
- String
Correct

Incorrect
Question 1409 of 1443

1409. Question
2. Fill in the blank below.
- To make sure your images are reproducible, you can use the function.
Correct

Incorrect
Question 1410 of 1443

1410. Question
2. Fill in the blank below.
- To make sure your images are reproducible, you can use the function.
Correct

Incorrect
Question 1411 of 1443

1411. Question
2. What are some things you should always check for in your model?
- Outliers
- Multicollinearity
- Bias in the residuals
- Inliers
Correct

Incorrect
Question 1412 of 1443

1412. Question
2. What does the Akaike Information Criterion (AIC) do?
- Measures the "quality" of several statistical models in comparison to each other.
- Provides an estimate of the information lost when the variables in the model are adjusted.
- Explains if heteroscedasticity is likely present in the regression model.
- Measures how much the variance of a regression coefficient is increased due to collinearity.
Correct

Incorrect
Question 1413 of 1443

1413. Question
3. How can you check the warnings to see if there is anything you should be concerned about?
- warnings()
- errors()
- war()
- help()
Correct

Incorrect
Question 1414 of 1443

1414. Question
3. How can you check the warnings to see if there is anything you should be concerned about?
- warnings()
- errors()
- war()
- help()
Correct

Incorrect
Question 1415 of 1443

1415. Question
3. Match the function to its purpose.
Sort elements
- searches for text in data
- gives you the length of any vector
- tabulates the number of entries for categorical data
- sorts data by a particular column
- grep()
- length()
- table()
- order()
Correct

Incorrect
Question 1416 of 1443

1416. Question
3. What are some key things you should always check for in your model?
- Outliers
- Multicollinearity and correlation among the variables
- Adjusted R squared
- Model bias and distribution of residuals (Q-Q plot)
- Standard deviation of residuals to assess model fit
- Heteroscedasticity / pattern of residuals vs. fitted values
Correct

Incorrect
Question 1417 of 1443

1417. Question
3. What is a good way to test for multicollinearity?
- Variance-inflation factors
- Cook's Distance
- Box plots
- Q-Q plot
Correct

Incorrect
Question 1418 of 1443

1418. Question
3. What is heteroscedasticity?
- Bias in the residuals as a function of predicted value.
- Bias as a result of outliers in the data.
- No bias evident in the residuals as a function of predicted value.
- No bias because there are no outliers in the data.
Correct

Incorrect
Question 1419 of 1443

1419. Question
4. After running a Breusch-Pagan test, how would I know that there is no heteroscedasticity?
- The p-value is very large.
- The p-value is very small.
- The residuals are evenly distributed.
- The residuals are not evenly distributed.
Correct

Incorrect
Question 1420 of 1443

1420. Question
4. How do you know that this is an undirected graph?
Correct

Incorrect
Question 1421 of 1443

1421. Question
4. What type of data does the grep() function work with?
- Character vectors
- Booleans
- Strings
- Data frames
Correct

Incorrect
Question 1422 of 1443

1422. Question
4. Which package do we use to run the vif() function?
- car
- ggplot
- plyr
- tidyr
Correct

Incorrect
Question 1423 of 1443

1423. Question
4. You can use the predict() function to make predictions using the model that you developed. Put the prediction use cases below in order according to the business function it was used for.
- Buy: Ski manufacturers predict demand for skis each winter, stocking up on supplies
- Make: Life insurance companies predict the age of death in order to approve policies and set pricing
- Sell: Harrah's Hotel and Casino in Las Vegas predicts how much a customer will spend over the years, estimating their lifetime value to the casino
- R&D: As much as 40% of trading on the London Stock Exchange is estimated to be driven by trading algorithms
- Ship: Energex (Australian utility) predicts 20 years of electricity demand growth to direct infrastructure investment
Correct

Incorrect
Question 1424 of 1443

1424. Question
5. Match the key terms below to their descriptions.
Sort elements
- Check if there is bias in the data or the model
- The probability that the pattern exists through random chance
- Test for multicollinearity and independent variable interaction
- Check the residuals for heteroscedasticity (pattern contingent on fitted values)
- Check for information loss when selecting the right model for your data
- Q-Q plot/ distribution of errors
- p-values
- VIF
- Breusch-Pagan test
- AIC
Correct

Incorrect
Question 1425 of 1443

1425. Question
5. Match the methods of variable selection to the correct descriptions.
Sort elements
- Algorithm starts with a model of 0 variables and continues to add more variables based upon a specified measure
- Starts with a model of all variables, and removes variables based upon a specified measure
- Combination of forward and backward selection that starts with a model of 0 variables and adds variables, but can also remove variables based upon a specified measure
- Forward selection
- Backward selection
- Step-wise selection
Correct

Incorrect
Question 1426 of 1443

1426. Question
5. What is TRUE about this Q-Q Plot?
- This Q-Q Plot shows that our model has fewer residuals at the tails of the distribution.
- The residuals may not be normally distributed, meaning that we could have achieved results at random.
- This Q-Q Plot shows that our model has more residuals at the tails of the distribution.
- The residuals are normally distributed, meaning that we could not have achieved results at random.
Correct

Incorrect
Question 1427 of 1443

1427. Question
5. Which function eliminates duplicate rows?
- ```
 unique() 
```
- ```
 order() 
```
- ```
 duplicate() 
```
- ```
 grep() 
```
Correct

Incorrect
Question 1428 of 1443

1428. Question
5. Which one of these is not a component of a needle?
- Shaft
- Pipe
- Bevel
- Plunger
Correct

Incorrect
Question 1429 of 1443

1429. Question
5. Which package should you install to plot an interactive network visualization?
- networkD3
- ggplot
- scatterplot3D
- plyr
Correct

Incorrect
Question 1430 of 1443

1430. Question
5. Which package should you install to plot an interactive network visualization?
- networkD3
- ggplot
- scatterplot3D
- plyr
Correct

Incorrect
Question 1431 of 1443

1431. Question
Fill in the blank below.
- A good explanatory model will have residuals whose variance does not depend on the (predictor) variables.
Correct

Incorrect
Question 1432 of 1443

1432. Question
Fill in the blank below.
- can have a very negative impact on linear regressions if they are not identified and handled properly, as they can skew the data because they lie outside the majority of data points.
Correct

Incorrect
Question 1433 of 1443

1433. Question
What are some important questions you have to ask in order to be comfortable with your model?
Select all that apply
- Does the variance of the residuals change with the predicted value?
- Do the forces affecting the dependent variable change in some parts of the data and should our model reflect that?
- Does it make a difference to identify outliers or bias in the residuals in our model?
- Is there any way to change our model or does it always have to stay the same?
Correct

Incorrect
Question 1434 of 1443

1434. Question
What are some important things to remember when working with outliers in your data?
Select all that apply
- Never remove outliers without understanding why they are present in the data.
- Make sure you understand why removing outliers is the correct course of action for your analysis.
- Never leave outliers in your data because they will always erroneously skew your model.
- It is unimportant to have a common sense justification for why you remove outliers from your data.
Correct

Incorrect
Question 1435 of 1443

1435. Question
What are some ways you can identify outliers in your data?
Select all that apply
- Scatterplots
- Box-and-whisker plots
- Cook's distance
- Other methods covered in other courses
Correct

Incorrect
Question 1436 of 1443

1436. Question
What does it mean to “practice” coding?
- Figure out how to fix bugs
- Type out the code
- Study the material
- Play the scales
Correct

Incorrect
Question 1437 of 1443

1437. Question
What is supervised machine learning?
- Classifying data based on pre-determined categories
- Analysis done under a superior’s supervision
- Data analysis with non-obvious outputs
- An iterative process that creates an accurate model
Correct

Incorrect
Question 1438 of 1443

1438. Question
What is unsupervised machine learning?
- Data analysis that leads to new patterns and conclusions
- Classification and regression
- A way to identify who will vote Democrat and Republican
- Data analysis that classifies data based on pre-determined categories
Correct

Incorrect
Question 1439 of 1443

1439. Question
Which of these is an example of exploratory data analysis?
- Using customer attributes to group customers and find new patterns
- Grouping customers into groups that have been created already
- Identifying outliers in the data based on the model
- Analyzing the data to match expected outcomes
Correct

Incorrect
Question 1440 of 1443

1440. Question
Why do we need to validate the model?
- To make sure it works with other data
- To prove that it works with the test data
- To determine what type of data we’re working with
- To answer questions about the data
Correct

Incorrect
Question 1441 of 1443

1441. Question
Put the six Data Science control cycle steps in order, starting with “Ask”
- Interpret
- Test
- Ask
- Validate
- Model
- Research
Correct

Incorrect
Question 1442 of 1443

1442. Question
4. How do you know that this is an undirected graph?
Correct

Incorrect
Question 1443 of 1443

1443. Question
4. What method do you use to recap a needle?
- Push-It method
- None
- One-Handed Scoop Method
- Hard Cap Method
Correct

Incorrect

All questions

All questions

Quiz Summary

Information

Results

Results

Categories

1. Question

2. Question

3. Question

4. Question

5. Question

6. Question

7. Question

8. Question

9. Question

10. Question

11. Question

12. Question

13. Question

14. Question

15. Question

16. Question

17. Question

18. Question

19. Question

Sort elements

20. Question

21. Question

22. Question

23. Question

24. Question

25. Question

26. Question

27. Question

28. Question

29. Question

30. Question

31. Question

32. Question

33. Question

34. Question

Sort elements

35. Question

36. Question

37. Question

38. Question

39. Question

40. Question

41. Question

42. Question

43. Question

44. Question

45. Question

46. Question

47. Question

Sort elements

48. Question

49. Question

50. Question

Sort elements

51. Question

52. Question

53. Question

Sort elements

54. Question

Sort elements

55. Question

56. Question

Sort elements

57. Question

58. Question

59. Question

60. Question

61. Question

62. Question

63. Question

64. Question

65. Question

66. Question