EMC Associate - Data Science and Big Data Analytics v2 - DEA-7TT2 Exam Practice Test

Question 1
A data scientist is investigating a new database column that needs to be integrated into their model. The column contains 10,000 labels with 300 unique values. Which data structure should be used when working in R?
Response:

Correct Answer: C
Question 2
Which data asset is an example of quasi-structured data?
Response:

Correct Answer: A
Question 3
Which activity is performed in the Operationalize phase of the Data Analytics Lifecycle?
Response:

Correct Answer: C
Question 4
Refer to the exhibit.

What provides the decision tree for predicting whether or not someone is a good or bad credit risk. What would be the assigned probability, p(good), of a single male with no known savings?
Response:

Correct Answer: D
Question 5
You do a Student's t-test to compare the average test scores of sample groups from populations A and B. Group A averaged 10 points higher than group B. You find that this difference is significant, with a p-value of 0.03.
What does that mean?
Response:

Correct Answer: A
Question 6
Which word or phrase completes the statement? Business Intelligence is to monitoring trends as Data Science is to ________ trends.
Response:

Correct Answer: B
Question 7
In a t-test with unknown variance, what values are used to calculate the t-statistic?
Response:

Correct Answer: C
Question 8
What is the primary function of the NameNode in Hadoop?
Response:

Correct Answer: C
Question 9
Which data asset is an example of unstructured data?
Response:

Correct Answer: B
Question 10
You have the following corpus of texts:
"The cat hit the dog."
"The dog bit the mail carrier."
"The mail carrier chased the truck."
"The truck hit the wall while avoiding the dog that chased the cat."
"The cat climbed the wall."
If the tf-idf metric is used to score relevance for search and retrieval, which term has the highest discriminatory power?
Response:

Correct Answer: D
Question 11
When is a Naive Bayesian Classifier model for classification preferred versus a Logistic Regression model?
Response:

Correct Answer: A
Question 12
When is a Wilcoxon Rank-Sum test used?
Response:

Correct Answer: C
Question 13
A data scientist plans to classify the sentiment polarity of 10, 000 product reviews collected from the Internet. What is the most appropriate model to use? Suppose labeled training data is available.
Response:
Naive Bayesian classifier

Correct Answer: A