Labour Day Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: scxmas70

Databricks-Certified-Professional-Data-Scientist Exam Dumps - Databricks Certified Professional Data Scientist Exam

Question # 4

Which of the following skills a data scientists required?

A.

Web designing to represent best visuals of its results from algorithm.

B.

He should be creative

C.

Should possess good programming skills

D.

Should be very good at mathematics and statistic

E.

He should possess database administrative skills.

Full Access
Question # 5

You are working on a email spam filtering assignment, while working on this you find there is new word e.g. HadoopExam comes in email, and in your solutions you never come across this word before, hence probability of this words is coming in either email could be zero. So which of the following algorithm can help you to avoid zero probability?

A.

Naive Bayes

B.

Laplace Smoothing

C.

Logistic Regression

D.

All of the above

Full Access
Question # 6

What is one modeling or descriptive statistical function in MADlib that is typically not provided in a standard relational database?

A.

Expected value

B.

Variance

C.

Linear regression

D.

Quantiles

Full Access
Question # 7

You are using one approach for the classification where to teach the agent not by giving explicit categorizations, but by using some sort of reward system to indicate success, where agents might be rewarded for doing certain actions and punished for doing others. Which kind of this learning

A.

Supervised

B.

Unsupervised

C.

Regression

D.

None of the above

Full Access
Question # 8

Suppose a man told you he had a nice conversation with someone on the train. Not knowing anything about this conversation, the probability that he was speaking to a woman is 50% (assuming the train had an equal number of men and women and the speaker was as likely to strike up a conversation with a man as with a woman). Now suppose he also told you that his conversational partner had long hair. It is now more

likely he was speaking to a woman, since women are more likely to have long hair than men.____________

can be used to calculate the probability that the person was a woman.

A.

SVM

B.

MLE

C.

Bayes' theorem

D.

Logistic Regression

Full Access
Question # 9

In which lifecycle stage are appropriate analytical techniques determined?

A.

Model planning

B.

Model building

C.

Data preparation

D.

Discovery

Full Access
Question # 10

Consider the following confusion matrix for a data set with 600 out of 11,100 instances positive:

In this case, Precision = 50%, Recall = 83%, Specificity = 95%, and Accuracy = 95%.

Select the correct statement

A.

Precision is low, which means the classifier is predicting positives best

B.

Precision is low, which means the classifier is predicting positives poorly

C.

problem domain has a major impact on the measures that should be used to evaluate a classifier within it

D.

1 and 3

E.

2 and 3

Full Access
Question # 11

Consider flipping a coin for which the probability of heads is p, where p is unknown, and our goa is to estimate p. The obvious approach is to count how many times the coin came up heads and divide by the total number of coin flips. If we flip the coin 1000 times and it comes up heads 367 times, it is very reasonable to estimate p as approximately 0.367. However, suppose we flip the coin only twice and we get heads both times. Is it reasonable to estimate p as 1.0? Intuitively, given that we only flipped the coin twice, it seems a bit

rash to conclude that the coin will always come up heads, and____________is a way of avoiding such rash

conclusions.

A.

Naive Bayes

B.

Laplace Smoothing

C.

Logistic Regression

D.

Linear Regression

Full Access
Question # 12

Refer to the Exhibit.

In the Exhibit, the table shows the values for the input Boolean attributes "A", "B", and "C". It also shows the values for the output attribute "class". Which decision tree is valid for the data?

A.

Tree A

B.

Tree B

C.

Tree C

D.

Tree D

Full Access
Question # 13

Which of the following metrics are useful in measuring the accuracy and quality of a recommender system?

A.

Cluster Density

B.

Support Vector Count

C.

Mean Absolute Error

D.

Sum of Absolute Errors

Full Access
Question # 14

The figure below shows a plot of the data of a data matrix M that is 1000 x 2. Which line represents the first principal component?

A.

yellow

B.

blue

C.

Neither

Full Access
Question # 15

Digit recognition, is an example of.....

A.

Classification

B.

Clustering

C.

Unsupervised learning

D.

None of the above

Full Access
Question # 16

Find out the classifier which assumes independence among all its features?

A.

Neural networks

B.

Linear Regression

C.

Naive Bayes

D.

Random forests

Full Access
Question # 17

Which of the following steps you will be using in the discovery phase?

A.

What all are the data sources for the project?

B.

Analyze the Raw data and its format and structure.

C.

What all tools are required, in the project?

D.

What is the network capacity required

E.

What Unix server capacity required?

Full Access
Question # 18

Select the correct problems which can be solved using SVMs

A.

SVMs are helpful in text and hypertext categorization

B.

Classification of images can also be performed using SVMs

C.

SVMs are also useful in medical science to classify proteins with up to 90% of the compounds classified correctly

D.

Hand-written characters can be recognized using SVM

Full Access
Question # 19

A bio-scientist is working on the analysis of the cancer cells. To identify whether the cell is cancerous or not, there has been hundreds of tests are done with small variations to say yes to the problem. Given the test result for a sample of healthy and cancerous cells, which of the following technique you will use to determine whether a cell is healthy?

A.

Linear regression

B.

Collaborative filtering

C.

Naive Bayes

D.

Identification Test

Full Access
Question # 20

You are studying the behavior of a population, and you are provided with multidimensional data at the individual level. You have identified four specific individuals who are valuable to your study, and would like to find all users who are most similar to each individual. Which algorithm is the most appropriate for this study?

A.

Association rules

B.

Decision trees

C.

Linear regression

D.

K-means clustering

Full Access