Summer Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: v4s65

Professional-Data-Engineer Exam Dumps - Google Professional Data Engineer Exam

Go to page:
Question # 65

You are building a model to predict whether or not it will rain on a given day. You have thousands of input features and want to see if you can improve training speed by removing some features while having a minimum effect on model accuracy. What can you do?

A.

Eliminate features that are highly correlated to the output labels.

B.

Combine highly co-dependent features into one representative feature.

C.

Instead of feeding in each feature individually, average their values in batches of 3.

D.

Remove the features that have null values for more than 50% of the training records.

Full Access
Question # 66

You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

A.

There are very few occurrences of mutations relative to normal samples.

B.

There are roughly equal occurrences of both normal and mutated samples in the database.

C.

You expect future mutations to have different features from the mutated samples in the database.

D.

You expect future mutations to have similar features to the mutated samples in the database.

E.

You already have labels for which samples are mutated and which are normal in the database.

Full Access
Question # 67

Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

A.

Use a row key of the form .

B.

Use a row key of the form .

C.

Use a row key of the form #.

D.

Use a row key of the form >##.

Full Access
Question # 68

Your company’s customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations. What should you do?

A.

Add a node to the MySQL cluster and build an OLAP cube there.

B.

Use an ETL tool to load the data from MySQL into Google BigQuery.

C.

Connect an on-premises Apache Hadoop cluster to MySQL and perform ETL.

D.

Mount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.

Full Access
Question # 69

You are creating a model to predict housing prices. Due to budget constraints, you must run it on a single resource-constrained virtual machine. Which learning algorithm should you use?

A.

Linear regression

B.

Logistic classification

C.

Recurrent neural network

D.

Feedforward neural network

Full Access
Question # 70

Your company is using WHILECARD tables to query data across multiple tables with similar names. The SQL statement is currently failing with the following error:

# Syntax error : Expected end of statement but got “-“ at [4:11]

SELECT age

FROM

bigquery-public-data.noaa_gsod.gsod

WHERE

age != 99

AND_TABLE_SUFFIX = ‘1929’

ORDER BY

age DESC

Which table name will make the SQL statement work correctly?

A.

‘bigquery-public-data.noaa_gsod.gsod‘

B.

bigquery-public-data.noaa_gsod.gsod*

C.

‘bigquery-public-data.noaa_gsod.gsod’*

D.

‘bigquery-public-data.noaa_gsod.gsod*`

Full Access
Question # 71

You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud. You want to support transactions that scale horizontally. You also want to optimize data for range queries on nonkey columns. What should you do?

A.

Use Cloud SQL for storage. Add secondary indexes to support query patterns.

B.

Use Cloud SQL for storage. Use Cloud Dataflow to transform data to support query patterns.

C.

Use Cloud Spanner for storage. Add secondary indexes to support query patterns.

D.

Use Cloud Spanner for storage. Use Cloud Dataflow to transform data to support query patterns.

Full Access
Question # 72

You have a data pipeline with a Cloud Dataflow job that aggregates and writes time series metrics to Cloud Bigtable. This data feeds a dashboard used by thousands of users across the organization. You need to support additional concurrent users and reduce the amount of time required to write the data. Which two actions should you take? (Choose two.)

A.

Configure your Cloud Dataflow pipeline to use local execution

B.

Increase the maximum number of Cloud Dataflow workers by setting maxNumWorkers in PipelineOptions

C.

Increase the number of nodes in the Cloud Bigtable cluster

D.

Modify your Cloud Dataflow pipeline to use the Flatten transform before writing to Cloud Bigtable

E.

Modify your Cloud Dataflow pipeline to use the CoGroupByKey transform before writing to Cloud Bigtable

Full Access
Go to page: