Winter Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: v4s65

Professional-Data-Engineer Exam Dumps - Google Professional Data Engineer Exam

Searching for workable clues to ace the Google Professional-Data-Engineer Exam? You’re on the right place! ExamCert has realistic, trusted and authentic exam prep tools to help you achieve your desired credential. ExamCert’s Professional-Data-Engineer PDF Study Guide, Testing Engine and Exam Dumps follow a reliable exam preparation strategy, providing you the most relevant and updated study material that is crafted in an easy to learn format of questions and answers. ExamCert’s study tools aim at simplifying all complex and confusing concepts of the exam and introduce you to the real exam scenario and practice it with the help of its testing engine and real exam dumps

Go to page:
Question # 33

When using Cloud Dataproc clusters, you can access the YARN web interface by configuring a browser to connect through a ____ proxy.

A.

HTTPS

B.

VPN

C.

SOCKS

D.

HTTP

Full Access
Question # 34

Which of the following are examples of hyperparameters? (Select 2 answers.)

A.

Number of hidden layers

B.

Number of nodes in each hidden layer

C.

Biases

D.

Weights

Full Access
Question # 35

If a dataset contains rows with individual people and columns for year of birth, country, and income, how many of the columns are continuous and how many are categorical?

A.

1 continuous and 2 categorical

B.

3 categorical

C.

3 continuous

D.

2 continuous and 1 categorical

Full Access
Question # 36

Your company receives both batch- and stream-based event data. You want to process the data using Google Cloud Dataflow over a predictable time period. However, you realize that in some instances data can arrive late or out of order. How should you design your Cloud Dataflow pipeline to handle data that is late or out of order?

A.

Set a single global window to capture all the data.

B.

Set sliding windows to capture all the lagged data.

C.

Use watermarks and timestamps to capture the lagged data.

D.

Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.

Full Access
Question # 37

You are building an application to share financial market data with consumers, who will receive data feeds. Data is collected from the markets in real time. Consumers will receive the data in the following ways:

Real-time event stream

ANSI SQL access to real-time stream and historical data

Batch historical exports

Which solution should you use?

A.

Cloud Dataflow, Cloud SQL, Cloud Spanner

B.

Cloud Pub/Sub, Cloud Storage, BigQuery

C.

Cloud Dataproc, Cloud Dataflow, BigQuery

D.

Cloud Pub/Sub, Cloud Dataproc, Cloud SQL

Full Access
Question # 38

Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of datA. Assuming that all expiring logs will be archived correctly, where should you store data that is subject to that mandate?

A.

Encrypted on Cloud Storage with user-supplied encryption keys. A separate decryption key will be given to each authorized user.

B.

In a BigQuery dataset that is viewable only by authorized personnel, with the Data Access log used toprovide the auditability.

C.

In Cloud SQL, with separate database user names to each user. The Cloud SQL Admin activity logs will be used to provide the auditability.

D.

In a bucket on Cloud Storage that is accessible only by an AppEngine service that collects user information and logs the access before providing a link to the bucket.

Full Access
Question # 39

You are designing BigQuery tables for large volumes of clickstream event data. Your data analyst team will most frequently query by specific event date ranges and filter by the user ID UUID. You want to optimize table structure for query cost and performance. What should you do?

A.

Partition the table by the event date column and cluster the table by the user ID column.

B.

Partition the table by the user ID column and cluster the table by the event date column.

C.

Create an ingestion-time partitioned table and cluster it by the user ID column.

D.

Cluster the table by both the event date and the user ID columns.

Full Access
Question # 40

You work on a regression problem in a natural language processing domain, and you have 100M labeled exmaples in your dataset. You have randomly shuffled your data and split your dataset into train and test samples (in a 90/10 ratio). After you trained the neural network and evaluatedyour model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set. How should you improve the performance of your model?

A.

Increase the share of the test sample in the train-test split.

B.

Try to collect more data and increase the size of your dataset.

C.

Try out regularization techniques (e.g., dropout of batch normalization) to avoid overfitting.

D.

Increase the complexity of your model by, e.g., introducing an additional layer or increase sizing the size of vocabularies or n-grams used.

Full Access
Go to page: