Summer Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: v4s65

Professional-Data-Engineer Exam Dumps - Google Professional Data Engineer Exam

Go to page:
Question # 73

You created an analytics environment on Google Cloud so that your data scientist team can explore data without impacting the on-premises Apache Hadoop solution. The data in the on-premises Hadoop Distributed File System (HDFS) cluster is in Optimized Row Columnar (ORC) formatted files with multiple columns of Hive partitioning. The data scientist team needs to be able to explore the data in a similar way as they used the on-premises HDFS cluster with SQL on the Hive query engine. You need to choose the most cost-effective storage and processing solution. What should you do?

A.

Import the ORC files lo Bigtable tables for the data scientist team.

B.

Import the ORC files to BigOuery tables for the data scientist team.

C.

Copy the ORC files on Cloud Storage, then deploy a Dataproc cluster for the data scientist team.

D.

Copy the ORC files on Cloud Storage, then create external BigQuery tables for the data scientist team.

Full Access
Question # 74

You are building a streaming Dataflow pipeline that ingests noise level data from hundreds of sensors placed near construction sites across a city. The sensors measure noise level every ten seconds, and send that data to the pipeline when levels reach above 70 dBA. You need to detect the average noise level from a sensor when data is received for a duration of more than 30 minutes, but the window ends when no data has been received for 15 minutes What should you do?

A.

Use session windows with a 30-mmute gap duration.

B.

Use tumbling windows with a 15-mmute window and a fifteen-minute. withAllowedLateness operator.

C.

Use session windows with a 15-minute gap duration.

D.

Use hopping windows with a 15-mmute window, and a thirty-minute period.

Full Access
Question # 75

Your company has a hybrid cloud initiative. You have a complex data pipeline that moves data between cloud provider services and leverages services from each of the cloud providers. Which cloud-native service should you use to orchestrate the entire pipeline?

A.

Cloud Dataflow

B.

Cloud Composer

C.

Cloud Dataprep

D.

Cloud Dataproc

Full Access
Question # 76

You use a dataset in BigQuery for analysis. You want to provide third-party companies with access to the same dataset. You need to keep the costs of data sharing low and ensure that the data is current. Which solution should you choose?

A.

Create an authorized view on the BigQuery table to control data access, and provide third-party companies with access to that view.

B.

Use Cloud Scheduler to export the data on a regular basis to Cloud Storage, and provide third-party companies with access to the bucket.

C.

Create a separate dataset in BigQuery that contains the relevant data to share, and provide third-party companies with access to the new dataset.

D.

Create a Cloud Dataflow job that reads the data in frequent time intervals, and writes it to the relevant BigQuery dataset or Cloud Storage bucket for third-party companies to use.

Full Access
Go to page: