New Year Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: scxmas70

Data-Engineer-Associate Exam Dumps - AWS Certified Data Engineer - Associate (DEA-C01)

Searching for workable clues to ace the Amazon Web Services Data-Engineer-Associate Exam? You’re on the right place! ExamCert has realistic, trusted and authentic exam prep tools to help you achieve your desired credential. ExamCert’s Data-Engineer-Associate PDF Study Guide, Testing Engine and Exam Dumps follow a reliable exam preparation strategy, providing you the most relevant and updated study material that is crafted in an easy to learn format of questions and answers. ExamCert’s study tools aim at simplifying all complex and confusing concepts of the exam and introduce you to the real exam scenario and practice it with the help of its testing engine and real exam dumps

Go to page:
Question # 25

A data engineer needs to debug an AWS Glue job that reads from Amazon S3 and writes to Amazon Redshift. The data engineer enabled the bookmark feature for the AWS Glue job. The data engineer has set the maximum concurrency for the AWS Glue job to 1.

The AWS Glue job is successfully writing the output to Amazon Redshift. However, the Amazon S3 files that were loaded during previous runs of the AWS Glue job are being reprocessed by subsequent runs.

What is the likely reason the AWS Glue job is reprocessing the files?

A.

The AWS Glue job does not have the s3:GetObjectAcl permission that is required for bookmarks to work correctly.

B.

The maximum concurrency for the AWS Glue job is set to 1.

C.

The data engineer incorrectly specified an older version of AWS Glue for the Glue job.

D.

The AWS Glue job does not have a required commit statement.

Full Access
Question # 26

A media company uses software as a service (SaaS) applications to gather data by using third-party tools. The company needs to store the data in an Amazon S3 bucket. The company will use Amazon Redshift to perform analytics based on the data.

Which AWS service or feature will meet these requirements with the LEAST operational overhead?

A.

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

B.

Amazon AppFlow

C.

AWS Glue Data Catalog

D.

Amazon Kinesis

Full Access
Question # 27

A company stores logs in an Amazon S3 bucket. When a data engineer attempts to access several log files, the data engineer discovers that some files have been unintentionally deleted.

The data engineer needs a solution that will prevent unintentional file deletion in the future.

Which solution will meet this requirement with the LEAST operational overhead?

A.

Manually back up the S3 bucket on a regular basis.

B.

Enable S3 Versioning for the S3 bucket.

C.

Configure replication for the S3 bucket.

D.

Use an Amazon S3 Glacier storage class to archive the data that is in the S3 bucket.

Full Access
Question # 28

A media company wants to improve a system that recommends media content to customer based on user behavior and preferences. To improve the recommendation system, the company needs to incorporate insights from third-party datasets into the company's existing analytics platform.

The company wants to minimize the effort and time required to incorporate third-party datasets.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Use API calls to access and integrate third-party datasets from AWS Data Exchange.

B.

Use API calls to access and integrate third-party datasets from AWS

C.

Use Amazon Kinesis Data Streams to access and integrate third-party datasets from AWS CodeCommit repositories.

D.

Use Amazon Kinesis Data Streams to access and integrate third-party datasets from Amazon Elastic Container Registry (Amazon ECR).

Full Access
Question # 29

A data engineer needs to deploy a complex pipeline. The stages of the pipeline must run scripts, but only fully managed and serverless services can be used.

A.

Deploy AWS Glue jobs and workflows. Use AWS Glue to run the jobs and workflows on a schedule.

B.

Use Amazon MWAA to build and schedule the pipeline.

C.

Deploy the script to EC2. Use EventBridge to schedule it.

D.

Use AWS Glue DataBrew and EventBridge to run on a schedule.

Full Access
Question # 30

A data engineer is configuring an AWS Glue Apache Spark extract, transform, and load (ETL) job. The job contains a sort-merge join of two large and equally sized DataFrames.

The job is failing with the following error: No space left on device.

Which solution will resolve the error?

A.

Use the AWS Glue Spark shuffle manager.

B.

Deploy an Amazon Elastic Block Store (Amazon EBS) volume for the job to use.

C.

Convert the sort-merge join in the job to be a broadcast join.

D.

Convert the DataFrames to DynamicFrames, and perform a DynamicFrame join in the job.

Full Access
Question # 31

A company wants to ingest streaming data into an Amazon Redshift data warehouse from an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster. A data engineer needs to develop a solution that provides low data access time and that optimizes storage costs.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Create an external schema that maps to the MSK cluster. Create a materialized view that references the external schema to consume the streaming data from the MSK topic.

B.

Develop an AWS Glue streaming extract, transform, and load (ETL) job to process the incoming data from Amazon MSK. Load the data into Amazon S3. Use Amazon Redshift Spectrum to read the data from Amazon S3.

C.

Create an external schema that maps to the streaming data source. Create a new Amazon Redshift table that references the external schema.

D.

Create an Amazon S3 bucket. Ingest the data from Amazon MSK. Create an event-driven AWS Lambda function to load the data from the S3 bucket to a new Amazon Redshift table.

Full Access
Question # 32

A data engineer is designing a new data lake architecture for a company. The data engineer plans to use Apache Iceberg tables and AWS Glue Data Catalog to achieve fast query performance and enhanced metadata handling. The data engineer needs to query historical data for trend analysis and optimize storage costs for a large volume of event data.

Which solution will meet these requirements with the LEAST development effort?

A.

Store Iceberg table data files in Amazon S3 Intelligent-Tiering.

B.

Define partitioning schemes based on event type and event date.

C.

Use AWS Glue Data Catalog to automatically optimize Iceberg storage.

D.

Run a custom AWS Glue job to compact Iceberg table data files.

Full Access
Go to page: