Labour Day Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: scxmas70

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam Dumps - Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Question # 4

Which of the following code blocks returns a copy of DataFrame transactionsDf in which column productId has been renamed to productNumber?

A.

transactionsDf.withColumnRenamed("productId", "productNumber")

B.

transactionsDf.withColumn("productId", "productNumber")

C.

transactionsDf.withColumnRenamed("productNumber", "productId")

D.

transactionsDf.withColumnRenamed(col(productId), col(productNumber))

E.

transactionsDf.withColumnRenamed(productId, productNumber)

Full Access
Question # 5

Which of the following code blocks returns a new DataFrame with only columns predError and values of every second row of DataFrame transactionsDf?

Entire DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

A.

transactionsDf.filter(col("transactionId").isin([3,4,6])).select([predError, value])

B.

transactionsDf.select(col("transactionId").isin([3,4,6]), "predError", "value")

C.

transactionsDf.filter("transactionId" % 2 == 0).select("predError", "value")

D.

transactionsDf.filter(col("transactionId") % 2 == 0).select("predError", "value")

(Correct)

E.

1.transactionsDf.createOrReplaceTempView("transactionsDf")

2.spark.sql("FROM transactionsDf SELECT predError, value WHERE transactionId % 2 = 2")

F.

transactionsDf.filter(col(transactionId).isin([3,4,6]))

Full Access
Question # 6

Which of the following code blocks returns a DataFrame showing the mean value of column "value" of DataFrame transactionsDf, grouped by its column storeId?

A.

transactionsDf.groupBy(col(storeId).avg())

B.

transactionsDf.groupBy("storeId").avg(col("value"))

C.

transactionsDf.groupBy("storeId").agg(avg("value"))

D.

transactionsDf.groupBy("storeId").agg(average("value"))

E.

transactionsDf.groupBy("value").average()

Full Access
Question # 7

Which of the following statements about the differences between actions and transformations is correct?

A.

Actions are evaluated lazily, while transformations are not evaluated lazily.

B.

Actions generate RDDs, while transformations do not.

C.

Actions do not send results to the driver, while transformations do.

D.

Actions can be queued for delayed execution, while transformations can only be processed immediately.

E.

Actions can trigger Adaptive Query Execution, while transformation cannot.

Full Access
Question # 8

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

A.

transactionsDf.drop(["predError", "value"])

B.

transactionsDf.drop("predError", "value")

C.

transactionsDf.drop(col("predError"), col("value"))

D.

transactionsDf.drop(predError, value)

E.

transactionsDf.drop("predError & value")

Full Access
Question # 9

Which of the following is not a feature of Adaptive Query Execution?

A.

Replace a sort merge join with a broadcast join, where appropriate.

B.

Coalesce partitions to accelerate data processing.

C.

Split skewed partitions into smaller partitions to avoid differences in partition processing time.

D.

Reroute a query in case of an executor failure.

E.

Collect runtime statistics during query execution.

Full Access
Question # 10

The code block shown below should return a copy of DataFrame transactionsDf with an added column cos. This column should have the values in column value converted to degrees and having

the cosine of those converted values taken, rounded to two decimals. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

transactionsDf.__1__(__2__, round(__3__(__4__(__5__)),2))

A.

1. withColumn

2. col("cos")

3. cos

4. degrees

5. transactionsDf.value

B.

1. withColumnRenamed

2. "cos"

3. cos

4. degrees

5. "transactionsDf.value"

C.

1. withColumn

2. "cos"

3. cos

4. degrees

5. transactionsDf.value

D.

1. withColumn

2. col("cos")

3. cos

4. degrees

5. col("value")

E

. 1. withColumn

2. "cos"

3. degrees

4. cos

5. col("value")

Full Access
Question # 11

Which of the following describes Spark's Adaptive Query Execution?

A.

Adaptive Query Execution features include dynamically coalescing shuffle partitions, dynamically injecting scan filters, and dynamically optimizing skew joins.

B.

Adaptive Query Execution is enabled in Spark by default.

C.

Adaptive Query Execution reoptimizes queries at execution points.

D.

Adaptive Query Execution features are dynamically switching join strategies and dynamically optimizing skew joins.

E.

Adaptive Query Execution applies to all kinds of queries.

Full Access
Question # 12

Which of the following code blocks silently writes DataFrame itemsDf in avro format to location fileLocation if a file does not yet exist at that location?

A.

itemsDf.write.avro(fileLocation)

B.

itemsDf.write.format("avro").mode("ignore").save(fileLocation)

C.

itemsDf.write.format("avro").mode("errorifexists").save(fileLocation)

D.

itemsDf.save.format("avro").mode("ignore").write(fileLocation)

E.

spark.DataFrameWriter(itemsDf).format("avro").write(fileLocation)

Full Access
Question # 13

The code block displayed below contains an error. The code block should return DataFrame transactionsDf, but with the column storeId renamed to storeNumber. Find the error.

Code block:

transactionsDf.withColumn("storeNumber", "storeId")

A.

Instead of withColumn, the withColumnRenamed method should be used.

B.

Arguments "storeNumber" and "storeId" each need to be wrapped in a col() operator.

C.

Argument "storeId" should be the first and argument "storeNumber" should be the second argument to the withColumn method.

D.

The withColumn operator should be replaced with the copyDataFrame operator.

E.

Instead of withColumn, the withColumnRenamed method should be used and argument "storeId" should be the first and argument "storeNumber" should be the second argument to that method.

Full Access
Question # 14

The code block displayed below contains an error. The code block should return a DataFrame where all entries in column supplier contain the letter combination et in this order. Find the error.

Code block:

itemsDf.filter(Column('supplier').isin('et'))

A.

The Column operator should be replaced by the col operator and instead of isin, contains should be used.

B.

The expression inside the filter parenthesis is malformed and should be replaced by isin('et', 'supplier').

C.

Instead of isin, it should be checked whether column supplier contains the letters et, so isin should be replaced with contains. In addition, the column should be accessed using col['supplier'].

D.

The expression only returns a single column and filter should be replaced by select.

Full Access
Question # 15

Which of the following code blocks shows the structure of a DataFrame in a tree-like way, containing both column names and types?

A.

1.print(itemsDf.columns)

2.print(itemsDf.types)

B.

itemsDf.printSchema()

C.

spark.schema(itemsDf)

D.

itemsDf.rdd.printSchema()

E.

itemsDf.print.schema()

Full Access
Question # 16

Which of the following code blocks applies the Python function to_limit on column predError in table transactionsDf, returning a DataFrame with columns transactionId and result?

A.

1.spark.udf.register("LIMIT_FCN", to_limit)

2.spark.sql("SELECT transactionId, LIMIT_FCN(predError) AS result FROM transactionsDf")

(Correct)

B.

1.spark.udf.register("LIMIT_FCN", to_limit)

2.spark.sql("SELECT transactionId, LIMIT_FCN(predError) FROM transactionsDf AS result")

C.

1.spark.udf.register("LIMIT_FCN", to_limit)

2.spark.sql("SELECT transactionId, to_limit(predError) AS result FROM transactionsDf")

spark.sql("SELECT transactionId, udf(to_limit(predError)) AS result FROM transactionsDf")

D.

1.spark.udf.register(to_limit, "LIMIT_FCN")

2.spark.sql("SELECT transactionId, LIMIT_FCN(predError) AS result FROM transactionsDf")

Full Access
Question # 17

The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to

accomplish this.

transactionsDf.__1__(__2__.__3__(__4__))

A.

1. select

2. col("storeId")

3. cast

4. StringType

B.

1. select

2. col("storeId")

3. as

4. StringType

C.

1. cast

2. "storeId"

3. as

4. StringType()

D.

1. select

2. col("storeId")

3. cast

4. StringType()

E.

1. select

2. storeId

3. cast

4. StringType()

Full Access
Question # 18

Which of the following code blocks returns a 2-column DataFrame that shows the distinct values in column productId and the number of rows with that productId in DataFrame transactionsDf?

A.

transactionsDf.count("productId").distinct()

B.

transactionsDf.groupBy("productId").agg(col("value").count())

C.

transactionsDf.count("productId")

D.

transactionsDf.groupBy("productId").count()

E.

transactionsDf.groupBy("productId").select(count("value"))

Full Access
Question # 19

Which of the following describes characteristics of the Spark UI?

A.

Via the Spark UI, workloads can be manually distributed across executors.

B.

Via the Spark UI, stage execution speed can be modified.

C.

The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.

D.

There is a place in the Spark UI that shows the property spark.executor.memory.

E.

Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Full Access
Question # 20

Which of the following describes Spark's standalone deployment mode?

A.

Standalone mode uses a single JVM to run Spark driver and executor processes.

B.

Standalone mode means that the cluster does not contain the driver.

C.

Standalone mode is how Spark runs on YARN and Mesos clusters.

D.

Standalone mode uses only a single executor per worker per application.

E.

Standalone mode is a viable solution for clusters that run multiple frameworks, not only Spark.

Full Access
Question # 21

Which of the following code blocks returns the number of unique values in column storeId of DataFrame transactionsDf?

A.

transactionsDf.select("storeId").dropDuplicates().count()

B.

transactionsDf.select(count("storeId")).dropDuplicates()

C.

transactionsDf.select(distinct("storeId")).count()

D.

transactionsDf.dropDuplicates().agg(count("storeId"))

E.

transactionsDf.distinct().select("storeId").count()

Full Access
Question # 22

The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate

row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes

contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:

itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))

A.

1. filter

2. array_contains("cozy")

3. select

4. "itemId"

5. explode

6. "attributes"

B.

1. where

2. "array_contains(attributes, 'cozy')"

3. select

4. itemId

5. explode

6. attributes

C.

1. filter

2. "array_contains(attributes, 'cozy')"

3. select

4. "itemId"

5. map

6. "attributes"

D.

1. filter

2. "array_contains(attributes, cozy)"

3. select

4. "itemId"

5. explode

6. "attributes"

E.

1. filter

2. "array_contains(attributes, 'cozy')"

3. select

4. "itemId"

5. explode

6. "attributes"

Full Access
Question # 23

Which of the following code blocks performs an inner join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively, excluding columns value and storeId from

DataFrame transactionsDf and column attributes from DataFrame itemsDf?

A.

transactionsDf.drop('value', 'storeId').join(itemsDf.select('attributes'), transactionsDf.productId==itemsDf.itemId)

B.

1.transactionsDf.createOrReplaceTempView('transactionsDf')

2.itemsDf.createOrReplaceTempView('itemsDf')

3.

4.spark.sql("SELECT -value, -storeId FROM transactionsDf INNER JOIN itemsDf ON productId==itemId").drop("attributes")

C.

transactionsDf.drop("value", "storeId").join(itemsDf.drop("attributes"), "transactionsDf.productId==itemsDf.itemId")

D.

1.transactionsDf \

2. .drop(col('value'), col('storeId')) \

3. .join(itemsDf.drop(col('attributes')), col('productId')==col('itemId'))

E.

1.transactionsDf.createOrReplaceTempView('transactionsDf')

2.itemsDf.createOrReplaceTempView('itemsDf')

3.

4.statement = """

5.SELECT * FROM transactionsDf

6.INNER JOIN itemsDf

7.ON transactionsDf.productId==itemsDf.itemId

8."""

9.spark.sql(statement).drop("value", "storeId", "attributes")

Full Access
Question # 24

The code block displayed below contains an error. The code block should use Python method find_most_freq_letter to find the letter present most in column itemName of DataFrame itemsDf and

return it in a new column most_frequent_letter. Find the error.

Code block:

1. find_most_freq_letter_udf = udf(find_most_freq_letter)

2. itemsDf.withColumn("most_frequent_letter", find_most_freq_letter("itemName"))

A.

Spark is not using the UDF method correctly.

B.

The UDF method is not registered correctly, since the return type is missing.

C.

The "itemName" expression should be wrapped in col().

D.

UDFs do not exist in PySpark.

E.

Spark is not adding a column.

Full Access
Question # 25

The code block shown below should set the number of partitions that Spark uses when shuffling data for joins or aggregations to 100. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

spark.sql.shuffle.partitions

__1__.__2__.__3__(__4__, 100)

A.

1. spark

2. conf

3. set

4. "spark.sql.shuffle.partitions"

B.

1. pyspark

2. config

3. set

4. spark.shuffle.partitions

C.

1. spark

2. conf

3. get

4. "spark.sql.shuffle.partitions"

D.

1. pyspark

2. config

3. set

4. "spark.sql.shuffle.partitions"

E.

1. spark

2. conf

3. set

4. "spark.sql.aggregate.partitions"

Full Access
Question # 26

The code block displayed below contains an error. The code block should arrange the rows of DataFrame transactionsDf using information from two columns in an ordered fashion, arranging first by

column value, showing smaller numbers at the top and greater numbers at the bottom, and then by column predError, for which all values should be arranged in the inverse way of the order of items

in column value. Find the error.

Code block:

transactionsDf.orderBy('value', asc_nulls_first(col('predError')))

A.

Two orderBy statements with calls to the individual columns should be chained, instead of having both columns in one orderBy statement.

B.

Column value should be wrapped by the col() operator.

C.

Column predError should be sorted in a descending way, putting nulls last.

D.

Column predError should be sorted by desc_nulls_first() instead.

E.

Instead of orderBy, sort should be used.

Full Access
Question # 27

Which of the following code blocks returns a one-column DataFrame of all values in column supplier of DataFrame itemsDf that do not contain the letter X? In the DataFrame, every value should

only be listed once.

Sample of DataFrame itemsDf:

1.+------+--------------------+--------------------+-------------------+

2.|itemId| itemName| attributes| supplier|

3.+------+--------------------+--------------------+-------------------+

4.| 1|Thick Coat for Wa...|[blue, winter, cozy]|Sports Company Inc.|

5.| 2|Elegant Outdoors ...|[red, summer, fre...| YetiX|

6.| 3| Outdoors Backpack|[green, summer, t...|Sports Company Inc.|

7.+------+--------------------+--------------------+-------------------+

A.

itemsDf.filter(col(supplier).not_contains('X')).select(supplier).distinct()

B.

itemsDf.select(~col('supplier').contains('X')).distinct()

C.

itemsDf.filter(not(col('supplier').contains('X'))).select('supplier').unique()

D.

itemsDf.filter(~col('supplier').contains('X')).select('supplier').distinct()

E.

itemsDf.filter(!col('supplier').contains('X')).select(col('supplier')).unique()

Full Access