Databricks Certified Associate Developer for Apache Spark 3.5 - Python - Associate-Developer-Apache-Spark-3.5 Exam Practice Test

Question 1

A Spark application is experiencing performance issues in client mode because the driver is resource-constrained.
How should this issue be resolved?

A. Switch the deployment mode to cluster mode

B. Increase the driver memory on the client machine

C. Switch the deployment mode to local mode

D. Add more executor instances to the cluster

Correct Answer: A

Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).

Question 2

31 of 55.
Given a DataFrame df that has 10 partitions, after running the code:
df.repartition(20)
How many partitions will the result DataFrame have?

A. 5

B. 10

C. Same number as the cluster executors

D. 20

Correct Answer: D

Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).

Question 3

Which configuration can be enabled to optimize the conversion between Pandas and PySpark DataFrames using Apache Arrow?

A. spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

B. spark.conf.set("spark.sql.arrow.pandas.enabled", "true")

C. spark.conf.set("spark.sql.execution.arrow.enabled", "true")

D. spark.conf.set("spark.pandas.arrow.enabled", "true")

Correct Answer: A

Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).

Question 4

2 of 55. Which command overwrites an existing JSON file when writing a DataFrame?

A. df.write.json("path/to/file")

B. df.write.mode("overwrite").json("path/to/file")

C. df.write.option("overwrite").json("path/to/file")

D. df.write.mode("append").json("path/to/file")

Correct Answer: B

Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).

Question 5

24 of 55.
Which code should be used to display the schema of the Parquet file stored in the location events.parquet?

A. spark.read.parquet("events.parquet").printSchema()

B. spark.sql("SELECT schema FROM events.parquet").show()

C. spark.sql("SELECT * FROM events.parquet").show()

D. spark.read.format("parquet").load("events.parquet").show()

Correct Answer: A

Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).

Question 6

34 of 55.
A data engineer is investigating a Spark cluster that is experiencing underutilization during scheduled batch jobs.
After checking the Spark logs, they noticed that tasks are often getting killed due to timeout errors, and there are several warnings about insufficient resources in the logs.
Which action should the engineer take to resolve the underutilization issue?

A. Increase the executor memory allocation in the Spark configuration.

B. Reduce the size of the data partitions to improve task scheduling.

C. Increase the number of executor instances to handle more concurrent tasks.

D. Set the spark.network.timeout property to allow tasks more time to complete without being killed.

Correct Answer: C

Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).

Question 7

Given the schema:

event_ts TIMESTAMP,
sensor_id STRING,
metric_value LONG,
ingest_ts TIMESTAMP,
source_file_path STRING
The goal is to deduplicate based on: event_ts, sensor_id, and metric_value.
Options:

A. dropDuplicates on all columns (wrong criteria)

B. dropDuplicates on the exact matching fields

C. groupBy without aggregation (invalid use)

D. dropDuplicates with no arguments (removes based on all columns)

Correct Answer: B

Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).

Question 8

49 of 55.
In the code block below, aggDF contains aggregations on a streaming DataFrame:
aggDF.writeStream \
.format("console") \
.outputMode("???") \
.start()
Which output mode at line 3 ensures that the entire result table is written to the console during each trigger execution?

A. AGGREGATE

B. REPLACE

C. APPEND

D. COMPLETE

Correct Answer: D

Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).

Question 9

42 of 55.
A developer needs to write the output of a complex chain of Spark transformations to a Parquet table called events.liveLatest.
Consumers of this table query it frequently with filters on both year and month of the event_ts column (a timestamp).
The current code:
from pyspark.sql import functions as F
final = df.withColumn("event_year", F.year("event_ts")) \
.withColumn("event_month", F.month("event_ts")) \
.bucketBy(42, ["event_year", "event_month"]) \
.saveAsTable("events.liveLatest")
However, consumers report poor query performance.
Which change will enable efficient querying by year and month?

A. Add .sortBy() after .bucketBy()

B. Change the bucket count (42) to a lower number

C. Replace .bucketBy() with .partitionBy("event_year", "event_month")

D. Replace .bucketBy() with .partitionBy("event_year") only

Correct Answer: C

Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).

Question 10

Which feature of Spark Connect is considered when designing an application to enable remote interaction with the Spark cluster?

A. It allows for remote execution of Spark jobs

B. It can be used to interact with any remote cluster using the REST API

C. It provides a way to run Spark applications remotely in any programming language

D. It is primarily used for data ingestion into Spark from external sources

Correct Answer: A

Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).

Databricks Certified Associate Developer for Apache Spark 3.5 - Python - Associate-Developer-Apache-Spark-3.5 Exam Practice Test

Latest Update

Useful Links

Contact Us