Databricks Certified Associate Developer for Apache Spark 3.5 - Python - Associate-Developer-Apache-Spark-3.5 Exam Practice Test

Question 1
A Spark application is experiencing performance issues in client mode because the driver is resource-constrained.
How should this issue be resolved?

Correct Answer: A
Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).
Question 2
31 of 55.
Given a DataFrame df that has 10 partitions, after running the code:
df.repartition(20)
How many partitions will the result DataFrame have?

Correct Answer: D
Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).
Question 3
Which configuration can be enabled to optimize the conversion between Pandas and PySpark DataFrames using Apache Arrow?

Correct Answer: A
Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).
Question 4
2 of 55. Which command overwrites an existing JSON file when writing a DataFrame?

Correct Answer: B
Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).
Question 5
24 of 55.
Which code should be used to display the schema of the Parquet file stored in the location events.parquet?

Correct Answer: A
Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).
Question 6
34 of 55.
A data engineer is investigating a Spark cluster that is experiencing underutilization during scheduled batch jobs.
After checking the Spark logs, they noticed that tasks are often getting killed due to timeout errors, and there are several warnings about insufficient resources in the logs.
Which action should the engineer take to resolve the underutilization issue?

Correct Answer: C
Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).
Question 7
Given the schema:

event_ts TIMESTAMP,
sensor_id STRING,
metric_value LONG,
ingest_ts TIMESTAMP,
source_file_path STRING
The goal is to deduplicate based on: event_ts, sensor_id, and metric_value.
Options:

Correct Answer: B
Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).
Question 8
49 of 55.
In the code block below, aggDF contains aggregations on a streaming DataFrame:
aggDF.writeStream \
.format("console") \
.outputMode("???") \
.start()
Which output mode at line 3 ensures that the entire result table is written to the console during each trigger execution?

Correct Answer: D
Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).
Question 9
42 of 55.
A developer needs to write the output of a complex chain of Spark transformations to a Parquet table called events.liveLatest.
Consumers of this table query it frequently with filters on both year and month of the event_ts column (a timestamp).
The current code:
from pyspark.sql import functions as F
final = df.withColumn("event_year", F.year("event_ts")) \
.withColumn("event_month", F.month("event_ts")) \
.bucketBy(42, ["event_year", "event_month"]) \
.saveAsTable("events.liveLatest")
However, consumers report poor query performance.
Which change will enable efficient querying by year and month?

Correct Answer: C
Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).
Question 10
Which feature of Spark Connect is considered when designing an application to enable remote interaction with the Spark cluster?

Correct Answer: A
Explanation: Only visible for Actualtests4sure members. You can sign-up / login (it's free).