Free Practice Questions for Databricks Certified Associate Developer for Apache Spark Certification

📚 Exam Guide: Oct 30, 2025

🔄 Last checked for updates July 6th, 2026

Study with 353 exam-style practice questions designed to help you prepare for the Databricks Certified Associate Developer for Apache Spark. All questions are aligned with the latest exam guide and include detailed explanations to help you master the material.

All Domains

Practice with randomly mixed questions from all topics

Question MixAll Topics

FormatRandom Order

Domain Mode

Practice questions from a specific topic area

Select Domain

Quiz History

Exam Details

Key information about Databricks Certified Associate Developer for Apache Spark

Official study guide

View

Question formats CertSafari offers

Multiple choice

renewal:

Retake the full exam every 2 years

test aides:

None

prerequisites:

None required; 6 months hands-on Apache Spark experience recommended

delivery method:

Online proctored and test center

time limit minutes:

number of questions:

certification validity:

2 years

Exam Topics & Skills Assessed

Skills measured (from the official study guide)

Domain 1: Apache Spark Architecture and Components

Subdomain 1.1: Identify the advantages and challenges of implementing Spark.

Identify the advantages and challenges of implementing Spark.

Subdomain 1.2: Identify the role of core components of Apache Spark™'s Architecture, including cluster, driver node, worker nodes/executors, CPU cores, and memory.

Identify the role of core components of Apache Spark™'s Architecture, including cluster, driver node, worker nodes/executors, CPU cores, and memory.

Subdomain 1.3: Describe the architecture of Apache Spark™, including DataFrame and Dataset concepts, SparkSession lifecycle, caching, storage levels, and garbage collection.

Describe the architecture of Apache Spark™, including DataFrame and Dataset concepts, SparkSession lifecycle, caching, storage levels, and garbage collection.

Subdomain 1.4: Explain the Apache Spark™ Architecture execution hierarchy.

Explain the Apache Spark™ Architecture execution hierarchy.

Subdomain 1.5: Configure Spark partitioning in distributed data processing, including shuffles and partitions

Configure Spark partitioning in distributed data processing, including shuffles and partitions

Subdomain 1.6: Describe the execution patterns of the Apache Spark™ engine, including actions, transformations, and lazy evaluation.

Describe the execution patterns of the Apache Spark™ engine, including actions, transformations, and lazy evaluation.

Subdomain 1.7: Identify the features of the Apache Spark Modules, including Core, Spark SQL, DataFrames, Pandas API on Spark, Structured Streaming, and MLib.

Identify the features of the Apache Spark Modules, including Core, Spark SQL, DataFrames, Pandas API on Spark, Structured Streaming, and MLib.

Domain 2: Using Spark SQL

Subdomain 2.1: Utilize common data sources such as JDBC, files, etc., to efficiently read from and write to Spark DataFrames using Spark SQL, including overwriting and partitioning by column.

Utilize common data sources such as JDBC, files, etc., to efficiently read from and write to Spark DataFrames using Spark SQL, including overwriting and partitioning by column.

Subdomain 2.2: Execute SQL queries directly on files, including ORC Files, JSON Files, CSV Files, Text Files, and Delta Files, and understand the different save modes for outputting data in Spark SQL.

Execute SQL queries directly on files, including ORC Files, JSON Files, CSV Files, Text Files, and Delta Files, and understand the different save modes for outputting data in Spark SQL.

Subdomain 2.3: Save data to persistent tables while applying sorting and partitioning to optimize data retrieval.

Save data to persistent tables while applying sorting and partitioning to optimize data retrieval.

Subdomain 2.4: Register DataFrames as temporary views in Spark SQL, allowing them to be queried with SQL syntax.

Domain 3: Developing Apache Spark™ DataFrame/DataSet API Applications

Subdomain 3.1: Manipulate columns, rows, and table structures by adding, dropping, splitting, renaming column names, applying filters, and exploding arrays.

Manipulate columns, rows, and table structures by adding, dropping, splitting, renaming column names, applying filters, and exploding arrays.

Subdomain 3.2: Perform data deduplication and validation operations on DataFrames.

Perform data deduplication and validation operations on DataFrames.

Subdomain 3.3: Perform aggregate operations on DataFrames such as count, approximate count distinct, and mean, summary.

Perform aggregate operations on DataFrames such as count, approximate count distinct, and mean, summary.

Subdomain 3.4: Manipulate and utilize Date data type, such as Unix epoch to date string, and extract date component.

Manipulate and utilize Date data type, such as Unix epoch to date string, and extract date component.

Subdomain 3.5: Combine DataFrames with operations such as Inner join, left join, broadcast join, multiple keys, cross join, union, and union all.

Combine DataFrames with operations such as Inner join, left join, broadcast join, multiple keys, cross join, union, and union all.

Subdomain 3.6: Manage input and output operations by writing, overwriting, and reading DataFrames with schemas.

Manage input and output operations by writing, overwriting, and reading DataFrames with schemas.

Subdomain 3.7: Perform operations on DataFrames such as sorting, iterating, printing schema, and conversion between DataFrame and sequence/list formats.

Perform operations on DataFrames such as sorting, iterating, printing schema, and conversion between DataFrame and sequence/list formats.

Subdomain 3.8: Create and invoke user-defined functions with or without stateful operators, including StateStores.

Create and invoke user-defined functions with or without stateful operators, including StateStores.

Subdomain 3.9: Describe different types of variables in Spark, including broadcast variables and accumulators.

Describe different types of variables in Spark, including broadcast variables and accumulators.

Subdomain 3.10: Describe the purpose and implementation of broadcast joins

Describe the purpose and implementation of broadcast joins

Domain 4: Troubleshooting and Tuning Apache Spark DataFrame API Applications

Subdomain 4.1: Implement performance tuning strategies & optimize cluster utilization, including partitioning, repartitioning, coalescing, identifying data skew, and reducing shuffling

Implement performance tuning strategies & optimize cluster utilization, including partitioning, repartitioning, coalescing, identifying data skew, and reducing shuffling

Subdomain 4.2: Describe Adaptive Query Execution (AQE) and its benefits.

Describe Adaptive Query Execution (AQE) and its benefits.

Subdomain 4.3: Perform logging and monitoring of Spark applications - publish, customize, and analyze Driver logs and Executor logs to diagnose out-of-memory errors, cluster underutilization, etc.

Perform logging and monitoring of Spark applications - publish, customize, and analyze Driver logs and Executor logs to diagnose out-of-memory errors, cluster underutilization, etc.

Domain 5: Structured Streaming

Subdomain 5.1: Explain the Structured Streaming engine in Spark, including its functions, programming model, micro-batch processing, exactly-once semantics, and fault tolerance mechanisms.

Explain the Structured Streaming engine in Spark, including its functions, programming model, micro-batch processing, exactly-once semantics, and fault tolerance mechanisms.

Subdomain 5.2: Create and write Streaming DataFrames and Streaming Datasets, including the basic output modes and output sinks.

Create and write Streaming DataFrames and Streaming Datasets, including the basic output modes and output sinks.

Subdomain 5.3: Perform basic operations on Streaming DataFrames and Streaming Datasets, such as selection, projection, window and aggregation.

Perform basic operations on Streaming DataFrames and Streaming Datasets, such as selection, projection, window and aggregation.

Subdomain 5.4: Perform Streaming Deduplication in Structured Streaming, both with and without watermark usage.

Perform Streaming Deduplication in Structured Streaming, both with and without watermark usage.

Domain 6: Using Spark Connect to deploy applications

Subdomain 6.1: Describe the features of Spark Connect.

Describe the features of Spark Connect.

Subdomain 6.2: Describe the different deployment mode types (Client, Cluster, Local) in the Apache Spark™ environment.

Describe the different deployment mode types (Client, Cluster, Local) in the Apache Spark™ environment.

Domain 7: Using Pandas API on Spark

Subdomain 7.1: Explain the advantages of using Pandas API on Spark.

Explain the advantages of using Pandas API on Spark.

Subdomain 7.2: Create and invoke Pandas UDF.

Create and invoke Pandas UDF.

Techniques & products

Apache Spark

Spark Architecture

Cluster

Driver node

Worker nodes

Executors

CPU cores

Memory

DataFrame

Dataset

SparkSession

Caching

Storage levels

Garbage collection

Spark partitioning

Shuffles

Partitions

Actions

Transformations

Lazy evaluation

Spark Core

Spark SQL

Pandas API on Spark

Structured Streaming

MLlib

JDBC

ORC Files

JSON Files

CSV Files

Text Files

Delta Files

Persistent tables

Temporary views

SQL syntax

UDFs

Stateful operators

StateStores

Broadcast variables

Accumulators

Broadcast joins

Performance tuning

Cluster utilization

Repartitioning

Coalescing

Data skew

Adaptive Query Execution (AQE)

Logging

Monitoring

Driver logs

Executor logs

Micro-batch processing

Exactly-once semantics

Fault tolerance mechanisms

Streaming DataFrames

Streaming Datasets

Output modes

Output sinks

Window operations

Streaming Deduplication

Watermark usage

Spark Connect

Client deployment mode

Cluster deployment mode

Local deployment mode

Pandas UDF

Start Practicing