Free Practice Questions for Databricks Certified Associate Developer for Apache Spark Certification

    šŸ”„ Last checked for updates February 24th, 2026

    Study with 353 exam-style practice questions designed to help you prepare for the Databricks Certified Associate Developer for Apache Spark. All questions are aligned with the latest exam guide and include detailed explanations to help you master the material.

    Start Practicing

    Random Questions

    Practice with randomly mixed questions from all topics

    Question MixAll Topics
    FormatRandom Order

    Domain Mode

    Practice questions from a specific topic area

    Exam Information

    Exam Details

    Key information about Databricks Certified Associate Developer for Apache Spark

    Official study guide:

    View

    renewal:

    Retake the full exam every 2 years

    test aides:

    None

    prerequisites:

    None required; 6 months hands-on Apache Spark experience recommended

    delivery method:

    Online proctored and test center

    time limit minutes:

    90

    number of questions:

    45

    certification validity:

    2 years

    Exam Topics & Skills Assessed

    Skills measured (from the official study guide)

    Domain 1: Apache Spark Architecture and Components

    Subdomain 1.1: Identify the advantages and challenges of implementing Spark.

    Identify the advantages and challenges of implementing Spark.

    Subdomain 1.2: Identify the role of core components of Apache Sparkā„¢'s Architecture, including cluster, driver node, worker nodes/executors, CPU cores, and memory.

    Identify the role of core components of Apache Sparkā„¢'s Architecture, including cluster, driver node, worker nodes/executors, CPU cores, and memory.

    Subdomain 1.3: Describe the architecture of Apache Sparkā„¢, including DataFrame and Dataset concepts, SparkSession lifecycle, caching, storage levels, and garbage collection.

    Describe the architecture of Apache Sparkā„¢, including DataFrame and Dataset concepts, SparkSession lifecycle, caching, storage levels, and garbage collection.

    Subdomain 1.4: Explain the Apache Sparkā„¢ Architecture execution hierarchy..

    Explain the Apache Sparkā„¢ Architecture execution hierarchy..

    Subdomain 1.5: Configure Spark partitioning in distributed data processing, including shuffles and partitions

    Configure Spark partitioning in distributed data processing, including shuffles and partitions

    Subdomain 1.6: Describe the execution patterns of the Apache Sparkā„¢ engine, including actions, transformations, and lazy evaluation.

    Describe the execution patterns of the Apache Sparkā„¢ engine, including actions, transformations, and lazy evaluation.

    Subdomain 1.7: Identify the features of the Apache Spark Modules, including Core, Spark SQL, DataFrames, Pandas API on Spark, Structured Streaming, and MLlib.

    Identify the features of the Apache Spark Modules, including Core, Spark SQL, DataFrames, Pandas API on Spark, Structured Streaming, and MLlib.

    Domain 2: Using Spark SQL

    Subdomain 2.1: Utilize common data sources such as JDBC, files, etc., to efficiently read from and write to Spark DataFrames using Spark SQL, including overwriting and partitioning by column.

    Utilize common data sources such as JDBC, files, etc., to efficiently read from and write to Spark DataFrames using Spark SQL, including overwriting and partitioning by column.

    Subdomain 2.2: Execute SQL queries directly on files, including ORC Files, JSON Files, CSV Files, Text Files, and Delta Files, and understand the different save modes for outputting data in Spark SQL.

    Execute SQL queries directly on files, including ORC Files, JSON Files, CSV Files, Text Files, and Delta Files, and understand the different save modes for outputting data in Spark SQL.

    Subdomain 2.3: Save data to persistent tables while applying sorting and partitioning to optimize data retrieval.

    Save data to persistent tables while applying sorting and partitioning to optimize data retrieval.

    Subdomain 2.4: Register DataFrames as temporary views in Spark SQL, allowing them to be queried with SQL syntax.

    Register DataFrames as temporary views in Spark SQL, allowing them to be queried with SQL syntax.

    Domain 3: Developing Apache Sparkā„¢ DataFrame/DataSet API Applications

    Subdomain 3.1: Manipulate columns, rows, and table structures by adding, dropping, splitting, renaming column names, applying filters, and exploding arrays.

    Manipulate columns, rows, and table structures by adding, dropping, splitting, renaming column names, applying filters, and exploding arrays.

    Subdomain 3.2: Perform data deduplication and validation operations on DataFrames.

    Perform data deduplication and validation operations on DataFrames.

    Subdomain 3.3: Perform aggregate operations on DataFrames such as count, approximate count distinct, and mean, summary.

    Perform aggregate operations on DataFrames such as count, approximate count distinct, and mean, summary.

    Subdomain 3.4: Manipulate and utilize Date data type, such as Unix epoch to date string, and extract date component.

    Manipulate and utilize Date data type, such as Unix epoch to date string, and extract date component.

    Subdomain 3.5: Combine DataFrames with operations such as Inner join, left join, broadcast join, multiple keys, cross join, union, and union all.

    Combine DataFrames with operations such as Inner join, left join, broadcast join, multiple keys, cross join, union, and union all.

    Subdomain 3.6: Manage input and output operations by writing, overwriting, and reading DataFrames with schemas.

    Manage input and output operations by writing, overwriting, and reading DataFrames with schemas.

    Subdomain 3.7: Perform operations on DataFrames such as sorting, iterating, printing schema, and conversion between DataFrame and sequence/list formats.

    Perform operations on DataFrames such as sorting, iterating, printing schema, and conversion between DataFrame and sequence/list formats.

    Subdomain 3.8: Create and invoke user-defined functions with or without stateful operators, including StateStores.

    Create and invoke user-defined functions with or without stateful operators, including StateStores.

    Subdomain 3.9: Describe different types of variables in Spark, including broadcast variables and accumulators.

    Describe different types of variables in Spark, including broadcast variables and accumulators.

    Subdomain 3.10: Describe the purpose and implementation of broadcast joins

    Describe the purpose and implementation of broadcast joins

    Domain 4: Troubleshooting and Tuning Apache Spark DataFrame API Applications.

    Subdomain 4.1: Implement performance tuning strategies & optimize cluster utilization, including partitioning, repartitioning, coalescing, identifying data skew, and reducing shuffling

    Implement performance tuning strategies & optimize cluster utilization, including partitioning, repartitioning, coalescing, identifying data skew, and reducing shuffling

    Subdomain 4.2: Describe Adaptive Query Execution (AQE) and its benefits.

    Describe Adaptive Query Execution (AQE) and its benefits.

    Subdomain 4.3: Perform logging and monitoring of Spark applications - publish, customize, and analyze Driver logs and Executor logs to diagnose out-of-memory errors, cluster underutilization, etc.

    Perform logging and monitoring of Spark applications - publish, customize, and analyze Driver logs and Executor logs to diagnose out-of-memory errors, cluster underutilization, etc.

    Domain 5: Structured Streaming

    Subdomain 5.1: Explain the Structured Streaming engine in Spark, including its functions, programming model, micro-batch processing, exactly-once semantics, and fault tolerance mechanisms.

    Explain the Structured Streaming engine in Spark, including its functions, programming model, micro-batch processing, exactly-once semantics, and fault tolerance mechanisms.

    Subdomain 5.2: Create and write Streaming DataFrames and Streaming Datasets, including the basic output modes and output sinks.

    Create and write Streaming DataFrames and Streaming Datasets, including the basic output modes and output sinks.

    Subdomain 5.3: Perform basic operations on Streaming DataFrames and Streaming Datasets, such as selection, projection, window and aggregation.

    Perform basic operations on Streaming DataFrames and Streaming Datasets, such as selection, projection, window and aggregation.

    Subdomain 5.4: Perform Streaming Deduplication in Structured Streaming, both with and without watermark usage.

    Perform Streaming Deduplication in Structured Streaming, both with and without watermark usage.

    Domain 6: Using Spark Connect to deploy applications

    Subdomain 6.1: Describe the features of Spark Connect.

    Describe the features of Spark Connect.

    Subdomain 6.2: Describe the different deployment mode types (Client, Cluster, Local) in the Apache Sparkā„¢ environment.

    Describe the different deployment mode types (Client, Cluster, Local) in the Apache Sparkā„¢ environment.

    Domain 7: Using Pandas API on Spark

    Subdomain 7.1: Explain the advantages of using Pandas API on Spark.

    Explain the advantages of using Pandas API on Spark.

    Subdomain 7.2: Create and invoke Pandas UDF.

    Create and invoke Pandas UDF.

    Techniques & products

    Apache Spark
    Spark Architecture
    Cluster
    Driver node
    Worker nodes
    Executors
    CPU cores
    Memory
    DataFrame
    Dataset
    SparkSession
    Caching
    Storage levels
    Garbage collection
    Spark partitioning
    Shuffles
    Partitions
    Actions
    Transformations
    Lazy evaluation
    Spark Core
    Spark SQL
    Pandas API on Spark
    Structured Streaming
    MLlib
    JDBC
    ORC Files
    JSON Files
    CSV Files
    Text Files
    Delta Files
    Persistent tables
    Temporary views
    SQL syntax
    UDFs
    Stateful operators
    StateStores
    Broadcast variables
    Accumulators
    Broadcast joins
    Performance tuning
    Cluster utilization
    Repartitioning
    Coalescing
    Data skew
    Adaptive Query Execution (AQE)
    Logging
    Monitoring
    Driver logs
    Executor logs
    Micro-batch processing
    Exactly-once semantics
    Fault tolerance mechanisms
    Streaming DataFrames
    Streaming Datasets
    Output modes
    Output sinks
    Window operations
    Streaming Deduplication
    Watermark usage
    Spark Connect
    Client deployment mode
    Cluster deployment mode
    Local deployment mode
    Pandas UDF

    CertSafari is not affiliated with, endorsed by, or officially connected to Databricks Inc.. Full disclaimer