Free Practice Questions for Databricks Certified Associate Developer for Apache Spark Certification
Study with 353 exam-style practice questions designed to help you prepare for the Databricks Certified Associate Developer for Apache Spark. All questions are aligned with the latest exam guide and include detailed explanations to help you master the material.
Start Practicing
Random Questions
Practice with randomly mixed questions from all topics
Domain Mode
Practice questions from a specific topic area
Exam Information
Exam Details
Key information about Databricks Certified Associate Developer for Apache Spark
Retake the full exam every 2 years
None
None required; 6 months hands-on Apache Spark experience recommended
Online proctored and test center
90
45
2 years
Exam Topics & Skills Assessed
Skills measured (from the official study guide)
Domain 1: Apache Spark Architecture and Components
Subdomain 1.1: Identify the advantages and challenges of implementing Spark.
Identify the advantages and challenges of implementing Spark.
Subdomain 1.2: Identify the role of core components of Apache Sparkā¢'s Architecture, including cluster, driver node, worker nodes/executors, CPU cores, and memory.
Identify the role of core components of Apache Sparkā¢'s Architecture, including cluster, driver node, worker nodes/executors, CPU cores, and memory.
Subdomain 1.3: Describe the architecture of Apache Sparkā¢, including DataFrame and Dataset concepts, SparkSession lifecycle, caching, storage levels, and garbage collection.
Describe the architecture of Apache Sparkā¢, including DataFrame and Dataset concepts, SparkSession lifecycle, caching, storage levels, and garbage collection.
Subdomain 1.4: Explain the Apache Spark⢠Architecture execution hierarchy..
Explain the Apache Spark⢠Architecture execution hierarchy..
Subdomain 1.5: Configure Spark partitioning in distributed data processing, including shuffles and partitions
Configure Spark partitioning in distributed data processing, including shuffles and partitions
Subdomain 1.6: Describe the execution patterns of the Apache Spark⢠engine, including actions, transformations, and lazy evaluation.
Describe the execution patterns of the Apache Spark⢠engine, including actions, transformations, and lazy evaluation.
Subdomain 1.7: Identify the features of the Apache Spark Modules, including Core, Spark SQL, DataFrames, Pandas API on Spark, Structured Streaming, and MLlib.
Identify the features of the Apache Spark Modules, including Core, Spark SQL, DataFrames, Pandas API on Spark, Structured Streaming, and MLlib.
Domain 2: Using Spark SQL
Subdomain 2.1: Utilize common data sources such as JDBC, files, etc., to efficiently read from and write to Spark DataFrames using Spark SQL, including overwriting and partitioning by column.
Utilize common data sources such as JDBC, files, etc., to efficiently read from and write to Spark DataFrames using Spark SQL, including overwriting and partitioning by column.
Subdomain 2.2: Execute SQL queries directly on files, including ORC Files, JSON Files, CSV Files, Text Files, and Delta Files, and understand the different save modes for outputting data in Spark SQL.
Execute SQL queries directly on files, including ORC Files, JSON Files, CSV Files, Text Files, and Delta Files, and understand the different save modes for outputting data in Spark SQL.
Subdomain 2.3: Save data to persistent tables while applying sorting and partitioning to optimize data retrieval.
Save data to persistent tables while applying sorting and partitioning to optimize data retrieval.
Subdomain 2.4: Register DataFrames as temporary views in Spark SQL, allowing them to be queried with SQL syntax.
Register DataFrames as temporary views in Spark SQL, allowing them to be queried with SQL syntax.
Domain 3: Developing Apache Spark⢠DataFrame/DataSet API Applications
Subdomain 3.1: Manipulate columns, rows, and table structures by adding, dropping, splitting, renaming column names, applying filters, and exploding arrays.
Manipulate columns, rows, and table structures by adding, dropping, splitting, renaming column names, applying filters, and exploding arrays.
Subdomain 3.2: Perform data deduplication and validation operations on DataFrames.
Perform data deduplication and validation operations on DataFrames.
Subdomain 3.3: Perform aggregate operations on DataFrames such as count, approximate count distinct, and mean, summary.
Perform aggregate operations on DataFrames such as count, approximate count distinct, and mean, summary.
Subdomain 3.4: Manipulate and utilize Date data type, such as Unix epoch to date string, and extract date component.
Manipulate and utilize Date data type, such as Unix epoch to date string, and extract date component.
Subdomain 3.5: Combine DataFrames with operations such as Inner join, left join, broadcast join, multiple keys, cross join, union, and union all.
Combine DataFrames with operations such as Inner join, left join, broadcast join, multiple keys, cross join, union, and union all.
Subdomain 3.6: Manage input and output operations by writing, overwriting, and reading DataFrames with schemas.
Manage input and output operations by writing, overwriting, and reading DataFrames with schemas.
Subdomain 3.7: Perform operations on DataFrames such as sorting, iterating, printing schema, and conversion between DataFrame and sequence/list formats.
Perform operations on DataFrames such as sorting, iterating, printing schema, and conversion between DataFrame and sequence/list formats.
Subdomain 3.8: Create and invoke user-defined functions with or without stateful operators, including StateStores.
Create and invoke user-defined functions with or without stateful operators, including StateStores.
Subdomain 3.9: Describe different types of variables in Spark, including broadcast variables and accumulators.
Describe different types of variables in Spark, including broadcast variables and accumulators.
Subdomain 3.10: Describe the purpose and implementation of broadcast joins
Describe the purpose and implementation of broadcast joins
Domain 4: Troubleshooting and Tuning Apache Spark DataFrame API Applications.
Subdomain 4.1: Implement performance tuning strategies & optimize cluster utilization, including partitioning, repartitioning, coalescing, identifying data skew, and reducing shuffling
Implement performance tuning strategies & optimize cluster utilization, including partitioning, repartitioning, coalescing, identifying data skew, and reducing shuffling
Subdomain 4.2: Describe Adaptive Query Execution (AQE) and its benefits.
Describe Adaptive Query Execution (AQE) and its benefits.
Subdomain 4.3: Perform logging and monitoring of Spark applications - publish, customize, and analyze Driver logs and Executor logs to diagnose out-of-memory errors, cluster underutilization, etc.
Perform logging and monitoring of Spark applications - publish, customize, and analyze Driver logs and Executor logs to diagnose out-of-memory errors, cluster underutilization, etc.
Domain 5: Structured Streaming
Subdomain 5.1: Explain the Structured Streaming engine in Spark, including its functions, programming model, micro-batch processing, exactly-once semantics, and fault tolerance mechanisms.
Explain the Structured Streaming engine in Spark, including its functions, programming model, micro-batch processing, exactly-once semantics, and fault tolerance mechanisms.
Subdomain 5.2: Create and write Streaming DataFrames and Streaming Datasets, including the basic output modes and output sinks.
Create and write Streaming DataFrames and Streaming Datasets, including the basic output modes and output sinks.
Subdomain 5.3: Perform basic operations on Streaming DataFrames and Streaming Datasets, such as selection, projection, window and aggregation.
Perform basic operations on Streaming DataFrames and Streaming Datasets, such as selection, projection, window and aggregation.
Subdomain 5.4: Perform Streaming Deduplication in Structured Streaming, both with and without watermark usage.
Perform Streaming Deduplication in Structured Streaming, both with and without watermark usage.
Domain 6: Using Spark Connect to deploy applications
Subdomain 6.1: Describe the features of Spark Connect.
Describe the features of Spark Connect.
Subdomain 6.2: Describe the different deployment mode types (Client, Cluster, Local) in the Apache Spark⢠environment.
Describe the different deployment mode types (Client, Cluster, Local) in the Apache Spark⢠environment.
Domain 7: Using Pandas API on Spark
Subdomain 7.1: Explain the advantages of using Pandas API on Spark.
Explain the advantages of using Pandas API on Spark.
Subdomain 7.2: Create and invoke Pandas UDF.
Create and invoke Pandas UDF.
Techniques & products