Free Practice Questions for Databricks Certified Data Engineer Associate (May 4 onwards) Certification

    🔄 Last checked for updates April 15th, 2026

    Study with 360 exam-style practice questions designed to help you prepare for the Databricks Certified Data Engineer Associate (May 4 onwards). All questions are aligned with the latest exam guide and include detailed explanations to help you master the material.

    Start Practicing

    Random Questions

    Practice with randomly mixed questions from all topics

    Question MixAll Topics
    FormatRandom Order

    Domain Mode

    Practice questions from a specific topic area

    Quiz History

    Exam Details

    Key information about Databricks Certified Data Engineer Associate (May 4 onwards)

    Official study guide

    View

    Question formats CertSafari offers
    • Multiple choice
    renewal:

    Required every 2 years by taking the current live exam.

    test aides:

    none allowed

    prerequisites:

    None required; course attendance and six months of hands-on experience in Databricks are highly recommended.

    delivery method:

    Online or test center

    registration fee:

    USD 200

    time limit minutes:

    90 minutes

    number of questions:

    45 scored multiple-choice questions

    certification validity:

    2 years

    Exam Topics & Skills Assessed

    Skills measured (from the official study guide)

    Domain 1: Databricks Intelligence Platform

    Subdomain 1.1: Understand the core components of the Databricks Data Intelligence Platform, such as its architecture, Delta Lake, and Unity Catalog.

    Understand the core components of the Databricks Data Intelligence Platform, such as its architecture, Delta Lake, and Unity Catalog.

    Subdomain 1.2: Understand Databricks Data Intelligence Platform’s compute services, including their characteristics, limitations, and cost models, and select the most suitable option for each workload use case.

    Understand Databricks Data Intelligence Platform’s compute services, including their characteristics, limitations, and cost models, and select the most suitable option for each workload use case.

    Domain 2: Data Ingestion and Loading

    Subdomain 2.1: Enable and detail data ingestion patterns, including batch, streaming, and incremental loading, and import data from sources such as local files, Lakeflow Connect standard connectors, and Lakeflow Connect managed connectors.

    Enable and detail data ingestion patterns, including batch, streaming, and incremental loading, and import data from sources such as local files, Lakeflow Connect standard connectors, and Lakeflow Connect managed connectors.

    Subdomain 2.2: Use the COPY INTO command to incrementally load files from cloud object storage (ADLS/S3/GCS) into Unity‑Catalog–governed tables.

    Use the COPY INTO command to incrementally load files from cloud object storage (ADLS/S3/GCS) into Unity‑Catalog–governed tables.

    Subdomain 2.3: Use Auto Loader with schema enforcement and schema evolution in batch modes (for example, directory listing or file notification) to land data into Unity‑Catalog–governed tables.

    Use Auto Loader with schema enforcement and schema evolution in batch modes (for example, directory listing or file notification) to land data into Unity‑Catalog–governed tables.

    Subdomain 2.4: Configure Lakeflow Connect to reliably ingest data from diverse enterprise sources into Unity‑Catalog–governed tables.

    Configure Lakeflow Connect to reliably ingest data from diverse enterprise sources into Unity‑Catalog–governed tables.

    Subdomain 2.5: Use JDBC/ODBC or REST clients in notebooks to land data into cloud storage or directly into Unity‑Catalog–governed tables, usually orchestrated and scheduled with Lakeflow Jobs.

    Use JDBC/ODBC or REST clients in notebooks to land data into cloud storage or directly into Unity‑Catalog–governed tables, usually orchestrated and scheduled with Lakeflow Jobs.

    Subdomain 2.6: Prioritize between Auto Loader, Lakeflow Connect (standard and managed connectors), partner connectors, and other ingestion methods based on technical requirements such as data volume, ingestion frequency, data types, and governance needs with Unity Catalog.

    Prioritize between Auto Loader, Lakeflow Connect (standard and managed connectors), partner connectors, and other ingestion methods based on technical requirements such as data volume, ingestion frequency, data types, and governance needs with Unity Catalog.

    Subdomain 2.7: Ingest semi-structured and unstructured data (for example, JSON and nested data) via Lakeflow Connect and other managed connectors into Unity‑Catalog–governed Delta tables.

    Ingest semi-structured and unstructured data (for example, JSON and nested data) via Lakeflow Connect and other managed connectors into Unity‑Catalog–governed Delta tables.

    Domain 3: Data Transformation and Modeling

    Subdomain 3.1: Implement data cleaning by reading bronze tables with PySpark/SQL, cleaning nulls, standardizing data types, and writing to new silver tables.

    Implement data cleaning by reading bronze tables with PySpark/SQL, cleaning nulls, standardizing data types, and writing to new silver tables.

    Subdomain 3.2: Combine DataFrames with operations such as Inner join, left join, broadcast join, multiple keys, cross join, union, and union all.

    Combine DataFrames with operations such as Inner join, left join, broadcast join, multiple keys, cross join, union, and union all.

    Subdomain 3.3: Manipulate columns, rows, and table structures by adding, dropping, splitting, renaming column names, applying filters, and exploding arrays.

    Manipulate columns, rows, and table structures by adding, dropping, splitting, renaming column names, applying filters, and exploding arrays.

    Subdomain 3.4: Perform data deduplication operations and aggregate operations on DataFrames, such as count, approximate count distinct, and mean, summary.

    Perform data deduplication operations and aggregate operations on DataFrames, such as count, approximate count distinct, and mean, summary.

    Subdomain 3.5: Understand the basic tuning parameters (spark.sql.shuffle.partitions:, spark.default.parallelism, spark.executor/driver.memory, spark.sql.autoBroadcastJoinThreshold) and re-measure the performance.

    Understand the basic tuning parameters (spark.sql.shuffle.partitions:, spark.default.parallelism, spark.executor/driver.memory, spark.sql.autoBroadcastJoinThreshold) and re-measure the performance.

    Subdomain 3.6: Understand the difference between, and how to build, Gold layer objects such as materialized views, views, streaming tables, and tables for BI and analytics teams in Unity Catalog.

    Understand the difference between, and how to build, Gold layer objects such as materialized views, views, streaming tables, and tables for BI and analytics teams in Unity Catalog.

    Subdomain 3.7: Apply data quality checks and validation rules to ensure reliable Silver and Gold datasets.

    Apply data quality checks and validation rules to ensure reliable Silver and Gold datasets.

    Domain 4: Working with Lakeflow Jobs

    Subdomain 4.1: Implement control flows (retries and conditional tasks such as branching and looping) using Lakeflow Jobs for pipeline orchestration

    Implement control flows (retries and conditional tasks such as branching and looping) using Lakeflow Jobs for pipeline orchestration

    Subdomain 4.2: Configure common tasks (notebook, SQL query, dashboard, and pipeline tasks) and their dependencies using Lakeflow Jobs and its DAG‑based task graph

    Configure common tasks (notebook, SQL query, dashboard, and pipeline tasks) and their dependencies using Lakeflow Jobs and its DAG‑based task graph

    Subdomain 4.3: Implement job schedules using Lakeflow Jobs with an understanding of trigger types (scheduled, file arrival, and table update)

    Implement job schedules using Lakeflow Jobs with an understanding of trigger types (scheduled, file arrival, and table update)

    Subdomain 4.4: Choose between time ‑ based and data ‑ driven triggers based on data availability and pipeline dependencies.

    Choose between time ‑ based and data ‑ driven triggers based on data availability and pipeline dependencies.

    Domain 5: Implementing CI/CD

    Subdomain 5.1: Manage your code development workflow within the Databricks workspace UI, including creating and switching between branches in Databricks Repos, committing and pushing changes, and creating pull requests using Databricks Git integration.

    Manage your code development workflow within the Databricks workspace UI, including creating and switching between branches in Databricks Repos, committing and pushing changes, and creating pull requests using Databricks Git integration.

    Subdomain 5.2: Understand environment-specific configuration using Automation Bundle (formerly Databricks Asset Bundles) variables and overrides while promoting the same codebase across dev, test, and prod targets.

    Understand environment-specific configuration using Automation Bundle (formerly Databricks Asset Bundles) variables and overrides while promoting the same codebase across dev, test, and prod targets.

    Subdomain 5.3: Deploy Declarative Automation Bundles (formerly Databricks Asset Bundles) to package, configure, and promote Lakeflow Jobs, Lakeflow Spark Declarative Pipelines, and other workspace assets across dev, test, and prod environments.

    Deploy Declarative Automation Bundles (formerly Databricks Asset Bundles) to package, configure, and promote Lakeflow Jobs, Lakeflow Spark Declarative Pipelines, and other workspace assets across dev, test, and prod environments.

    Subdomain 5.4: Understand the Databricks CLI to validate, deploy, and manage Declarative Automation Bundles (formerly Databricks Asset Bundles) and other workspace assets in automated CI/CD workflows.

    Understand the Databricks CLI to validate, deploy, and manage Declarative Automation Bundles (formerly Databricks Asset Bundles) and other workspace assets in automated CI/CD workflows.

    Domain 6: Troubleshooting, Monitoring, and Optimization

    Subdomain 6.1: Identify trends in job performance using the Lakeflow Jobs run history view to compare current execution times against historical baselines.

    Identify trends in job performance using the Lakeflow Jobs run history view to compare current execution times against historical baselines.

    Subdomain 6.2: Use the Lakeflow Jobs UI to monitor pipeline health by interpreting job statuses, viewing DAG‑based task graphs to spot upstream blockers, and tracking pipeline run times and failure rates.

    Use the Lakeflow Jobs UI to monitor pipeline health by interpreting job statuses, viewing DAG‑based task graphs to spot upstream blockers, and tracking pipeline run times and failure rates.

    Subdomain 6.3: Identify common performance bottlenecks such as data skew, shuffling, and disk spilling by interpreting stage-level metrics in the Spark UI.

    Identify common performance bottlenecks such as data skew, shuffling, and disk spilling by interpreting stage-level metrics in the Spark UI.

    Subdomain 6.4: Understand the features of Liquid Clustering and predictive optimization.

    Understand the features of Liquid Clustering and predictive optimization.

    Subdomain 6.5: Diagnose cluster startup failures, library conflicts, and out-of-memory issues.

    Diagnose cluster startup failures, library conflicts, and out-of-memory issues.

    Domain 7: Governance and Security

    Subdomain 7.1: Differentiate between managed and external tables in Unity Catalog and perform basic operations (create, modify, delete, and convert between managed and external tables) on them.

    Differentiate between managed and external tables in Unity Catalog and perform basic operations (create, modify, delete, and convert between managed and external tables) on them.

    Subdomain 7.2: Configure access controls using the UI and SQL by applying GRANT, REVOKE, and DENY privileges to principals (users, groups, and service principals) at appropriate levels of the security hierarchy.

    Configure access controls using the UI and SQL by applying GRANT, REVOKE, and DENY privileges to principals (users, groups, and service principals) at appropriate levels of the security hierarchy.

    Subdomain 7.3: Understand column-level masking and row-level security to restrict data visibility based on user groups.

    Understand column-level masking and row-level security to restrict data visibility based on user groups.

    Subdomain 7.4: Understand Unity Catalog ABAC policies to centrally control row-level filtering and column masking for sensitive data.

    Understand Unity Catalog ABAC policies to centrally control row-level filtering and column masking for sensitive data.

    Techniques & products

    Databricks Data Intelligence Platform
    Delta Lake
    Unity Catalog
    Databricks compute services
    batch ingestion
    streaming ingestion
    incremental loading
    local files
    Lakeflow Connect
    COPY INTO command
    ADLS
    S3
    GCS
    Auto Loader
    schema enforcement
    schema evolution
    JDBC
    ODBC
    REST clients
    notebooks
    cloud storage
    Lakeflow Jobs
    partner connectors
    semi-structured data
    unstructured data
    JSON
    nested data
    PySpark
    SQL
    bronze tables
    silver tables
    DataFrames
    Inner join
    left join
    broadcast join
    cross join
    union
    union all
    column manipulation
    row manipulation
    table structure manipulation
    data deduplication
    aggregate operations
    Spark tuning parameters
    spark.sql.shuffle.partitions
    spark.default.parallelism
    spark.executor/driver.memory
    spark.sql.autoBroadcastJoinThreshold
    Gold layer objects
    materialized views
    views
    streaming tables
    BI
    analytics
    data quality checks
    validation rules
    pipeline orchestration
    control flows
    retries
    conditional tasks
    branching
    looping
    DAG-based task graph
    job schedules
    scheduled triggers
    file arrival triggers
    table update triggers
    time-based triggers
    data-driven triggers
    Databricks workspace UI
    Databricks Repos
    Git integration
    pull requests
    Automation Bundle
    Databricks Asset Bundles
    Databricks CLI
    CI/CD workflows
    Lakeflow Spark Declarative Pipelines
    job performance monitoring
    Lakeflow Jobs run history
    Spark UI
    performance bottlenecks
    data skew
    shuffling
    disk spilling
    Liquid Clustering
    predictive optimization
    cluster startup failures
    library conflicts
    out-of-memory issues
    managed tables
    external tables
    access controls
    GRANT
    REVOKE
    DENY
    principals
    users
    groups
    service principals
    column-level masking
    row-level security
    ABAC policies

    CertSafari is not affiliated with, endorsed by, or officially connected to Databricks Inc.. Full disclaimer