Free Practice Questions for Databricks Certified Machine Learning Professional Certification

    🔄 Last checked for updates February 16th, 2026

    Study with 330 exam-style practice questions designed to help you prepare for the Databricks Certified Machine Learning Professional. All questions are aligned with the latest exam guide and include detailed explanations to help you master the material.

    Start Practicing

    Random Questions

    Practice with randomly mixed questions from all topics

    Question MixAll Topics
    FormatRandom Order

    Domain Mode

    Practice questions from a specific topic area

    Exam Information

    Exam Details

    Key information about Databricks Certified Machine Learning Professional

    Official study guide:

    View

    renewal:

    Required every two years by taking the full exam

    prerequisites:

    None required; course attendance and 1 year of hands-on experience in Databricks is highly recommended

    delivery method:

    Online Proctored

    registration fee:

    USD 200

    time limit minutes:

    120

    number of questions:

    59 scored multiple-choice questions

    certification validity:

    2 years

    Exam Topics & Skills Assessed

    Skills measured (from the official study guide)

    Domain 1: Model Development Using Spark ML

    Subdomain 1.1: Model Development Using Spark ML

    This section covers the core concepts of using SparkML for model development.

    - Identify when SparkML is recommended based on the data, model, and use case requirements. - Construct an ML pipeline using SparkML. - Apply the appropriate estimator and/or transformer given a use case. - Tune a SparkML model using MLlib. - Evaluate a SparkML model. - Score a Spark ML model for a batch or streaming use case. - Select SparkML model or single node model for an inference based on type: batch, real-time, streaming.

    Subdomain 1.2: Scaling and Tuning

    This section focuses on techniques for scaling and tuning machine learning workloads in Databricks.

    - Scale distributed training pipelines using SparkML and pandas Function APIs/UDFs. - Perform distributed hyperparameter tuning using Optuna and integrate it with MLflow. - Perform distributed hyperparameter tuning using Ray. - Evaluate the trade-offs between vertical and horizontal scaling for machine learning workloads in Databricks environments. - Evaluate and select appropriate parallelization (model parallelism, data parallelism) strategies for large-scale ML training. - Compare Ray and Spark for distributing ML training workloads. - Use the Pandas Function API to parallelize group-specific model training and perform inference.

    Subdomain 1.3: Advanced MLflow Usage

    This section covers advanced features and usage patterns of MLflow for experiment tracking and model management.

    - Utilize nested runs using MLflow for tracking complex experiments. - Log custom metrics, parameters, and artifacts programmatically in MLflow to track advanced experimentation workflows. - Create custom model objects using real-time feature engineering.

    Subdomain 1.4: Advanced Feature Store Concepts

    This section delves into advanced concepts for managing features using the Databricks Feature Store.

    - Ensure point-in-time correctness in feature lookups to prevent data leakage during model training and inference. - Build automated pipelines for feature computation using the FeatureEngineering Client. - Configure online tables for low-latency applications using Databricks SDK. - Design scalable solutions for ingesting and processing streaming data to generate features in real time. - Develop on-demand features using feature serving for consistent use across training and production environments.

    Domain 2: MLOps Model Lifecycle Management

    Subdomain 2.1: MLOps Model Lifecycle Management

    This section provides an overview of the components and processes involved in managing the model lifecycle in an MLOps context.

    - Describe and implement the architecture components of model lifecycle pipelines used to manage environment transitions in the deploy code strategy. - Map Databricks features to activities of the model lifecycle management process.

    Subdomain 2.2: Validation Testing

    This section covers strategies and implementation of testing for machine learning systems.

    - Implement unit tests for individual functions in Databricks notebooks to ensure they produce expected outputs when given specific inputs. - Identify types of testing performed (unit and integration) in various environment stages (dev, test, prod, etc.). - Design an integration test for machine learning systems that incorporates common pipelines: feature engineering, training, evaluation, deployment, and inference. - Compare the benefits and challenges of approaches for organizing functions and unit tests.

    Subdomain 2.3: Environment Architectures

    This section focuses on designing and managing environments for machine learning projects on Databricks.

    - Design and implement scalable Databricks environments for machine learning projects using best practices. - Define and configure Databricks ML assets using DABs (Databricks Asset Bundles): model serving endpoints, MLflow experiments, ML registered models.

    Subdomain 2.4: Automated Retraining

    This section covers the implementation of automated workflows for model retraining.

    - Implement automated retraining workflows that can be triggered by data drift detection or performance degradation alerts. - Develop a strategy for selecting top-performing models during automated retraining.

    Subdomain 2.5: Drift Detection and Lakehouse Monitoring

    This section details the use of Lakehouse Monitoring for detecting drift and monitoring model performance.

    - Apply any statistical tests from the drift metrics table in Lakehouse Monitoring to detect drift in numerical and categorical data and evaluate the significance of observed changes. - Identify the data table type and Lakehouse Monitoring feature that will resolve a use case need and explain why. - Build a monitor for a snapshot, time series, or inference table using Lakehouse Monitoring. - Identify the key components of common monitoring pipelines: logging, drift detection, model performance, model health, etc. - Design and configure alerting mechanisms to notify stakeholders when drift metrics exceed predefined thresholds. - Detect data drift by comparing current data distributions to a known baseline or between successive time windows. - Evaluate model performance trends over time using an inference table. - Define custom metrics in Lakehouse Monitoring metrics tables. - Evaluate metrics based on different data granularities and feature slicing. - Monitor endpoint health by tracking infrastructure metrics such as latency, request rate, error rate, CPU usage, and memory usage.

    Domain 3: Model Deployment

    Subdomain 3.1: Deployment Strategies

    This section covers different strategies for deploying machine learning models into production.

    - Compare deployment strategies (e.g. blue-green and canary) and evaluate their suitability for high-traffic applications. - Implement a model rollout strategy using Databricks Model Serving.

    Subdomain 3.2: Custom Model Serving

    This section focuses on deploying and querying custom models using Databricks Model Serving.

    - Register a custom PyFunc model and log custom artifacts in Unity Catalog. - Query custom models via REST API or MLflow Deployments SDK. - Deploy custom model objects using MLflow deployments SDK, REST API or user interface.

    Techniques & products

    SparkML
    MLlib
    pandas Function APIs
    UDFs
    Optuna
    MLflow
    Ray
    Databricks Feature Store
    FeatureEngineering Client
    Databricks SDK
    Unity Catalog
    Databricks Asset Bundles (DABs)
    Lakehouse Monitoring
    Databricks Model Serving
    REST API
    Python
    scikit-learn
    CI/CD
    MLOps
    Distributed training
    Hyperparameter tuning
    Model deployment strategies (blue-green, canary)
    Data drift detection
    Model performance monitoring
    Unit testing
    Integration testing
    Model lifecycle management

    CertSafari is not affiliated with, endorsed by, or officially connected to Databricks Inc.. Full disclaimer