Free Practice Questions for Databricks Certified Machine Learning Professional Certification

📚 Exam Guide: September 30, 2025

🔄 Last checked for updates February 16th, 2026

Study with 330 exam-style practice questions designed to help you prepare for the Databricks Certified Machine Learning Professional. All questions are aligned with the latest exam guide and include detailed explanations to help you master the material.

Random Questions

Practice with randomly mixed questions from all topics

Question MixAll Topics

FormatRandom Order

Domain Mode

Practice questions from a specific topic area

Select Domain

Exam Information

Exam Details

Key information about Databricks Certified Machine Learning Professional

Official study guide:

View

renewal:

Required every two years by taking the full exam

prerequisites:

None required; course attendance and 1 year of hands-on experience in Databricks is highly recommended

delivery method:

Online Proctored

registration fee:

USD 200

time limit minutes:

120

number of questions:

59 scored multiple-choice questions

certification validity:

2 years

Exam Topics & Skills Assessed

Skills measured (from the official study guide)

Domain 1: Model Development Using Spark ML

Subdomain 1.1: Model Development Using Spark ML

This section covers the core concepts of using SparkML for model development.

- Identify when SparkML is recommended based on the data, model, and use case requirements. - Construct an ML pipeline using SparkML. - Apply the appropriate estimator and/or transformer given a use case. - Tune a SparkML model using MLlib. - Evaluate a SparkML model. - Score a Spark ML model for a batch or streaming use case. - Select SparkML model or single node model for an inference based on type: batch, real-time, streaming.

Subdomain 1.2: Scaling and Tuning

This section focuses on techniques for scaling and tuning machine learning workloads in Databricks.

- Scale distributed training pipelines using SparkML and pandas Function APIs/UDFs. - Perform distributed hyperparameter tuning using Optuna and integrate it with MLﬂow. - Perform distributed hyperparameter tuning using Ray. - Evaluate the trade-offs between vertical and horizontal scaling for machine learning workloads in Databricks environments. - Evaluate and select appropriate parallelization (model parallelism, data parallelism) strategies for large-scale ML training. - Compare Ray and Spark for distributing ML training workloads. - Use the Pandas Function API to parallelize group-speciﬁc model training and perform inference.

Subdomain 1.3: Advanced MLﬂow Usage

This section covers advanced features and usage patterns of MLflow for experiment tracking and model management.

- Utilize nested runs using MLﬂow for tracking complex experiments. - Log custom metrics, parameters, and artifacts programmatically in MLﬂow to track advanced experimentation workﬂows. - Create custom model objects using real-time feature engineering.

Subdomain 1.4: Advanced Feature Store Concepts

This section delves into advanced concepts for managing features using the Databricks Feature Store.

- Ensure point-in-time correctness in feature lookups to prevent data leakage during model training and inference. - Build automated pipelines for feature computation using the FeatureEngineering Client. - Conﬁgure online tables for low-latency applications using Databricks SDK. - Design scalable solutions for ingesting and processing streaming data to generate features in real time. - Develop on-demand features using feature serving for consistent use across training and production environments.

Domain 2: MLOps Model Lifecycle Management

Subdomain 2.1: MLOps Model Lifecycle Management

This section provides an overview of the components and processes involved in managing the model lifecycle in an MLOps context.

- Describe and implement the architecture components of model lifecycle pipelines used to manage environment transitions in the deploy code strategy. - Map Databricks features to activities of the model lifecycle management process.

Subdomain 2.2: Validation Testing

This section covers strategies and implementation of testing for machine learning systems.

- Implement unit tests for individual functions in Databricks notebooks to ensure they produce expected outputs when given speciﬁc inputs. - Identify types of testing performed (unit and integration) in various environment stages (dev, test, prod, etc.). - Design an integration test for machine learning systems that incorporates common pipelines: feature engineering, training, evaluation, deployment, and inference. - Compare the beneﬁts and challenges of approaches for organizing functions and unit tests.

Subdomain 2.3: Environment Architectures

This section focuses on designing and managing environments for machine learning projects on Databricks.

- Design and implement scalable Databricks environments for machine learning projects using best practices. - Deﬁne and conﬁgure Databricks ML assets using DABs (Databricks Asset Bundles): model serving endpoints, MLﬂow experiments, ML registered models.

Subdomain 2.4: Automated Retraining

This section covers the implementation of automated workflows for model retraining.

- Implement automated retraining workﬂows that can be triggered by data drift detection or performance degradation alerts. - Develop a strategy for selecting top-performing models during automated retraining.

Subdomain 2.5: Drift Detection and Lakehouse Monitoring

This section details the use of Lakehouse Monitoring for detecting drift and monitoring model performance.

- Apply any statistical tests from the drift metrics table in Lakehouse Monitoring to detect drift in numerical and categorical data and evaluate the signiﬁcance of observed changes. - Identify the data table type and Lakehouse Monitoring feature that will resolve a use case need and explain why. - Build a monitor for a snapshot, time series, or inference table using Lakehouse Monitoring. - Identify the key components of common monitoring pipelines: logging, drift detection, model performance, model health, etc. - Design and conﬁgure alerting mechanisms to notify stakeholders when drift metrics exceed predeﬁned thresholds. - Detect data drift by comparing current data distributions to a known baseline or between successive time windows. - Evaluate model performance trends over time using an inference table. - Deﬁne custom metrics in Lakehouse Monitoring metrics tables. - Evaluate metrics based on different data granularities and feature slicing. - Monitor endpoint health by tracking infrastructure metrics such as latency, request rate, error rate, CPU usage, and memory usage.

Domain 3: Model Deployment

Subdomain 3.1: Deployment Strategies

This section covers different strategies for deploying machine learning models into production.

- Compare deployment strategies (e.g. blue-green and canary) and evaluate their suitability for high-trafﬁc applications. - Implement a model rollout strategy using Databricks Model Serving.

Subdomain 3.2: Custom Model Serving

This section focuses on deploying and querying custom models using Databricks Model Serving.

- Register a custom PyFunc model and log custom artifacts in Unity Catalog. - Query custom models via REST API or MLﬂow Deployments SDK. - Deploy custom model objects using MLﬂow deployments SDK, REST API or user interface.

Techniques & products

SparkML

MLlib

pandas Function APIs

UDFs

Optuna

MLflow

Ray

Databricks Feature Store

FeatureEngineering Client

Databricks SDK

Unity Catalog

Databricks Asset Bundles (DABs)

Lakehouse Monitoring

Databricks Model Serving

REST API

Python

scikit-learn

CI/CD

MLOps

Distributed training

Hyperparameter tuning

Model deployment strategies (blue-green, canary)

Data drift detection

Model performance monitoring

Unit testing

Integration testing

Model lifecycle management

Start Practicing