Free Practice Questions for Databricks Certified Machine Learning Professional Certification
Study with 330 exam-style practice questions designed to help you prepare for the Databricks Certified Machine Learning Professional. All questions are aligned with the latest exam guide and include detailed explanations to help you master the material.
Start Practicing
Random Questions
Practice with randomly mixed questions from all topics
Domain Mode
Practice questions from a specific topic area
Exam Information
Exam Details
Key information about Databricks Certified Machine Learning Professional
Required every two years by taking the full exam
None required; course attendance and 1 year of hands-on experience in Databricks is highly recommended
Online Proctored
USD 200
120
59 scored multiple-choice questions
2 years
Exam Topics & Skills Assessed
Skills measured (from the official study guide)
Domain 1: Model Development Using Spark ML
Subdomain 1.1: Model Development Using Spark ML
This section covers the core concepts of using SparkML for model development.
- Identify when SparkML is recommended based on the data, model, and use case requirements. - Construct an ML pipeline using SparkML. - Apply the appropriate estimator and/or transformer given a use case. - Tune a SparkML model using MLlib. - Evaluate a SparkML model. - Score a Spark ML model for a batch or streaming use case. - Select SparkML model or single node model for an inference based on type: batch, real-time, streaming.
Subdomain 1.2: Scaling and Tuning
This section focuses on techniques for scaling and tuning machine learning workloads in Databricks.
- Scale distributed training pipelines using SparkML and pandas Function APIs/UDFs. - Perform distributed hyperparameter tuning using Optuna and integrate it with MLflow. - Perform distributed hyperparameter tuning using Ray. - Evaluate the trade-offs between vertical and horizontal scaling for machine learning workloads in Databricks environments. - Evaluate and select appropriate parallelization (model parallelism, data parallelism) strategies for large-scale ML training. - Compare Ray and Spark for distributing ML training workloads. - Use the Pandas Function API to parallelize group-specific model training and perform inference.
Subdomain 1.3: Advanced MLflow Usage
This section covers advanced features and usage patterns of MLflow for experiment tracking and model management.
- Utilize nested runs using MLflow for tracking complex experiments. - Log custom metrics, parameters, and artifacts programmatically in MLflow to track advanced experimentation workflows. - Create custom model objects using real-time feature engineering.
Subdomain 1.4: Advanced Feature Store Concepts
This section delves into advanced concepts for managing features using the Databricks Feature Store.
- Ensure point-in-time correctness in feature lookups to prevent data leakage during model training and inference. - Build automated pipelines for feature computation using the FeatureEngineering Client. - Configure online tables for low-latency applications using Databricks SDK. - Design scalable solutions for ingesting and processing streaming data to generate features in real time. - Develop on-demand features using feature serving for consistent use across training and production environments.
Domain 2: MLOps Model Lifecycle Management
Subdomain 2.1: MLOps Model Lifecycle Management
This section provides an overview of the components and processes involved in managing the model lifecycle in an MLOps context.
- Describe and implement the architecture components of model lifecycle pipelines used to manage environment transitions in the deploy code strategy. - Map Databricks features to activities of the model lifecycle management process.
Subdomain 2.2: Validation Testing
This section covers strategies and implementation of testing for machine learning systems.
- Implement unit tests for individual functions in Databricks notebooks to ensure they produce expected outputs when given specific inputs. - Identify types of testing performed (unit and integration) in various environment stages (dev, test, prod, etc.). - Design an integration test for machine learning systems that incorporates common pipelines: feature engineering, training, evaluation, deployment, and inference. - Compare the benefits and challenges of approaches for organizing functions and unit tests.
Subdomain 2.3: Environment Architectures
This section focuses on designing and managing environments for machine learning projects on Databricks.
- Design and implement scalable Databricks environments for machine learning projects using best practices. - Define and configure Databricks ML assets using DABs (Databricks Asset Bundles): model serving endpoints, MLflow experiments, ML registered models.
Subdomain 2.4: Automated Retraining
This section covers the implementation of automated workflows for model retraining.
- Implement automated retraining workflows that can be triggered by data drift detection or performance degradation alerts. - Develop a strategy for selecting top-performing models during automated retraining.
Subdomain 2.5: Drift Detection and Lakehouse Monitoring
This section details the use of Lakehouse Monitoring for detecting drift and monitoring model performance.
- Apply any statistical tests from the drift metrics table in Lakehouse Monitoring to detect drift in numerical and categorical data and evaluate the significance of observed changes. - Identify the data table type and Lakehouse Monitoring feature that will resolve a use case need and explain why. - Build a monitor for a snapshot, time series, or inference table using Lakehouse Monitoring. - Identify the key components of common monitoring pipelines: logging, drift detection, model performance, model health, etc. - Design and configure alerting mechanisms to notify stakeholders when drift metrics exceed predefined thresholds. - Detect data drift by comparing current data distributions to a known baseline or between successive time windows. - Evaluate model performance trends over time using an inference table. - Define custom metrics in Lakehouse Monitoring metrics tables. - Evaluate metrics based on different data granularities and feature slicing. - Monitor endpoint health by tracking infrastructure metrics such as latency, request rate, error rate, CPU usage, and memory usage.
Domain 3: Model Deployment
Subdomain 3.1: Deployment Strategies
This section covers different strategies for deploying machine learning models into production.
- Compare deployment strategies (e.g. blue-green and canary) and evaluate their suitability for high-traffic applications. - Implement a model rollout strategy using Databricks Model Serving.
Subdomain 3.2: Custom Model Serving
This section focuses on deploying and querying custom models using Databricks Model Serving.
- Register a custom PyFunc model and log custom artifacts in Unity Catalog. - Query custom models via REST API or MLflow Deployments SDK. - Deploy custom model objects using MLflow deployments SDK, REST API or user interface.
Techniques & products