Free Practice Questions for Databricks Certified Machine Learning Associate Certification

📚 Exam Guide: Mar 1, 2025

🔄 Last checked for updates February 16th, 2026

Study with 240 exam-style practice questions designed to help you prepare for the Databricks Certified Machine Learning Associate. All questions are aligned with the latest exam guide and include detailed explanations to help you master the material.

Random Questions

Practice with randomly mixed questions from all topics

Question MixAll Topics

FormatRandom Order

Domain Mode

Practice questions from a specific topic area

Select Domain

Exam Information

Exam Details

Key information about Databricks Certified Machine Learning Associate

Official study guide:

View

level:

associate (intermediate)

renewal:

Recertification is required every two years by taking the full exam.

prerequisites:

None required; course attendance and six months of hands-on experience performing the tasks mentioned in the Exam Outline is highly recommended. Working knowledge of Python, scikit-learn, SparkML, Unity Catalog, Delta Live Tables, and Databricks ML documentation.

delivery method:

Online Proctored

target audience:

Individuals who can use Databricks to perform basic machine learning tasks, including understanding and using Databricks ML capabilities like AutoML, Unity Catalog, and MLflow, exploring data, performing feature engineering, model building (training, tuning, evaluation, selection), and deploying ML models.

registration fee:

$200

time limit minutes:

90 minutes

number of questions:

48 scored multiple-choice or multiple-selection questions

certification validity:

2 years

Exam Topics & Skills Assessed

Skills measured (from the official study guide)

Domain 1: Databricks Machine Learning

Subdomain 1.1: Identify the best practices of an MLOps strategy

Identify the best practices of an MLOps strategy

Subdomain 1.2: Identify the advantages of using ML runtimes

Identify the advantages of using ML runtimes

Subdomain 1.3: Identify how AutoML facilitates model/feature selection.

Identify how AutoML facilitates model/feature selection.

Subdomain 1.4: Identify the advantages AutoML brings to the model development process

Identify the advantages AutoML brings to the model development process

Subdomain 1.5: Identify the benefits of creating feature store tables at the account level in Unity Catalog in Databricks vs at the workspace level

Identify the benefits of creating feature store tables at the account level in Unity Catalog in Databricks vs at the workspace level

Subdomain 1.6: Create a feature store table in Unity Catalog

Create a feature store table in Unity Catalog

Subdomain 1.7: Write data to a feature store table

Write data to a feature store table

Subdomain 1.8: Train a model with features from a feature store table.

Train a model with features from a feature store table.

Subdomain 1.9: Score a model using features from a feature store table.

Score a model using features from a feature store table.

Subdomain 1.10: Describe the differences between online and offline feature tables

Describe the differences between online and offline feature tables

Subdomain 1.11: Identify the best run using the MLflow Client API.

Identify the best run using the MLflow Client API.

Subdomain 1.12: Manually log metrics, artifacts, and models in an MLflow Run.

Manually log metrics, artifacts, and models in an MLflow Run.

Subdomain 1.13: Identify information available in the MLFlow UI

Identify information available in the MLFlow UI

Subdomain 1.14: Register a model using the MLflow Client API in the Unity Catalog registry

Subdomain 1.15: Identify benefits of registering models in the Unity Catalog registry over the workspace registry

Identify benefits of registering models in the Unity Catalog registry over the workspace registry

Subdomain 1.16: Identify scenarios where promoting code is preferred over promoting models and vice versa

Identify scenarios where promoting code is preferred over promoting models and vice versa

Subdomain 1.17: Set or remove a tag for a model

Set or remove a tag for a model

Subdomain 1.18: Promote a challenger model to a champion model using aliases

Promote a challenger model to a champion model using aliases

Domain 2: Data Processing

Subdomain 2.1: Compute summary statistics on a Spark DataFrame using .summary() or dbutils data summaries

Compute summary statistics on a Spark DataFrame using .summary() or dbutils data summaries

Subdomain 2.2: Remove outliers from a Spark DataFrame based on standard deviation or IQR

Remove outliers from a Spark DataFrame based on standard deviation or IQR

Subdomain 2.3: Create visualizations for categorical or continuous features

Create visualizations for categorical or continuous features

Subdomain 2.4: Compare two categorical or two continuous features using the appropriate method

Compare two categorical or two continuous features using the appropriate method

Subdomain 2.5: Compare and contrast imputing missing values with the mean or median or mode value

Compare and contrast imputing missing values with the mean or median or mode value

Subdomain 2.6: Impute missing values with the mode, mean, or median value

Impute missing values with the mode, mean, or median value

Subdomain 2.7: Use one-hot encoding for categorical features

Use one-hot encoding for categorical features

Subdomain 2.8: Identify and explain the model types or data sets for which one-hot encoding is or is not appropriate.

Identify and explain the model types or data sets for which one-hot encoding is or is not appropriate.

Subdomain 2.9: Identify scenarios where log scale transformation is appropriate

Identify scenarios where log scale transformation is appropriate

Domain 3: Model Development

Subdomain 3.1: Use ML foundations to select the appropriate algorithm for a given model scenario

Use ML foundations to select the appropriate algorithm for a given model scenario

Subdomain 3.2: Identify methods to mitigate data imbalance in training data

Identify methods to mitigate data imbalance in training data

Subdomain 3.3: Compare estimators and transformers

Compare estimators and transformers

Subdomain 3.4: Develop a training pipeline

Develop a training pipeline

Subdomain 3.5: Use Hyperopt's fmin operation to tune a model's hyperparameters

Use Hyperopt's fmin operation to tune a model's hyperparameters

Subdomain 3.6: Perform random or grid search or Bayesian search as a method for tuning hyperparameters.

Perform random or grid search or Bayesian search as a method for tuning hyperparameters.

Subdomain 3.7: Parallelize single node models for hyperparameter tuning

Parallelize single node models for hyperparameter tuning

Subdomain 3.8: Describe the benefits and downsides of using cross-validation over a train-validation split.

Describe the benefits and downsides of using cross-validation over a train-validation split.

Subdomain 3.9: Perform cross-validation as a part of model fitting.

Perform cross-validation as a part of model fitting.

Subdomain 3.10: Identify the number of models being trained in conjunction with a grid-search and cross-validation process.

Identify the number of models being trained in conjunction with a grid-search and cross-validation process.

Subdomain 3.11: Use common classification metrics: F1, Log Loss, ROC/AUC, etc

Use common classification metrics: F1, Log Loss, ROC/AUC, etc

Subdomain 3.12: Use common regression metrics: RMSE, MAE, R-squared, etc.

Use common regression metrics: RMSE, MAE, R-squared, etc.

Subdomain 3.13: Choose the most appropriate metric for a given scenario objective

Choose the most appropriate metric for a given scenario objective

Subdomain 3.14: Identify the need to exponentiate log-transformed variables before calculating evaluation metrics or interpreting predictions

Identify the need to exponentiate log-transformed variables before calculating evaluation metrics or interpreting predictions

Subdomain 3.15: Assess the impact of model complexity and the bias variance tradeoff on model performance

Assess the impact of model complexity and the bias variance tradeoff on model performance

Domain 4: Model Deployment

Subdomain 4.1: Identify the differences and advantages of model serving approaches: batch, realtime, and streaming

Identify the differences and advantages of model serving approaches: batch, realtime, and streaming

Subdomain 4.2: Deploy a custom model to a model endpoint

Deploy a custom model to a model endpoint

Subdomain 4.3: Use pandas to perform batch inference

Use pandas to perform batch inference

Subdomain 4.4: Identify how streaming inference is performed with Delta Live Tables

Identify how streaming inference is performed with Delta Live Tables

Subdomain 4.5: Deploy and query a model for realtime inference

Deploy and query a model for realtime inference

Subdomain 4.6: Split data between endpoints for realtime interference

Split data between endpoints for realtime interference

Techniques & products

Databricks

Machine Learning

MLOps

ML runtimes

AutoML

Unity Catalog

Feature Store

MLflow

MLflow Client API

MLflow UI

Model Registry

Spark DataFrame

dbutils data summaries

Outlier removal

Standard deviation

IQR

Data visualization

Missing value imputation

Mean imputation

Median imputation

Mode imputation

One-hot encoding

Log scale transformation

Algorithm selection

Data imbalance mitigation

Estimators

Transformers

Training pipeline

Hyperopt

fmin operation

Hyperparameter tuning

Random search

Grid search

Bayesian search

Cross-validation

Train-validation split

Classification metrics

F1 score

Log Loss

ROC/AUC

Regression metrics

RMSE

MAE

R-squared

Model complexity

Bias-variance tradeoff

Model serving

Batch inference

Realtime inference

Streaming inference

Model endpoints

Pandas

Delta Live Tables

Python

scikit-learn

SparkML

Start Practicing