Free Practice Questions for Databricks Certified Machine Learning Associate Certification

    🔄 Last checked for updates February 16th, 2026

    Study with 240 exam-style practice questions designed to help you prepare for the Databricks Certified Machine Learning Associate. All questions are aligned with the latest exam guide and include detailed explanations to help you master the material.

    Start Practicing

    Random Questions

    Practice with randomly mixed questions from all topics

    Question MixAll Topics
    FormatRandom Order

    Domain Mode

    Practice questions from a specific topic area

    Exam Information

    Exam Details

    Key information about Databricks Certified Machine Learning Associate

    Official study guide:

    View

    level:

    associate (intermediate)

    renewal:

    Recertification is required every two years by taking the full exam.

    prerequisites:

    None required; course attendance and six months of hands-on experience performing the tasks mentioned in the Exam Outline is highly recommended. Working knowledge of Python, scikit-learn, SparkML, Unity Catalog, Delta Live Tables, and Databricks ML documentation.

    delivery method:

    Online Proctored

    target audience:

    Individuals who can use Databricks to perform basic machine learning tasks, including understanding and using Databricks ML capabilities like AutoML, Unity Catalog, and MLflow, exploring data, performing feature engineering, model building (training, tuning, evaluation, selection), and deploying ML models.

    registration fee:

    $200

    time limit minutes:

    90 minutes

    number of questions:

    48 scored multiple-choice or multiple-selection questions

    certification validity:

    2 years

    Exam Topics & Skills Assessed

    Skills measured (from the official study guide)

    Domain 1: Databricks Machine Learning

    Subdomain 1.1: Identify the best practices of an MLOps strategy

    Identify the best practices of an MLOps strategy

    Subdomain 1.2: Identify the advantages of using ML runtimes

    Identify the advantages of using ML runtimes

    Subdomain 1.3: Identify how AutoML facilitates model/feature selection.

    Identify how AutoML facilitates model/feature selection.

    Subdomain 1.4: Identify the advantages AutoML brings to the model development process

    Identify the advantages AutoML brings to the model development process

    Subdomain 1.5: Identify the benefits of creating feature store tables at the account level in Unity Catalog in Databricks vs at the workspace level

    Identify the benefits of creating feature store tables at the account level in Unity Catalog in Databricks vs at the workspace level

    Subdomain 1.6: Create a feature store table in Unity Catalog

    Create a feature store table in Unity Catalog

    Subdomain 1.7: Write data to a feature store table

    Write data to a feature store table

    Subdomain 1.8: Train a model with features from a feature store table.

    Train a model with features from a feature store table.

    Subdomain 1.9: Score a model using features from a feature store table.

    Score a model using features from a feature store table.

    Subdomain 1.10: Describe the differences between online and offline feature tables

    Describe the differences between online and offline feature tables

    Subdomain 1.11: Identify the best run using the MLflow Client API.

    Identify the best run using the MLflow Client API.

    Subdomain 1.12: Manually log metrics, artifacts, and models in an MLflow Run.

    Manually log metrics, artifacts, and models in an MLflow Run.

    Subdomain 1.13: Identify information available in the MLFlow UI

    Identify information available in the MLFlow UI

    Subdomain 1.14: Register a model using the MLflow Client API in the Unity Catalog registry

    Register a model using the MLflow Client API in the Unity Catalog registry

    Subdomain 1.15: Identify benefits of registering models in the Unity Catalog registry over the workspace registry

    Identify benefits of registering models in the Unity Catalog registry over the workspace registry

    Subdomain 1.16: Identify scenarios where promoting code is preferred over promoting models and vice versa

    Identify scenarios where promoting code is preferred over promoting models and vice versa

    Subdomain 1.17: Set or remove a tag for a model

    Set or remove a tag for a model

    Subdomain 1.18: Promote a challenger model to a champion model using aliases

    Promote a challenger model to a champion model using aliases

    Domain 2: Data Processing

    Subdomain 2.1: Compute summary statistics on a Spark DataFrame using .summary() or dbutils data summaries

    Compute summary statistics on a Spark DataFrame using .summary() or dbutils data summaries

    Subdomain 2.2: Remove outliers from a Spark DataFrame based on standard deviation or IQR

    Remove outliers from a Spark DataFrame based on standard deviation or IQR

    Subdomain 2.3: Create visualizations for categorical or continuous features

    Create visualizations for categorical or continuous features

    Subdomain 2.4: Compare two categorical or two continuous features using the appropriate method

    Compare two categorical or two continuous features using the appropriate method

    Subdomain 2.5: Compare and contrast imputing missing values with the mean or median or mode value

    Compare and contrast imputing missing values with the mean or median or mode value

    Subdomain 2.6: Impute missing values with the mode, mean, or median value

    Impute missing values with the mode, mean, or median value

    Subdomain 2.7: Use one-hot encoding for categorical features

    Use one-hot encoding for categorical features

    Subdomain 2.8: Identify and explain the model types or data sets for which one-hot encoding is or is not appropriate.

    Identify and explain the model types or data sets for which one-hot encoding is or is not appropriate.

    Subdomain 2.9: Identify scenarios where log scale transformation is appropriate

    Identify scenarios where log scale transformation is appropriate

    Domain 3: Model Development

    Subdomain 3.1: Use ML foundations to select the appropriate algorithm for a given model scenario

    Use ML foundations to select the appropriate algorithm for a given model scenario

    Subdomain 3.2: Identify methods to mitigate data imbalance in training data

    Identify methods to mitigate data imbalance in training data

    Subdomain 3.3: Compare estimators and transformers

    Compare estimators and transformers

    Subdomain 3.4: Develop a training pipeline

    Develop a training pipeline

    Subdomain 3.5: Use Hyperopt's fmin operation to tune a model's hyperparameters

    Use Hyperopt's fmin operation to tune a model's hyperparameters

    Subdomain 3.6: Perform random or grid search or Bayesian search as a method for tuning hyperparameters.

    Perform random or grid search or Bayesian search as a method for tuning hyperparameters.

    Subdomain 3.7: Parallelize single node models for hyperparameter tuning

    Parallelize single node models for hyperparameter tuning

    Subdomain 3.8: Describe the benefits and downsides of using cross-validation over a train-validation split.

    Describe the benefits and downsides of using cross-validation over a train-validation split.

    Subdomain 3.9: Perform cross-validation as a part of model fitting.

    Perform cross-validation as a part of model fitting.

    Subdomain 3.10: Identify the number of models being trained in conjunction with a grid-search and cross-validation process.

    Identify the number of models being trained in conjunction with a grid-search and cross-validation process.

    Subdomain 3.11: Use common classification metrics: F1, Log Loss, ROC/AUC, etc

    Use common classification metrics: F1, Log Loss, ROC/AUC, etc

    Subdomain 3.12: Use common regression metrics: RMSE, MAE, R-squared, etc.

    Use common regression metrics: RMSE, MAE, R-squared, etc.

    Subdomain 3.13: Choose the most appropriate metric for a given scenario objective

    Choose the most appropriate metric for a given scenario objective

    Subdomain 3.14: Identify the need to exponentiate log-transformed variables before calculating evaluation metrics or interpreting predictions

    Identify the need to exponentiate log-transformed variables before calculating evaluation metrics or interpreting predictions

    Subdomain 3.15: Assess the impact of model complexity and the bias variance tradeoff on model performance

    Assess the impact of model complexity and the bias variance tradeoff on model performance

    Domain 4: Model Deployment

    Subdomain 4.1: Identify the differences and advantages of model serving approaches: batch, realtime, and streaming

    Identify the differences and advantages of model serving approaches: batch, realtime, and streaming

    Subdomain 4.2: Deploy a custom model to a model endpoint

    Deploy a custom model to a model endpoint

    Subdomain 4.3: Use pandas to perform batch inference

    Use pandas to perform batch inference

    Subdomain 4.4: Identify how streaming inference is performed with Delta Live Tables

    Identify how streaming inference is performed with Delta Live Tables

    Subdomain 4.5: Deploy and query a model for realtime inference

    Deploy and query a model for realtime inference

    Subdomain 4.6: Split data between endpoints for realtime interference

    Split data between endpoints for realtime interference

    Techniques & products

    Databricks
    Machine Learning
    MLOps
    ML runtimes
    AutoML
    Unity Catalog
    Feature Store
    MLflow
    MLflow Client API
    MLflow UI
    Model Registry
    Spark DataFrame
    dbutils data summaries
    Outlier removal
    Standard deviation
    IQR
    Data visualization
    Missing value imputation
    Mean imputation
    Median imputation
    Mode imputation
    One-hot encoding
    Log scale transformation
    Algorithm selection
    Data imbalance mitigation
    Estimators
    Transformers
    Training pipeline
    Hyperopt
    fmin operation
    Hyperparameter tuning
    Random search
    Grid search
    Bayesian search
    Cross-validation
    Train-validation split
    Classification metrics
    F1 score
    Log Loss
    ROC/AUC
    Regression metrics
    RMSE
    MAE
    R-squared
    Model complexity
    Bias-variance tradeoff
    Model serving
    Batch inference
    Realtime inference
    Streaming inference
    Model endpoints
    Pandas
    Delta Live Tables
    Python
    scikit-learn
    SparkML

    CertSafari is not affiliated with, endorsed by, or officially connected to Databricks Inc.. Full disclaimer