Free Practice Questions for Databricks Certified Machine Learning Associate Certification
Study with 240 exam-style practice questions designed to help you prepare for the Databricks Certified Machine Learning Associate. All questions are aligned with the latest exam guide and include detailed explanations to help you master the material.
Start Practicing
Random Questions
Practice with randomly mixed questions from all topics
Domain Mode
Practice questions from a specific topic area
Exam Information
Exam Details
Key information about Databricks Certified Machine Learning Associate
associate (intermediate)
Recertification is required every two years by taking the full exam.
None required; course attendance and six months of hands-on experience performing the tasks mentioned in the Exam Outline is highly recommended. Working knowledge of Python, scikit-learn, SparkML, Unity Catalog, Delta Live Tables, and Databricks ML documentation.
Online Proctored
Individuals who can use Databricks to perform basic machine learning tasks, including understanding and using Databricks ML capabilities like AutoML, Unity Catalog, and MLflow, exploring data, performing feature engineering, model building (training, tuning, evaluation, selection), and deploying ML models.
$200
90 minutes
48 scored multiple-choice or multiple-selection questions
2 years
Exam Topics & Skills Assessed
Skills measured (from the official study guide)
Domain 1: Databricks Machine Learning
Subdomain 1.1: Identify the best practices of an MLOps strategy
Identify the best practices of an MLOps strategy
Subdomain 1.2: Identify the advantages of using ML runtimes
Identify the advantages of using ML runtimes
Subdomain 1.3: Identify how AutoML facilitates model/feature selection.
Identify how AutoML facilitates model/feature selection.
Subdomain 1.4: Identify the advantages AutoML brings to the model development process
Identify the advantages AutoML brings to the model development process
Subdomain 1.5: Identify the benefits of creating feature store tables at the account level in Unity Catalog in Databricks vs at the workspace level
Identify the benefits of creating feature store tables at the account level in Unity Catalog in Databricks vs at the workspace level
Subdomain 1.6: Create a feature store table in Unity Catalog
Create a feature store table in Unity Catalog
Subdomain 1.7: Write data to a feature store table
Write data to a feature store table
Subdomain 1.8: Train a model with features from a feature store table.
Train a model with features from a feature store table.
Subdomain 1.9: Score a model using features from a feature store table.
Score a model using features from a feature store table.
Subdomain 1.10: Describe the differences between online and offline feature tables
Describe the differences between online and offline feature tables
Subdomain 1.11: Identify the best run using the MLflow Client API.
Identify the best run using the MLflow Client API.
Subdomain 1.12: Manually log metrics, artifacts, and models in an MLflow Run.
Manually log metrics, artifacts, and models in an MLflow Run.
Subdomain 1.13: Identify information available in the MLFlow UI
Identify information available in the MLFlow UI
Subdomain 1.14: Register a model using the MLflow Client API in the Unity Catalog registry
Register a model using the MLflow Client API in the Unity Catalog registry
Subdomain 1.15: Identify benefits of registering models in the Unity Catalog registry over the workspace registry
Identify benefits of registering models in the Unity Catalog registry over the workspace registry
Subdomain 1.16: Identify scenarios where promoting code is preferred over promoting models and vice versa
Identify scenarios where promoting code is preferred over promoting models and vice versa
Subdomain 1.17: Set or remove a tag for a model
Set or remove a tag for a model
Subdomain 1.18: Promote a challenger model to a champion model using aliases
Promote a challenger model to a champion model using aliases
Domain 2: Data Processing
Subdomain 2.1: Compute summary statistics on a Spark DataFrame using .summary() or dbutils data summaries
Compute summary statistics on a Spark DataFrame using .summary() or dbutils data summaries
Subdomain 2.2: Remove outliers from a Spark DataFrame based on standard deviation or IQR
Remove outliers from a Spark DataFrame based on standard deviation or IQR
Subdomain 2.3: Create visualizations for categorical or continuous features
Create visualizations for categorical or continuous features
Subdomain 2.4: Compare two categorical or two continuous features using the appropriate method
Compare two categorical or two continuous features using the appropriate method
Subdomain 2.5: Compare and contrast imputing missing values with the mean or median or mode value
Compare and contrast imputing missing values with the mean or median or mode value
Subdomain 2.6: Impute missing values with the mode, mean, or median value
Impute missing values with the mode, mean, or median value
Subdomain 2.7: Use one-hot encoding for categorical features
Use one-hot encoding for categorical features
Subdomain 2.8: Identify and explain the model types or data sets for which one-hot encoding is or is not appropriate.
Identify and explain the model types or data sets for which one-hot encoding is or is not appropriate.
Subdomain 2.9: Identify scenarios where log scale transformation is appropriate
Identify scenarios where log scale transformation is appropriate
Domain 3: Model Development
Subdomain 3.1: Use ML foundations to select the appropriate algorithm for a given model scenario
Use ML foundations to select the appropriate algorithm for a given model scenario
Subdomain 3.2: Identify methods to mitigate data imbalance in training data
Identify methods to mitigate data imbalance in training data
Subdomain 3.3: Compare estimators and transformers
Compare estimators and transformers
Subdomain 3.4: Develop a training pipeline
Develop a training pipeline
Subdomain 3.5: Use Hyperopt's fmin operation to tune a model's hyperparameters
Use Hyperopt's fmin operation to tune a model's hyperparameters
Subdomain 3.6: Perform random or grid search or Bayesian search as a method for tuning hyperparameters.
Perform random or grid search or Bayesian search as a method for tuning hyperparameters.
Subdomain 3.7: Parallelize single node models for hyperparameter tuning
Parallelize single node models for hyperparameter tuning
Subdomain 3.8: Describe the benefits and downsides of using cross-validation over a train-validation split.
Describe the benefits and downsides of using cross-validation over a train-validation split.
Subdomain 3.9: Perform cross-validation as a part of model fitting.
Perform cross-validation as a part of model fitting.
Subdomain 3.10: Identify the number of models being trained in conjunction with a grid-search and cross-validation process.
Identify the number of models being trained in conjunction with a grid-search and cross-validation process.
Subdomain 3.11: Use common classification metrics: F1, Log Loss, ROC/AUC, etc
Use common classification metrics: F1, Log Loss, ROC/AUC, etc
Subdomain 3.12: Use common regression metrics: RMSE, MAE, R-squared, etc.
Use common regression metrics: RMSE, MAE, R-squared, etc.
Subdomain 3.13: Choose the most appropriate metric for a given scenario objective
Choose the most appropriate metric for a given scenario objective
Subdomain 3.14: Identify the need to exponentiate log-transformed variables before calculating evaluation metrics or interpreting predictions
Identify the need to exponentiate log-transformed variables before calculating evaluation metrics or interpreting predictions
Subdomain 3.15: Assess the impact of model complexity and the bias variance tradeoff on model performance
Assess the impact of model complexity and the bias variance tradeoff on model performance
Domain 4: Model Deployment
Subdomain 4.1: Identify the differences and advantages of model serving approaches: batch, realtime, and streaming
Identify the differences and advantages of model serving approaches: batch, realtime, and streaming
Subdomain 4.2: Deploy a custom model to a model endpoint
Deploy a custom model to a model endpoint
Subdomain 4.3: Use pandas to perform batch inference
Use pandas to perform batch inference
Subdomain 4.4: Identify how streaming inference is performed with Delta Live Tables
Identify how streaming inference is performed with Delta Live Tables
Subdomain 4.5: Deploy and query a model for realtime inference
Deploy and query a model for realtime inference
Subdomain 4.6: Split data between endpoints for realtime interference
Split data between endpoints for realtime interference
Techniques & products