Free Practice Questions for AWS Certified Data Engineer - Associate (DEA-C01) Certification

🔄 Last checked for updates February 16th, 2026

Study with 390 exam-style practice questions designed to help you prepare for the AWS Certified Data Engineer - Associate (DEA-C01).

Random Questions

Practice with randomly mixed questions from all topics

Question MixAll Topics

FormatRandom Order

Domain Mode

Practice questions from a specific topic area

Select Domain

Exam Information

Exam Details

Key information about AWS Certified Data Engineer - Associate (DEA-C01)

Official study guide:

View

level:

associate (intermediate)

exam format:

Multiple choice, Multiple response

passing score:

720 out of 1,000

target audience:

2–3 years of experience in data engineering, 1–2 years of hands-on experience with AWS services

number of questions:

50 scored questions

Exam Topics & Skills Assessed

Skills measured (from the official study guide)

Domain 1: Data Ingestion and Transformation

Subdomain 1.1: Perform data ingestion

• Skill 1.1.1: Read data from streaming sources (for example, Amazon Kinesis, Amazon Managed Streaming for Apache Kafka [Amazon MSK], Amazon DynamoDB Streams, AWS Database Migration Service [AWS DMS], AWS Glue, Amazon Redshift). • Skill 1.1.2: Read data from batch sources (for example, Amazon S3, AWS Glue, Amazon EMR, AWS DMS, Amazon Redshift, AWS Lambda, Amazon AppFlow). • Skill 1.1.3: Implement appropriate conﬁguration options for batch ingestion. • Skill 1.1.4: Consume data APIs. • Skill 1.1.5: Set up schedulers by using Amazon EventBridge, Apache Airﬂow, or time-based schedules for jobs and crawlers. • Skill 1.1.6: Set up event triggers (for example, Amazon S3 Event Notiﬁcations, EventBridge). • Skill 1.1.7: Call a Lambda function from Kinesis. • Skill 1.1.8: Create allowlists for IP addresses to allow connections to data sources. • Skill 1.1.9: Implement throttling and overcoming rate limits (for example, DynamoDB, Amazon RDS, Kinesis). • Skill 1.1.10: Manage fan-in and fan-out for streaming data distribution. • Skill 1.1.11: Describe replayability of data ingestion pipelines. • Skill 1.1.12: Deﬁne stateful and stateless data transactions.

Subdomain 1.2: Transform and process data

• Skill 1.2.1: Optimize container usage for performance needs (for example, Amazon Elastic Kubernetes Service [Amazon EKS], Amazon Elastic Container Service [Amazon ECS]). • Skill 1.2.2: Connect to diﬀerent data sources (for example, Java Database Connectivity [JDBC], Open Database Connectivity [ODBC]). • Skill 1.2.3: Integrate data from multiple sources. • Skill 1.2.4: Optimize costs while processing data. • Skill 1.2.5: Implement data transformation services based on requirements (for example, Amazon EMR, AWS Glue, Lambda, Amazon Redshift). • Skill 1.2.6: Transform data between formats (for example, from .csv to Apache Parquet). • Skill 1.2.7: Troubleshoot and debug common transformation failures and performance issues. • Skill 1.2.8: Create data APIs to make data available to other systems by using AWS services. • Skill 1.2.9: Deﬁne volume, velocity, and variety of data (for example, structured data, unstructured data). • Skill 1.2.10: Integrate large language models (LLMs) for data processing.

Subdomain 1.3: Orchestrate data pipelines

• Skill 1.3.1: Use orchestration services to build workﬂows for data ETL pipelines (for example, Lambda, EventBridge, Amazon Managed Workﬂows for Apache Airﬂow [Amazon MWAA], AWS Step Functions, AWS Glue workﬂows). • Skill 1.3.2: Build data pipelines for performance, availability, scalability, resiliency, and fault tolerance. • Skill 1.3.3: Implement and maintain serverless workﬂows. • Skill 1.3.4: Use notiﬁcation services to send alerts (for example, Amazon Simple Notiﬁcation Service [Amazon SNS], Amazon Simple Queue Service [Amazon SQS]).

Subdomain 1.4: Apply programming concepts

• Skill 1.4.1: Optimize code to reduce runtime for data ingestion and transformation. • Skill 1.4.2: Conﬁgure Lambda functions to meet concurrency and performance needs. • Skill 1.4.3: Use programming languages and frameworks for data engineering (for example, Python, SQL, Scala, R, Java, Bash, PowerShell). • Skill 1.4.4: Use software engineering best practices for data engineering (for example, version control, testing, logging, monitoring). • Skill 1.4.5: Use Infrastructure as Code (IaC) to deploy data engineering solutions. • Skill 1.4.6: Use the AWS Serverless Application Model (AWS SAM) to package and deploy serverless data pipelines (for example, Lambda functions, Step Functions, DynamoDB tables). • Skill 1.4.7: Use and mount storage volumes from within Lambda functions. • Skill 1.4.8: Use infrastructure as code (IaC) for repeatable resource deployment (for example, AWS CloudFormation and AWS Cloud Development Kit [AWS CDK]). • Skill 1.4.9: Describe continuous integration and continuous delivery (CI/CD) (implementation, testing, and deployment of data pipelines). • Skill 1.4.10: Deﬁne distributed computing. • Skill 1.4.11: Describe data structures and algorithms (for example, graph data structures and tree data structures).

Domain 2: Data Store Management

Subdomain 2.1: Choose a data store

• Skill 2.1.1: Implement the appropriate storage services for speciﬁc cost and performance requirements (for example, Amazon Redshift, Amazon EMR, AWS Lake Formation, Amazon RDS, Amazon DynamoDB, Amazon Kinesis Data Streams, Amazon Managed Streaming for Apache Kafka [Amazon MSK]). • Skill 2.1.2: Conﬁgure the appropriate storage services for speciﬁc access patterns and requirements (for example, Amazon Redshift, Amazon EMR, Lake Formation, Amazon RDS, DynamoDB). • Skill 2.1.3: Apply storage services to appropriate use cases (for example, using indexing algorithms like Hierarchical Navigable Small Worlds [HNSW] with Amazon Aurora PostgreSQL and using Amazon MemoryDB for fast key/value pair access). • Skill 2.1.4: Integrate migration tools into data processing systems (for example, AWS Transfer Family). • Skill 2.1.5: Implement data migration or remote access methods (for example, Amazon Redshift federated queries, Amazon Redshift materialized views, Amazon Redshift Spectrum). • Skill 2.1.6: Manage locks to prevent access to data (for example, Amazon Redshift, Amazon RDS). • Skill 2.1.7: Manage open table formats (for example Apache Iceberg). • Skill 2.1.8: Describe vector index types (for example, HNSW, IVF).

Subdomain 2.2: Understand data cataloging systems

• Skill 2.2.1: Use data catalogs to consume data from the data's source. • Skill 2.2.2: Build and reference a technical data catalog (for example, AWS Glue Data Catalog, Apache Hive metastore). • Skill 2.2.3: Discover schemas and use AWS Glue crawlers to populate data catalogs. • Skill 2.2.4: Synchronize partitions with a data catalog. • Skill 2.2.5: Create new source or target connections for cataloging (for example, AWS Glue). • Skill 2.2.6: Create and manage business data catalogs (for example, Amazon SageMaker Catalog).

Subdomain 2.3: Manage the lifecycle of data

• Skill 2.3.1: Perform load and unload operations to move data between Amazon S3 and Amazon Redshift. • Skill 2.3.2: Manage S3 Lifecycle policies to change the storage tier of S3 data. • Skill 2.3.3: Expire data when it reaches a speciﬁc age by using S3 Lifecycle policies. • Skill 2.3.4: Manage S3 versioning and DynamoDB TTL. • Skill 2.3.5: Delete data to meet business and legal requirements. • Skill 2.3.6: Protect data with appropriate resiliency and availability.

Subdomain 2.4: Design data models and schema evolution

• Skill 2.4.1: Design schemas for Amazon Redshift, DynamoDB, and Lake Formation. • Skill 2.4.2: Address changes to the characteristics of data. • Skill 2.4.3: Perform schema conversion (for example, by using the AWS Schema Conversion Tool [AWS SCT] and AWS Database Migration Service [AWS DMS] Schema Conversion). • Skill 2.4.4: Establish data lineage by using AWS tools (for example, Amazon SageMaker ML Lineage Tracking and Amazon SageMaker Catalog). • Skill 2.4.5: Describe best practices for indexing, partitioning strategies, compression, and other data optimization techniques. • Skill 2.4.6: Describe vectorization concepts (for example, Amazon Bedrock knowledge base).

Domain 3: Data Operations and Support

Subdomain 3.1: Automate data processing by using AWS services

• Skill 3.1.1: Orchestrate data pipelines (for example, Amazon Managed Workﬂows for Apache Airﬂow [Amazon MWAA], AWS Step Functions). • Skill 3.1.2: Troubleshoot Amazon managed workﬂows. • Skill 3.1.3: Call SDKs to access Amazon features from code. • Skill 3.1.4: Use the features of AWS services to process data (for example, Amazon EMR, Amazon Redshift, AWS Glue). • Skill 3.1.5: Consume and maintain data APIs. • Skill 3.1.6: Prepare data for transformation (for example, AWS Glue DataBrew and Amazon SageMaker Uniﬁed Studio). • Skill 3.1.7: Query data (for example, Amazon Athena). • Skill 3.1.8: Use AWS Lambda to automate data processing. • Skill 3.1.9: Manage events and schedulers (for example, Amazon EventBridge).

Subdomain 3.2: Analyze data by using AWS services

• Skill 3.2.1: Visualize data by using AWS services and tools (for example, DataBrew, Amazon QuickSight). • Skill 3.2.2: Verify and clean data (for example, Lambda, Athena, QuickSight, Jupyter Notebooks, Amazon SageMaker Data Wrangler). • Skill 3.2.3: Use SQL in Amazon Redshift and Athena to query data or to create views. • Skill 3.2.4: Use Athena notebooks that use Apache Spark to explore data. • Skill 3.2.5: Describe tradeoﬀs between provisioned services and serverless services. • Skill 3.2.6: Deﬁne data aggregation, rolling average, grouping, and pivoting.

Subdomain 3.3: Maintain and monitor data pipelines

• Skill 3.3.1: Extract logs for audits. • Skill 3.3.2: Deploy logging and monitoring solutions to facilitate auditing and traceability. • Skill 3.3.3: Use notiﬁcations during monitoring to send alerts. • Skill 3.3.4: Troubleshoot performance issues. • Skill 3.3.5: Use AWS CloudTrail to track API calls. • Skill 3.3.6: Troubleshoot and maintain pipelines (for example, AWS Glue, Amazon EMR). • Skill 3.3.7: Use Amazon CloudWatch Logs to log application data (with a focus on conﬁguration and automation). • Skill 3.3.8: Analyze logs with AWS services (for example, Athena, Amazon EMR, Amazon OpenSearch Service, CloudWatch Logs Insights, big data application logs).

Subdomain 3.4: Ensure data quality

• Skill 3.4.1: Run data quality checks while processing the data (for example, checking for empty ﬁelds). • Skill 3.4.2: Deﬁne data quality rules (for example, DataBrew). • Skill 3.4.3: Investigate data consistency (for example, DataBrew). • Skill 3.4.4: Describe data sampling techniques. • Skill 3.4.5: Implement data skew mechanisms.

Domain 4: Data Security and Governance

Subdomain 4.1: Apply authentication mechanisms

• Skill 4.1.1: Update VPC security groups. • Skill 4.1.2: Create and update AWS Identity and Access Management (IAM) groups, roles, endpoints, and services. • Skill 4.1.3: Create and rotate credentials for password management (for example, AWS Secrets Manager). • Skill 4.1.4: Set up IAM roles for access (for example, AWS Lambda, Amazon API Gateway, AWS CLI, AWS CloudFormation). • Skill 4.1.5: Apply IAM policies to roles, endpoints, and services (for example, S3 Access Points, AWS PrivateLink). • Skill 4.1.6: Describe the diﬀerences between managed services and unmanaged services. • Skill 4.1.7: Use domain, domain units, and projects for SageMaker Uniﬁed Studio.

Subdomain 4.2: Apply authorization mechanisms

• Skill 4.2.1: Create custom IAM policies when a managed policy does not meet the needs. • Skill 4.2.2: Store application and database credentials (for example, Secrets Manager, AWS Systems Manager Parameter Store). • Skill 4.2.3: Provide database users, groups, and roles access and authority in a database (for example, for Amazon Redshift). • Skill 4.2.4: Manage permissions through AWS Lake Formation (for Amazon Redshift, Amazon EMR, Amazon Athena, and Amazon S3). • Skill 4.2.5: Apply authorization methods that address business needs (role-based, tag-based, and attribute-based). • Skill 4.2.6: Construct custom policies that meet the principle of least privilege.

Subdomain 4.3: Ensure data encryption and masking

• Skill 4.3.1: Apply data masking and anonymization according to compliance laws or company policies. • Skill 4.3.2: Use encryption keys to encrypt or decrypt data (for example, AWS Key Management Service [AWS KMS]). • Skill 4.3.3: Conﬁgure encryption across AWS account boundaries. • Skill 4.3.4: Enable encryption in transit or before transit for data.

Subdomain 4.4: Prepare logs for audit

• Skill 4.4.1: Use AWS CloudTrail to track API calls. • Skill 4.4.2: Use Amazon CloudWatch Logs to store application logs. • Skill 4.4.3: Use AWS CloudTrail Lake for centralized logging queries. • Skill 4.4.4: Analyze logs by using AWS services (for example, Athena, CloudWatch Logs Insights, Amazon OpenSearch Service). • Skill 4.4.5: Integrate various AWS services to perform logging (for example, Amazon EMR in cases of large volumes of log data).

Subdomain 4.5: Understand data privacy and governance

• Skill 4.5.1: Grant permissions for data sharing (for example, data sharing for Amazon Redshift). • Skill 4.5.2: Implement PII identiﬁcation (for example, Amazon Macie with Lake Formation). • Skill 4.5.3: Implement data privacy strategies to prevent backups or replications of data to disallowed AWS Regions. • Skill 4.5.4: Viewing conﬁguration changes that have occurred in an account (for example, AWS Conﬁg). • Skill 4.5.5: Maintain data sovereignty. • Skill 4.5.6: Manage data access through Amazon SageMaker Catalog projects. • Skill 4.5.7: Describe governance data framework and data sharing patterns.

Techniques & products

Amazon Athena

Amazon EMR

AWS Glue

AWS Glue DataBrew

AWS Lake Formation

Amazon Kinesis Data Firehose

Amazon Kinesis Data Streams

Amazon Managed Service for Apache Flink

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Amazon OpenSearch Service

Amazon QuickSight

Amazon SageMaker AI

Amazon AppFlow

Amazon EventBridge

Amazon Managed Workflows for Apache Airflow (Amazon MWAA)

Amazon Simple Notification Service (Amazon SNS)

Amazon Simple Queue Service (Amazon SQS)

AWS Step Functions

AWS Budgets

AWS Cost Explorer

AWS Batch

Amazon EC2

AWS Lambda

AWS Serverless Application Model (AWS SAM)

Amazon Elastic Container Registry (Amazon ECR)

Amazon Elastic Container Service (Amazon ECS)

Amazon Elastic Kubernetes Service (Amazon EKS)

Amazon DocumentDB (with MongoDB compatibility)

Amazon DynamoDB

Amazon Keyspaces (for Apache Cassandra)

Amazon MemoryDB for Redis

Amazon Neptune

Amazon RDS

Amazon Aurora

Amazon Redshift

AWS CLI

AWS CloudFormation

AWS Cloud Development Kit (AWS CDK)

AWS CodeBuild

AWS CodeDeploy

AWS CodePipeline

Amazon Q

Amazon API Gateway

Amazon Bedrock

Amazon Kendra

AWS CloudTrail

Amazon CloudWatch

Amazon CloudWatch Logs

AWS Config

Amazon Managed Grafana

AWS Systems Manager

AWS Well-Architected Tool

AWS Data Exchange

AWS Application Discovery Service

AWS Application Migration Service

AWS Database Migration Service (AWS DMS)

AWS DataSync

AWS Snow Family

AWS Transfer Family

Amazon CloudFront

AWS PrivateLink

Amazon Route 53

Amazon VPC

AWS Identity and Access Management (IAM)

AWS Key Management Service (AWS KMS)

Amazon Macie

AWS Secrets Manager

AWS Shield

AWS WAF

AWS Backup

Amazon Elastic Block Store (Amazon EBS)

Amazon Elastic File System (Amazon EFS)

Amazon S3

Amazon S3 Tables

Amazon S3 Glacier

Apache Airflow

Apache Hive metastore

Apache Iceberg

Java Database Connectivity (JDBC)

Open Database Connectivity (ODBC)

Python

SQL

Scala

Java

Bash

PowerShell

Git commands

Infrastructure as Code (IaC)

Continuous Integration and Continuous Delivery (CI/CD)

Distributed computing

Graph data structures

Tree data structures

Hierarchical Navigable Small Worlds (HNSW)

Inverted File Index (IVF)

Jupyter Notebooks

Apache Spark

Personally Identifiable Information (PII) identification

Data sovereignty

Extract, Transform, Load (ETL) pipelines

Data APIs

Large Language Models (LLMs)

Vectorization concepts

AWS Schema Conversion Tool (AWS SCT)

Amazon SageMaker ML Lineage Tracking

Amazon SageMaker Data Wrangler

Amazon SageMaker Unified Studio

Start Practicing