Free Practice Questions for AWS Certified Data Engineer - Associate (DEA-C01) Certification

    šŸ”„ Last checked for updates February 16th, 2026

    Study with 390 exam-style practice questions designed to help you prepare for the AWS Certified Data Engineer - Associate (DEA-C01).

    Start Practicing

    Random Questions

    Practice with randomly mixed questions from all topics

    Question MixAll Topics
    FormatRandom Order

    Domain Mode

    Practice questions from a specific topic area

    Exam Information

    Exam Details

    Key information about AWS Certified Data Engineer - Associate (DEA-C01)

    Official study guide:

    View

    level:

    associate (intermediate)

    exam format:

    Multiple choice, Multiple response

    passing score:

    720 out of 1,000

    target audience:

    2–3 years of experience in data engineering, 1–2 years of hands-on experience with AWS services

    number of questions:

    50 scored questions

    Exam Topics & Skills Assessed

    Skills measured (from the official study guide)

    Domain 1: Data Ingestion and Transformation

    Subdomain 1.1: Perform data ingestion

    • Skill 1.1.1: Read data from streaming sources (for example, Amazon Kinesis, Amazon Managed Streaming for Apache Kafka [Amazon MSK], Amazon DynamoDB Streams, AWS Database Migration Service [AWS DMS], AWS Glue, Amazon Redshift). • Skill 1.1.2: Read data from batch sources (for example, Amazon S3, AWS Glue, Amazon EMR, AWS DMS, Amazon Redshift, AWS Lambda, Amazon AppFlow). • Skill 1.1.3: Implement appropriate configuration options for batch ingestion. • Skill 1.1.4: Consume data APIs. • Skill 1.1.5: Set up schedulers by using Amazon EventBridge, Apache Airflow, or time-based schedules for jobs and crawlers. • Skill 1.1.6: Set up event triggers (for example, Amazon S3 Event Notifications, EventBridge). • Skill 1.1.7: Call a Lambda function from Kinesis. • Skill 1.1.8: Create allowlists for IP addresses to allow connections to data sources. • Skill 1.1.9: Implement throttling and overcoming rate limits (for example, DynamoDB, Amazon RDS, Kinesis). • Skill 1.1.10: Manage fan-in and fan-out for streaming data distribution. • Skill 1.1.11: Describe replayability of data ingestion pipelines. • Skill 1.1.12: Define stateful and stateless data transactions.

    Subdomain 1.2: Transform and process data

    • Skill 1.2.1: Optimize container usage for performance needs (for example, Amazon Elastic Kubernetes Service [Amazon EKS], Amazon Elastic Container Service [Amazon ECS]). • Skill 1.2.2: Connect to different data sources (for example, Java Database Connectivity [JDBC], Open Database Connectivity [ODBC]). • Skill 1.2.3: Integrate data from multiple sources. • Skill 1.2.4: Optimize costs while processing data. • Skill 1.2.5: Implement data transformation services based on requirements (for example, Amazon EMR, AWS Glue, Lambda, Amazon Redshift). • Skill 1.2.6: Transform data between formats (for example, from .csv to Apache Parquet). • Skill 1.2.7: Troubleshoot and debug common transformation failures and performance issues. • Skill 1.2.8: Create data APIs to make data available to other systems by using AWS services. • Skill 1.2.9: Define volume, velocity, and variety of data (for example, structured data, unstructured data). • Skill 1.2.10: Integrate large language models (LLMs) for data processing.

    Subdomain 1.3: Orchestrate data pipelines

    • Skill 1.3.1: Use orchestration services to build workflows for data ETL pipelines (for example, Lambda, EventBridge, Amazon Managed Workflows for Apache Airflow [Amazon MWAA], AWS Step Functions, AWS Glue workflows). • Skill 1.3.2: Build data pipelines for performance, availability, scalability, resiliency, and fault tolerance. • Skill 1.3.3: Implement and maintain serverless workflows. • Skill 1.3.4: Use notification services to send alerts (for example, Amazon Simple Notification Service [Amazon SNS], Amazon Simple Queue Service [Amazon SQS]).

    Subdomain 1.4: Apply programming concepts

    • Skill 1.4.1: Optimize code to reduce runtime for data ingestion and transformation. • Skill 1.4.2: Configure Lambda functions to meet concurrency and performance needs. • Skill 1.4.3: Use programming languages and frameworks for data engineering (for example, Python, SQL, Scala, R, Java, Bash, PowerShell). • Skill 1.4.4: Use software engineering best practices for data engineering (for example, version control, testing, logging, monitoring). • Skill 1.4.5: Use Infrastructure as Code (IaC) to deploy data engineering solutions. • Skill 1.4.6: Use the AWS Serverless Application Model (AWS SAM) to package and deploy serverless data pipelines (for example, Lambda functions, Step Functions, DynamoDB tables). • Skill 1.4.7: Use and mount storage volumes from within Lambda functions. • Skill 1.4.8: Use infrastructure as code (IaC) for repeatable resource deployment (for example, AWS CloudFormation and AWS Cloud Development Kit [AWS CDK]). • Skill 1.4.9: Describe continuous integration and continuous delivery (CI/CD) (implementation, testing, and deployment of data pipelines). • Skill 1.4.10: Define distributed computing. • Skill 1.4.11: Describe data structures and algorithms (for example, graph data structures and tree data structures).

    Domain 2: Data Store Management

    Subdomain 2.1: Choose a data store

    • Skill 2.1.1: Implement the appropriate storage services for specific cost and performance requirements (for example, Amazon Redshift, Amazon EMR, AWS Lake Formation, Amazon RDS, Amazon DynamoDB, Amazon Kinesis Data Streams, Amazon Managed Streaming for Apache Kafka [Amazon MSK]). • Skill 2.1.2: Configure the appropriate storage services for specific access patterns and requirements (for example, Amazon Redshift, Amazon EMR, Lake Formation, Amazon RDS, DynamoDB). • Skill 2.1.3: Apply storage services to appropriate use cases (for example, using indexing algorithms like Hierarchical Navigable Small Worlds [HNSW] with Amazon Aurora PostgreSQL and using Amazon MemoryDB for fast key/value pair access). • Skill 2.1.4: Integrate migration tools into data processing systems (for example, AWS Transfer Family). • Skill 2.1.5: Implement data migration or remote access methods (for example, Amazon Redshift federated queries, Amazon Redshift materialized views, Amazon Redshift Spectrum). • Skill 2.1.6: Manage locks to prevent access to data (for example, Amazon Redshift, Amazon RDS). • Skill 2.1.7: Manage open table formats (for example Apache Iceberg). • Skill 2.1.8: Describe vector index types (for example, HNSW, IVF).

    Subdomain 2.2: Understand data cataloging systems

    • Skill 2.2.1: Use data catalogs to consume data from the data's source. • Skill 2.2.2: Build and reference a technical data catalog (for example, AWS Glue Data Catalog, Apache Hive metastore). • Skill 2.2.3: Discover schemas and use AWS Glue crawlers to populate data catalogs. • Skill 2.2.4: Synchronize partitions with a data catalog. • Skill 2.2.5: Create new source or target connections for cataloging (for example, AWS Glue). • Skill 2.2.6: Create and manage business data catalogs (for example, Amazon SageMaker Catalog).

    Subdomain 2.3: Manage the lifecycle of data

    • Skill 2.3.1: Perform load and unload operations to move data between Amazon S3 and Amazon Redshift. • Skill 2.3.2: Manage S3 Lifecycle policies to change the storage tier of S3 data. • Skill 2.3.3: Expire data when it reaches a specific age by using S3 Lifecycle policies. • Skill 2.3.4: Manage S3 versioning and DynamoDB TTL. • Skill 2.3.5: Delete data to meet business and legal requirements. • Skill 2.3.6: Protect data with appropriate resiliency and availability.

    Subdomain 2.4: Design data models and schema evolution

    • Skill 2.4.1: Design schemas for Amazon Redshift, DynamoDB, and Lake Formation. • Skill 2.4.2: Address changes to the characteristics of data. • Skill 2.4.3: Perform schema conversion (for example, by using the AWS Schema Conversion Tool [AWS SCT] and AWS Database Migration Service [AWS DMS] Schema Conversion). • Skill 2.4.4: Establish data lineage by using AWS tools (for example, Amazon SageMaker ML Lineage Tracking and Amazon SageMaker Catalog). • Skill 2.4.5: Describe best practices for indexing, partitioning strategies, compression, and other data optimization techniques. • Skill 2.4.6: Describe vectorization concepts (for example, Amazon Bedrock knowledge base).

    Domain 3: Data Operations and Support

    Subdomain 3.1: Automate data processing by using AWS services

    • Skill 3.1.1: Orchestrate data pipelines (for example, Amazon Managed Workflows for Apache Airflow [Amazon MWAA], AWS Step Functions). • Skill 3.1.2: Troubleshoot Amazon managed workflows. • Skill 3.1.3: Call SDKs to access Amazon features from code. • Skill 3.1.4: Use the features of AWS services to process data (for example, Amazon EMR, Amazon Redshift, AWS Glue). • Skill 3.1.5: Consume and maintain data APIs. • Skill 3.1.6: Prepare data for transformation (for example, AWS Glue DataBrew and Amazon SageMaker Unified Studio). • Skill 3.1.7: Query data (for example, Amazon Athena). • Skill 3.1.8: Use AWS Lambda to automate data processing. • Skill 3.1.9: Manage events and schedulers (for example, Amazon EventBridge).

    Subdomain 3.2: Analyze data by using AWS services

    • Skill 3.2.1: Visualize data by using AWS services and tools (for example, DataBrew, Amazon QuickSight). • Skill 3.2.2: Verify and clean data (for example, Lambda, Athena, QuickSight, Jupyter Notebooks, Amazon SageMaker Data Wrangler). • Skill 3.2.3: Use SQL in Amazon Redshift and Athena to query data or to create views. • Skill 3.2.4: Use Athena notebooks that use Apache Spark to explore data. • Skill 3.2.5: Describe tradeoffs between provisioned services and serverless services. • Skill 3.2.6: Define data aggregation, rolling average, grouping, and pivoting.

    Subdomain 3.3: Maintain and monitor data pipelines

    • Skill 3.3.1: Extract logs for audits. • Skill 3.3.2: Deploy logging and monitoring solutions to facilitate auditing and traceability. • Skill 3.3.3: Use notifications during monitoring to send alerts. • Skill 3.3.4: Troubleshoot performance issues. • Skill 3.3.5: Use AWS CloudTrail to track API calls. • Skill 3.3.6: Troubleshoot and maintain pipelines (for example, AWS Glue, Amazon EMR). • Skill 3.3.7: Use Amazon CloudWatch Logs to log application data (with a focus on configuration and automation). • Skill 3.3.8: Analyze logs with AWS services (for example, Athena, Amazon EMR, Amazon OpenSearch Service, CloudWatch Logs Insights, big data application logs).

    Subdomain 3.4: Ensure data quality

    • Skill 3.4.1: Run data quality checks while processing the data (for example, checking for empty fields). • Skill 3.4.2: Define data quality rules (for example, DataBrew). • Skill 3.4.3: Investigate data consistency (for example, DataBrew). • Skill 3.4.4: Describe data sampling techniques. • Skill 3.4.5: Implement data skew mechanisms.

    Domain 4: Data Security and Governance

    Subdomain 4.1: Apply authentication mechanisms

    • Skill 4.1.1: Update VPC security groups. • Skill 4.1.2: Create and update AWS Identity and Access Management (IAM) groups, roles, endpoints, and services. • Skill 4.1.3: Create and rotate credentials for password management (for example, AWS Secrets Manager). • Skill 4.1.4: Set up IAM roles for access (for example, AWS Lambda, Amazon API Gateway, AWS CLI, AWS CloudFormation). • Skill 4.1.5: Apply IAM policies to roles, endpoints, and services (for example, S3 Access Points, AWS PrivateLink). • Skill 4.1.6: Describe the differences between managed services and unmanaged services. • Skill 4.1.7: Use domain, domain units, and projects for SageMaker Unified Studio.

    Subdomain 4.2: Apply authorization mechanisms

    • Skill 4.2.1: Create custom IAM policies when a managed policy does not meet the needs. • Skill 4.2.2: Store application and database credentials (for example, Secrets Manager, AWS Systems Manager Parameter Store). • Skill 4.2.3: Provide database users, groups, and roles access and authority in a database (for example, for Amazon Redshift). • Skill 4.2.4: Manage permissions through AWS Lake Formation (for Amazon Redshift, Amazon EMR, Amazon Athena, and Amazon S3). • Skill 4.2.5: Apply authorization methods that address business needs (role-based, tag-based, and attribute-based). • Skill 4.2.6: Construct custom policies that meet the principle of least privilege.

    Subdomain 4.3: Ensure data encryption and masking

    • Skill 4.3.1: Apply data masking and anonymization according to compliance laws or company policies. • Skill 4.3.2: Use encryption keys to encrypt or decrypt data (for example, AWS Key Management Service [AWS KMS]). • Skill 4.3.3: Configure encryption across AWS account boundaries. • Skill 4.3.4: Enable encryption in transit or before transit for data.

    Subdomain 4.4: Prepare logs for audit

    • Skill 4.4.1: Use AWS CloudTrail to track API calls. • Skill 4.4.2: Use Amazon CloudWatch Logs to store application logs. • Skill 4.4.3: Use AWS CloudTrail Lake for centralized logging queries. • Skill 4.4.4: Analyze logs by using AWS services (for example, Athena, CloudWatch Logs Insights, Amazon OpenSearch Service). • Skill 4.4.5: Integrate various AWS services to perform logging (for example, Amazon EMR in cases of large volumes of log data).

    Subdomain 4.5: Understand data privacy and governance

    • Skill 4.5.1: Grant permissions for data sharing (for example, data sharing for Amazon Redshift). • Skill 4.5.2: Implement PII identification (for example, Amazon Macie with Lake Formation). • Skill 4.5.3: Implement data privacy strategies to prevent backups or replications of data to disallowed AWS Regions. • Skill 4.5.4: Viewing configuration changes that have occurred in an account (for example, AWS Config). • Skill 4.5.5: Maintain data sovereignty. • Skill 4.5.6: Manage data access through Amazon SageMaker Catalog projects. • Skill 4.5.7: Describe governance data framework and data sharing patterns.

    Techniques & products

    Amazon Athena
    Amazon EMR
    AWS Glue
    AWS Glue DataBrew
    AWS Lake Formation
    Amazon Kinesis Data Firehose
    Amazon Kinesis Data Streams
    Amazon Managed Service for Apache Flink
    Amazon Managed Streaming for Apache Kafka (Amazon MSK)
    Amazon OpenSearch Service
    Amazon QuickSight
    Amazon SageMaker AI
    Amazon AppFlow
    Amazon EventBridge
    Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
    Amazon Simple Notification Service (Amazon SNS)
    Amazon Simple Queue Service (Amazon SQS)
    AWS Step Functions
    AWS Budgets
    AWS Cost Explorer
    AWS Batch
    Amazon EC2
    AWS Lambda
    AWS Serverless Application Model (AWS SAM)
    Amazon Elastic Container Registry (Amazon ECR)
    Amazon Elastic Container Service (Amazon ECS)
    Amazon Elastic Kubernetes Service (Amazon EKS)
    Amazon DocumentDB (with MongoDB compatibility)
    Amazon DynamoDB
    Amazon Keyspaces (for Apache Cassandra)
    Amazon MemoryDB for Redis
    Amazon Neptune
    Amazon RDS
    Amazon Aurora
    Amazon Redshift
    AWS CLI
    AWS CloudFormation
    AWS Cloud Development Kit (AWS CDK)
    AWS CodeBuild
    AWS CodeDeploy
    AWS CodePipeline
    Amazon Q
    Amazon API Gateway
    Amazon Bedrock
    Amazon Kendra
    AWS CloudTrail
    Amazon CloudWatch
    Amazon CloudWatch Logs
    AWS Config
    Amazon Managed Grafana
    AWS Systems Manager
    AWS Well-Architected Tool
    AWS Data Exchange
    AWS Application Discovery Service
    AWS Application Migration Service
    AWS Database Migration Service (AWS DMS)
    AWS DataSync
    AWS Snow Family
    AWS Transfer Family
    Amazon CloudFront
    AWS PrivateLink
    Amazon Route 53
    Amazon VPC
    AWS Identity and Access Management (IAM)
    AWS Key Management Service (AWS KMS)
    Amazon Macie
    AWS Secrets Manager
    AWS Shield
    AWS WAF
    AWS Backup
    Amazon Elastic Block Store (Amazon EBS)
    Amazon Elastic File System (Amazon EFS)
    Amazon S3
    Amazon S3 Tables
    Amazon S3 Glacier
    Apache Airflow
    Apache Hive metastore
    Apache Iceberg
    Java Database Connectivity (JDBC)
    Open Database Connectivity (ODBC)
    Python
    SQL
    Scala
    R
    Java
    Bash
    PowerShell
    Git commands
    Infrastructure as Code (IaC)
    Continuous Integration and Continuous Delivery (CI/CD)
    Distributed computing
    Graph data structures
    Tree data structures
    Hierarchical Navigable Small Worlds (HNSW)
    Inverted File Index (IVF)
    Jupyter Notebooks
    Apache Spark
    Personally Identifiable Information (PII) identification
    Data sovereignty
    Extract, Transform, Load (ETL) pipelines
    Data APIs
    Large Language Models (LLMs)
    Vectorization concepts
    AWS Schema Conversion Tool (AWS SCT)
    Amazon SageMaker ML Lineage Tracking
    Amazon SageMaker Data Wrangler
    Amazon SageMaker Unified Studio

    CertSafari is not affiliated with, endorsed by, or officially connected to Amazon Web Services, Inc.. Full disclaimer