Free Practice Questions for AWS Certified Data Engineer - Associate (DEA-C01) Certification
Study with 390 exam-style practice questions designed to help you prepare for the AWS Certified Data Engineer - Associate (DEA-C01).
Start Practicing
Random Questions
Practice with randomly mixed questions from all topics
Domain Mode
Practice questions from a specific topic area
Exam Information
Exam Details
Key information about AWS Certified Data Engineer - Associate (DEA-C01)
associate (intermediate)
Multiple choice, Multiple response
720 out of 1,000
2ā3 years of experience in data engineering, 1ā2 years of hands-on experience with AWS services
50 scored questions
Exam Topics & Skills Assessed
Skills measured (from the official study guide)
Domain 1: Data Ingestion and Transformation
Subdomain 1.1: Perform data ingestion
⢠Skill 1.1.1: Read data from streaming sources (for example, Amazon Kinesis, Amazon Managed Streaming for Apache Kafka [Amazon MSK], Amazon DynamoDB Streams, AWS Database Migration Service [AWS DMS], AWS Glue, Amazon Redshift). ⢠Skill 1.1.2: Read data from batch sources (for example, Amazon S3, AWS Glue, Amazon EMR, AWS DMS, Amazon Redshift, AWS Lambda, Amazon AppFlow). ⢠Skill 1.1.3: Implement appropriate conļ¬guration options for batch ingestion. ⢠Skill 1.1.4: Consume data APIs. ⢠Skill 1.1.5: Set up schedulers by using Amazon EventBridge, Apache Airļ¬ow, or time-based schedules for jobs and crawlers. ⢠Skill 1.1.6: Set up event triggers (for example, Amazon S3 Event Notiļ¬cations, EventBridge). ⢠Skill 1.1.7: Call a Lambda function from Kinesis. ⢠Skill 1.1.8: Create allowlists for IP addresses to allow connections to data sources. ⢠Skill 1.1.9: Implement throttling and overcoming rate limits (for example, DynamoDB, Amazon RDS, Kinesis). ⢠Skill 1.1.10: Manage fan-in and fan-out for streaming data distribution. ⢠Skill 1.1.11: Describe replayability of data ingestion pipelines. ⢠Skill 1.1.12: Deļ¬ne stateful and stateless data transactions.
Subdomain 1.2: Transform and process data
⢠Skill 1.2.1: Optimize container usage for performance needs (for example, Amazon Elastic Kubernetes Service [Amazon EKS], Amazon Elastic Container Service [Amazon ECS]). ⢠Skill 1.2.2: Connect to diļ¬erent data sources (for example, Java Database Connectivity [JDBC], Open Database Connectivity [ODBC]). ⢠Skill 1.2.3: Integrate data from multiple sources. ⢠Skill 1.2.4: Optimize costs while processing data. ⢠Skill 1.2.5: Implement data transformation services based on requirements (for example, Amazon EMR, AWS Glue, Lambda, Amazon Redshift). ⢠Skill 1.2.6: Transform data between formats (for example, from .csv to Apache Parquet). ⢠Skill 1.2.7: Troubleshoot and debug common transformation failures and performance issues. ⢠Skill 1.2.8: Create data APIs to make data available to other systems by using AWS services. ⢠Skill 1.2.9: Deļ¬ne volume, velocity, and variety of data (for example, structured data, unstructured data). ⢠Skill 1.2.10: Integrate large language models (LLMs) for data processing.
Subdomain 1.3: Orchestrate data pipelines
⢠Skill 1.3.1: Use orchestration services to build workļ¬ows for data ETL pipelines (for example, Lambda, EventBridge, Amazon Managed Workļ¬ows for Apache Airļ¬ow [Amazon MWAA], AWS Step Functions, AWS Glue workļ¬ows). ⢠Skill 1.3.2: Build data pipelines for performance, availability, scalability, resiliency, and fault tolerance. ⢠Skill 1.3.3: Implement and maintain serverless workļ¬ows. ⢠Skill 1.3.4: Use notiļ¬cation services to send alerts (for example, Amazon Simple Notiļ¬cation Service [Amazon SNS], Amazon Simple Queue Service [Amazon SQS]).
Subdomain 1.4: Apply programming concepts
⢠Skill 1.4.1: Optimize code to reduce runtime for data ingestion and transformation. ⢠Skill 1.4.2: Conļ¬gure Lambda functions to meet concurrency and performance needs. ⢠Skill 1.4.3: Use programming languages and frameworks for data engineering (for example, Python, SQL, Scala, R, Java, Bash, PowerShell). ⢠Skill 1.4.4: Use software engineering best practices for data engineering (for example, version control, testing, logging, monitoring). ⢠Skill 1.4.5: Use Infrastructure as Code (IaC) to deploy data engineering solutions. ⢠Skill 1.4.6: Use the AWS Serverless Application Model (AWS SAM) to package and deploy serverless data pipelines (for example, Lambda functions, Step Functions, DynamoDB tables). ⢠Skill 1.4.7: Use and mount storage volumes from within Lambda functions. ⢠Skill 1.4.8: Use infrastructure as code (IaC) for repeatable resource deployment (for example, AWS CloudFormation and AWS Cloud Development Kit [AWS CDK]). ⢠Skill 1.4.9: Describe continuous integration and continuous delivery (CI/CD) (implementation, testing, and deployment of data pipelines). ⢠Skill 1.4.10: Deļ¬ne distributed computing. ⢠Skill 1.4.11: Describe data structures and algorithms (for example, graph data structures and tree data structures).
Domain 2: Data Store Management
Subdomain 2.1: Choose a data store
⢠Skill 2.1.1: Implement the appropriate storage services for speciļ¬c cost and performance requirements (for example, Amazon Redshift, Amazon EMR, AWS Lake Formation, Amazon RDS, Amazon DynamoDB, Amazon Kinesis Data Streams, Amazon Managed Streaming for Apache Kafka [Amazon MSK]). ⢠Skill 2.1.2: Conļ¬gure the appropriate storage services for speciļ¬c access patterns and requirements (for example, Amazon Redshift, Amazon EMR, Lake Formation, Amazon RDS, DynamoDB). ⢠Skill 2.1.3: Apply storage services to appropriate use cases (for example, using indexing algorithms like Hierarchical Navigable Small Worlds [HNSW] with Amazon Aurora PostgreSQL and using Amazon MemoryDB for fast key/value pair access). ⢠Skill 2.1.4: Integrate migration tools into data processing systems (for example, AWS Transfer Family). ⢠Skill 2.1.5: Implement data migration or remote access methods (for example, Amazon Redshift federated queries, Amazon Redshift materialized views, Amazon Redshift Spectrum). ⢠Skill 2.1.6: Manage locks to prevent access to data (for example, Amazon Redshift, Amazon RDS). ⢠Skill 2.1.7: Manage open table formats (for example Apache Iceberg). ⢠Skill 2.1.8: Describe vector index types (for example, HNSW, IVF).
Subdomain 2.2: Understand data cataloging systems
⢠Skill 2.2.1: Use data catalogs to consume data from the data's source. ⢠Skill 2.2.2: Build and reference a technical data catalog (for example, AWS Glue Data Catalog, Apache Hive metastore). ⢠Skill 2.2.3: Discover schemas and use AWS Glue crawlers to populate data catalogs. ⢠Skill 2.2.4: Synchronize partitions with a data catalog. ⢠Skill 2.2.5: Create new source or target connections for cataloging (for example, AWS Glue). ⢠Skill 2.2.6: Create and manage business data catalogs (for example, Amazon SageMaker Catalog).
Subdomain 2.3: Manage the lifecycle of data
⢠Skill 2.3.1: Perform load and unload operations to move data between Amazon S3 and Amazon Redshift. ⢠Skill 2.3.2: Manage S3 Lifecycle policies to change the storage tier of S3 data. ⢠Skill 2.3.3: Expire data when it reaches a speciļ¬c age by using S3 Lifecycle policies. ⢠Skill 2.3.4: Manage S3 versioning and DynamoDB TTL. ⢠Skill 2.3.5: Delete data to meet business and legal requirements. ⢠Skill 2.3.6: Protect data with appropriate resiliency and availability.
Subdomain 2.4: Design data models and schema evolution
⢠Skill 2.4.1: Design schemas for Amazon Redshift, DynamoDB, and Lake Formation. ⢠Skill 2.4.2: Address changes to the characteristics of data. ⢠Skill 2.4.3: Perform schema conversion (for example, by using the AWS Schema Conversion Tool [AWS SCT] and AWS Database Migration Service [AWS DMS] Schema Conversion). ⢠Skill 2.4.4: Establish data lineage by using AWS tools (for example, Amazon SageMaker ML Lineage Tracking and Amazon SageMaker Catalog). ⢠Skill 2.4.5: Describe best practices for indexing, partitioning strategies, compression, and other data optimization techniques. ⢠Skill 2.4.6: Describe vectorization concepts (for example, Amazon Bedrock knowledge base).
Domain 3: Data Operations and Support
Subdomain 3.1: Automate data processing by using AWS services
⢠Skill 3.1.1: Orchestrate data pipelines (for example, Amazon Managed Workļ¬ows for Apache Airļ¬ow [Amazon MWAA], AWS Step Functions). ⢠Skill 3.1.2: Troubleshoot Amazon managed workļ¬ows. ⢠Skill 3.1.3: Call SDKs to access Amazon features from code. ⢠Skill 3.1.4: Use the features of AWS services to process data (for example, Amazon EMR, Amazon Redshift, AWS Glue). ⢠Skill 3.1.5: Consume and maintain data APIs. ⢠Skill 3.1.6: Prepare data for transformation (for example, AWS Glue DataBrew and Amazon SageMaker Uniļ¬ed Studio). ⢠Skill 3.1.7: Query data (for example, Amazon Athena). ⢠Skill 3.1.8: Use AWS Lambda to automate data processing. ⢠Skill 3.1.9: Manage events and schedulers (for example, Amazon EventBridge).
Subdomain 3.2: Analyze data by using AWS services
⢠Skill 3.2.1: Visualize data by using AWS services and tools (for example, DataBrew, Amazon QuickSight). ⢠Skill 3.2.2: Verify and clean data (for example, Lambda, Athena, QuickSight, Jupyter Notebooks, Amazon SageMaker Data Wrangler). ⢠Skill 3.2.3: Use SQL in Amazon Redshift and Athena to query data or to create views. ⢠Skill 3.2.4: Use Athena notebooks that use Apache Spark to explore data. ⢠Skill 3.2.5: Describe tradeoļ¬s between provisioned services and serverless services. ⢠Skill 3.2.6: Deļ¬ne data aggregation, rolling average, grouping, and pivoting.
Subdomain 3.3: Maintain and monitor data pipelines
⢠Skill 3.3.1: Extract logs for audits. ⢠Skill 3.3.2: Deploy logging and monitoring solutions to facilitate auditing and traceability. ⢠Skill 3.3.3: Use notiļ¬cations during monitoring to send alerts. ⢠Skill 3.3.4: Troubleshoot performance issues. ⢠Skill 3.3.5: Use AWS CloudTrail to track API calls. ⢠Skill 3.3.6: Troubleshoot and maintain pipelines (for example, AWS Glue, Amazon EMR). ⢠Skill 3.3.7: Use Amazon CloudWatch Logs to log application data (with a focus on conļ¬guration and automation). ⢠Skill 3.3.8: Analyze logs with AWS services (for example, Athena, Amazon EMR, Amazon OpenSearch Service, CloudWatch Logs Insights, big data application logs).
Subdomain 3.4: Ensure data quality
⢠Skill 3.4.1: Run data quality checks while processing the data (for example, checking for empty ļ¬elds). ⢠Skill 3.4.2: Deļ¬ne data quality rules (for example, DataBrew). ⢠Skill 3.4.3: Investigate data consistency (for example, DataBrew). ⢠Skill 3.4.4: Describe data sampling techniques. ⢠Skill 3.4.5: Implement data skew mechanisms.
Domain 4: Data Security and Governance
Subdomain 4.1: Apply authentication mechanisms
⢠Skill 4.1.1: Update VPC security groups. ⢠Skill 4.1.2: Create and update AWS Identity and Access Management (IAM) groups, roles, endpoints, and services. ⢠Skill 4.1.3: Create and rotate credentials for password management (for example, AWS Secrets Manager). ⢠Skill 4.1.4: Set up IAM roles for access (for example, AWS Lambda, Amazon API Gateway, AWS CLI, AWS CloudFormation). ⢠Skill 4.1.5: Apply IAM policies to roles, endpoints, and services (for example, S3 Access Points, AWS PrivateLink). ⢠Skill 4.1.6: Describe the diļ¬erences between managed services and unmanaged services. ⢠Skill 4.1.7: Use domain, domain units, and projects for SageMaker Uniļ¬ed Studio.
Subdomain 4.2: Apply authorization mechanisms
⢠Skill 4.2.1: Create custom IAM policies when a managed policy does not meet the needs. ⢠Skill 4.2.2: Store application and database credentials (for example, Secrets Manager, AWS Systems Manager Parameter Store). ⢠Skill 4.2.3: Provide database users, groups, and roles access and authority in a database (for example, for Amazon Redshift). ⢠Skill 4.2.4: Manage permissions through AWS Lake Formation (for Amazon Redshift, Amazon EMR, Amazon Athena, and Amazon S3). ⢠Skill 4.2.5: Apply authorization methods that address business needs (role-based, tag-based, and attribute-based). ⢠Skill 4.2.6: Construct custom policies that meet the principle of least privilege.
Subdomain 4.3: Ensure data encryption and masking
⢠Skill 4.3.1: Apply data masking and anonymization according to compliance laws or company policies. ⢠Skill 4.3.2: Use encryption keys to encrypt or decrypt data (for example, AWS Key Management Service [AWS KMS]). ⢠Skill 4.3.3: Conļ¬gure encryption across AWS account boundaries. ⢠Skill 4.3.4: Enable encryption in transit or before transit for data.
Subdomain 4.4: Prepare logs for audit
⢠Skill 4.4.1: Use AWS CloudTrail to track API calls. ⢠Skill 4.4.2: Use Amazon CloudWatch Logs to store application logs. ⢠Skill 4.4.3: Use AWS CloudTrail Lake for centralized logging queries. ⢠Skill 4.4.4: Analyze logs by using AWS services (for example, Athena, CloudWatch Logs Insights, Amazon OpenSearch Service). ⢠Skill 4.4.5: Integrate various AWS services to perform logging (for example, Amazon EMR in cases of large volumes of log data).
Subdomain 4.5: Understand data privacy and governance
⢠Skill 4.5.1: Grant permissions for data sharing (for example, data sharing for Amazon Redshift). ⢠Skill 4.5.2: Implement PII identiļ¬cation (for example, Amazon Macie with Lake Formation). ⢠Skill 4.5.3: Implement data privacy strategies to prevent backups or replications of data to disallowed AWS Regions. ⢠Skill 4.5.4: Viewing conļ¬guration changes that have occurred in an account (for example, AWS Conļ¬g). ⢠Skill 4.5.5: Maintain data sovereignty. ⢠Skill 4.5.6: Manage data access through Amazon SageMaker Catalog projects. ⢠Skill 4.5.7: Describe governance data framework and data sharing patterns.
Techniques & products