Create Next App

amazon AWS_CERTIFIED_BIG_DATA_SPECIALTY

Exam contains 85 questions

Page 4 of 15

Question 19 🔥

An organization currently runs a large Hadoop environment in their data center and is in the process of creating an alternative Hadoop environment on AWS, using Amazon EMR.They generate around 20 TB of data on a monthly basis. Also on a monthly basis, files need to be grouped and copied to Amazon S3 to be used for the AmazonEMR environment. They have multiple S3 buckets across AWS accounts to which data needs to be copied. There is a 10G AWS Direct Connect setup between their data center and AWS, and the network team has agreed to allocate 50% of AWS Direct Connect bandwidth to data transfer. The data transfer cannot take more than two days.What would be the MOST efficient approach to transfer data to AWS on a monthly basis?

Which database solution meets these requirements?

A. Use an offline copy method, such as an AWS Snowball device, to copy and transfer data to Amazon S3.

Highly voted

B. Configure a multipart upload for Amazon S3 on AWS Java SDK to transfer data over AWS Direct Connect.

Highly voted

C. Use Amazon S3 transfer acceleration capability to transfer data to Amazon S3 over AWS Direct Connect.

Highly voted

D. Setup S3DistCop tool on the on-premises Hadoop environment to transfer data to Amazon S3 over AWS Direct Connect.

Highly voted

Discussion of the question

Question 20 🔥

An organization is developing a mobile social application and needs to collect logs from all devices on which it is installed. The organization is evaluating theAmazon Kinesis Data Streams to push logs and Amazon EMR to process data. They want to store data on HDFS using the default replication factor to replicate data among the cluster, but they are concerned about the durability of the data. Currently, they are producing 300 GB of raw data daily, with additional spikes during special events. They will need to scale out the Amazon EMR cluster to match the increase in streamed data.Which solution prevents data loss and matches compute demand?

Which database solution meets these requirements?

A. Use multiple Amazon EBS volumes on Amazon EMR to store processed data and scale out the Amazon EMR cluster as needed.

Highly voted

B. Use the EMR File System and Amazon S3 to store processed data and scale out the Amazon EMR cluster as needed.

Highly voted

C. Use Amazon DynamoDB to store processed data and scale out the Amazon EMR cluster as needed.

Highly voted

D. use Amazon Kinesis Data Firehose and, instead of using Amazon EMR, stream logs directly into Amazon Elasticsearch Service.

Highly voted

Discussion of the question

Question 21 🔥

An administrator needs to manage a large catalog of items from various external sellers. The administrator needs to determine if the items should be identified as minimally dangerous, dangerous, or highly dangerous based on their textual descriptions. The administrator already has some items with the danger attribute, but receives hundreds of new item descriptions every day without such classification.The administrator has a system that captures dangerous goods reports from customer support team of from user feedback.What is a cost-effective architecture to solve this issue?

Which database solution meets these requirements?

A. Build a set of regular expression rules that are based on the existing examples, and run them on the DynamoDB Streams as every new item description is added to the system.

Highly voted

B. Build a Kinesis Streams process that captures and marks the relevant items in the dangerous goods reports using a Lambda function once more than two reports have been filed.

Highly voted

C. Build a machine learning model to properly classify dangerous goods and run it on the DynamoDB Streams as every new item description is added to the system.

Highly voted

D. Build a machine learning model with binary classification for dangerous goods and run it on the DynamoDB Streams as every new item description is added to the system.

Highly voted

Discussion of the question

Question 22 🔥

A company receives data sets coming from external providers on Amazon S3. Data sets from different providers are dependent on one another. Data sets will arrive at different times and in no particular order.A data architect needs to design a solution that enables the company to do the following:✑ Rapidly perform cross data set analysis as soon as the data becomes available✑ Manage dependencies between data sets that arrive at different timesWhich architecture strategy offers a scalable and cost-effective solution that meets these requirements?

Which database solution meets these requirements?

A. Maintain data dependency information in Amazon RDS for MySQL. Use an AWS Data Pipeline job to load an Amazon EMR Hive table based on task dependencies and event notification triggers in Amazon S3.

Highly voted

B. Maintain data dependency information in an Amazon DynamoDB table. Use Amazon SNS and event notifications to publish data to fleet of Amazon EC2 workers. Once the task dependencies have been resolved, process the data with Amazon EMR.

Highly voted

C. Maintain data dependency information in an Amazon ElastiCache Redis cluster. Use Amazon S3 event notifications to trigger an AWS Lambda function that maps the S3 object to Redis. Once the task dependencies have been resolved, process the data with Amazon EMR.

Highly voted

D. Maintain data dependency information in an Amazon DynamoDB table. Use Amazon S3 event notifications to trigger an AWS Lambda function that maps the S3 object to the task associated with it in DynamoDB. Once all task dependencies have been resolved, process the data with Amazon EMR.

Highly voted

Discussion of the question

Question 23 🔥

An advertising organization uses an application to process a stream of events that are received from clients in multiple unstructured formats.The application does the following:✑ Transforms the events into a single structured format and streams them to Amazon Kinesis for real-time analysis.✑ Stores the unstructured raw events from the log files on local hard drivers that are rotated and uploaded to Amazon S3.The organization wants to extract campaign performance reporting using an existing Amazon redshift cluster.Which solution will provide the performance data with the LEAST number of operations?

Which database solution meets these requirements?

A. Install the Amazon Kinesis Data Firehose agent on the application servers and use it to stream the log files directly to Amazon Redshift.

Highly voted

B. Create an external table in Amazon Redshift and point it to the S3 bucket where the unstructured raw events are stored.

Highly voted

C. Write an AWS Lambda function that triggers every hour to load the new log files already in S3 to Amazon redshift.

Highly voted

D. Connect Amazon Kinesis Data Firehose to the existing Amazon Kinesis stream and use it to stream the event directly to Amazon Redshift.

Highly voted

Discussion of the question

Question 24 🔥

An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region.Which three steps should the data engineer take to accomplish this task? (Choose three.)

Which database solution meets these requirements?

A. Create a new KMS key in the destination region.

Highly voted

B. Copy the existing KMS key to the destination region.

Highly voted

C. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source region.

Highly voted

D. In the source region, enable cross-region replication and specify the name of the copy grant created.

Highly voted

E. In the destination region, enable cross-region replication and specify the name of the copy grant created.

Highly voted

F. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key created in the destination region. ADF

Highly voted

Discussion of the question

Ready to Pass Your Certification Test

amazon AWS_CERTIFIED_BIG_DATA_SPECIALTY

Exam contains 85 questions

Lorem ipsum dolor sit amet consectetur. Eget sed turpis aenean sit aenean. Integer at nam ullamcorper a.

Company

Product

Resources

Follow us