Create Next App

amazon AWS_CERTIFIED_BIG_DATA_SPECIALTY

Exam contains 85 questions

Page 2 of 15

Question 7 🔥

An organization is setting up a data catalog and metadata management environment for their numerous data stores currently running on AWS. The data catalog will be used to determine the structure and other attributes of data in the data stores. The data stores are composed of Amazon RDS databases, AmazonRedshift, and CSV files residing on Amazon S3. The catalog should be populated on a scheduled basis, and minimal administration is required to manage the catalog.How can this be accomplished?

Which database solution meets these requirements?

A. Set up Amazon DynamoDB as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the DynamoDB table.

Highly voted

B. Use an Amazon database as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the database.

Highly voted

C. Use AWS Glue Data Catalog as the data catalog and schedule crawlers that connect to data sources to populate the catalog.

Highly voted

D. Set up Apache Hive metastore on an Amazon EC2 instance and run a scheduled bash script that connects to data sources to populate the metastore.

Highly voted

Discussion of the question

Question 8 🔥

Company A operates in Country X. Company A maintains a large dataset of historical purchase orders that contains personal data of their customers in the form of full names and telephone numbers. The dataset consists of 5 text files, 1TB each. Currently the dataset resides on-premises due to legal requirements of storing personal data in-country. The research and development department needs to run a clustering algorithm on the dataset and wants to use Elastic Map Reduce service in the closest AWS region. Due to geographic distance, the minimum latency between the on-premises system and the closet AWS region is 200 ms.Which option allows Company A to do clustering in the AWS Cloud and meet the legal requirement of maintaining personal data in-country?

Which database solution meets these requirements?

D. Use AWS Import/Export Snowball device to securely transfer the data to the AWS region and copy the files onto an EBS volume. Have the EMR cluster read the dataset using EMRFS.

Highly voted

A. Anonymize the personal data portions of the dataset and transfer the data files into Amazon S3 in the AWS region. Have the EMR cluster read the dataset using EMRFS.

Highly voted

B. Establish a Direct Connect link between the on-premises system and the AWS region to reduce latency. Have the EMR cluster read the data directly from the on-premises storage system over Direct Connect.

Highly voted

C. Encrypt the data files according to encryption standards of Country X and store them on AWS region in Amazon S3. Have the EMR cluster read the dataset using EMRFS.

Highly voted

Discussion of the question

Question 9 🔥

An organization is currently using an Amazon EMR long-running cluster with the latest Amazon EMR release for analytic jobs and is storing data as external tables on Amazon S3.The company needs to launch multiple transient EMR clusters to access the same tables concurrently, but the metadata about the Amazon S3 external tables are defined and stored on the long-running cluster.Which solution will expose the Hive metastore with the LEAST operational effort?

Which database solution meets these requirements?

A. Export Hive metastore information to Amazon DynamoDB hive-site classification to point to the Amazon DynamoDB table.

Highly voted

B. Export Hive metastore information to a MySQL table on Amazon RDS and configure the Amazon EMR hive-site classification to point to the Amazon RDS database.

Highly voted

C. Launch an Amazon EC2 instance, install and configure Apache Derby, and export the Hive metastore information to derby.

Highly voted

D. Create and configure an AWS Glue Data Catalog as a Hive metastore for Amazon EMR.

Highly voted

Discussion of the question

Question 10 🔥

An administrator needs to design a strategy for the schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema.In which two circumstances would choosing EVEN distribution be most appropriate? (Choose two.)

Which database solution meets these requirements?

A. When the tables are highly denormalized and do NOT participate in frequent joins.

Highly voted

B. When data must be grouped based on a specific key on a defined slice.

Highly voted

C. When data transfer between nodes must be eliminated.

Highly voted

D. When a new table has been loaded and it is unclear how it will be joined to dimension.

Highly voted

Discussion of the question

Question 11 🔥

An organization is using Amazon Kinesis Data Streams to collect data generated from thousands of temperature devices and is using AWS Lambda to process the data. Devices generate 10 to 12 million records every day, but Lambda is processing only around 450 thousand records. Amazon CloudWatch indicates that throttling on Lambda is not occurring.What should be done to ensure that all data is processed? (Choose two.)

Which database solution meets these requirements?

A. Increase the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.

Highly voted

B. Decrease the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.

Highly voted

C. Create multiple Lambda functions that will consume the same Amazon Kinesis stream.

Highly voted

D. Increase the number of vCores allocated for the Lambda function.

Highly voted

E. Increase the number of shards on the Amazon Kinesis stream.

Highly voted

Discussion of the question

Question 12 🔥

An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region.Which three steps should the data engineer take to accomplish this task? (Choose three.)

Which database solution meets these requirements?

A. Create a new KMS key in the destination region.

Highly voted

B. Copy the existing KMS key to the destination region.

Highly voted

C. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source region.

Highly voted

D. In the source region, enable cross-region replication and specify the name of the copy grant created.

Highly voted

E. In the destination region, enable cross-region replication and specify the name of the copy grant created.

Highly voted

F. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key created in the destination region. ADF

Highly voted

Discussion of the question

Ready to Pass Your Certification Test

amazon AWS_CERTIFIED_BIG_DATA_SPECIALTY

Exam contains 85 questions

Lorem ipsum dolor sit amet consectetur. Eget sed turpis aenean sit aenean. Integer at nam ullamcorper a.

Company

Product

Resources

Follow us