AWS_CERTIFIED_DATA_ANALYTICS_SPECIALTY questions • Exam prepare

amazon AWS_CERTIFIED_DATA_ANALYTICS_SPECIALTY

Exam contains 164 questions

Page 11 of 28

Question 61 🔥

A company launched a service that produces millions of messages every day and uses Amazon Kinesis Data Streams as the streaming service.The company uses the Kinesis SDK to write data to Kinesis Data Streams. A few months after launch, a data analyst found that write performance is significantly reduced. The data analyst investigated the metrics and determined that Kinesis is throttling the write requests. The data analyst wants to address this issue without significant changes to the architecture.Which actions should the data analyst take to resolve this issue? (Choose two.)

Which database solution meets these requirements?

A. Increase the Kinesis Data Streams retention period to reduce throttling.

B. Replace the Kinesis API-based data ingestion mechanism with Kinesis Agent.

C. Increase the number of shards in the stream using the UpdateShardCount API.

Highly voted

D. Choose partition keys in a way that results in a uniform record distribution across shards.

Highly voted

E. Customize the application code to include retry logic to improve performance.

Discussion of the question

Question 62 🔥

A smart home automation company must efficiently ingest and process messages from various connected devices and sensors. The majority of these messages are comprised of a large number of small files. These messages are ingested using Amazon Kinesis Data Streams and sent to Amazon S3 using a Kinesis data stream consumer application. The Amazon S3 message data is then passed through a processing pipeline built on Amazon EMR running scheduled PySpark jobs.The data platform team manages data processing and is concerned about the efficiency and cost of downstream data processing. They want to continue to usePySpark.Which solution improves the efficiency of the data processing jobs and is well architected?

Which database solution meets these requirements?

A. Send the sensor and devices data directly to a Kinesis Data Firehose delivery stream to send the data to Amazon S3 with Apache Parquet record format conversion enabled. Use Amazon EMR running PySpark to process the data in Amazon S3.

B. Set up an AWS Lambda function with a Python runtime environment. Process individual Kinesis data stream messages from the connected devices and sensors using Lambda.

C. Launch an Amazon Redshift cluster. Copy the collected data from Amazon S3 to Amazon Redshift and move the data processing jobs from Amazon EMR to Amazon Redshift.

D. Set up AWS Glue Python jobs to merge the small data files in Amazon S3 into larger files and transform them to Apache Parquet format. Migrate the downstream PySpark jobs from Amazon EMR to AWS Glue.

Highly voted

Discussion of the question

Question 63 🔥

A large financial company is running its ETL process. Part of this process is to move data from Amazon S3 into an Amazon Redshift cluster. The company wants to use the most cost-efficient method to load the dataset into Amazon Redshift.Which combination of steps would meet these requirements? (Choose two.)

Which database solution meets these requirements?

A. Use the COPY command with the manifest file to load data into Amazon Redshift.

Highly voted

B. Use S3DistCp to load files into Amazon Redshift.

C. Use temporary staging tables during the loading process.

Highly voted

D. Use the UNLOAD command to upload data into Amazon Redshift.

E. Use Amazon Redshift Spectrum to query files from Amazon S3.

Discussion of the question

Question 64 🔥

A university intends to use Amazon Kinesis Data Firehose to collect JSON-formatted batches of water quality readings in Amazon S3. The readings are from 50 sensors scattered across a local lake. Students will query the stored data using Amazon Athena to observe changes in a captured metric over time, such as water temperature or acidity. Interest has grown in the study, prompting the university to reconsider how data will be stored.Which data format and partitioning choices will MOST significantly reduce costs? (Choose two.)

Which database solution meets these requirements?

A. Store the data in Apache Avro format using Snappy compression.

B. Partition the data by year, month, and day.

Highly voted

C. Store the data in Apache ORC format using no compression.

D. Store the data in Apache Parquet format using Snappy compression.

Highly voted

E. Partition the data by sensor, year, month, and day.

Discussion of the question

Question 65 🔥

A healthcare company uses AWS data and analytics tools to collect, ingest, and store electronic health record (EHR) data about its patients. The raw EHR data is stored in Amazon S3 in JSON format partitioned by hour, day, and year and is updated every hour. The company wants to maintain the data catalog and metadata in an AWS Glue Data Catalog to be able to access the data using Amazon Athena or Amazon Redshift Spectrum for analytics.When defining tables in the Data Catalog, the company has the following requirements:✑ Choose the catalog table name and do not rely on the catalog table naming algorithm.✑ Keep the table updated with new partitions loaded in the respective S3 bucket prefixes.Which solution meets these requirements with minimal effort?

Which database solution meets these requirements?

A. Run an AWS Glue crawler that connects to one or more data stores, determines the data structures, and writes tables in the Data Catalog.

B. Use the AWS Glue console to manually create a table in the Data Catalog and schedule an AWS Lambda function to update the table partitions hourly.

C. Use the AWS Glue API CreateTable operation to create a table in the Data Catalog. Create an AWS Glue crawler and specify the table as the source.

Highly voted

D. Create an Apache Hive catalog in Amazon EMR with the table schema definition in Amazon S3, and update the table partition with a scheduled job. Migrate the Hive catalog to the Data Catalog.

Discussion of the question

Question 66 🔥

A financial services company needs to aggregate daily stock trade data from the exchanges into a data store. The company requires that data be streamed directly into the data store, but also occasionally allows data to be modified using SQL. The solution should integrate complex, analytic queries running with minimal latency. The solution must provide a business intelligence dashboard that enables viewing of the top contributors to anomalies in stock prices.Which solution meets the company's requirements?

Which database solution meets these requirements?

A. Use Amazon Kinesis Data Firehose to stream data to Amazon S3. Use Amazon Athena as a data source for Amazon QuickSight to create a business intelligence dashboard.

B. Use Amazon Kinesis Data Streams to stream data to Amazon Redshift. Use Amazon Redshift as a data source for Amazon QuickSight to create a business intelligence dashboard.

C. Use Amazon Kinesis Data Firehose to stream data to Amazon Redshift. Use Amazon Redshift as a data source for Amazon QuickSight to create a business intelligence dashboard.

Highly voted

D. Use Amazon Kinesis Data Streams to stream data to Amazon S3. Use Amazon Athena as a data source for Amazon QuickSight to create a business intelligence dashboard.

Discussion of the question

Ready to Pass Your Certification Test

amazon AWS_CERTIFIED_DATA_ANALYTICS_SPECIALTY

Exam contains 164 questions

Lorem ipsum dolor sit amet consectetur. Eget sed turpis aenean sit aenean. Integer at nam ullamcorper a.

Company

Product

Resources

Follow us