A large grocery distributor receives daily depletion reports from the field in the form of gzip archives od CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files are processed daily by an EMR job.Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The distributor needs to tune and optimize the data processing workflow with this limited information to improve the performance of theEMR job.Which recommendation should an administrator provide?
A web-hosting company is building a web analytics tool to capture clickstream data from all of the websites hosted within its platform and to provide near-real-time business intelligence. This entire system is built onAWS services. The web-hosting company is interested in using Amazon Kinesis to collect this data and perform sliding window analytics.What is the most reliable and fault-tolerant technique to get each website to send data to Amazon Kinesis with every click?
An organization would like to run analytics on their Elastic Load Balancing logs stored in Amazon S3 and join this data with other tables in Amazon S3. The users are currently using a BI tool connecting with JDBC and would like to keep using this BI tool.Which solution would result in the LEAST operational overhead?
A customer has an Amazon S3 bucket. Objects are uploaded simultaneously by a cluster of servers from multiple streams of data. The customer maintains a catalog of objects uploaded in Amazon S3 using anAmazon DynamoDB table. This catalog has the following fileds: StreamName, TimeStamp, and ServerName, from which ObjectName can be obtained.The customer needs to define the catalog to support querying for a given stream or server within a defined time range.Which DynamoDB table scheme is most efficient to support these queries?
An organization has added a clickstream to their website to analyze traffic. The website is sending each page request with the PutRecord API call to an AmazonKinesis stream by using the page name as the partition key. During peak spikes in website traffic, a support engineer notices many events in the application logs.ProvisionedThroughputExcededExceptionWhat should be done to resolve the issue in the MOST cost-effective way?
An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region.Which three steps should the data engineer take to accomplish this task? (Choose three.)