You are given 10, 000, 000 user profile pages of an online dating site in XML files, and they are stored in HDFS. You are assigned to divide the users into groups based on the content of their profiles. You have been instructed to try K-means clustering on this data. How should you proceed?
The Marketing department of your company wishes to track opinion on a new product that was recently introduced. Marketing would like to know how many positive and negative reviews are appearing over a given period and potentially retrieve each review for more in-depth insight.They have identified several popular product review blogs that historically have published thousands of user reviews of your companys products.You have been asked to provide the desired analysis. You examine the RSS feeds for each blog and determine which fields are relevant. You then craft a regular expression to match your new products name and extract the relevant text from each matching review.What is the next step you should take?
Which word or phrase completes the statement? A Data Scientist would consider that a RDBMS is to a Table as R is to a ______________ .
Which word or phrase completes the statement? Unix is to bash as Hadoop is to:
A call center for a large electronics company handles an average of 35, 000 support calls a day. The head of the call center would like to optimize the staffing of the call center during the rollout of a new product due to recent customer complaints of long wait times. You have been asked to create a model to optimize call center costs and customer wait times.The goals for this project include:1. Relative to the release of a product, how does the call volume change over time?2. How to best optimize staffing based on the call volume for the newly released product, relative to old products.3. Historically, what time of day does the call center need to be most heavily staffed?4. Determine the frequency of calls by both product type and customer language.Which goals are suitable to be completed with MapReduce?
You are using MADlib for Linear Regression analysis. Which value does the statement return?SELECT (linregr(depvar, indepvar)).r2 FROM zeta1;