The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in their massively parallel database. Which tool should they use to export the structured data from Hadoop?
When would you prefer a Naive Bayes model to a logistic regression model for classification?
Before you build an ARMA model, how can you tell if your time series is weakly stationary?
What is an example of a null hypothesis?
You have fit a decision tree classifier using 12 input variables. The resulting tree used 7 of the 12 variables, and is 5 levels deep. Some of the nodes contain only 3 data points. The AUC of the model is 0.85. What is your evaluation of this model?
You are using MADlib for Linear Regression analysis. Which value does the statement return?SELECT (linregr(depvar, indepvar)).r2 FROM zeta1;