A data scientist uses 3-fold cross-validation and the following hyperparameter grid when optimizing model hyperparameters via grid search for a classification problem:Hyperparameter 1: [2, 5, 10]Hyperparameter 2: [50, 100]Which of the following represents the number of machine learning models that can be trained in parallel during this process?
An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.Which of the following explanations justifies this suggestion?
A data scientist has created a linear regression model that uses log(price) as a label variable. Using this model, they have performed inference and the predictions and actual label values are in Spark DataFrame preds_df.They are using the following code block to evaluate the model: regression_evaluator.setMetricName("rmse").evaluate(preds_df)Which of the following changes should the data scientist make to evaluate the RMSE in a way that is comparable with price?
A data scientist is working with a feature set with the following schema:The customer_id column is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?
A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column discount is less than or equal 0.Which of the following code blocks will accomplish this task?
A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.Which change could the data scientist make to improve their model accuracy over the course of their tuning process?