A data scientist is attempting to tune a logistic regression model logistic using scikit-learn. They want to specify a search space for two hyperparameters and let the tuning process randomly select values for each evaluation.They attempt to run the following code block, but it does not accomplish the desired task:Which of the following changes can the data scientist make to accomplish the task?
Which of the following tools can be used to parallelize the hyperparameter tuning process for single-node machine learning models using a Spark cluster?
Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?
A data scientist has written a data cleaning notebook that utilizes the pandas library, but their colleague has suggested that they refactor their notebook to scale with big data.Which of the following approaches can the data scientist take to spend the least amount of time refactoring their notebook to scale with big data?
A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model:They have written the following incomplete code block to use predict to score each record of Spark DataFrame spark_df:Which of the following lines of code can be used to complete the code block to successfully complete the task?
A data scientist is using Spark ML to engineer features for an exploratory machine learning project.They decide they want to standardize their features using the following code block:Upon code review, a colleague expressed concern with the features being standardized prior to splitting the data into a training set and a test set.Which of the following changes can the data scientist make to address the concern?