A data scientist is using Spark SQL to import their data into a machine learning pipeline. Once the data is imported, the data scientist performs machine learning tasks using Spark ML.Which of the following compute tools is best suited for this use case?
A machine learning engineer is trying to perform batch model inference. They want to get predictions using the linear regression model saved at the path model_uri for the DataFrame batch_df. batch_df has the following schema: customer_id STRINGThe machine learning engineer runs the following code block to perform inference on batch_df using the linear regression model at model_uri:In which situation will the machine learning engineer’s code block perform the desired inference?
Which of the following evaluation metrics is not suitable to evaluate runs in AutoML experiments for regression problems?
A data scientist wants to use Spark ML to impute missing values in their PySpark DataFrame features_df. They want to replace missing values in all numeric columns in features_df with each respective numeric column’s median value.They have developed the following code block to accomplish this task:The code block is not accomplishing the task.Which reasons describes why the code block is not accomplishing the imputation task?
A data scientist wants to use Spark ML to one-hot encode the categorical features in their PySpark DataFrame features_df. A list of the names of the string columns is assigned to the input_columns variable.They have developed this code block to accomplish this task:The code block is returning an error.Which of the following adjustments does the data scientist need to make to accomplish this task?
A data scientist is wanting to explore the Spark DataFrame spark_df. The data scientist wants visual histograms displaying the distribution of numeric features to be included in the exploration.Which of the following lines of code can the data scientist run to accomplish the task?