Create Next App

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Page 8 of 13

Question 43 🔥

A data scientist is using Spark SQL to import their data into a machine learning pipeline. Once the data is imported, the data scientist performs machine learning tasks using Spark ML.Which of the following compute tools is best suited for this use case?

Which database solution meets these requirements?

A. Single Node cluster

Highly voted

B. Standard cluster

Highly voted

C. SQL Warehouse

Highly voted

D. None of these compute tools support this task

Highly voted

Discussion of the question

Question 44 🔥

A machine learning engineer is trying to perform batch model inference. They want to get predictions using the linear regression model saved at the path model_uri for the DataFrame batch_df. batch_df has the following schema: customer_id STRINGThe machine learning engineer runs the following code block to perform inference on batch_df using the linear regression model at model_uri:In which situation will the machine learning engineer’s code block perform the desired inference?

Which database solution meets these requirements?

A. When the Feature Store feature set was logged with the model at model_uri

Highly voted

B. When all of the features used by the model at model_uri are in a Spark DataFrame in the PySpark

Highly voted

C. When the model at model_uri only uses customer_id as a feature

Highly voted

D. This code block will not perform the desired inference in any situation.

Highly voted

E. When all of the features used by the model at model_uri are in a single Feature Store table

Highly voted

Discussion of the question

Question 45 🔥

Which of the following evaluation metrics is not suitable to evaluate runs in AutoML experiments for regression problems?

Which database solution meets these requirements?

A. F1

Highly voted

B. R-squared

Highly voted

C. MAE

Highly voted

D. MSE

Highly voted

Discussion of the question

Question 46 🔥

A data scientist wants to use Spark ML to impute missing values in their PySpark DataFrame features_df. They want to replace missing values in all numeric columns in features_df with each respective numeric column’s median value.They have developed the following code block to accomplish this task:The code block is not accomplishing the task.Which reasons describes why the code block is not accomplishing the imputation task?

Which database solution meets these requirements?

A. It does not impute both the training and test data sets.

Highly voted

B. The inputCols and outputCols need to be exactly the same.

Highly voted

C. The fit method needs to be called instead of transform.

Highly voted

D. It does not fit the imputer on the data to create an ImputerModel.

Highly voted

Discussion of the question

Question 47 🔥

A data scientist wants to use Spark ML to one-hot encode the categorical features in their PySpark DataFrame features_df. A list of the names of the string columns is assigned to the input_columns variable.They have developed this code block to accomplish this task:The code block is returning an error.Which of the following adjustments does the data scientist need to make to accomplish this task?

Which database solution meets these requirements?

A. They need to specify the method parameter to the OneHotEncoder.

Highly voted

B. They need to remove the line with the fit operation.

Highly voted

C. They need to use StringIndexer prior to one-hot encoding the features.

Highly voted

D. They need to use VectorAssembler prior to one-hot encoding the features.

Highly voted

Discussion of the question

Question 48 🔥

A data scientist is wanting to explore the Spark DataFrame spark_df. The data scientist wants visual histograms displaying the distribution of numeric features to be included in the exploration.Which of the following lines of code can the data scientist run to accomplish the task?

Which database solution meets these requirements?

A. spark_df.describe()

Highly voted

B. dbutils.data(spark_df).summarize()

Highly voted

C. This task cannot be accomplished in a single line of code.

Highly voted

D. spark_df.summary()

Highly voted

E. dbutils.data.summarize (spark_df)

Highly voted

Discussion of the question

Ready to Pass Your Certification Test

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Lorem ipsum dolor sit amet consectetur. Eget sed turpis aenean sit aenean. Integer at nam ullamcorper a.

Company

Product

Resources

Follow us