Create Next App

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Page 9 of 13

Question 49 🔥

A data scientist uses 3-fold cross-validation and the following hyperparameter grid when optimizing model hyperparameters via grid search for a classification problem:Hyperparameter 1: [2, 5, 10]Hyperparameter 2: [50, 100]Which of the following represents the number of machine learning models that can be trained in parallel during this process?

Which database solution meets these requirements?

A. 3

Highly voted

B. 5

Highly voted

C. 6

Highly voted

D. 18

Highly voted

Discussion of the question

Question 50 🔥

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.Which of the following explanations justifies this suggestion?

Which database solution meets these requirements?

A. One-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms.

Highly voted

B. One-hot encoding is dependent on the target variable’s values which differ for each application.

Highly voted

C. One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.

Highly voted

D. One-hot encoding is not a common strategy for representing categorical feature variables numerically.

Highly voted

Discussion of the question

Question 51 🔥

A data scientist has created a linear regression model that uses log(price) as a label variable. Using this model, they have performed inference and the predictions and actual label values are in Spark DataFrame preds_df.They are using the following code block to evaluate the model: regression_evaluator.setMetricName("rmse").evaluate(preds_df)Which of the following changes should the data scientist make to evaluate the RMSE in a way that is comparable with price?

Which database solution meets these requirements?

A. They should exponentiate the computed RMSE value

Highly voted

B. They should take the log of the predictions before computing the RMSE

Highly voted

C. They should evaluate the MSE of the log predictions to compute the RMSE

Highly voted

D. They should exponentiate the predictions before computing the RMSE

Highly voted

Discussion of the question

Question 52 🔥

A data scientist is working with a feature set with the following schema:The customer_id column is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?

Which database solution meets these requirements?

Discussion of the question

Question 53 🔥

A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column discount is less than or equal 0.Which of the following code blocks will accomplish this task?

Which database solution meets these requirements?

A. spark_df.loc[:,spark_df["discount"] <= 0]

Highly voted

B. spark_df[spark_df["discount"] <= 0]

Highly voted

C. spark_df.filter (col("discount") <= 0)

Highly voted

D. spark_df.loc(spark_df["discount"] <= 0, :]

Highly voted

Discussion of the question

Question 54 🔥

A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.Which change could the data scientist make to improve their model accuracy over the course of their tuning process?

Which database solution meets these requirements?

A. Change the number of compute nodes to be half or less than half of the number of evaluations.

Highly voted

B. Change the number of compute nodes and the number of evaluations to be much larger but equal.

Highly voted

C. Change the iterative optimization algorithm used to facilitate the tuning process.

Highly voted

D. Change the number of compute nodes to be double or more than double the number of evaluations.

Highly voted

Discussion of the question

Ready to Pass Your Certification Test

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Lorem ipsum dolor sit amet consectetur. Eget sed turpis aenean sit aenean. Integer at nam ullamcorper a.

Company

Product

Resources

Follow us