Create Next App

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Page 2 of 13

Question 7 🔥

In which of the following situations is it preferable to impute missing feature values with their median value over the mean value?

Which database solution meets these requirements?

A. When the features are of the categorical type

Highly voted

B. When the features are of the boolean type

Highly voted

C. When the features contain a lot of extreme outliers

Highly voted

D. When the features contain no outliers

Highly voted

E. When the features contain no missing values

Highly voted

Discussion of the question

Question 8 🔥

A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column price is greater than 0.Which of the following code blocks will accomplish this task?

Which database solution meets these requirements?

A. spark_df[spark_df["price"] > 0]

Highly voted

B. spark_df.filter(col("price") > 0)

Highly voted

C. SELECT * FROM spark_df WHERE price > 0

Highly voted

D. spark_df.loc[spark_df["price"] > 0,:]

Highly voted

E. spark_df.loc[:,spark_df["price"] > 0]

Highly voted

Discussion of the question

Question 9 🔥

A machine learning engineer is trying to scale a machine learning pipeline pipeline that contains multiple feature engineering stages and a modeling stage. As part of the cross-validation process, they are using the following code block:A colleague suggests that the code block can be changed to speed up the tuning process by passing the model object to the estimator parameter and then placing the updated cv object as the final stage of the pipeline in place of the original model.Which of the following is a negative consequence of the approach suggested by the colleague?

Which database solution meets these requirements?

A. The model will take longer to train for each unique combination of hyperparameter values

Highly voted

B. The feature engineering stages will be computed using validation data

Highly voted

C. The cross-validation process will no longer be parallelizable

Highly voted

D. The cross-validation process will no longer be reproducible

Highly voted

E. The model will be refit one more per cross-validation fold

Highly voted

Discussion of the question

Question 10 🔥

What is the name of the method that transforms categorical features into a series of binary indicator feature variables?

Which database solution meets these requirements?

A. Leave-one-out encoding

Highly voted

C. One-hot encoding

Highly voted

D. Categorical embeddings

Highly voted

E. String indexing

Highly voted

B. Target encoding

Highly voted

Discussion of the question

Question 11 🔥

A data scientist wants to parallelize the training of trees in a gradient boosted tree to speed up the training process. A colleague suggests that parallelizing a boosted tree algorithm can be difficult.Which of the following describes why?

Which database solution meets these requirements?

A. Gradient boosting is not a linear algebra-based algorithm which is required for parallelization.

Highly voted

B. Gradient boosting requires access to all data at once which cannot happen during parallelization.

Highly voted

C. Gradient boosting calculates gradients in evaluation metrics using all cores which prevents parallelization.

Highly voted

D. Gradient boosting is an iterative algorithm that requires information from the previous iteration to perform the next step.

Highly voted

E. Gradient boosting uses decision trees in each iteration which cannot be parallelized.

Highly voted

Discussion of the question

Question 12 🔥

A data scientist is attempting to tune a logistic regression model logistic using scikit-learn. They want to specify a search space for two hyperparameters and let the tuning process randomly select values for each evaluation.They attempt to run the following code block, but it does not accomplish the desired task:Which of the following changes can the data scientist make to accomplish the task?

Which database solution meets these requirements?

A. Replace the GridSearchCV operation with RandomizedSearchCV

Highly voted

B. Replace the GridSearchCV operation with cross_validate

Highly voted

C. Replace the GridSearchCV operation with ParameterGrid

Highly voted

D. Replace the random_state=0 argument with random_state=1

Highly voted

E. Replace the penalty= ['12', '11'] argument with penalty=uniform ('12', '11')

Highly voted

Discussion of the question

Ready to Pass Your Certification Test

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Lorem ipsum dolor sit amet consectetur. Eget sed turpis aenean sit aenean. Integer at nam ullamcorper a.

Company

Product

Resources

Follow us