Create Next App

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Page 10 of 13

Question 55 🔥

A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation when k > 2.Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?

Which database solution meets these requirements?

A. A holdout set is not necessary when using a train-validation split

Highly voted

B. Reproducibility is achievable when using a train-validation split

Highly voted

C. Fewer hyperparameter values need to be tested when using a train-validation split

Highly voted

D. Bias is avoidable when using a train-validation split

Highly voted

E. Fewer models need to be trained when using a train-validation split

Highly voted

Discussion of the question

Question 56 🔥

Which of the following hyperparameter optimization methods automatically makes informed selections of hyperparameter values based on previous trials for each iterative model evaluation?

Which database solution meets these requirements?

A. Random Search

Highly voted

B. Halving Random Search

Highly voted

C. Tree of Parzen Estimators

Highly voted

D. Grid Search

Highly voted

Discussion of the question

Question 57 🔥

A team is developing guidelines on when to use various evaluation metrics for classification problems. The team needs to provide input on when to use the F1 score over accuracy.Which of the following suggestions should the team include in their guidelines?

Which database solution meets these requirements?

A. The F1 score should be utilized over accuracy when the number of actual positive cases is identical to the number of actual negative cases.

Highly voted

B. The F1 score should be utilized over accuracy when there are greater than two classes in the target variable.

Highly voted

C. The F1 score should be utilized over accuracy when there is significant imbalance between positive and negative classes and avoiding false negatives is a priority.

Highly voted

D. The F1 score should be utilized over accuracy when identifying true positives and true negatives are equally important to the business problem.

Highly voted

Discussion of the question

Question 58 🔥

A data scientist has developed a random forest regressor rfr and included it as the final stage in a Spark MLPipeline pipeline. They then set up a cross-validation process with pipeline as the estimator in the following code block:Which of the following is a negative consequence of including pipeline as the estimator in the cross-validation process rather than rfr as the estimator?

Which database solution meets these requirements?

A. The process will have a longer runtime because all stages of pipeline need to be refit or retransformed with each model

Highly voted

B. The process will leak data from the training set to the test set during the evaluation phase

Highly voted

C. The process will be unable to parallelize tuning due to the distributed nature of pipeline

Highly voted

D. The process will leak data prep information from the validation sets to the training sets for each model

Highly voted

Discussion of the question

Question 59 🔥

A data scientist has written a feature engineering notebook that utilizes the pandas library. As the size of the data processed by the notebook increases, the notebook's runtime is drastically increasing, but it is processing slowly as the size of the data included in the process increases.Which of the following tools can the data scientist use to spend the least amount of time refactoring their notebook to scale with big data?

Which database solution meets these requirements?

A. PySpark DataFrame API

Highly voted

B. pandas API on Spark

Highly voted

C. Spark SQL

Highly voted

D. Feature Store

Highly voted

Discussion of the question

Question 60 🔥

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema: prediction DOUBLE actual DOUBLEWhich of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

Which database solution meets these requirements?

Highly voted

Discussion of the question

Ready to Pass Your Certification Test

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Lorem ipsum dolor sit amet consectetur. Eget sed turpis aenean sit aenean. Integer at nam ullamcorper a.

Company

Product

Resources

Follow us