Create Next App

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Page 6 of 13

Question 31 🔥

The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?

Which database solution meets these requirements?

A. Logistic regression

Highly voted

B. Spark ML cannot distribute linear regression training

Highly voted

C. Iterative optimization

Highly voted

D. Least-squares method

Highly voted

E. Singular value decomposition

Highly voted

Discussion of the question

Question 32 🔥

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

Which database solution meets these requirements?

A. pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata

Highly voted

B. pandas API on Spark DataFrames are more performant than Spark DataFrames

Highly voted

C. pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata

Highly voted

D. pandas API on Spark DataFrames are less mutable versions of Spark DataFrames

Highly voted

E. pandas API on Spark DataFrames are unrelated to Spark DataFrames

Highly voted

Discussion of the question

Question 33 🔥

A data scientist is using MLflow to track their machine learning experiment. As a part of each of their MLflow runs, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values. All parent and child runs are being manually started with mlflow.start_run.Which of the following approaches can the data scientist use to accomplish this MLflow run organization?

Which database solution meets these requirements?

A. They can turn on Databricks Autologging

Highly voted

B. They can specify nested=True when starting the child run for each unique combination of hyperparameter values

Highly voted

C. They can start each child run inside the parent run's indented code block using mlflow.start_run()

Highly voted

D. They can start each child run with the same experiment ID as the parent run

Highly voted

E. They can specify nested=True when starting the parent run for the tuning process

Highly voted

Discussion of the question

Question 34 🔥

Which of the following approaches can be used to view the notebook that was run to create an MLflow run?

Which database solution meets these requirements?

A. Open the MLmodel artifact in the MLflow run page

Highly voted

B. Click the “Models” link in the row corresponding to the run in the MLflow experiment page

Highly voted

C. Click the “Source” link in the row corresponding to the run in the MLflow experiment page

Highly voted

D. Click the “Start Time” link in the row corresponding to the run in the MLflow experiment page

Highly voted

Discussion of the question

Question 35 🔥

A data scientist is developing a machine learning pipeline using AutoML on Databricks Machine Learning.Which of the following steps will the data scientist need to perform outside of their AutoML experiment?

Which database solution meets these requirements?

A. Model tuning

Highly voted

B. Model evaluation

Highly voted

C. Model deployment

Highly voted

D. Exploratory data analysis

Highly voted

Discussion of the question

Question 36 🔥

A machine learning engineering team has a Job with three successive tasks. Each task runs a single notebook. The team has been alerted that the Job has failed in its latest run.Which of the following approaches can the team use to identify which task is the cause of the failure?

Which database solution meets these requirements?

A. Run each notebook interactively

Highly voted

B. Review the matrix view in the Job’s runs

Highly voted

C. Migrate the Job to a Delta Live Tables pipeline

Highly voted

D. Change each Task’s setting to use a dedicated cluster

Highly voted

Discussion of the question

Ready to Pass Your Certification Test

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Lorem ipsum dolor sit amet consectetur. Eget sed turpis aenean sit aenean. Integer at nam ullamcorper a.

Company

Product

Resources

Follow us