Create Next App

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Page 5 of 13

Question 25 🔥

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema: prediction DOUBLE actual DOUBLEWhich of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

Which database solution meets these requirements?

Discussion of the question

Question 26 🔥

A machine learning engineer wants to parallelize the training of group-specific models using the Pandas Function API. They have developed the train_model function, and they want to apply it to each group of DataFrame df.They have written the following incomplete code block:Which of the following pieces of code can be used to fill in the above blank to complete the task?

Which database solution meets these requirements?

Discussion of the question

Question 27 🔥

Which of the following statements describes a Spark ML estimator?

Which database solution meets these requirements?

A. An estimator is a hyperparameter grid that can be used to train a model

Highly voted

B. An estimator chains multiple algorithms together to specify an ML workflow

Highly voted

C. An estimator is a trained ML model which turns a DataFrame with features into a DataFrame with predictions

Highly voted

D. An estimator is an algorithm which can be fit on a DataFrame to produce a Transformer

Highly voted

E. An estimator is an evaluation tool to assess to the quality of a model

Highly voted

Discussion of the question

Question 28 🔥

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

Which database solution meets these requirements?

A. import pyspark.pandas as psdf = ps.DataFrame(spark_df)

Highly voted

B. import pyspark.pandas as psdf = ps.to_pandas(spark_df)

Highly voted

D. import pandas as pddf = pd.DataFrame(spark_df)

Highly voted

E. spark_df.to_pandas()

Highly voted

C. spark_df.to_sql()

Highly voted

Discussion of the question

Question 29 🔥

Which of the following tools can be used to parallelize the hyperparameter tuning process for single-node machine learning models using a Spark cluster?

Which database solution meets these requirements?

A. MLflow Experiment Tracking

Highly voted

C. Autoscaling clusters

Highly voted

D. Hyperopt

Highly voted

E. Delta Lake

Highly voted

B. Spark ML

Highly voted

Discussion of the question

Question 30 🔥

Which of the following approaches can be used to view the notebook that was run to create an MLflow run?

Which database solution meets these requirements?

A. Open the MLmodel artifact in the MLflow run page

Highly voted

B. Click the “Models” link in the row corresponding to the run in the MLflow experiment page

Highly voted

C. Click the “Source” link in the row corresponding to the run in the MLflow experiment page

Highly voted

D. Click the “Start Time” link in the row corresponding to the run in the MLflow experiment page

Highly voted

Discussion of the question

Ready to Pass Your Certification Test

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Lorem ipsum dolor sit amet consectetur. Eget sed turpis aenean sit aenean. Integer at nam ullamcorper a.

Company

Product

Resources

Follow us