Create Next App

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Page 3 of 13

Question 13 🔥

A data scientist is attempting to tune a logistic regression model logistic using scikit-learn. They want to specify a search space for two hyperparameters and let the tuning process randomly select values for each evaluation.They attempt to run the following code block, but it does not accomplish the desired task:Which of the following changes can the data scientist make to accomplish the task?

Which database solution meets these requirements?

A. Replace the GridSearchCV operation with RandomizedSearchCV

Highly voted

B. Replace the GridSearchCV operation with cross_validate

Highly voted

C. Replace the GridSearchCV operation with ParameterGrid

Highly voted

D. Replace the random_state=0 argument with random_state=1

Highly voted

E. Replace the penalty= ['12', '11'] argument with penalty=uniform ('12', '11')

Highly voted

Discussion of the question

Question 14 🔥

Which of the following tools can be used to parallelize the hyperparameter tuning process for single-node machine learning models using a Spark cluster?

Which database solution meets these requirements?

A. MLflow Experiment Tracking

Highly voted

C. Autoscaling clusters

Highly voted

D. Hyperopt

Highly voted

E. Delta Lake

Highly voted

B. Spark ML

Highly voted

Discussion of the question

Question 15 🔥

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

Which database solution meets these requirements?

A. pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata

Highly voted

B. pandas API on Spark DataFrames are more performant than Spark DataFrames

Highly voted

C. pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata

Highly voted

D. pandas API on Spark DataFrames are less mutable versions of Spark DataFrames

Highly voted

E. pandas API on Spark DataFrames are unrelated to Spark DataFrames

Highly voted

Discussion of the question

Question 16 🔥

A data scientist has written a data cleaning notebook that utilizes the pandas library, but their colleague has suggested that they refactor their notebook to scale with big data.Which of the following approaches can the data scientist take to spend the least amount of time refactoring their notebook to scale with big data?

Which database solution meets these requirements?

A. They can refactor their notebook to process the data in parallel.

Highly voted

B. They can refactor their notebook to use the PySpark DataFrame API.

Highly voted

C. They can refactor their notebook to use the Scala Dataset API.

Highly voted

D. They can refactor their notebook to use Spark SQL.

Highly voted

E. They can refactor their notebook to utilize the pandas API on Spark.

Highly voted

Discussion of the question

Question 17 🔥

A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model:They have written the following incomplete code block to use predict to score each record of Spark DataFrame spark_df:Which of the following lines of code can be used to complete the code block to successfully complete the task?

Which database solution meets these requirements?

A. predict(*spark_df.columns)

Highly voted

B. mapInPandas(predict)

Highly voted

C. predict(Iterator(spark_df))

Highly voted

D. mapInPandas(predict(spark_df.columns))

Highly voted

E. predict(spark_df.columns)

Highly voted

Discussion of the question

Question 18 🔥

A data scientist is using Spark ML to engineer features for an exploratory machine learning project.They decide they want to standardize their features using the following code block:Upon code review, a colleague expressed concern with the features being standardized prior to splitting the data into a training set and a test set.Which of the following changes can the data scientist make to address the concern?

Which database solution meets these requirements?

A. Utilize the MinMaxScaler object to standardize the training data according to global minimum and maximum values

Highly voted

B. Utilize the MinMaxScaler object to standardize the test data according to global minimum and maximum values

Highly voted

C. Utilize a cross-validation process rather than a train-test split process to remove the need for standardizing data

Highly voted

D. Utilize the Pipeline API to standardize the training data according to the test data's summary statistics

Highly voted

E. Utilize the Pipeline API to standardize the test data according to the training data's summary statistics

Highly voted

Discussion of the question

Ready to Pass Your Certification Test

databricks CERTIFIED_MACHINE_LEARNING_ASSOCIATE

Exam contains 73 questions

Lorem ipsum dolor sit amet consectetur. Eget sed turpis aenean sit aenean. Integer at nam ullamcorper a.

Company

Product

Resources

Follow us