Summer Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: v4s65

MLS-C01 Exam Dumps - AWS Certified Machine Learning - Specialty

Go to page:
Question # 9

A Data Scientist is developing a machine learning model to classify whether a financial transaction is fraudulent. The labeled data available for training consists of 100,000 non-fraudulent observations and 1,000 fraudulent observations.

The Data Scientist applies the XGBoost algorithm to the data, resulting in the following confusion matrix when the trained model is applied to a previously unseen validation dataset. The accuracy of the model is 99.1%, but the Data Scientist needs to reduce the number of false negatives.

Which combination of steps should the Data Scientist take to reduce the number of false negative predictions by the model? (Choose two.)

A.

Change the XGBoost eval_metric parameter to optimize based on Root Mean Square Error (RMSE).

B.

Increase the XGBoost scale_pos_weight parameter to adjust the balance of positive and negative weights.

C.

Increase the XGBoost max_depth parameter because the model is currently underfitting the data.

D.

Change the XGBoost eval_metric parameter to optimize based on Area Under the ROC Curve (AUC).

E.

Decrease the XGBoost max_depth parameter because the model is currently overfitting the data.

Full Access
Question # 10

A data scientist wants to use Amazon Forecast to build a forecasting model for inventory demand for a retail company. The company has provided a dataset of historic inventory demand for its products as a .csv file stored in an Amazon S3 bucket. The table below shows a sample of the dataset.

How should the data scientist transform the data?

A.

Use ETL jobs in AWS Glue to separate the dataset into a target time series dataset and an item metadata dataset. Upload both datasets as .csv files to Amazon S3.

B.

Use a Jupyter notebook in Amazon SageMaker to separate the dataset into a related time series dataset and an item metadata dataset. Upload both datasets as tables in Amazon Aurora.

C.

Use AWS Batch jobs to separate the dataset into a target time series dataset, a related time series dataset, and an item metadata dataset. Upload them directly to Forecast from a local machine.

D.

Use a Jupyter notebook in Amazon SageMaker to transform the data into the optimized protobuf recordIO format. Upload the dataset in this format to Amazon S3.

Full Access
Question # 11

A machine learning (ML) engineer is using Amazon SageMaker automatic model tuning (AMT) to optimize a model's hyperparameters. The ML engineer notices that the tuning jobs take a long time to run. The tuning jobs continue even when the jobs are not significantly improving against the objective metric.

The ML engineer needs the training jobs to optimize the hyperparameters more quickly. How should the ML engineer configure the SageMaker AMT data types to meet these requirements?

A.

Set Strategy to the Bayesian value.

B.

Set RetryStrategy to a value of 1.

C.

Set ParameterRanges to the narrow range inferred from previous hyperparameter jobs.

D.

Set TrainingJobEarlyStoppingType to the AUTO value.

Full Access
Question # 12

A Machine Learning Specialist is implementing a full Bayesian network on a dataset that describes public transit in New York City. One of the random variables is discrete, and represents the number of minutes New Yorkers wait for a bus given that the buses cycle every 10 minutes, with a mean of 3 minutes.

Which prior probability distribution should the ML Specialist use for this variable?

A.

Poisson distribution ,

B.

Uniform distribution

C.

Normal distribution

D.

Binomial distribution

Full Access
Question # 13

A trucking company is collecting live image data from its fleet of trucks across the globe. The data is growing rapidly and approximately 100 GB of new data is generated every day. The company wants to explore machine learning uses cases while ensuring the data is only accessible to specific IAM users.

Which storage option provides the most processing flexibility and will allow access control with IAM?

A.

Use a database, such as Amazon DynamoDB, to store the images, and set the IAM policies to restrict access to only the desired IAM users.

B.

Use an Amazon S3-backed data lake to store the raw images, and set up the permissions using bucket policies.

C.

Setup up Amazon EMR with Hadoop Distributed File System (HDFS) to store the files, and restrict access to the EMR instances using IAM policies.

D.

Configure Amazon EFS with IAM policies to make the data available to Amazon EC2 instances owned by the IAM users.

Full Access
Question # 14

A Data Scientist is training a multilayer perception (MLP) on a dataset with multiple classes. The target class of interest is unique compared to the other classes within the dataset, but it does not achieve and acceptable ecall metric. The Data Scientist has already tried varying the number and size of the MLP’s hidden layers,

which has not significantly improved the results. A solution to improve recall must be implemented as quickly as possible.

Which techniques should be used to meet these requirements?

A.

Gather more data using Amazon Mechanical Turk and then retrain

B.

Train an anomaly detection model instead of an MLP

C.

Train an XGBoost model instead of an MLP

D.

Add class weights to the MLP’s loss function and then retrain

Full Access
Question # 15

A medical device company is building a machine learning (ML) model to predict the likelihood of device recall based on customer data that the company collects from a plain text survey. One of the survey questions asks which medications the customer is taking. The data for this field contains the names of medications that customers enter manually. Customers misspell some of the medication names. The column that contains the medication name data gives a categorical feature with high cardinality but redundancy.

What is the MOST effective way to encode this categorical feature into a numeric feature?

A.

Spell check the column. Use Amazon SageMaker one-hot encoding on the column to transform a categorical feature to a numerical feature.

B.

Fix the spelling in the column by using char-RNN. Use Amazon SageMaker Data Wrangler one-hot encoding to transform a categorical feature to a numerical feature.

C.

Use Amazon SageMaker Data Wrangler similarity encoding on the column to create embeddings Of vectors Of real numbers.

D.

Use Amazon SageMaker Data Wrangler ordinal encoding on the column to encode categories into an integer between O and the total number Of categories in the column.

Full Access
Question # 16

A law firm handles thousands of contracts every day. Every contract must be signed. Currently, a lawyer manually checks all contracts for signatures.

The law firm is developing a machine learning (ML) solution to automate signature detection for each contract. The ML solution must also provide a confidence score for each contract page.

Which Amazon Textract API action can the law firm use to generate a confidence score for each page of each contract?

A.

Use the AnalyzeDocument API action. Set the FeatureTypes parameter to SIGNATURES. Return the confidence scores for each page.

B.

Use the Prediction API call on the documents. Return the signatures and confidence scores for each page.

C.

Use the StartDocumentAnalysis API action to detect the signatures. Return the confidence scores for each page.

D.

Use the GetDocumentAnalysis API action to detect the signatures. Return the confidence scores for each page

Full Access
Go to page: