Summer Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: v4s65

NCP-AIO Exam Dumps - NVIDIA AI Operations

Go to page:
Question # 9

You are deploying an AI workload on a Kubernetes cluster that requires access to GPUs for training deep learning models. However, the pods are not able to detect the GPUs on the nodes.

What would be the first step to troubleshoot this issue?

A.

Verify that the NVIDIA GPU Operator is installed and running on the cluster.

B.

Ensure that all pods are using the latest version of TensorFlow or PyTorch.

C.

Check if the nodes have sufficient memory allocated for AI workloads.

D.

Increase the number of CPU cores allocated to each pod to ensure better resource utilization.

Full Access
Question # 10

You are managing multiple edge AI deployments using NVIDIA Fleet Command. You need to ensure that each AI application running on the same GPU is isolated from others to prevent interference.

Which feature of Fleet Command should you use to achieve this?

A.

Remote Console

B.

Secure NFS support

C.

Multi-Instance GPU (MIG) support

D.

Over-the-air updates

Full Access
Question # 11

A cloud engineer is looking to deploy a digital fingerprinting pipeline using NVIDIA Morpheus and the NVIDIA AI Enterprise Virtual Machine Image (VMI).

Where would the cloud engineer find the VMI?

A.

Github and Dockerhub

B.

Azure, Google, Amazon Marketplaces

C.

NVIDIA NGC

D.

Developer Forums

Full Access
Question # 12

You have noticed that users can access all GPUs on a node even when they request only one GPU in their job script using --gres=gpu:1. This is causing resource contention and inefficient GPU usage.

What configuration change would you make to restrict users’ access to only their allocated GPUs?

A.

Increase the memory allocation per job to limit access to other resources on the node.

B.

Enable cgroup enforcement in cgroup.conf by setting ConstrainDevices=yes.

C.

Set a higher priority for Jobs requesting fewer GPUs, so they finish faster and free up resources sooner.

D.

Modify the job script to include additional resource requests for CPU cores alongside GPUs.

Full Access
Question # 13

What must be done before installing new versions of DOCA drivers on a BlueField DPU?

A.

Uninstall any previous versions of DOCA drivers.

B.

Re-flash the firmware every time.

C.

Disable network interfaces during installation.

D.

Reboot the host system.

Full Access
Question # 14

An administrator needs to submit a script named “my_script.sh” to Slurm and specify a custom output file named “output.txt” for storing the job's standard output and error.

Which ‘sbatch’ option should be used?

A.

=-o output.txt

B.

=-e output.txt

C.

=-output-output output.txt

Full Access
Question # 15

A system administrator notices that jobs are failing intermittently on Base Command Manager due to incorrect GPU configurations in Slurm. The administrator needs to ensure that jobs utilize GPUs correctly.

How should they troubleshoot this issue?

A.

Increase the number of GPUs requested in the job script to avoid using unconfigured GPUs.

B.

Check if MIG (Multi-Instance GPU) mode has been enabled incorrectly and reconfigure Slurm accordingly.

C.

Verify that non-MIG GPUs are automatically configured in Slurm when detected, and adjust configurations if needed.

D.

Ensure that GPU resource limits have been correctly defined in Slurm’s configuration file for each job type.

Full Access
Question # 16

A system administrator wants to run these two commands in Base Command Manager.

main

showprofile device status apc01

What command should the system administrator use from the management node system shell?

A.

cmsh -c “main showprofile; device status apc01”

B.

cmsh -p “main showprofile; device status apc01”

C.

system -c “main showprofile; device status apc01”

D.

cmsh-system -c “main showprofile; device status apc01”

Full Access
Go to page: