Hi Everyone,
I have one Server with below information:
- OS: Vmware ESXI 6.5
- VM:Ubuntu 20.04.3 LTS
CUDA version: 11.4
Driver version: 470.223.02 - Card GPU: Tesla T4, config Passthrough
About the last 4 months, sometimes the operating system reports an error saying the GPU card cannot be detected → application error, restarting the VM works normally.
But randomly every few weeks it will fail again.
I check and see warnings on VM:
- Warning: Failed to initialize NVML: Driver/library version mismatch on VMs use GPU card
Warning on server hardware management application:
Accelerator in Slot 1 has OS driver missing or not in persistent mode so power sensor is unknown
Please check for me, what error is the server experiencing with the GPU card and how to resolve it.
Many thanks