About:
RedLegg will occasionally communicate vulnerabilities released outside the usual release schedule to provide additional value to our customers. These emergency bulletins describe vulnerabilities or threats we classify as the highest severity level and warrant out-of-band emergency patching or mitigation action.
VULNERABILITIES
NVIDIA Container Toolkit Remote Code Execution Vulnerability
Identifier: CVE-2024-0132
CVSS Score: 9.0 (Critical)
Update Guide: Visit the NVIDIA Security Bulletin at https://www.nvidia.com/security for more information on patches and remediation. You can also refer to the detailed advisory by NVIDIA at https://nvidia.custhelp.com/app/answers/detail/a_id/5582 for further instructions.
Description: CVE-2024-0132 is a critical vulnerability affecting the NVIDIA Container Toolkit and NVIDIA GPU Operator, impacting cloud and on-premise environments where GPUs are used for AI applications. This vulnerability, a Time-of-Check to Time-of-Use (TOCTOU) flaw, allows attackers to perform a container escape, enabling unauthorized access to the host machine. Such access allows an attacker to execute arbitrary code, escalate privileges, perform denial-of-service (DoS) attacks, disclose sensitive information, and manipulate data.
Affected Versions:
NVIDIA Container Toolkit: All versions up to and including v1.16.1.
NVIDIA GPU Operator: All versions up to and including 24.6.1.
Impact and Exploitation: An attacker can exploit this vulnerability by manipulating container runtime resources, escaping the container environment, and gaining control over the host system. This is particularly risky in shared cloud environments, such as Kubernetes clusters, where an attacker could leverage shared GPUs to access and compromise multiple workloads.
Mitigation and Patching Instructions:
Identify Affected Systems:
Confirm that you are running NVIDIA Container Toolkit v1.16.1 or earlier, or GPU Operator v24.6.1 or earlier.
Systems utilizing GPUs for containerized workloads should be immediately checked.
Download Updated Version:
Go to the NVIDIA Support Advisory at https://nvidia.custhelp.com/app/answers/detail/a_id/5582 to download patches.
Ensure that the NVIDIA Container Toolkit is updated to v1.16.2 or later.
Update the NVIDIA GPU Operator to v24.6.2 or later for Kubernetes or similar environments.
Backup System Configuration:
Before applying patches, backup all critical data, configurations, and container settings to ensure a rollback is possible in case of issues.
Apply the Patches for Container Toolkit and GPU Operator:
For Container Toolkit: Replace the existing binaries with the updated package. Restart any containers or orchestration services (e.g., Kubernetes) to apply the changes.
For GPU Operator: Update the Kubernetes deployment using Helm or other deployment methods to ensure that all nodes are running the patched GPU Operator version.
Restart and Verify Services:
After applying updates, restart all affected services and containers to ensure the changes are active.
Verify the installed version
Restrict container access to GPU resources to only trusted users and systems.
Implement Network Access Controls:
Implement firewalls and network policies to limit access to your containers and GPU resources to prevent potential external exploits.
For comprehensive instructions and updates on the vulnerability and patching process, please refer to the full NVIDIA advisory:
https://nvidia.custhelp.com/app/answers/detail/a_id/5582
Further Details:
National Vulnerability Database: https://nvd.nist.gov/vuln/detail/CVE-2024-0132
NVIDIA Security Bulletin: https://www.nvidia.com/security
By following these steps, you can mitigate CVE-2024-0132 and ensure your systems are protected from unauthorized access and potential exploitation.