A company is looking for a Senior Site Reliability Engineer, BCM - DGX Cloud.
Key Responsibilities
Contribute to deployments and daily operations of large-scale next-generation GPU platforms
Handle incidents in GPU clusters, bridging the gap between cluster operations and development
Validate complex cluster configurations for performance, scalability, and resilience
Required Qualifications
Bachelor's Degree or equivalent experience in Computer Science or related field
8+ years of experience in site reliability engineering and / or software development roles
Fluency in Python
In-depth knowledge of Linux and networking
Senior Site Reliability Engineer • San Bernardino, California, United States