A company is looking for a Systems Software Engineer, AI Infrastructure.
Key Responsibilities
Develop and maintain large-scale systems for AI Infrastructure, ensuring reliability and scalability
Implement SRE fundamentals, design automation tools, and optimize performance
Build observability tools and frameworks, and lead incident response protocols to enhance system resilience
Required Qualifications
Degree in Computer Science or related field, or equivalent experience with 8+ years in Software Development, SRE, or Production Engineering
Proficiency in Python and at least one other programming language (C / C++, Go, Perl, Ruby)
Expertise in systems engineering within Linux or Windows environments and cloud platforms (AWS, OCI, Azure, GCP)
Strong understanding of SRE principles and Infrastructure as Code tools (e.g., Terraform CDK)
Hands-on experience with observability platforms and CI / CD systems (e.g., GitLab)
Senior Site Reliability Engineer • Jamaica, New York, United States