Job Title
Responsibilities :
- Set technical strategy and oversee development of high scale, reliable infrastructure systems
- Collaborate across teams across companies to deeply understand infrastructure, operations and capacity needs, identifying potential solutions to support frontier LLM serving
- Create clarity for the team and stakeholders in an ambiguous and evolving environment
- Take an inclusive approach to hiring and coaching top technical talent, and support a high performing team
- Design and run processes (e.g. postmortem review, incident response, on-call rotations) that help the team operate effectively and never fail the same way twice
You may be a good fit if you :
Have 10+ years of experience in high-scale, high-reliability software development, particularly infrastructure or capacity managementHave 3+ years of engineering management experienceHave experience recruiting, scaling, and retaining engineering talent in a high growth environmentHave experience scaling resources and operations to accommodate rapid growthAre deeply interested in the potential transformative effects of advanced AI systems and are committed to ensuring their safe developmentExcel at building strong relationships with stakeholders at all levels and across companiesEnjoy working in a fast-paced, early environment; comfortable with adapting priorities as driven by the rapidly evolving AI spaceHave excellent written and verbal communication skills and comfort with a high degree of collaboration with both internal and external engineers and product managersDemonstrated success building a culture of belonging and engineering excellenceAre motivated by developing AI responsibly and safelyStrong candidates may also have experience with :
Machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCLDeployment and capacity management automationSecurity and privacy best practice expertiseThe expected base compensation for this position is below. Our total compensation package for full-time employees includes equity, benefits, and may include incentive compensation. Annual Salary : $320,000 - $485,000 USD