DevOps Engineer
U.S. GenAI startup, Cambridge Office
Full-Time Employment with We . We are committed to building a transformative AI platform that revolutionizes software development. Our goal is to enable you to have a long, impactful career with us, with opportunity for advancement. If you want a role where you can shape the future of AI-powered infrastructure, read on!
About Us
We are a Boston, MA based Generative AI Start-up on a mission to automate custom software creation to unlock the next industrial revolution. We're building an AI-powered platform capable of autonomously generating enterprise-grade software, powered by thousands of cooperative AI agents working in concert.
We're backed by multiple tier 1 investors, have success as founders at our previous start-up, and hold dozens of Generative AI patents.
Location : 1 Kendall Square, Cambridge, MA (In-person role)
About the Role
We're looking for an exceptional DevOps Engineer to architect and maintain the infrastructure that powers our revolutionary AI agent ecosystem. You'll be instrumental in building scalable, resilient systems that support both our cutting-edge AI platform and modern applications. This role offers the unique opportunity to work at the intersection of traditional DevOps and emerging AI infrastructure, creating systems that enable thousands of AI agents to collaborate seamlessly.
As our DevOps Engineer, you'll take ownership of our entire infrastructure stack-from Kubernetes orchestration to AI agent deployment pipelines. You'll work directly with our engineering teams to ensure our platform can scale to support enterprise customers while maintaining the performance and reliability they demand.
What Success Looks Like
Architect and implement robust Kubernetes infrastructure that scales effortlessly to support our growing AI agent ecosystem
Create sophisticated CI / CD pipelines that enable rapid, reliable deployment of both traditional services and AI agents
Develop Python-based automation to eliminate manual tasks and accelerate development velocity
Design monitoring and observability systems for deep insights into both infrastructure and AI agent performance
Optimize cloud infrastructure for cost-efficiency while maintaining enterprise-grade reliability
Collaborate effectively with development teams to improve developer experience and productivity
Proactively identify and resolve infrastructure bottlenecks before they impact customers
Establish infrastructure best practices to support rapid growth
Build systems that handle the unique challenges of AI workloads at scale
Maintain 99.9%+ uptime for critical production services
Areas of Ownership
Core Infrastructure :
Kubernetes cluster design, deployment, and management for AI and application workloads
Infrastructure as Code using Terraform for multi-cloud environments
Container orchestration and optimization for AI agent deployment
Network architecture and security for distributed systems
Automation & Tooling :
Python-based automation scripts for infrastructure management
Helm chart development and maintenance for application deployment
CI / CD pipeline design using modern DevOps tools
Developer productivity tooling and automation
Monitoring & Reliability :
Comprehensive monitoring, alerting, and tracing systems
Performance optimization for AI workloads
Incident response and disaster recovery planning
Cost optimization and resource management
AI Infrastructure (Unique to Us) :
Infrastructure for AI agent orchestration and management
MLOps pipeline integration
Scalable systems for handling AI model deployment
Resource optimization for GPU / compute-intensive workloads
Required Technical Experience
5 8 years of DevOps / Infrastructure experience
Expert-level Python proficiency for automation and scripting
Deep Kubernetes expertise : deployment, scaling, troubleshooting, and optimization
Strong experience with Helm for application package management
Proven track record designing and implementing CI / CD pipelines
Hands-on experience with major cloud platforms (AWS, Azure, or GCP)
Terraform expertise for Infrastructure as Code
Strong Linux administration and containerization (Docker) skills
Experience with monitoring tools (Prometheus, Grafana, ELK stack)
Understanding of microservices architecture and distributed systems
Ways to Stand Out
CKA (Certified Kubernetes Administrator) or CKAD certification
Experience with MLOps tools (MLflow, Kubeflow, Ray, etc.)
Knowledge of AI / ML infrastructure requirements and optimization
Experience with GPU orchestration and management
API gateway and service mesh implementation (Istio, Linkerd)
GitOps experience (ArgoCD, Flux)
Experience scaling infrastructure for high-growth startups
Contributions to open-source infrastructure projects
Experience with multi-region, highly available deployments
Background in security and compliance (SOC2, HIPAA)
You'll Get
Competitive Salary
Comprehensive health, dental, and vision insurance
401(k) with company match
Flexible PTO policy
$5,000 annual professional development budget
Latest hardware and software tools
The opportunity to shape infrastructure for the future of software development
Work with cutting-edge AI technology and world-class engineers
Modern office in Cambridge's innovation hub
Regular team events and activities
The chance to solve novel infrastructure challenges at the intersection of DevOps and AI
Culture
Who we are : Our founding team consists of a Serial Gen AI Inventor and a successful Serial Entrepreneur. We work hard, maintain a curious mindset, and believe in a low-ego, high-output approach.
We move fast. Time is our most precious asset. We make decisions quickly and iterate rapidly, believing that a good decision today beats a perfect decision next week.
We have a Championship Mindset. We operate like a professional team-winning together by maintaining high standards, supporting each other, and staying laser-focused on our mission.
We have a Passion for Invention. As technologists pushing the boundaries of what's possible with AI, we thrive on solving problems that haven't been solved before.
What We Ask of You
This role requires someone who thrives in ambiguity and loves tackling unprecedented challenges. You'll be building infrastructure for a type of platform that's never existed before-one where thousands of AI agents collaborate to write software. This means being comfortable with rapid change, continuous learning, and creative problem-solving.
You should be excited about working independently while collaborating in-person with our team at our Cambridge headquarters. The ability to communicate complex technical concepts clearly and work effectively with both technical and non-technical stakeholders is essential.
To Apply
Apply with your resume and a brief note about :
Your most challenging infrastructure project and how you solved it
Why you're excited about building infrastructure for AI-powered software development
Interview Process
Here's what you can expect :
Initial screening call (30 minutes)
Technical discussion with our team (45 minutes)
Deep dive system design (60 minutes)
Final conversation with leadership (45 minutes)
Offer discussion
We are an equal opportunity employer committed to building a diverse and inclusive team.
Engineer • Cambridge, MA, United States