Talent.com
Site Reliability Engineer
Site Reliability EngineerCimulate • Boston, MA, United States
Site Reliability Engineer

Site Reliability Engineer

Cimulate • Boston, MA, United States
1 day ago
Job type
  • Full-time
Job description

The Role

Cimulate is seeking a skilled Site Reliability Engineer to join our dynamic team as we revolutionize the future of commerce through intelligent, AI-driven systems. In this pivotal role, you’ll own the reliability, availability, and performance of our SaaS production environment—monitoring critical systems, managing deployments, and ensuring seamless operations for our customers. As a Site Reliability Engineer , you’ll manage production support processes, deployments (including model releases), and incident response, with an opportunity to grow the role into managing vendor partners for 24 / 7 follow-the-sun coverage. This position combines hands-on technical problem-solving with process ownership and operational leadership.

Your work will directly contribute to the stability and scalability of Cimulate’s AI platform, supporting our mission to help businesses operate and engage more intelligently.

Responsibilities

  • Ensure reliability, availability, and performance of SaaS production systems and AI pipelines.
  • Monitor production environments, deployed models, and data pipelines; respond rapidly to incidents and service disruptions.
  • Manage deployments, configuration changes, and release processes (e.g., model and service rollouts).
  • Maintain and enhance observability, monitoring, and alerting systems (e.g., Grafana, Prometheus, ELK).
  • Lead incident response, postmortems, and continuous improvement of operational processes and playbooks.
  • Partner with DevOps and engineering teams to improve scalability, fault tolerance, and automation.
  • Track and improve reliability metrics (SLAs, SLOs, SLIs).
  • Create and maintain clear technical documentation, including runbooks and escalation paths.
  • Participate in on-call rotation and drive improvements in incident management and response.
  • Grow into managing vendor teams providing 24 / 7 L1 operational coverage.

Requirements

  • Proven experience in monitoring and supporting production systems, preferably in a SaaS or multi-tenant environment.
  • Strong knowledge of Linux systems and scripting (Python, Bash, or Go).
  • Hands-on experience with cloud platforms (GCP preferred; AWS / Azure also valuable) and container orchestration (Kubernetes, Docker).
  • Familiarity with Infrastructure-as-Code (IaC) tools such as Terraform or Pulumi.
  • Understanding of networking, databases, and performance tuning.
  • Experience with observability, monitoring, and logging tools (Grafana, Prometheus, ELK, etc.).
  • Proficiency with Git, version control workflows, and CI / CD pipelines.
  • Strong analytical, debugging, and problem-solving skills.
  • Excellent communication and collaboration abilities, including with non-technical stakeholders.
  • Calm under pressure and effective in incident management situations.
  • Growth mindset with the ambition to build and lead scalable 24 / 7 production operations.
  • Nice to haves

  • Experience working with security, compliance, or audit frameworks.
  • Exposure to AI / ML pipelines or data-driven systems.
  • Prior experience managing offshore or vendor-based support teams.
  • Why Join Cimulate?

  • Work with a passionate and collaborative founding team
  • Make a real impact at an early-stage startup with high-growth potential
  • Help redefine the future of online shopping and personalization
  • Competitive compensation, equity, and benefits
  • #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Boston, MA, United States

    Related jobs
    Senior System Reliability Analysis Engineer

    Senior System Reliability Analysis Engineer

    Draper Labs • Cambridge, MA, United States
    Full-time
    Draper is an independent, nonprofit research and development company headquartered in Cambridge, MA.The 2,000+ employees of Draper tackle important national challenges with a promise of delivering ...Show more
    Last updated: 30+ days ago • Promoted
    Cloud Site Reliability Engineer

    Cloud Site Reliability Engineer

    Ford Motor Company • Boston, MA, US
    Full-time
    Enterprise Technology is the engine driving the future of transportation.If you're looking for the chance to leverage advanced technology to redefine the mobility landscape, enhance the customer ex...Show more
    Last updated: 5 days ago • Promoted
    Sediment Remediation Engineer 5

    Sediment Remediation Engineer 5

    CDM Smith • Pawtucket, RI, US
    Full-time
    CDM Smith is seeking candidates for a senior level Sediment Remediation Engineer position.The candidate must have demonstrated expertise working independently as well as leading a team of engineer...Show more
    Last updated: 5 days ago • Promoted
    Site Reliability Engineer Team Lead

    Site Reliability Engineer Team Lead

    VirtualVocations • Dorchester, Massachusetts, United States
    Full-time
    A company is looking for a Site Reliability Engineer, Team Lead.Key Responsibilities Ensure 24x7 availability of production application systems Drive initiatives to improve operational efficienc...Show more
    Last updated: 2 days ago • Promoted
    Lead Semiconductor Reliability Engineer

    Lead Semiconductor Reliability Engineer

    Raytheon • Lawrence, MA, United States
    Full-time
    MA112 : Andover MA 358 Lowell St Dukes 358 Lowell Street Dukes, Andover, MA, 01810 USA.Person, or Immigration Status Requirements : . The ability to obtain and maintain a U.At Raytheon, the foundation ...Show more
    Last updated: 30+ days ago • Promoted
    Software Reliability Engineer

    Software Reliability Engineer

    Raft • Hanscom Air Force Base, MA, United States
    Full-time
    All of the programs we support require.All work must be conducted within the continental U.Distributed Data Systems, Platforms at Scale, and Complex Application Development, with headquarters in Mc...Show more
    Last updated: 18 days ago • Promoted
    Reliability Engineering Co-Op - Spring 2026

    Reliability Engineering Co-Op - Spring 2026

    Entegris • Billerica, MA, United States
    Full-time
    Reliability Engineering Co-Op - Spring 2026.Reliability Engineering Co-Op - Spring 2026 Here at Entegris, we use advanced science to enable technologies that transform the world, and we are seeking...Show more
    Last updated: 30+ days ago • Promoted
    Senior Reliability, Availability, Maintainability (RAM) System Engineer

    Senior Reliability, Availability, Maintainability (RAM) System Engineer

    McBride • Bedford, MA, US
    Full-time
    Senior Reliability, Availability, Maintainability (RAM) System Engineer.McBride is looking for a Reliability, Availability, Maintainability (RAM) System Engineer to join the Force Protection Branch...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer Lead

    Site Reliability Engineer Lead

    VirtualVocations • Lowell, Massachusetts, United States
    Full-time
    A company is looking for a Site Reliability Engineer, Team Lead.Key Responsibilities Ensure 24x7 availability of production application systems and drive operational efficiency initiatives Ident...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    VirtualVocations • Lowell, Massachusetts, United States
    Full-time
    A company is looking for a Site Reliability Engineer II- Process Automation.Key Responsibilities Optimize and automate incident and change management processes to enhance system efficiency and re...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Ahold Delhaize USA • Quincy, MA, US
    Full-time
    Ahold Delhaize USA, a division of global food retailer Ahold Delhaize, is part of the U.Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop. The Site Reliability Engineer (SRE) I...Show more
    Last updated: 2 days ago • Promoted
    Director of Site Reliability Engineering

    Director of Site Reliability Engineering

    Oscar • Boston, MA, United States
    Full-time +1
    My client is searching for a Director of Site Reliability Engineering to play a pivotal role in scaling operations, strengthening platform reliability, and shaping the long-term DevOps vision.This ...Show more
    Last updated: 24 days ago • Promoted
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    VirtualVocations • Lowell, Massachusetts, United States
    Full-time
    A company is looking for a Manager, Site Reliability Engineer.Key Responsibilities Ensure systems and services maintain high availability, reliability, and scalability Develop and maintain autom...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocations • Dorchester, Massachusetts, United States
    Full-time
    A company is looking for a Site Reliability Engineer to provide engineering and operational support for cloud and application services in Oracle Cloud Infrastructure (OCI).Key Responsibilities De...Show more
    Last updated: 30+ days ago • Promoted
    Reliability Engineer

    Reliability Engineer

    Vicor Corporation • Andover, MA, US
    Full-time
    The applications in which our products are used are typically in the higher-performance, higher-power segments of the market segments we serve. Our products are sold worldwide to customers ranging f...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    VirtualVocations • Dorchester, Massachusetts, United States
    Full-time
    A company is looking for a Senior Site Reliability Engineer.Key Responsibilities Design, develop, and implement software to enhance system availability, scalability, latency, and efficiency Lead...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Manager - Site Reliability Engineering (SRE)

    Sr. Manager - Site Reliability Engineering (SRE)

    1010 Analog Devices Inc. • Wilmington, MA, United States
    Full-time +1
    NASDAQ : ADI ) is a global semiconductor leader that bridges the physical and digital worlds to enable breakthroughs at the Intelligent Edge. ADI combines analog, digital, and software technologie...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    LogRocket, Inc • Boston, MA, United States
    Full-time
    LogRocket is an equal opportunity employer.We celebrate diversity and are committed to creating an inclusive environment for all employees. LogRocket will consider sponsoring visas for applicants in...Show more
    Last updated: 6 days ago • Promoted