Talent.com
Site Reliability Engineer (SRE) - Engineering Productivity
Site Reliability Engineer (SRE) - Engineering ProductivityArista Networks • Nashua, Massachusetts, United States
No longer accepting applications
Site Reliability Engineer (SRE) - Engineering Productivity

Site Reliability Engineer (SRE) - Engineering Productivity

Arista Networks • Nashua, Massachusetts, United States
30+ days ago
Job type
  • Full-time
Job description

Company Description

Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. What sets us apart is our relentless pursuit of innovation. We leverage the latest advancements in cloud computing, artificial intelligence, and software-defined networking to provide our clients with a competitive edge in an increasingly interconnected world. Our solutions are designed to not only meet the current demands of the digital landscape but to also anticipate and adapt to future challenges.

At Arista we value the diversity of thought and perspectives that each employee brings to the table. We  believe that fostering an inclusive environment, where individuals from various backgrounds and experiences feel welcome, is essential for driving creativity and innovation.

Our commitment to excellence has earned us several prestigious awards, such as Best Engineering Team, Best Company for Diversity, Compensation, and Work-Life Balance. At Arista, we take pride in our track record of success and strive to maintain the highest standards of quality and performance in everything we do.

Job Description

Who You’ll Work With

Arista Networks is looking for world-class Site Reliability Engineers passionate about driving systems reliability and scalability to provide the best possible development experience for our 2000+ person engineering team. You will be part of a fast paced, high caliber team building the internal systems and infrastructure used to build the routing and switching products driving the industry's largest data center networks.

Arista’s Software Engineering team runs at a scale rarely found - TBs of source control, 60GB work trees with 1000s of developer branches in flight at any given time, over 400K daily build / test jobs and over 150 homegrown and cloud native services running on a 60 node Kubernetes cluster.  Operating these systems takes vigilance, responsiveness to alerts, and a steady stream of updates and bug fixes to keep things running smoothly and efficiently as well as to increase our ability to monitor, understand and visualize them. The SRE role will cover all aspects of our software development infrastructure, and may include monitoring, responding to, and enhancing alerts, working to unify and standardize our alerts, fine tuning code for scalability and performance, debugging problems and the addition of new features. You will own your projects from definition to deployment and customer interactions, and you will be responsible for the quality of everything you deliver.

Working in Engineering Productivity (EngProd), you will collaborate and work with other engineers to design, build, scale, and operate the systems that the rest of Arista’s development teams use.  The EngProd team uses industry-standard systems like Ansible, Jenkins, Kubernetes, Grafana, Gerrit, MySQL, ElasticSearch, Google Cloud, and Redis and also internal systems that we’ve built from the ground-up to automate CI / CD, testing, analysis, and visualization.

What You’ll Do

  • Keeping the production status green all the time
  • Proactively monitor, respond to, and enhance alerts
  • Build automated responses to the most common alerts or work with the rest of the EngProd team to build them
  • Create and maintain the incident response runbooks working with the service dev teams
  • Debug and resolve issues impacting developer user experience and infrastructure stability
  • Develop patterns to support system reliability and socialize them within the EngProd team
  • Review and contribute to the specifications and implementations written by other team members.
  • Work with Arista’s software engineers to identify bottlenecks and limitations in our workflows, tooling, and infrastructure and provide fixes for those problems.
  • Provide support for our tools and infrastructure to Arista’s development team.

Qualifications

  • At least BS Computer Science or Engineering + 5 years’ experience, MS Computer Science or Engineering + 3 years’ experience, or equivalent work experience.
  • Knowledge of one or more of Go, Python, Javascript, Shell Scripting.
  • Knowledge of Linux (or UNIX).
  • Experience operating and managing software systems at scale
  • Strong understanding of the fundamentals of storage and networking
  • Comfortable with Ansible and GitOps
  • Applied understanding of software engineering principles.
  • Strong problem solving and software troubleshooting skills.
  • Ability to design a solution and implement features independently. Ability to work in small teams.
  • #LI-SP1

    Additional Information

    Arista Networks is an equal opportunity employer.  Arista makes all hiring and employment-related decisions in a non-discriminatory manner without regard to race, color, religion, sex, sexual orientation, gender identity, national origin or any other factor determined to be unlawful under applicable federal, state, or law law.  All your information will be kept confidential according to EEO guidelines.

    Create a job alert for this search

    Site Reliability Engineer Sre • Nashua, Massachusetts, United States

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Iron Mountain • Boston, MA, United States
    Full-time
    Get AI-powered advice on this job and more exclusive features.This range is provided by Iron Mountain.Your actual pay will be based on your skills and experience — talk with your recruiter to learn...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    National Society for Black Engineers • Boston, MA, United States
    Full-time
    Join Axon and be a Force for Good.At Axon, we’re on a mission to Protect Life.We’re explorers, pursuing society’s most critical safety and justice issues with our ecosystem of devices and cloud sof...Show more
    Last updated: 24 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    LogRocket • Boston, MA, United States
    Full-time
    Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - USA).Founded in 2016, LogRocket's goal is to make every experience on the web as perfect as possible.We solve a huge ch...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Red Hat • Boston, MA, United States
    Full-time +1
    Join to apply for the Site Reliability Engineer role at Red Hat.Red Hat is looking for a Platform Engineer to join its Platform Engineering team! In this role, you will help architect, implement, i...Show more
    Last updated: 7 days ago • Promoted
    Principal Reliability Engineer

    Principal Reliability Engineer

    Raytheon • Tewksbury, Massachusetts, US
    Full-time
    Date Posted : 2025-10-27 Country : United States of America Location : MA134 : Innovation Dr Tewks Bdg 400 836 North Street Building 400, Tewksbury, MA, 01876 USA Position Role Type : Onsite U.Person, o...Show more
    Last updated: 4 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Wabbisoft • Boston, MA, United States
    Full-time
    Boston, MA or Remote / / Full-time Position.Are you interested in helping companies transform the way they think about security as part of their software development pipeline? If “Yes!,” then keep re...Show more
    Last updated: 30+ days ago • Promoted
    Lead Reliability Engineer

    Lead Reliability Engineer

    Arcadis • Boston, MA, United States
    Full-time +1
    Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.Arcadis is the world's leading company delivering sustainable design, engineering, and consultancy sol...Show more
    Last updated: 7 days ago • Promoted
    Staff Site Reliability Engineer - Observability

    Staff Site Reliability Engineer - Observability

    Hispanic Alliance for Career Enhancement • Boston, MA, United States
    Full-time
    At CVS Health, we’re building a world of health around every consumer and surrounding ourselves with dedicated colleagues who are passionate about transforming health care.As the nation’s leading h...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Reliability Engineer

    Sr. Reliability Engineer

    Raytheon • Acton, Massachusetts, US
    Full-time
    All potential applicants are encouraged to scroll through and read the complete job description before applying.MA133 : Tewksbury, Ma Bldg 3 Concord 50 Apple Hill Drive Concord - Building 3, Tewksbu...Show more
    Last updated: 27 days ago • Promoted
    SRE : AI SaaS Reliability, 24 / 7 Ops & Equity

    SRE : AI SaaS Reliability, 24 / 7 Ops & Equity

    Cimulate, Inc. • Boston, MA, United States
    Full-time
    A dynamic technology company is seeking a skilled Site Reliability Engineer to ensure the reliability and performance of their SaaS production systems. You will monitor environments, manage deployme...Show more
    Last updated: 23 hours ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Red Hat, Inc. • Boston, MA, United States
    Full-time +1
    Red Hat is looking for a Platform Engineer to join its Platform Engineering team! In this role, you will help architect, implement, improve, and support the OpenShift-based platform that runs many ...Show more
    Last updated: 3 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    DevOps projects • Boston, MA, United States
    Full-time
    Cimulate is an AI-native eCommerce search and discovery platform built on cutting‑edge LLM technology.We help commerce brands deliver radically better shopping experiences—faster, more relevant, an...Show more
    Last updated: 3 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Cimulate AI • Boston, MA, United States
    Full-time
    In this pivotal role, you will own the reliability, availability, and performance of our SaaS production environment—monitoring critical systems, managing deployments, and ensuring seamless operati...Show more
    Last updated: 7 days ago • Promoted
    Sr. Manager, Site Reliability Engineering

    Sr. Manager, Site Reliability Engineering

    Xometry • Boston, MA, United States
    Full-time
    Manager, Site Reliability Engineering.Xometry (NASDAQ : XMTR) powers the industries of today and tomorrow by connecting people with big ideas to manufacturers who can bring them to life.Our digital ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Canonical • Boston, MA, United States
    Full-time
    Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiat...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Digital Realty • Boston, MA, United States
    Full-time
    Position Title : Site Reliability Engineer, Interconnection Service and Network Delivery.Location : Hybrid : Austin, Dallas, Boston, Ashburn, Atlanta, London, or Amsterdam. In this role, you will be re...Show more
    Last updated: 7 days ago • Promoted
    Site Reliability Engineer III - AWM

    Site Reliability Engineer III - AWM

    JPMorgan Chase & Co. • Boston, MA, United States
    Full-time
    We have an exciting and rewarding opportunity for you to take your software engineering career to the next level.As a Software Engineer III at JPMorganChase within the Asset and Wealth Management A...Show more
    Last updated: 18 days ago • Promoted
    Site Reliability Engineer (SRE) - Cloud / DevOps Engineer-

    Site Reliability Engineer (SRE) - Cloud / DevOps Engineer-

    Lumen • Boston, MA, United States
    Full-time
    We are igniting business growth by connecting people, data and applications – quickly, securely, and effortlessly.Together, we are building a culture and company from the people up – committed to t...Show more
    Last updated: 4 hours ago • Promoted • New!