Talent.com
Senior Customer Reliability Engineer (CRE)
Senior Customer Reliability Engineer (CRE)ZipRecruiter • Austin, TX, US
Senior Customer Reliability Engineer (CRE)

Senior Customer Reliability Engineer (CRE)

ZipRecruiter • Austin, TX, US
2 days ago
Job type
  • Full-time
Job description

Overview

Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. What sets us apart is our relentless pursuit of innovation. We leverage the latest advancements in cloud computing, artificial intelligence, and software-defined networking to provide our clients with a competitive edge in an increasingly interconnected world. Our solutions are designed to not only meet the current demands of the digital landscape but to also anticipate and adapt to future challenges.

At Arista we value the of thought and perspectives that each employee brings to the table. We believe that fostering an inclusive environment, where individuals from various backgrounds and experiences feel welcome, is essential for driving creativity and innovation.

Our commitment to excellence has earned us several prestigious awards, such as Best Engineering Team, Best Company for , Compensation, and Work-Life Balance. At Arista, we take pride in our track record of success and strive to maintain the highest standards of quality and performance in everything we do.

The Opportunity

This is not a traditional operations role. You will inherit a set of critical, manual, and hands-on operational responsibilities essential to our customers' success. We need you to help lead the effort to systematically dismantle this operational burden through automation, tooling, and systems. You will have a collaborative team of excellent engineers and a counterpart to you to work with on both the manual toil and the systems we need to engineer.

The short-term needs are : manual deployments, reactive troubleshooting, and on-call escalations. But we need you to help us build a system where programmatic solutions have replaced human intervention. You must have the pragmatism to manage the current reality and the systematic impatience to build its replacement.

Success in this role requires a dual mindset. You must be a skilled incident leader who can stabilize a crisis and a deliberate systems architect who can prevent the next one. You will work closely with our internal tools, platform, and product engineering teams to channel your direct operational knowledge into durable, long-term solutions.

What You'll Do

Your work will follow a deliberate trajectory from reactive execution to proactive design.

  • Phase 1 : Stabilize and Map (First 3-6 Months). You will embed with the team, taking ownership of the existing operational workload alongside the other customer SRE person covering the India time zone and product engineers. This includes customer deployments, upgrades, and incident response. Your initial goal is to achieve stability while mapping the landscape of our operational toil.
  • Phase 2 : Automate and Influence (Months 6-18). Armed with your map of toil, you will begin to automate. You will write code, build tooling, and deploy declarative infrastructure to eliminate the most critical operational burdens. For larger projects, you will act as a primary stakeholder, providing clear requirements to our internal tooling and platform teams and ensuring their solutions meet the operational need. Your success will be measured by a demonstrable reduction in the overall support effort, fewer pages, support escalations, and manual tasks.
  • Phase 3 : Architect and Evangelize (Year 2+). With the most acute operational pains addressed, your focus will shift to architectural concerns. You will define and implement Service Level Objectives (SLOs), influence the design of new products for operability, and help instill SRE principles throughout the engineering organization.

Qualifications

  • DevOps and SRE Proficiency
  • You must have a strong background in Site Reliability Engineering or a closely related DevOps function. You also have a strong command of Linux systems administration and possess an understanding of networking fundamentals (TCP / IP, DNS, routing).

  • Customer-Facing Experience
  • You must have experience working directly with external customers to solve difficult technical problems. Your communication must be clear, empathetic, and precise.

  • Cloud Infrastructure Expertise
  • You need production experience with a major cloud provider, preferably AWS. You should be proficient in its core concepts and services (VPC, EC2, IAM, S3) and have experience building and managing infrastructure as code with tools like Terraform.

  • Monitoring and Observability
  • You will be responsible for both building and using our observability stack. This requires hands-on experience instrumenting applications and managing the telemetry pipelines for metrics, logs, and traces. A core part of the role is then applying this data to debug complex production incidents, understand system behavior, and define SLOs.

  • Automation and Software Development
  • You must be proficient in writing code to automate operational tasks. Expertise in a high-level language like Python or Go is required, as are strong shell scripting skills (e.g., Bash).

    Skills

  • Proficiency with Kafka, Postgres, nginx, systemd, etc is a plus
  • We use this software extensively in the product in customer environments. Experience here is not required but it is a plus.

  • Proficiency in Nix and NixOS is a plus
  • We use Nix / NixOS extensively so knowing them helps, but they will not play a large role in your initial responsibilities. We\'ll train you on the job if you\'ve never used Nix before.

  • Exposure to or proficiency in functional programming and paradigms is a plus
  • We value functional programming-oriented principles (compositionality, immutability, etc). You are not required to know functional , but some exposure is a plus as is a willingness to learn but this is not a requirement.

    This is a hybrid work environment where office presence maybe required 1-2 days a week.

    Additional Information

    Arista Networks is an equal opportunity employer. Arista makes all hiring and employment-related decisions in a non-discriminatory manner without regard to race, color, national origin, religion, sex, familial status, disability, age, or any other factor determined to be unlawful under applicable federal, state, or law law. All your information will be kept confidential according to EEO guidelines.

    J-18808-Ljbffr

    Create a job alert for this search

    Senior Reliability Engineer • Austin, TX, US

    Related jobs
    Site Reliability Engineer Lead

    Site Reliability Engineer Lead

    VirtualVocations • Austin, Texas, United States
    Full-time
    A company is looking for a Site Reliability Engineer, Team Lead.Key Responsibilities Ensure 24x7 availability of production application systems and drive operational efficiency initiatives Ident...Show more
    Last updated: 2 days ago • Promoted
    Senior Cloud Site Reliability Engineer

    Senior Cloud Site Reliability Engineer

    ZipRecruiter • Austin, TX, US
    Full-time
    Radicle Health is a collection of human services software products created and designed to foster collaboration and experimentation across our teams so that we can collectively better serve our com...Show more
    Last updated: 2 days ago • Promoted
    Senior Service Reliability Engineer - ASE Data Platform

    Senior Service Reliability Engineer - ASE Data Platform

    Apple Inc. • Austin, TX, US
    Full-time
    Senior Service Reliability Engineer - ASE Data Platform.Austin, Texas, United States Software and Services.The Apple Services Engineering team (ASE) is one of the most exciting examples of Apple's ...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer - Sr. Consultant

    Site Reliability Engineer - Sr. Consultant

    Tink • Austin, TX, US
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
    Last updated: 2 days ago • Promoted
    Car Wash Attendant 417

    Car Wash Attendant 417

    Whitewater Express Car Wash • Elgin, TX, US
    Full-time
    At WhiteWater Express, we are more than just a car wash; we are a company built on respect, communication, and a passion for people. Our dedicated team members are leaders within the company, and we...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Site Reliability Engineer - Talent Day

    Sr. Site Reliability Engineer - Talent Day

    Visa • Austin, TX, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer, Atlas Austin; Boston; Chicago; New York City; Pittsburgh

    Senior Site Reliability Engineer, Atlas Austin; Boston; Chicago; New York City; Pittsburgh

    MongoDB • Austin, TX, US
    Full-time
    MongoDB's mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. We enable organizations of all sizes to easily build, scale, and...Show more
    Last updated: 2 days ago • Promoted
    Reliability Engineer II

    Reliability Engineer II

    ICON Technology • Austin, TX, United States
    Full-time
    ICON is looking for a Reliability Engineer II to assist in the development of ICON's latest print systems on the Phoenix Team. This team is responsible for delivering the machine known as Phoenix to...Show more
    Last updated: 12 days ago • Promoted
    Service Reliability Engineer - ASE Data Platform

    Service Reliability Engineer - ASE Data Platform

    Apple Inc. • Austin, TX, US
    Full-time
    Service Reliability Engineer - ASE Data Platform.Austin, Texas, United States Software and Services.The Apple Services Engineering team (ASE) is one of the most exciting examples of Apple's long-he...Show more
    Last updated: 8 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Paradromics, Inc. • Austin, TX, US
    Full-time
    Quick Apply
    Site Reliability Engineer About Paradromics Brain-related illness is one of the last great frontiers in medicine, not because the brain is unknowable, but because it has been inaccessible.Paradromi...Show more
    Last updated: 30+ days ago
    Customer Reliability Engineer

    Customer Reliability Engineer

    VirtualVocations • Austin, Texas, United States
    Permanent
    A company is looking for a Customer Reliability Engineer to ensure the stability and performance of solutions while providing technical escalation support for customers.Key Responsibilities Serve...Show more
    Last updated: 30+ days ago • Promoted
    Senior Service Reliability Engineer - Apple Data Platform

    Senior Service Reliability Engineer - Apple Data Platform

    Apple Inc. • Austin, TX, US
    Full-time
    Senior Service Reliability Engineer - Apple Data Platform.Austin, Texas, United States Software and Services.As a Service Reliability Engineer, you will be responsible for providing the platform fo...Show more
    Last updated: 2 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    VirtualVocations • Austin, Texas, United States
    Full-time
    A company is looking for a Senior Site Reliability Engineer.Key Responsibilities Design, develop, and implement software to enhance system availability, scalability, latency, and efficiency Lead...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Visa • Austin, TX, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer Team Lead

    Site Reliability Engineer Team Lead

    VirtualVocations • Austin, Texas, United States
    Full-time
    A company is looking for a Site Reliability Engineer, Team Lead.Key Responsibilities Ensure 24x7 availability of production application systems Drive initiatives to improve operational efficienc...Show more
    Last updated: 2 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Eagle Eye Networks, Inc • Austin, TX, US
    Full-time
    Eagle Eye Networks is the global leader in cloud video surveillance, delivering cyber-secure, cloud-based video with artificial intelligence (AI) and analytics to make businesses more efficient and...Show more
    Last updated: 2 days ago • Promoted
    Platform Reliability Engineer (Intermediate OR Senior)

    Platform Reliability Engineer (Intermediate OR Senior)

    Teacher Retirement System of Texas • Austin, TX, US
    Full-time
    Platform Reliability Engineer (Intermediate OR Senior).Location : 1900 Aldrich Street, Austin, Texas, 78723, United States. Employment Type : Unclassified Regular Full-Time (URF).Division : Enterprise ...Show more
    Last updated: 2 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Eagle Eye Networks Inc • Austin, TX, US
    Full-time
    Quick Apply
    About Us Eagle Eye Networks is the global leader in cloud video surveillance, delivering cyber-secure, cloud-based video with artificial intelligence (AI) and analytics to make businesses more effi...Show more
    Last updated: 30+ days ago