Talent.com
Network Reliability Operations Engineer

Network Reliability Operations Engineer

TEKsystemsSanta Clara, CA, United States
1 day ago
Job type
  • Full-time
Job description
  • Description
  • Summary

    NVIDIA is looking for a Network Reliability and Operations (NRO) Engineer to support and maintain our cloud network infrastructure. This network serves the needs across the whole software stack for NVIDIA, from Graphics Drivers to Autonomous Vehicles and Artificial Intelligence.

    In this role, the NRO Engineer will remediate critical alerts within defined SLAs, provide an initial line of triage for network incidents, and interact with internal customers on network related issues. They will also be responsible for engaging with external vendors to remediate issues such as circuit outages, and participate in project related work such as network device upgrades and link capacity augmentations. An ideal candidate will possess a wide range of skills, including alert monitoring & resolution in large-scale networks and CSP environments, outstanding troubleshooting skills, and network protocol knowledge in large multi-vendor infrastructures.

    What You Will Be Doing :

    Monitor and troubleshoot the entire NVIDIA network stack within our cloud and on-premise network infrastructures, which include intra-DC, inter-DC, and CSP environments.

    Network Reliability Operations experience

    • Knowledge of large scale IP Networking Technologies and protocols such as : MP-BGP, VRF, VxLAN, EVPN, IPSEC, DNS
    • Ability to multi-task in an interrupt-driven environment
    • Familiarity with Arista, Fortinet, and Juniper
    • Strong track record of alert response and resolution, within defined SLAs
    • Excellent verbal and written communication skills
    • Experience with high performance network and network optimization in highly-available, large-scale, multi-site, international environments
    • Hands-on experience with contributing to tooling and automation for provisioning, monitoring, and managing network infrastructure
    • 4+ years of experience in network operations
    • BS Degree or equivalent combination of education, technical training, and work experience
    • Skills
    • BGP, VXLAN, Alert Management

    • Top Skills Details
    • BGP,VXLAN,Alert Management

    • Additional Skills & Qualifications
    • Ways To Stand Out From The Crowd :

    • Working knowledge of Mellanox / Cumulus OS
    • Ability to write and understand Python / Shell scripts and programs for automation, tools, frameworks, dashboards, alarms
    • Passionate about innovating and investing in ground breaking technologies
    • Experience Level
    • Entry Level

    • Job Type & Location
    • This is a Contract position based out of Santa Clara, CA.

    • Pay and Benefits
    • The pay range for this position is $65.00 - $65.00 / hr.

      Eligibility requirements apply to some benefits and may depend on your job

      classification and length of employment. Benefits are subject to change and may be

      subject to specific elections, plan, or program terms. If eligible, the benefits

      available for this temporary role may include the following :

    • Medical, dental & vision
    • Critical Illness, Accident, and Hospital
    • 401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available
    • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
    • Short and long-term disability
    • Health Spending Account (HSA)
    • Transportation benefits
    • Employee Assistance Program
    • Time Off / Leave (PTO, Vacation or Sick Leave)
    • Workplace Type
    • This is a fully remote position.

    • Application Deadline
    • This position is anticipated to close on Nov 19, 2025.

      h4>

      About TEKsystems :

      We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

      The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

      About TEKsystems and TEKsystems Global Services

      We're a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We're a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We're strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We're building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com.

      The company is an equal opportunity employer and will consider all applications without regard to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

    Create a job alert for this search

    Reliability Engineer • Santa Clara, CA, United States

    Related jobs
    • Promoted
    Network Engineer

    Network Engineer

    QualysFoster City, CA, United States
    Permanent
    Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!.The successful applicant will be performing work in FedRAMP environments, and th...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer - xAI Technical Operations

    Site Reliability Engineer - xAI Technical Operations

    xAIPalo Alto, CA, US
    Full-time
    AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering exc...Show moreLast updated: 30+ days ago
    • Promoted
    Network Systems Engineer Blue Harbors – Posted by BHUS

    Network Systems Engineer Blue Harbors – Posted by BHUS

    Blue Harbors CorporationSan Francisco, CA, United States
    Full-time
    PositionMust Be Filled By April15, 2016, So Apply Soon!.This position requires both strong technical and interpersonal skills. The position will support the client’s enterprise client management pro...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Network Engineer, Wired

    Network Engineer, Wired

    University of California - San Francisco Campus and HealthSan Francisco, CA, United States
    Full-time
    The Network Engineer, Wired will play a crucial role in the Network Services team within Infrastructure Services to support the design, implementation, optimization, administration, and technical d...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Diverse LynxSan Francisco, CA, United States
    Full-time
    Role : Site Reliability Engineer.Location : RTP, NC / San Jose, CA (Onsite).SRE, NetApp Storage, Linux Certified, Kubernetes Certified, DevOps, Docker, etc. Experienced Senior SRE working on Kubernetes...Show moreLast updated: 4 days ago
    • Promoted
    Sr. Reliability Engineer (26861)

    Sr. Reliability Engineer (26861)

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 7 days ago
    • Promoted
    Staff Site Reliability Engineer, Network

    Staff Site Reliability Engineer, Network

    CrusoeSan Francisco, CA, US
    Full-time
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrif...Show moreLast updated: 12 days ago
    • Promoted
    Staff Network Operations Engineer

    Staff Network Operations Engineer

    Epoch BiodesignSan Francisco, CA, United States
    Full-time
    Sunnyvale, CA - US, San Francisco, CA - US.Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitious...Show moreLast updated: 30+ days ago
    • Promoted
    Infrastructure, DevOps & Reliability Engineer (Multiple Roles, Remote & On-Site)

    Infrastructure, DevOps & Reliability Engineer (Multiple Roles, Remote & On-Site)

    MLabsSan Francisco, CA, US
    Remote
    Full-time
    We’re recruiting Infrastructure, DevOps, and Reliability Engineers for high-growth startups including.AirGarage, Dyno Therapeutics, Codex Health, and Banquet Health.These roles focus on scali...Show moreLast updated: 30+ days ago
    • Promoted
    Senior / Staff Network Reliability Engineer

    Senior / Staff Network Reliability Engineer

    FluidstackSan Francisco, CA, United States
    Full-time
    Fluidstack is building GPU supercomputers for top AI labs, governments, and enterprises.Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more. Our team is small, highly motivate...Show moreLast updated: 20 days ago
    • Promoted
    Senior Site Reliability Engineer, Networking

    Senior Site Reliability Engineer, Networking

    Google Inc.San Francisco, CA, United States
    Full-time
    Senior Site Reliability Engineer, Networking.X Applicants in San Francisco : Qualified applications with arrest or conviction records will be considered for employment in accordance with the San Fra...Show moreLast updated: 20 days ago
    • Promoted
    Staff Network Operations Engineer

    Staff Network Operations Engineer

    Crusoe Energy Systems LLCSan Francisco, CA, United States
    Full-time
    Crusoe Cloud Network Engineering team is looking for an ambitious, experienced team player to join our Network Operations team. Crusoe Cloud Network Engineering Team is responsible for designing, bu...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage

    Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage

    5 Star Global Recruitment PartnersSan Jose, CA, United States
    Full-time
    About the job Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage.SPIFFE - Experience SPIRE - Experience Multiple Cloud Experience Kubernetes. Deep Knowledge base of Development I...Show moreLast updated: 30+ days ago
    • Promoted
    Manager, Provider Network Operations

    Manager, Provider Network Operations

    San Francisco Health PlanSan Francisco, CA, United States
    Full-time
    Director, Provider Network Operations, the Manager, Provider Network Operations oversees the operational tasks of credentialing, provider data management, and provider portal support.You will ensur...Show moreLast updated: 26 days ago
    • Promoted
    Operations Reliability Engineer

    Operations Reliability Engineer

    AppleCupertino, CA, United States
    Full-time
    Imagine what you could do here.At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there'...Show moreLast updated: 4 days ago
    • Promoted
    Deployment Engineering, Networking

    Deployment Engineering, Networking

    Recruiting From ScratchSan Francisco, CA, United States
    Full-time
    Mission District, San Francisco, CA (Hybrid – Tue–Thu in office).Recruiting from Scratch — a specialized talent firm dedicated to helping companies build exceptional teams.Our client is a fast-grow...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer - Cybersecurity

    Site Reliability Engineer - Cybersecurity

    Pantera CapitalPalo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show moreLast updated: 25 days ago