Talent.com
Distinguished Software Engineer, Reliability Infra

Distinguished Software Engineer, Reliability Infra

Next MatterMountain View, CA, United States
18 hours ago
Job type
  • Full-time
Job description

Overview

LinkedIn is the world’s largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. We are committed to providing transformational opportunities for our own employees by investing in their growth. We aspire to create a culture that’s built on trust, care, inclusion, and fun where everyone can succeed.

Job Details

At LinkedIn, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location for this role is hybrid, meaning it will be performed both from home and from a LinkedIn office on select days, as determined by the business needs of the team.

This role will be based in Sunnyvale, CA or San Francisco, CA.

Responsibilities

  • Serve as a senior technical leader driving the long-term reliability and observability strategy across LinkedIn’s infrastructure
  • Re-architect LinkedIn’s backend systems to enable granular failure domains and reduce the blast radius of incidents
  • Design and implement next-generation failure mitigation strategies that avoid full-region or full-datacenter failovers
  • Partner closely with across many different types of engineers to raise the bar for operational excellence and incident response
  • Define and build frameworks to improve monitoring, alerting, and observability across hundreds of services and systems
  • Define and own the roadmap of bringing observability to critical user journeys for LinkedIn’s products to help capture and improve the experience of LinkedIn’s members / customers
  • Spearhead a multi-year initiative to transition LinkedIn’s infrastructure to a regionalized model with localized failover, enhancing both scalability and availability
  • Lead technical discussions on the future of Engineering at LinkedIn, what the function should evolve into over the next 3-5 years
  • Deliver key insights, executive level reporting across the cross-functional engineering teams to enable the right business decisions around improving quality and reliability of our services and products
  • Act as a force multiplier by mentoring engineers, influencing technical direction across orgs, and contributing deeply to culture, hiring, and technical excellence
  • Lead incident response and post-incident reviews to identify root causes and implement preventive measures
  • Develop and maintain incident management processes and procedures to ensure timely resolution of issues and minimize impact on customers

Qualifications

Basic Qualifications

  • 15+ years of software engineering experience
  • 8+ years focused on infrastructure, reliability-focused engineering, or distributed systems
  • Preferred Qualifications

  • Hands-on experience with large-scale incident response, root cause analysis, and resiliency engineering
  • Strong communication and cross-functional collaboration skills, with experience influencing across multiple orgs and leadership levels
  • Proven success designing and leading architectural transformations at internet-scale companies
  • Deep knowledge of systems reliability, observability frameworks, and fault-tolerant architecture design
  • Experience with multi-region architecture, capacity planning, and failover strategies in large-scale cloud or hybrid environments
  • Background in CI / CD, platform reliability, and automation of ops-heavy systems
  • Familiarity with modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana) and service mesh architecture
  • Track record of setting long-term technical strategy and driving systemic improvements in availability and performance
  • Previous experience in a Distinguished Engineer or equivalent role at a high-growth or web-scale technology company
  • Suggested Skills

  • Site Reliability Engineering (SRE)
  • Leadership
  • Large scale infrastructure
  • Compensation and Benefits

    LinkedIn is committed to fair and equitable compensation practices. The pay range for this role is $238,000 to $390,000. Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to skill set, depth of experience, certifications, and specific work location. This may be different in other locations due to differences in the cost of labor. The total compensation package for this position may also include annual performance bonus, stock, benefits and / or other applicable incentive compensation plans. For more information, visit https : / / careers.linkedin.com / benefits

    Additional Information

    Equal Opportunity Statement

    We seek candidates with a wide range of perspectives and backgrounds and we are proud to be an equal opportunity employer. LinkedIn considers qualified applicants without regard to race, color, religion, creed, gender, national origin, age, disability, veteran status, marital status, pregnancy, sex, gender expression or identity, sexual orientation, citizenship, or any other legally protected class.

    LinkedIn is committed to offering an inclusive and accessible experience for all job seekers, including individuals with disabilities. Our goal is to foster an inclusive and accessible workplace where everyone has the opportunity to be successful.

    If you need a reasonable accommodation to search for a job opening, apply for a position, or participate in the interview process, connect with us at accommodations@linkedin.com and describe the specific accommodation requested for a disability-related limitation.

    A request for an accommodation will be responded to within three business days. However, non-disability related requests, such as following up on an application, will not receive a response.

    LinkedIn will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by LinkedIn, or (c) consistent with LinkedIn’s legal duty to furnish information.

    San Francisco Fair Chance Ordinance

    Pursuant to the San Francisco Fair Chance Ordinance, LinkedIn will consider for employment qualified applicants with arrest and conviction records.

    Pay Transparency Policy Statement

    As a federal contractor, LinkedIn follows the Pay Transparency and non-discrimination provisions described at this link : https : / / lnkd.in / paytransparency.

    Global Data Privacy Notice for Job Candidates

    Please follow this link to access the document that provides transparency around the way in which LinkedIn handles personal data of employees and job applicants : https : / / legal.linkedin.com / candidate-portal.

    #J-18808-Ljbffr

    Create a job alert for this search

    Reliability Engineer • Mountain View, CA, United States

    Related jobs
    • Promoted
    Flight Software Infrastructure Engineer

    Flight Software Infrastructure Engineer

    Reliable RoboticsMountain View, CA, United States
    Permanent
    We're building safety-enhancing technology for aviation that will save lives.Automated aviation systems will enable a future where air transportation is safer, more convenient and fundamentally tra...Show moreLast updated: 30+ days ago
    • Promoted
    Systems Engineer II

    Systems Engineer II

    VirtualVocationsFremont, California, United States
    Full-time
    A company is looking for a Systems Engineer II to manage and operate production environments while ensuring 24 / 7 availability. Key Responsibilities Monitor and maintain all production system equip...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Software Engineer - Reliability

    Software Engineer - Reliability

    RubrikPalo Alto, CA, United States
    Full-time
    The Rubrik Engineering team is comprised of people who produce extraordinary results.Our engineers are driven to build efficient, reliable, and cost effective products. We believe in empowering our ...Show moreLast updated: 18 hours ago
    • Promoted
    • New!
    Software Engineer - Reliability

    Software Engineer - Reliability

    xAIPalo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show moreLast updated: 18 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocationsHayward, California, United States
    Full-time
    A company is looking for a Mid-Sr.Site Reliability Engineer with a focus on on-prem Kubernetes / K8s.Key Responsibilities Manage and maintain on-premise containerized environments Deploy resources...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    VirtualVocationsFremont, California, United States
    Full-time
    A company is looking for a Senior Site Reliability Engineer.Key Responsibilities Design and implement infrastructure and automation scripts for AWS deployment and management Optimize and monitor...Show moreLast updated: 30+ days ago
    • Promoted
    Systems Software Engineer

    Systems Software Engineer

    VirtualVocationsFremont, California, United States
    Full-time
    A company is looking for a Staff Systems Software Engineer.Key Responsibilities Design and implement a stable framework for integrating with multiple vendor firewalls Understand customer require...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Infrastructure Reliability

    Software Engineer, Infrastructure Reliability

    OpenAISan Francisco, CA, United States
    Full-time
    We’re hiring Software Engineers to join our Applied Infrastructure organization, and more specifically for our Database Systems and Online Storage teams. These teams operate with a high degree of au...Show moreLast updated: 11 days ago
    • Promoted
    • New!
    Site Reliability Engineer - Inference

    Site Reliability Engineer - Inference

    Jobright.aiSan Francisco, CA, United States
    Full-time
    Site Reliability Engineer - Inference.Be among the first 25 applicants.Site Reliability Engineer - Inference.Get AI-powered advice on this job and more exclusive features.Jobright is an AI-powered ...Show moreLast updated: 18 hours ago
    • Promoted
    Software Engineer (Site Reliability Engineer)

    Software Engineer (Site Reliability Engineer)

    CerebrasSan Francisco, CA, United States
    Full-time
    San Francisco or Palo Alto, CA.At Anyscale, we take a market-based approach to compensation.We are data-driven, transparent, and consistent. As the market data changes over time, the target salary f...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PerplexitySan Francisco, CA, United States
    Full-time
    Perplexity is an AI-powered answer engine founded in December 2022 and growing rapidly as one of the world’s leading AI platforms. Perplexity has raised over $1B in venture investment from some of t...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    Hinge HealthSan Francisco, CA, United States
    Full-time
    From scaling Kubernetes clusters to improving observability with Datadog, we build the tooling and automation that empower product teams to ship with confidence. Collaborate with engineering teams t...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Jobs via DiceRedwood City, CA, United States
    Full-time
    Dice is the leading career destination for tech experts at every stage of their careers.Our client, Kforce Technology Staffing, is seeking a Reliability Engineer in Redwood City, CA.Deliver high-le...Show moreLast updated: 18 hours ago
    • Promoted
    • New!
    Senior Software Engineer - Observability and Reliability

    Senior Software Engineer - Observability and Reliability

    SigmaSan Francisco, CA, United States
    Full-time
    Senior Software Engineer - Observability and Reliability.Senior Software Engineer - Observability and Reliability.We are growing the engineering team and looking for engineers who have the chops to...Show moreLast updated: 18 hours ago
    • Promoted
    • New!
    Staff Site Reliability Engineer, Fleetnet, Vehicle Software

    Staff Site Reliability Engineer, Fleetnet, Vehicle Software

    Tesla Motors, Inc.Palo Alto, CA, United States
    Full-time
    We are a product focused global team creating the next-generation of server-side infrastructure and code to support the growing suite of Tesla products and services. We are looking for seasoned SREs...Show moreLast updated: 18 hours ago
    • Promoted
    • New!
    Distinguished Software Engineer, Reliability Infra

    Distinguished Software Engineer, Reliability Infra

    LinkedInMountain View, CA, United States
    Full-time
    Distinguished Software Engineer, Reliability Infra.LinkedIn is the world's largest professional network, built to create economic opportunity for every member of the global workforce.Our products h...Show moreLast updated: 18 hours ago
    • Promoted
    • New!
    Distinguished Software Engineer, Reliability Infra

    Distinguished Software Engineer, Reliability Infra

    Collide Capital LLCMountain View, CA, United States
    Full-time
    LinkedIn is the world’s largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exci...Show moreLast updated: 18 hours ago