Talent.com
Linux Site Reliability Engineer

Linux Site Reliability Engineer

SpacexRedmond, Washington, United States
30+ days ago
Job type
  • Full-time
  • Permanent
Job description

SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars.

LINUX SITE RELIABILITY ENGINEER

SpaceX is looking for an experienced engineer with deep working knowledge of Kubernetes and related containerized technologies. This employee will be a member of the Information Technology Linux Infrastructure team and will provide expertise in Kubernetes design, maintenance, scaling and optimization in support of critical business functions. The ideal candidate will be flexible and flourish in a fast paced and challenging environment. They should be a self-starter, self-motivator and possess ingenuity to excel at this position.

RESPONSIBILITIES :

  • Install, manage, scale and optimize Kubernetes and RKE clusters using Ansible, Terraform and adjacent technologies in production environments.
  • Work closely with other SpaceX engineers to gather requirements, research, evaluate, design, plan, deploy, and support software platforms and related technologies running in Kubernetes within a world-class environment that meets the needs of the demanding SpaceX engineering teams. Build highly resilient, high-performance, scalable, and robust systems.
  • Exercise a high degree of personal responsibility for the processes, systems, and tools you create and manage; all supporting the goal of making humanity an interplanetary species.
  • Make recommendations, justify, and implement improvements using an accepted change control methodology.
  • Work within a diverse group to design and deliver creative solutions and resolve problems in a timely and proactive manner by interacting with internal business units.
  • Define, document and follow standards and best practices for systems design, testing, and implementation.
  • Foster an environment of collaboration and cross-training, upskilling the team in Kubernetes expertise and ensuring peers are developed into capable engineers.
  • Drive scripting, self-service and automation to develop solutions to reduce administrative overhead and TOIL.
  • Participate in on-call rotation to handle urgent after-hours work when necessary.

BASIC QUALIFICATIONS :

  • Bachelor’s degree in Computer Science or a STEM discipline and 3+ years of systems engineering experience; OR 5+ years of systems engineering experience in lieu of a degree.
  • Experience deploying and supporting Linux servers in physical and virtualized environments (e.g. VMware via automation).
  • Experience with the Linux shell as well as configuring and extending Linux instances (e.g. kernel modules, cgroups, pki, iptables, interfaces).
  • Experience supporting and scaling containerized applications in Linux environments.
  • Experience using automation frameworks (e.g. Ansible, Terraform) to manage provisioning and post-provisioning lifecycles of infrastructure and Kubernetes installations.
  • PREFERRED SKILLS AND EXPERIENCE :

  • Expertise in creating repeatable, reliable, scalable systems architectures, with high availability, fault tolerance, performance tuning, monitoring, and statistics / metrics collection.
  • Expertise in source code version control tools such as Git and Subversion and collaborating on source code via Pull Requests and other Git-based workflows.
  • Strong understanding of Linux Container Runtime.
  • Experience implementing configuration management provisioning and workflow automation solutions via Infrastructure as Code, CI / CD and GitOps (e.g. Ansible, AWX / Tower, Vagrant, Puppet, Redfish, Jenkins, cloud-init, ArgoCD, etc).
  • Experience writing test automation to ensure backwards compatibility of feature and change development for automation processes and Kubernetes deployments.
  • Experience with programming and scripting languages such as Python and Golang to develop software solutions and integrate with external systems to implement automation against RESTful API services.
  • Experience installing, configuring and troubleshooting Kubernetes internals, CNI, CRI and CSI plugins (e.g. Docker, Cri-O, Ceph, Cilium), load balancing (e.g. MetalLB), Service Mesh (e.g. Istio) and software-defined storage (e.g. rook-ceph) in cloud or on-premise environments.
  • Experience developing solutions using Kubernetes patterns to extend system functionality and solve custom use cases (e.g. webhooks, controllers, operators, sidecars).
  • Experience implementing proactive alert / monitoring workflows and dashboards for Linux systems and Kubernetes deployments using Prometheus, Grafana, InfluxDB or similar technologies.
  • Experience with dynamic system configuration templating using Jinja, Jsonnet, YAML and Helm.
  • ADDITIONAL REQUIREMENTS :

  • Must be willing to work extended hours and weekends as needed.
  • COMPENSATION AND BENEFITS :

    Pay Range :

    Site Reliability Engineer : $140,000.00-$170,000.00 / year

    Your actual level and base salary will be determined on a case-by-case basis and may vary based on the following considerations : job-related knowledge and skills, education, and experience.

    Base salary is just one part of your total rewards package at SpaceX. You may also be eligible for long-term incentives, in the form of company stock, stock options, or long-term cash awards, as well as potential discretionary bonuses and the ability to purchase additional stock at a discount through an Employee Stock Purchase Plan. You will also receive access to comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short and long-term disability insurance, life insurance, paid parental leave, and various other discounts and perks. You may also accrue 3 weeks of paid vacation and will be eligible for 10 or more paid holidays per year. Employees in Washington State accrue paid sick time in compliance with state and federal law. Company shuttles are offered to employees for roundtrip travel from select Seattle locations to the SpaceX Redmond office Monday to Friday.

    ITAR REQUIREMENTS :

  • To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. § 1157, or (iv) Asylee under 8 U.S.C. § 1158, or be eligible to obtain the required authorizations from the U.S. Department of State. Learn more about the ITAR here .
  • SpaceX is an Equal Opportunity Employer; employment with SpaceX is governed on the basis of merit, competence and qualifications and will not be influenced in any manner by race, color, religion, gender, national origin / ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability or any other legally protected status.

    Applicants wishing to view a copy of SpaceX’s Affirmative Action Plan for veterans and individuals with disabilities, or applicants requiring reasonable accommodation to the application / interview process should reach out to  EEOCompliance@spacex.com .

    Create a job alert for this search

    Site Reliability Engineer • Redmond, Washington, United States

    Related jobs
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    SheinBellevue, Washington, United States
    Full-time
    SHEIN is a global online fashion and lifestyle retailer, offering SHEIN branded apparel and products from a global network of vendors, all at affordable prices. Headquartered in Singapore, with more...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer - Networking

    Senior Site Reliability Engineer - Networking

    LambdaSeattle, WA, United States
    Full-time
    Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference.Lambda’s mission is to make compute as ubiquitous as electricity and give every person access to a...Show moreLast updated: 2 days ago
    • Promoted
    Sr. Kubernetes Platform Site Reliability Engineer (Starlink)

    Sr. Kubernetes Platform Site Reliability Engineer (Starlink)

    SpacexRedmond, Washington, United States
    Full-time +1
    SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technolo...Show moreLast updated: 30+ days ago
    • Promoted
    Hardware / Infrastructure Site Reliability Engineer (Starlink)

    Hardware / Infrastructure Site Reliability Engineer (Starlink)

    SpacexRedmond, Washington, United States
    Full-time +1
    SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technolo...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer, Software Simulation

    Site Reliability Engineer, Software Simulation

    Anduril IndustriesSeattle, Washington, United States
    Full-time
    The Software Integration Environment (SIE) team is responsible for managing fleets of Kubernetes clusters, CI / CD pipelines, test environments and external cloud deployments.If you are an experience...Show moreLast updated: 30+ days ago
    • Promoted
    Kubernetes Platform Site Reliability Engineer (Starlink)

    Kubernetes Platform Site Reliability Engineer (Starlink)

    SpacexRedmond, Washington, United States
    Full-time +1
    SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technolo...Show moreLast updated: 30+ days ago
    • Promoted
    Sr. Hardware / Infrastructure Site Reliability Engineer (Starlink)

    Sr. Hardware / Infrastructure Site Reliability Engineer (Starlink)

    SpacexRedmond, Washington, United States
    Full-time +1
    SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technolo...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    DevOps / Site Reliability Engineer

    DevOps / Site Reliability Engineer

    IttconnectSeattle, Washington, United States
    Full-time
    DevOps \ / Site reliability engineer .We would like someone to come on primarily as a “DevOps Engineer” but also have some SRE capabilities. This hybrid role is focused on enabling our applicatio...Show moreLast updated: less than 1 hour ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    DatSeattle, Washington, United States
    Full-time
    SaaS technology company that has been at the leading edge of innovation in transportation supply chain logistics for 45 years. We continue to transform the industry year over year, by deploying a su...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    HiveSeattle, Washington, United States
    Full-time
    Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show moreLast updated: 30+ days ago
    • Promoted
    Associate Site Reliability Engineer

    Associate Site Reliability Engineer

    OktaBellevue, Washington, United States
    Full-time +1
    Okta is The World’s Identity Company.We free everyone to safely use any technology, anywhere, on any device or app.Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secur...Show moreLast updated: 30+ days ago
    • Promoted
    Sr. User Terminal System Engineer, Customer Terminal

    Sr. User Terminal System Engineer, Customer Terminal

    Amazon Kuiper Manufacturing Enterprises LLCRedmond, WA, US
    Permanent
    Project Kuiper is an initiative to launch a constellation of Low Earth Orbit satellites that will provide low latency, high-speed broadband connectivity to unserved and underserved communities arou...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer, Networking

    Senior Site Reliability Engineer, Networking

    CartaSeattle, Washington, United States
    Full-time
    Carta develops purpose-built software that transforms traditional accounting into a powerful growth engine.Carta’s world-class fund administration platform supports nearly 7,000 funds and SPVs, and...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    DatSeattle, Washington, United States
    Full-time
    SaaS technology company that has been at the leading edge of innovation in transportation supply chain logistics for 45 years. We continue to transform the industry year over year, by deploying a su...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer I

    Senior Site Reliability Engineer I

    AxonSeattle, Washington, United States
    Full-time
    Join Axon and be a Force for Good.At Axon, we’re on a mission to Protect Life.We’re explorers, pursuing society’s most critical safety and justice issues with our ecosystem of devices and cloud sof...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer, Software Simulation

    Senior Site Reliability Engineer, Software Simulation

    Anduril IndustriesSeattle, Washington, United States
    Full-time
    The Software Integration Environment (SIE) team is responsible for managing fleets of Kubernetes clusters, CI / CD pipelines, test environments and external cloud deployments.If you are an experience...Show moreLast updated: 1 day ago
    • Promoted
    Senior Site Reliability Engineer - Infrastructure

    Senior Site Reliability Engineer - Infrastructure

    The Trade DeskSeattle, WA, United States
    Full-time
    The Trade Desk is changing the way global brands and their agencies advertise to audiences around the world.How? With a media buying platform that helps brands deliver a more insightful and relevan...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    CognitivBellevue, Washington, United States
    Remote
    Full-time
    Are you ready to revolutionize the advertising industry? At Cognitiv, we are not just another AdTech company—we are industry trailblazers redefining media buying with our Deep Learning Advertising ...Show moreLast updated: 30+ days ago