Talent.com
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Mango, Inc.Los Angeles, California, United States
23 hours ago
Job type
  • Full-time
Job description

We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on-premise instruments, data systems, and machine learning pipelines. This role combines systems-level engineering with software craftsmanship, requiring deep understanding of how compute, storage, and networking layers interact under real workloads.You will be the go-to expert for diagnosing performance issues in our on-prem system. This could be from kernel-level I / O bottlenecks to distributed service latency. In addition to building robust automation that keeps our systems consistent and observable.Key ResponsibilitiesInfrastructure Design & Reliability Design, deploy, and maintain our on-premise and hybrid infrastructure which includes Dell PowerEdge and PowerVault servers, prosumer NAS units, and high-throughput data processing clusters. Implement fault-tolerant systems with reproducible deployments and clear observability.Performance & Systems Analysis Investigate complex performance issues across hardware, OS, and software boundaries. You will be using Linux toolin addition to in-house application-level metrics to uncover root causes in filesystems, caching layers, or I / O scheduling.Automation & Tooling Build automation for system provisioning, configuration management, and software deployment using Python, Go, Ansible, or similar frameworks. Develop lightweight services and tools that make reliability visible and maintainable.Collaboration Work closely with our software and hardware teams to co-design systems that meet the needs of high-resolution imaging and ML inference workloads. Translate hardware realities into software reliability guarantees.Observability & Incident Response Develop and maintain monitoring, alerting, and logging systems to ensure early detection of issues. Lead incident response and post-mortem efforts with a focus on learning and prevention.Documentation & Communication Produce clear documentation and communicate findings effectively to the broader team from network topology diagrams to kernel tuning rationales.General QualificationsDeep understanding of Linux systems and performance (I / O schedulers, RAID, caching, NUMA, kernel parameters).Hands-on experience designing and managing on-premise servers, storage arrays, or HPC clusters.Comfort with automation and software development (Python, Go, Bash, or similar).Strong diagnostic and analytical skills : ability to decompose performance problems across multiple layers.Proven track record of improving system reliability, throughput, and maintainability in a fast-paced environment.Excellent written and verbal communication skills for cross-disciplinary collaboration.Self-driven, curious, and motivated by understanding systems deeply rather than just maintaining them.Bonus Qualities (Not Required)510 years of relevant industry experience in systems engineering, SRE, or infrastructure software roles.Experience tuning Linux filesystems (ext4, btrfs) and software RAID (mdadm).Familiarity with containerization and orchestration (Docker, Compose, Kubernetes).Knowledge of networking fundamentals (VLANs, bonding, LACP, 10 GbE / 40 GbE).Experience supporting data-heavy scientific or ML workloads.Demonstrated technical leadership mentoring others in debugging, reliability, or performance analysis.

recblid a27ykxdqpvdzrj81gllu1mnyf3d85k

Create a job alert for this search

Senior Site Reliability Engineer • Los Angeles, California, United States

Related jobs
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Talent Software ServicesLos Angeles, CA, United States
Permanent
Job Summary : Talent Software Services is in search of a Site Reliability Engineer for a contract position in Los Angeles, CA. The opportunity will be six months with a strong chance for a long-term ...Show moreLast updated: 4 days ago
  • Promoted
Senior Site Reliability Engineer (SRE)

Senior Site Reliability Engineer (SRE)

StubHubLos Angeles, CA, United States
Full-time
StubHub is on a mission to redefine the live event experience on a global scale.Whether someone is looking to attend their first event or their hundredth, we're here to delight them all the way fro...Show moreLast updated: 8 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Diverse LynxLos Angeles, CA, United States
Full-time
Must Have Technical / Functional Skills.Experience in Cloud platforms (AWS, Azure, Google Cloud) and hybrid environments. Proficiency in container technologies (Docker, Container, Podman).Strong knowl...Show moreLast updated: 30+ days ago
  • Promoted
Senior Site Reliability Engineer / Los Angeles, CA / Hybrid

Senior Site Reliability Engineer / Los Angeles, CA / Hybrid

Motion RecruitmentLos Angeles, CA, United States
Full-time
A large gaming company is looking for a Senior Site Reliability Engineer to come join their team based in Los Angeles!.This person will be apart of a team of Site Reliability Engineers that leverag...Show moreLast updated: 8 days ago
  • Promoted
Senior Site Reliability Engineer NewLos Angeles, CA

Senior Site Reliability Engineer NewLos Angeles, CA

K2 SpaceLos Angeles, CA, United States
Permanent
Senior Site Reliability Engineer.K2 Space is building large, high-powered spacecraft for the next generation of space development. Backed by Lightspeed Venture Partners, Altimeter Capital, and many ...Show moreLast updated: 1 day ago
  • Promoted
Site Reliability Engineer II

Site Reliability Engineer II

AEG - Anschutz Entertainment GroupLos Angeles, CA, United States
Full-time
AXS - Los Angeles, CA Los Angeles, CA Info Technology & Software Engineering Full Time AXSDG 8690 AXS connects fans with the artists and teams they love. Each year we sell millions of tickets to tho...Show moreLast updated: 2 days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

K2 SpaceLos Angeles, CA, United States
Permanent
K2 Space is building large, high-powered spacecraft for the next generation of space development.Backed by Lightspeed Venture Partners, Altimeter Capital, and many others ($200M raised to date), we...Show moreLast updated: 30+ days ago
  • Promoted
Sr. Site Reliability Engineer

Sr. Site Reliability Engineer

KēSTA I.T.Culver City, CA, US
Full-time +1
Come build, innovate, disrupt, and thrive!.Site Reliability Engineer for an immediate full-time opportunity with our industry leading client. Are you on the lookout for a unique career opportunity t...Show moreLast updated: 16 days ago
  • Promoted
Site Reliability Engineer in Los Angeles

Site Reliability Engineer in Los Angeles

Energy Jobline ZRLos Angeles, CA, United States
Full-time
Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub.We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy ...Show moreLast updated: 8 days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

andurilCosta Mesa, CA, United States
Full-time
Senior Site Reliability Engineer.Anduril Industries is a defense technology company with a mission to transform U.By bringing the expertise, technology, and business model of the 21st century's mos...Show moreLast updated: 7 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

TentekGlendale, CA, United States
Full-time
Must report onsite in Glendale 3 days per week, typically Tuesday-Thursday.There will be 3 rounds of interviews for this position. Linux system admin and Windows but willing to consider only Linux b...Show moreLast updated: 8 days ago
  • Promoted
Senior Site Reliability Engineer (Remote)

Senior Site Reliability Engineer (Remote)

ExperianCosta Mesa, CA, United States
Remote
Full-time
Experian is a global data and technology company, powering opportunities for people and businesses around the world.We help to redefine lending practices, uncover and prevent fraud, simplify health...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer, GNC (Falcon)

Site Reliability Engineer, GNC (Falcon)

SpaceXInglewood, CA, United States
Full-time
Site Reliability Engineer, GNC (Falcon).SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not.Today Sp...Show moreLast updated: 6 days ago
  • Promoted
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

NominalLos Angeles, CA, US
Permanent
Nominal is building the software infrastructure powering the world’s most advanced hardware systems — from spacecraft and autonomous vehicles to next-generation industrial machines.Our ...Show moreLast updated: 20 days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

First ResonanceLos Angeles, CA, US
Full-time
As a Senior Site Reliability Engineer at First Resonance, you will play a pivotal role in enhancing the efficiency, scalability, and reliability of our software solutions.Joining the core Engineeri...Show moreLast updated: 20 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

WorkOSLos Angeles, CA, US
Full-time
WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with employees ...Show moreLast updated: 20 days ago
  • Promoted
Sr. Site Reliability Engineer

Sr. Site Reliability Engineer

Kesta ITCulver City, CA, United States
Full-time +1
Come build, innovate, disrupt, and thrive!.Site Reliability Engineer for an immediate full-time opportunity with our industry leading client. Are you on the lookout for a unique career opportunity t...Show moreLast updated: 8 days ago
  • Promoted
Site Reliability Engineer - Senior (CPE)

Site Reliability Engineer - Senior (CPE)

TalentBurstLos Angeles, CA, United States
Full-time
Position Title : Site Reliability Engineer.It is an exciting time to be part of SIE's CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engine...Show moreLast updated: 4 days ago