Talent.com
Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

Epoch BiodesignSan Francisco, CA, United States
4 days ago
Job type
  • Full-time
Job description

About the Role

We are looking for a highly skilled engineer with deep expertise in building and operating observability platforms at scale. You will design, develop, and run Crusoe’s next-generation observability stack, enabling engineers to understand the internal state of distributed systems through metrics, logs, and traces. Your work will ensure reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform.

What You’ll Be Working On

Designing and operating scalable observability systems (metrics, logging, tracing) across multi-datacenter Kubernetes environments

Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization

Extending monitoring and alerting with Prometheus, Alertmanager, Thanos / Cortex, Grafana, and OpenTelemetry

Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK / Opensearch stacks

Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrating with service meshes, load balancers, and APIs

Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams

Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)

Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI / ML, HPC clusters, GPU infrastructure)

Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls

Partnering with engineering teams to embed observability into applications, services, and infrastructure

Mentoring engineers and shaping Crusoe’s observability strategy and technical roadmap

What You’ll Bring to the Team

7+ years of experience in infrastructure or platform engineering, with a focus on observability and monitoring systems

Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex), logging pipelines (Fluent Bit, Vector, Loki, ELK / Opensearch), and tracing platforms (Jaeger, Tempo, OpenTelemetry)

Strong programming skills in Go or Python for automation, operators, and custom integrations

Experience running observability platforms on Kubernetes and operating them at scale across multi-datacenter environments

Proven ability to design, optimize, and scale telemetry pipelines handling high cardinality and high throughput data

Solid understanding of distributed systems, performance engineering, and debugging complex workloads

Familiarity with service meshes, networking, and workload instrumentation (Envoy, Istio, OpenTelemetry SDKs)

Strong collaboration skills and the ability to influence engineering teams to adopt observability best practices

Bonus Points

Contributions to open source observability projects (Prometheus, OpenTelemetry, Grafana, Loki, etc.)

Experience supporting AI / ML or GPU-heavy environments with high observability demands

Knowledge of event-driven or streaming systems (Kafka, NATS, Pulsar) used in telemetry pipelines

Experience implementing cost optimization strategies for large-scale observability platforms

Background in incident response, chaos engineering, and reliability practices

Benefits

Industry competitive pay

Restricted Stock Units in a fast growing, well-funded technology company

Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents

Employer contributions to HSA accounts

Paid Parental Leave

Paid life insurance, short-term and long-term disability

Teladoc

401(k) with a 100% match up to 4% of salary

Generous paid time off and holiday schedule

Cell phone reimbursement

Tuition reimbursement

Subscription to the Calm app

MetLife Legal

Company paid commuter benefit; $300 per month

Compensation

Compensation will be paid in the range of $166,000 - $201,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex / gender, sexual preference / orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

#J-18808-Ljbffr

Create a job alert for this search

Senior Software Engineer • San Francisco, CA, United States

Related jobs
  • Promoted
Senior Solutions Engineer

Senior Solutions Engineer

VirtualVocationsFremont, California, United States
Full-time
A company is looking for a Senior Solution Engineer to join their dynamic APAC Solutions Engineering team.Key Responsibilities Own the technical engagement in pre-sales opportunities and build tr...Show moreLast updated: 30+ days ago
  • Promoted
Senior Business Systems Engineer

Senior Business Systems Engineer

VirtualVocationsHayward, California, United States
Full-time
A company is looking for a Senior Business Systems Engineer.Key Responsibilities Transform business processes through custom scripting, API integrations, and web development Create and maintain ...Show moreLast updated: 1 day ago
  • Promoted
Senior Cloud Engineer

Senior Cloud Engineer

VirtualVocationsConcord, California, United States
Full-time
A company is looking for a Senior Cloud Platform Engineer.Key Responsibilities Design and implement solutions for deploying infrastructure and applications to AWS through CICD pipelines Mentor j...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Senior Software Engineer Consultant

Senior Software Engineer Consultant

VirtualVocationsHayward, California, United States
Full-time
A company is looking for a Senior Software Engineer Consultant.Key Responsibilities Design and architect software solutions that leverage LLMs Lead and mentor junior software engineers while gui...Show moreLast updated: 17 hours ago
  • Promoted
Senior Cloud Infrastructure Engineer

Senior Cloud Infrastructure Engineer

VirtualVocationsFremont, California, United States
Full-time
A company is looking for a Senior Cloud Infrastructure Engineer.Key Responsibilities Lead image management for Windows, Linux, and container workloads, including creation, patching, scanning, and...Show moreLast updated: 30+ days ago
  • Promoted
Senior Backend Engineer

Senior Backend Engineer

VirtualVocationsHayward, California, United States
Full-time
Backend Engineer - Investigator.Key Responsibilities Lead technical design discussions and collaborate with cross-functional teams Design, develop, and operate scalable microservices and robust ...Show moreLast updated: 30+ days ago
  • Promoted
Senior Forward Deployed Engineer

Senior Forward Deployed Engineer

VirtualVocationsFremont, California, United States
Full-time
A company is looking for a Senior Forward Deployed Engineer, Investigator.Key Responsibilities Lead the deployment and configuration of the Open NDR SaaS platform, including sensor setup and inte...Show moreLast updated: 30+ days ago
  • Promoted
Senior Systems Engineer

Senior Systems Engineer

VirtualVocationsFremont, California, United States
Full-time
A company is looking for a Senior Systems Engineer.Key Responsibilities Design, support, and manage corporate infrastructure with Microsoft M365 experience Mentor team members and immerse in new...Show moreLast updated: 30+ days ago
  • Promoted
Senior DevSecOps Engineer

Senior DevSecOps Engineer

VirtualVocationsFremont, California, United States
Full-time
A company is looking for a Senior DevSecOps Engineer - Cloud Hosting & Kubernetes.Key Responsibilities Manage and maintain cloud infrastructure in AWS and Azure, including cost optimization Over...Show moreLast updated: 30+ days ago
  • Promoted
Senior DevOps Engineer

Senior DevOps Engineer

VirtualVocationsHayward, California, United States
Full-time
A company is looking for a Senior DevOps Engineer (Azure).Key Responsibilities Lead and contribute to Infrastructure as Code initiatives using Terraform Help define and improve cloud infrastruct...Show moreLast updated: 30+ days ago
  • Promoted
Senior Telephony Engineer

Senior Telephony Engineer

VirtualVocationsFremont, California, United States
Full-time
A company is looking for a Senior Amazon Connect Developer.Key Responsibilities Own the architecture and roadmap for Amazon Connect and Salesforce Service Cloud Voice Design, build, and maintain...Show moreLast updated: 30+ days ago
  • Promoted
Senior Platform Engineer

Senior Platform Engineer

VirtualVocationsSanta Clara, California, United States
Full-time
A company is looking for a Platform Engineer to join their Digital Services Ecosystem team.Key Responsibilities Participate in the operations and maintenance of the Enterprise Digital Services Ec...Show moreLast updated: 30+ days ago
  • Promoted
Senior Software Engineer

Senior Software Engineer

VirtualVocationsSanta Clara, California, United States
Full-time
A company is looking for a Senior Software Engineer, Data Infrastructure.Key Responsibilities Build and operate core data infrastructure across Airflow, Spark, and AWS for batch and streaming pip...Show moreLast updated: 30+ days ago
  • Promoted
Senior Software Engineer, AI Systems

Senior Software Engineer, AI Systems

VirtualVocationsFremont, California, United States
Full-time
A company is looking for a Senior Software Engineer, AI Systems - vLLM and MLPerf.Key Responsibilities Design and implement efficient inference systems for generative AI models Define benchmarki...Show moreLast updated: 1 day ago
  • Promoted
Senior Cloud Architect

Senior Cloud Architect

VirtualVocationsConcord, California, United States
Full-time
A company is looking for a Senior Software / Cloud Architect.Key Responsibilities Lead the design of cloud-native and hybrid applications, platforms, and infrastructure Define system architecture,...Show moreLast updated: 30+ days ago
  • Promoted
Senior Sales Engineer - Federal

Senior Sales Engineer - Federal

VirtualVocationsFremont, California, United States
Full-time
A company is looking for a Senior Sales Engineer - Federal (DC or Colorado).Key Responsibilities Understand customer business drivers and tailor solutions to meet their needs Collaborate with sa...Show moreLast updated: 22 days ago
  • Promoted
Senior iOS Software Engineer

Senior iOS Software Engineer

VirtualVocationsHayward, California, United States
Full-time
A company is looking for a Senior Software Engineer - iOS (Contract).Key Responsibilities Lead feature development in the iOS app, focusing on commerce and editorial use cases Architect reusable...Show moreLast updated: 30+ days ago
  • Promoted
Senior Databricks Engineer

Senior Databricks Engineer

VirtualVocationsConcord, California, United States
Full-time
A company is looking for a Senior Databricks Engineer / Cloud Data Engineer (Remote).Key Responsibilities Design, build, and optimize data pipelines using Databricks, Apache Spark, and Delta Lake...Show moreLast updated: 30+ days ago
  • Promoted
Gaming Licensed Senior Software Engineer

Gaming Licensed Senior Software Engineer

VirtualVocationsFremont, California, United States
Full-time
A company is looking for a Senior Lead Software Engineer, AI Engineering.Key Responsibilities Design, develop, and operate core AI platform components, including LLM runtime services and vector s...Show moreLast updated: 30+ days ago
  • Promoted
Senior Atlassian Engineer

Senior Atlassian Engineer

VirtualVocationsFremont, California, United States
Full-time
A company is looking for a Senior Atlassian Engineer to architect, design, implement, and operate Jira and Confluence tools. Key Responsibilities Lead tool migrations, upgrades, and updates while ...Show moreLast updated: 2 days ago