Talent.com
Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

Crusoe Energy Systems LLCSan Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

About the Role

We are looking for a highly skilled engineer with deep expertise in building and operating observability platforms at scale. You will design, develop, and run Crusoe’s next-generation observability stack, enabling engineers to understand the internal state of distributed systems through metrics, logs, and traces. Your work will ensure reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform.

What You’ll Be Working On

Designing and operating scalable observability systems (metrics, logging, tracing) across multi-datacenter Kubernetes environments

Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization

Extending monitoring and alerting with Prometheus, Alertmanager, Thanos / Cortex, Grafana, and OpenTelemetry

Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK / Opensearch stacks

Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrating with service meshes, load balancers, and APIs

Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams

Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)

Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI / ML, HPC clusters, GPU infrastructure)

Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls

Partnering with engineering teams to embed observability into applications, services, and infrastructure

Mentoring engineers and shaping Crusoe’s observability strategy and technical roadmap

What You’ll Bring to the Team

7+ years of experience in infrastructure or platform engineering, with a focus on observability and monitoring systems

Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex), logging pipelines (Fluent Bit, Vector, Loki, ELK / Opensearch), and tracing platforms (Jaeger, Tempo, OpenTelemetry)

Strong programming skills in Go or Python for automation, operators, and custom integrations

Experience running observability platforms on Kubernetes and operating them at scale across multi-datacenter environments

Proven ability to design, optimize, and scale telemetry pipelines handling high cardinality and high throughput data

Solid understanding of distributed systems, performance engineering, and debugging complex workloads

Familiarity with service meshes, networking, and workload instrumentation (Envoy, Istio, OpenTelemetry SDKs)

Strong collaboration skills and the ability to influence engineering teams to adopt observability best practices

Bonus Points

Contributions to open source observability projects (Prometheus, OpenTelemetry, Grafana, Loki, etc.)

Experience supporting AI / ML or GPU-heavy environments with high observability demands

Knowledge of event-driven or streaming systems (Kafka, NATS, Pulsar) used in telemetry pipelines

Experience implementing cost optimization strategies for large-scale observability platforms

Background in incident response, chaos engineering, and reliability practices

Benefits

Industry competitive pay

Restricted Stock Units in a fast growing, well-funded technology company

Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents

Employer contributions to HSA accounts

Paid Parental Leave

Paid life insurance, short-term and long-term disability

Teladoc

401(k) with a 100% match up to 4% of salary

Generous paid time off and holiday schedule

Cell phone reimbursement

Tuition reimbursement

Subscription to the Calm app

MetLife Legal

Company paid commuter benefit; $300 per month

Compensation

Compensation will be paid in the range of $166,000 - $201,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex / gender, sexual preference / orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

#J-18808-Ljbffr

Create a job alert for this search

Senior Software Engineer Platform • San Francisco, CA, United States

Related jobs
  • Promoted
Senior Software Engineer - Cloudflare One

Senior Software Engineer - Cloudflare One

Cloudflare, Inc.San Francisco, CA, United States
Full-time
You will be on a team of passionate, talented engineers that are building critical features in Cloudflare One's Zero Trust security platform. Throughout this product suite, we have offerings ranging...Show moreLast updated: 6 days ago
  • Promoted
Senior Software Engineer - Cloud Logistics

Senior Software Engineer - Cloud Logistics

NimbleSan Francisco, CA, United States
Full-time
Senior Software Engineer - Cloud Logistics.Join to apply for the Senior Software Engineer - Cloud Logistics role at Nimble. Nimble is a robotics and AI company inventing and scaling autonomous logis...Show moreLast updated: 30+ days ago
  • Promoted
Senior Software Engineer, Observability

Senior Software Engineer, Observability

AirtableSan Francisco, CA, United States
Full-time
Airtable is the no-code app platform that empowers people closest to the work to accelerate their most critical business processes. More than 500,000 organizations, including 80% of the Fortune 100,...Show moreLast updated: 10 days ago
  • Promoted
Senior Software Engineer - Cloudflare One

Senior Software Engineer - Cloudflare One

Cloudflare IncSan Francisco, CA, United States
Full-time
You will be on a team of passionate, talented engineers that are building critical features in Cloudflare One's Zero Trust security platform. Throughout this product suite, we have offerings ranging...Show moreLast updated: 12 days ago
  • Promoted
Senior Backend Software Engineer, Cloud Management

Senior Backend Software Engineer, Cloud Management

CrusoeSan Francisco, CA, United States
Full-time
We are seeking talented Software Engineers to design, build, and scale Crusoe Cloud’s customer-facing platforms and services. The Cloud Customer Experience (CCX) team is at the forefront of deliveri...Show moreLast updated: 30+ days ago
  • Promoted
Senior Software Engineer, Observability

Senior Software Engineer, Observability

Together AISan Francisco, CA, United States
Full-time
Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastruct...Show moreLast updated: 12 days ago
  • Promoted
Senior Software Engineer

Senior Software Engineer

Scale AI, Inc.San Francisco, CA, United States
Full-time
Software is eating the world, but AI is eating software.We live in unprecedented times - AI has the potential to exponentially augment human intelligence. Every person will have a personal tutor, co...Show moreLast updated: 25 days ago
  • Promoted
Senior Software Engineer, Edge & Cloud Integration

Senior Software Engineer, Edge & Cloud Integration

AuterionSan Francisco, CA, United States
Full-time
Senior Software Engineer – Edge & Cloud Integration is responsible for designing, implementing, and optimizing software that runs on the edge (onboard companion computers and embedded systems) and ...Show moreLast updated: 6 days ago
  • Promoted
Senior Cloud Engineer

Senior Cloud Engineer

University of California San FranciscoSan Francisco, CA, United States
Full-time
The Senior Cloud Engineer will be accountable for driving the configuration and operation of University of California, San Francisco (UCSF) cloud infrastructure services. The Senior Cloud Engineer w...Show moreLast updated: 3 days ago
  • Promoted
Senior Software Engineer, Cloud Platform

Senior Software Engineer, Cloud Platform

Chef Robotics, Inc.San Francisco, CA, United States
Full-time
Chef Robotics is on a mission to accelerate the advent of intelligent machines in the physical world.As the rise of LLMs like ChatGPT has shown, AI has the potential to drive immense change.However...Show moreLast updated: 30+ days ago
  • Promoted
Senior Engineer Cloud Architecture

Senior Engineer Cloud Architecture

Tata Consultancy ServicesFremont, CA, United States
Full-time
Must Have Technical / Functional Skills.Net, C#, Java, AWS Roles & Responsibilities.Software Development : Design, develop, test, and deploy robust, scalable, and secure applications using C#,.Cloud A...Show moreLast updated: 1 day ago
  • Promoted
Senior Cloud Engineer

Senior Cloud Engineer

University of California, San FranciscoSan Francisco, CA, United States
Full-time
University of California, San Francisco.Job Summary : The Senior Cloud Engineer will be accountable for driving the configuration and operation of University of California, San Francisco (UCSF) clou...Show moreLast updated: 6 days ago
  • Promoted
Senior Cloud Engineer

Senior Cloud Engineer

University of California - San Francisco Campus and HealthSan Francisco, CA, United States
Full-time
The Senior Cloud Engineer will be accountable for driving the configuration and operation of University of California, San Francisco (UCSF) cloud infrastructure services. The Senior Cloud Engineer w...Show moreLast updated: 6 days ago
  • Promoted
  • New!
Senior Software Engineer – Cloud Data Platform

Senior Software Engineer – Cloud Data Platform

Disneyland Hong KongSan Francisco, CA, United States
Full-time
A global entertainment company seeks a Senior Software Engineer to drive data platform innovation.The role involves developing critical tools for engineering teams, ensuring operational excellence,...Show moreLast updated: 8 hours ago
  • Promoted
Senior Software Engineer - CFOne Client Apps

Senior Software Engineer - CFOne Client Apps

Cloudflare, Inc.San Francisco, CA, United States
Full-time
At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for cust...Show moreLast updated: 30+ days ago
  • Promoted
Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

Epoch BiodesignSan Francisco, CA, United States
Full-time
We are looking for a highly skilled engineer with deep expertise in building and operating observability platforms at scale. You will design, develop, and run Crusoe’s next-generation observability ...Show moreLast updated: 30+ days ago
  • Promoted
Senior Software Engineer - Aurora Services Engineering

Senior Software Engineer - Aurora Services Engineering

Australian Competition and Consumer CommissionMountain View, CA, United States
Full-time
Software Platform Software & Services Mountain View, California.Design complex systems from the ground up, working closely with software, hardware, and infrastructure engineering teams along with o...Show moreLast updated: 30+ days ago
  • Promoted
Senior Cloud Engineer

Senior Cloud Engineer

Nuon Inc.San Francisco, CA, United States
Full-time
As a Senior Software Engineer, Cloud at Nuon, you will be responsible for building and maintaining features to manage cloud infrastructure across multiple platforms. You should have extensive backen...Show moreLast updated: 24 days ago