Site Reliability Engineer

Fractal
CA, United States
Full-time
We are sorry. The job offer you are looking for is no longer available.

Responsibilities : Monitoring system uptime and availability, ensuring functional and performance SLAs.Responding to alerts from all critical infrastructure resolving environment issues.

Participate in analyzing incident trends and identifying root causes of the issues.Triage problems for critical services and build automation to prevent problem recurrence.

Influence and create new designs, architectures, standards, and methods for supporting the platform.Understand C3 deployment automation flows to upgrade as needed and effectively troubleshoot issues with system updates and upgrades.

Must be willing to participate in on-call rotationWork cross-functionally with Services and Engineering teams.

Qualifications : Demonstrated a good understanding in deploying, managing, and operating scalable and fault-tolerant Linux / Kubernetes / JVM-based infrastructure in AWS, GCP, and other public clouds.

Expertise in Linux Operating Systems, Networking, and Database concepts.Experience deploying, upgrading, and troubleshooting Kubernetes clusters and workloads.

Experience with Cassandra (or another NoSQL alternative).Expertise in cloud providers, such as Amazon Web Services, Azure, and GCP.

Experience with configuration management systems such as Puppet.Experience in Bash or Python; to automate and monitor systems.

Experience with IaaC tools like Ansible or Terraform.Excellent problem-solving, critical thinking, and communication skills.

Experience supporting as a DevOps or sys admin for commercial SaaS solutions.BS or MS in Computer Science, related field, or equivalent professional experience.

30+ days ago
Related jobs
Promoted
Redis
Bodega Bay, California

The Redis Cloud Operations team is hiring for a Site Reliability Engineer role, offering you the chance to work on large-scale systems and support our valued customers. As a Site Reliability Engineer at Redis, you will:. Leverage your software development and problem-solving expertise to create auto...

Promoted
Apple
Cupertino, California

The Apple Service Engineering - SRE team is looking for Site Reliability Engineers with experience in developing processes, tools, and automation for managing distributed systems in production environments. This role is for engineers who enjoy deep technical engineering that spans large cross-organi...

Promoted
SpaceX
Hawthorne, California

GNC SITE RELIABILITY ENGINEER (FALCON). SpaceX is looking for a GNC Site Reliability Engineer to operate and scale custom-built mission-critical products for Guidance Navigational and Control (GNC). Bachelor's degree in computer science, information systems/IT, engineering, math, or scientific disci...

Philips
Carlsbad, California

Lead Site Reliability Engineer. We are looking for an AWS Expert to strengthen our Engineering Team. ...

Apple Inc.
Cupertino, California

Senior Site Reliability Engineer, Object Storage. At least 5 years in a Site Reliability Engineering, DevOps or infrastructure focused role. The Apple Services Engineering (ASE) team is one of the most exciting examples of Apple’s long-held passion for combining art and technology. These engineers b...

Radar
San Francisco, California

Full Time] Site Reliability Engineer at RADAR (United States). As a cloud Site Reliability Engineer, you will be involved with our fast-paced releases and collaborate closely with the application development team. Evangelize high engineering standards and best practices across multiple areas. Provid...

Rollbar, Inc.
San Diego, California

Dexcom’s Site Reliability Engineering (SRE) team exists to empower our SW Dev Teams to engineer highly reliable systems through which people take control of their health. We are seeking a highly experienced and hands-on Staff Site Reliability Development Engineer to lead our efforts in building and ...

Zscaler
San Jose, California

Position: Staff Site Reliability Engineer. Resolve escalations and help prevent reiteration of incidents with process, monitoring and reliability improvements. Relevant experience preferably in an Operations or Engineering environment. ...

SingleStore
San Francisco, California

Full Time] Senior Site Reliability Engineer at SingleStore (United States). Senior Site Reliability Engineer. MemSQL is seeking a Senior Site Reliability Engineer to help drive our Kubernetes product strategy surrounding our managed service. As a technical leader in the space you will collaborate wi...

Apple
Elk Grove, California

Do you want to be part of a group critical to the success of Apple? Are you a Senior Site Reliability Engineer who is passionate about solving hard problems, owning the entire solution and leveraging cutting edge technologies to enable business operations? Do you enjoy creating automation to elimina...