Talent.com
Reliability Engineer, Ai & Data Platforms
Reliability Engineer, Ai & Data PlatformsApple • Sunnyvale, CA, United States
Reliability Engineer, Ai & Data Platforms

Reliability Engineer, Ai & Data Platforms

Apple • Sunnyvale, CA, United States
2 days ago
Job type
  • Full-time
Job description

Role Number : 200624750-3956

Summary

Join the AI and Data Platforms team at Apple, where we build and manage cloud-based data platforms handling petabytes of data at scale. We are looking for a passionate and independent Software Engineer specializing in reliability engineering for data platforms, with a strong understanding of data and ML systems. If you thrive in a fast-paced environment, love crafting solutions that don't yet exist, and possess excellent communication skills to collaborate across diverse teams, we invite you to contribute to Apple’s high standards in an exciting and dynamic setting.

Description

As part of our team, you will be responsible for developing and operating our big data platform using open source or other solutions to aid critical applications, such as analytics, reporting, and AI / ML apps. This includes working to optimize performance and cost, automate operations, and identifying and resolving production errors and issues to ensure the best data platform experience.

Minimum Qualifications

3+ years of professional software engineering experience with large-scale big data platforms, including strong programming skills in Java, Scala, Python, or Go.

Proven expertise in designing, building, and operating large-scale distributed data processing systems with a strong focus on Apache Spark.

Hands-on experience with table formats and data lake technologies such as Apache Iceberg, ensuring scalability, reliability, and optimized query performance.

Skilled at coding for distributed systems and developing resilient data pipelines.

Strong background in incident management, including troubleshooting, root cause analysis, and performance optimization in complex production environments.

Proficient with Unix / Linux systems and command-line tools for debugging and operational support.

Preferred Qualifications

Expertise in designing, building, and operating critical, large-scale distributed systems with a focus on low latency, fault-tolerance, and high availability.

Experience with contribution to Open Source projects is a plus.

Experience with multiple public cloud infrastructure, managing multi-tenant Kubernetes clusters at scale and debugging Kubernetes / Spark issues.

Experience with workflow and data pipeline orchestration tools (e.g., Airflow, DBT).

Understanding of data modeling and data warehousing concepts.

Familiarity with the AI / ML stack, including GPUs, MLFlow, or Large Language Models (LLMs).

A learning attitude to continuously improve the self, team, and the organization.

Solid understanding of software engineering best practices, including the full development lifecycle, secure coding, and experience building reusable frameworks or libraries.

Pay & Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $147,400 and $220,900, and your base pay will depend on your skills, qualifications, experience, and location.

Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including : Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. ()

Note : Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant () .

Create a job alert for this search

Engineer Data Platform • Sunnyvale, CA, United States

Related jobs
Generative AI Engineer (Remote)

Generative AI Engineer (Remote)

Jobs via Dice • San Francisco, CA, United States
Remote
Full-time
Generative AI Engineer (Remote).Generative AI Engineer (Remote).Y Combinator-backed Bio-Tech company is looking for a Senior Data Engineer to join their growing team!. This Jobot Job is hosted by : S...Show more
Last updated: 2 days ago • Promoted
Founding Site Reliability Engineer

Founding Site Reliability Engineer

Assort Health • San Francisco, CA, United States
Full-time
Our mission is to make exceptional healthcare accessible anytime, anywhere, for everyone.At Assort Health, we believe healthcare should feel effortless and connected — quick answers, clear communic...Show more
Last updated: 9 days ago • Promoted
Lead Database Reliability Engineer

Lead Database Reliability Engineer

Qualys • Foster City, CA, United States
Full-time
Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!.The Qualys SaaS platform is database centric and relies heavily on Oracle, Elast...Show more
Last updated: 30+ days ago • Promoted
Reliability Engineer

Reliability Engineer

Periodic • Menlo Park, CA, United States
Full-time
We are an AI + physical sciences lab building state of the art models to make novel scientific discoveries.We are well funded and growing rapidly. Team members are owners who identify and solve prob...Show more
Last updated: 24 days ago • Promoted
Software Engineer, Enterprise AI

Software Engineer, Enterprise AI

Scale AI, Inc. • San Francisco, CA, United States
Full-time
Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative AI platform that provides APIs for knowledge retrieval, inference, evaluation, and more. We are looking for a strong enginee...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer, Data Platform - USDS

Site Reliability Engineer, Data Platform - USDS

Tik Tok • San Jose, CA, United States
Full-time
Team Intro : The Data Engineering team in Data Platform USDS is focused on ensuring the stability, reliability, scalability and risk management of TikTok's US data processing ecosystem.We maintain a...Show more
Last updated: 2 days ago • Promoted
Senior Site Reliability Engineer ML Platforms

Senior Site Reliability Engineer ML Platforms

Promote Project • Santa Clara, CA, United States
Full-time
Senior Site Reliability Engineer ML Platforms.Are you passionate about building and maintaining large-scale production systems that support advanced data science and machine learning applications? ...Show more
Last updated: 3 days ago • Promoted
Principal Engineer, Data Reliability

Principal Engineer, Data Reliability

Xero • San Mateo, CA, United States
Permanent
Principal Engineer, Data Reliability.Technology Platform / Permanent / Hybrid.At Xero, we're here to help supercharge small businesses. We do this by automating routine tasks, surfacing actionable i...Show more
Last updated: 2 days ago • Promoted
Solutions Engineer

Solutions Engineer

Menlo Ventures • San Francisco, CA, United States
Full-time
December 31, 2025 at 3 : 00 AM EST.Every business relies on accounting.Yet most software in the space was built in the early 2000s — clunky, slow, and far behind the curve on AI.Today, accountants ar...Show more
Last updated: 23 days ago • Promoted
Site Reliability Engineer - Data Infrastructure

Site Reliability Engineer - Data Infrastructure

Tik Tok • San Jose, CA, United States
Full-time
Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation.We seamlessly merge software development and infrastructure operations to design, build, and manage large-...Show more
Last updated: 30+ days ago • Promoted
Sr. Reliability Engineer (26861)

Sr. Reliability Engineer (26861)

Supermicro • San Jose, CA, United States
Full-time
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more
Last updated: 5 days ago • Promoted
AI Systems & Data Engineer

AI Systems & Data Engineer

HyperFi • San Francisco, CA, United States
Full-time
We're building the kind of platform we always wanted to use : fast, flexible, and built for making sense of real-world complexity. Behind the scenes is a robust, event-driven architecture that connec...Show more
Last updated: 25 days ago • Promoted
Reliability EngineerCupertino, CA

Reliability EngineerCupertino, CA

ETCHED LLC • Cupertino, CA, United States
Full-time
Etched is building AI chips that are hard-coded for individual model architectures.Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower laten...Show more
Last updated: 1 day ago • Promoted
Senior Site Reliability Engineer (SRE), Data - Apple Ads

Senior Site Reliability Engineer (SRE), Data - Apple Ads

Apple • Cupertino, CA, United States
Full-time
Senior Site Reliability Engineer (SRE), Data - Apple Ads.Cupertino, California, United States Software and Services.At Apple, we focus deeply on our customers experience. Apple Ads brings this same ...Show more
Last updated: 3 days ago • Promoted
Cloud Native / Serverless Reliability Engineer (SRE)

Cloud Native / Serverless Reliability Engineer (SRE)

Alibaba Cloud • Sunnyvale, CA, United States
Full-time
Cloud Native / Serverless Reliability Engineer (SRE).Cloud Native / Serverless Reliability Engineer (SRE).The Alibaba Cloud Cloud Native Serverless Team is a leading innovation force within Alibaba Clo...Show more
Last updated: 2 days ago • Promoted
Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage

Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage

5 Star Global Recruitment Partners • San Jose, CA, United States
Full-time
About the job Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage.SPIFFE - Experience SPIRE - Experience Multiple Cloud Experience Kubernetes. Deep Knowledge base of Development I...Show more
Last updated: 30+ days ago • Promoted
Founding Site Reliability Engineer

Founding Site Reliability Engineer

Assort Health Inc. • San Francisco, CA, United States
Full-time
Our mission is to make exceptional healthcare accessible anytime, anywhere, for everyone.At Assort Health, we believe healthcare should feel effortless and connected — quick answers, clear communic...Show more
Last updated: 9 days ago • Promoted
Reliability Engineer

Reliability Engineer

Meta Platforms • Sunnyvale, CA, United States
Full-time
Meta), formerly known as Facebook Inc.When Facebook launched in 2004, it changed the way people connect.Apps and services like Messenger, Instagram, and WhatsApp further empowered billions around t...Show more
Last updated: 3 days ago • Promoted