Talent.com
Package Reliability Engineer
Package Reliability EngineerCelestial AI • Santa Clara, CA, US
Package Reliability Engineer

Package Reliability Engineer

Celestial AI • Santa Clara, CA, US
30+ days ago
Job type
  • Full-time
Job description

Job Description

Job Description

About Celestial AI

As Generative AI continues to advance, the performance drivers for data center infrastructure are shifting from systems-on-chip (SOCs) to systems of chips. In the era of Accelerated Computing, data center bottlenecks are no longer limited to compute performance, but rather the system's interconnect bandwidth, memory bandwidth, and memory capacity. Celestial AI's Photonic Fabric™ is the next-generation interconnect technology that delivers a tenfold increase in performance and energy efficiency compared to competing solutions.

The Photonic Fabric™ is available to our customers in multiple technology offerings, including optical interface chiplets, optical interposers, and Optical Multi-chip Interconnect Bridges (OMIB). This allows customers to easily incorporate high bandwidth, low power, and low latency optical interfaces into their AI accelerators and GPUs. The technology is fully compatible with both protocol and physical layers, including standard 2.5D packaging processes. This seamless integration enables XPUs to utilize optical interconnects for both compute-to-compute and compute-to-memory fabrics, achieving bandwidths in the tens of terabits per second with nanosecond latencies.

This innovation empowers hyperscalers to enhance the efficiency and cost-effectiveness of AI processing by optimizing the XPUs required for training and inference, while significantly reducing the TCO2 impact. To bolster customer collaborations, Celestial AI is developing a Photonic Fabric ecosystem consisting of tier-1 partnerships that include custom silicon / ASIC design, system integrators, HBM memory, assembly, and packaging suppliers.

ABOUT THE ROLE

We are seeking an experienced Package Reliability Engineer with expertise in 2.5D / 3D advanced packaging. The ideal candidate will have a strong background in physics of failure, materials science, and experience working closely with OSATs to drive package reliability improvements. This role requires collaboration with external assembly and test partners, internal design, process, and failure analysis teams, and suppliers to ensure the reliability and manufacturability of cutting-edge semiconductor packages.

ESSENTIAL DUTIES AND RESPONSIBILITIES

  • Reliability Analysis & Risk Assessment :
  • Conduct physics of failure (PoF)-based reliability modeling for 2.5D / 3D advanced packaging.
  • Assess package reliability risks from thermal, mechanical, and electrical stressors.
  • Define and execute stress test plans (e.g., thermal cycling, humidity, electromigration) to validate package robustness.
  • OSAT Management & Collaboration :
  • Work closely with OSAT partners to drive package reliability improvements, process optimizations, and yield enhancements.
  • Define reliability requirements, review test methodologies, and ensure OSAT compliance with JEDEC and industry standards.
  • Monitor and evaluate OSAT performance in executing reliability qualifications and failure analysis.
  • Support supplier audits and technical reviews to assess manufacturing capabilities and reliability processes.
  • Material Characterization & Selection :
  • Evaluate and select materials (substrates, dielectrics, adhesives, underfills) for optimal reliability.
  • Analyze CTE mismatches, warpage, delamination, and interfacial adhesion issues.
  • Work with material suppliers and OSATs to qualify new materials for advanced packaging applications.
  • Failure Analysis & Root Cause Identification :
  • Lead failure mode analysis (FMEA), model-based problem solving (MBPS) and determine root causes of package failures using techniques such as FIB, X-ray, SEM, and TEM.
  • Identify and mitigate interfacial failures, cracking, voiding, electromigration, and stress-induced damage.
  • Drive OSATs and internal teams to implement corrective and preventive actions (CAPA).
  • Process & Design Collaboration :
  • Work cross-functionally with internal design, process, and manufacturing teams to define assembly test vehicles and optimize package architectures.
  • Develop and refine design guidelines, process improvements, and reliability best practices.
  • Stay up to date with industry standards (JEDEC, IPC, IEEE, etc.) and implement best practices in package reliability.

QUALIFICATIONS

  • Education : Master's or Ph.D. in Materials Science, Mechanical Engineering, Electrical Engineering, Applied Physics, or a related field.
  • Experience : 5-10 years of hands-on experience in 2.5D / 3D advanced packaging reliability.
  • Technical Expertise :
  • Deep understanding of physics of failure (PoF) methodologies for package reliability.
  • Strong knowledge of materials science, particularly in interconnects, substrates, and interfaces.
  • Proficiency in stress modeling tools (ANSYS, Abaqus, COMSOL, etc.) for thermo-mechanical analysis.
  • Experience with failure analysis techniques such as C-SAM, X-ray CT, SEM, TEM, FIB, and EBSD.
  • OSAT Collaboration Experience :
  • Proven track record of working with and driving OSAT partners for package reliability, yield, and continuous quality improvements.
  • Experience managing OSAT qualifications, failure analysis, and corrective actions.
  • Familiarity with supplier engagement, reliability testing at OSATs, and package process flows.
  • Industry Knowledge : Familiarity with JEDEC, IPC, IEEE, and MIL-STD reliability standards.
  • Soft Skills : Strong analytical, problem-solving, and cross-functional collaboration skills.
  • PREFERRED QUALIFICATIONS

  • Experience in heterogeneous integration, fan-out packaging, chiplet architectures.
  • Knowledge of electrical reliability mechanisms (e.g., electromigration, time-dependent dielectric breakdown).
  • Expertise in AI-driven reliability modeling or machine learning for failure prediction.
  • LOCATION : Santa Clara, CA

    For California Location :

    As an early stage start up, we offer an extremely attractive total compensation package inclusive of competitive base salary, bonus and a generous grant of our valuable early-stage equity. The target base salary for this role is approximately $185,000.00 - $225,000.00. The base salary offered may be slightly higher or lower than the target base salary, based on the final scope as determined by the depth of the experience and skills demonstrated by candidate in the interviews.

    We offer great benefits (health, vision, dental and life insurance), collaborative and continuous learning work environment, where you will get a chance to work with smart and dedicated people engaged in developing the next generation architecture for high performance computing.

    Celestial AI Inc. is proud to be an equal opportunity workplace and is an affirmative action employer.

    #LI-Onsite

    Create a job alert for this search

    Reliability Engineer • Santa Clara, CA, US

    Related jobs
    Site Reliability Engineer - Storage

    Site Reliability Engineer - Storage

    Pantera Capital • Palo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer - Storage

    Site Reliability Engineer - Storage

    xAI • Palo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
    Last updated: 30+ days ago • Promoted
    Reliability Engineer

    Reliability Engineer

    Medium • Palo Alto, CA, United States
    Full-time
    Pivotal is the leader in the emerging market of electric Vertical Takeoff and Landing (eVTOL) aircraft.We design, develop, and manufacture light eVTOL aircraft and are renowned for the BlackFly, th...Show more
    Last updated: 21 days ago • Promoted
    Reliability Engineer

    Reliability Engineer

    Pivotal • Palo Alto, CA, US
    Full-time
    Pivotal is the leader in the emerging market of electric Vertical Takeoff and Landing (eVTOL) aircraft.We design, develop, and manufacture light eVTOL aircraft and are renowned for the BlackFly, th...Show more
    Last updated: 15 days ago • Promoted
    Founding Site Reliability Engineer

    Founding Site Reliability Engineer

    Assort Health • San Francisco, CA, United States
    Full-time
    Our mission is to make exceptional healthcare accessible anytime, anywhere, for everyone.At Assort Health, we believe healthcare should feel effortless and connected — quick answers, clear communic...Show more
    Last updated: 5 days ago • Promoted
    Sr. Reliability Engineer / Sustaining

    Sr. Reliability Engineer / Sustaining

    Rivian • Palo Alto, CA, United States
    Full-time
    Rivianis on a mission to keep the world adventurous forever.This goes for the emissions-free Electric Adventure Vehicles we build, and the curious, courageous souls we seek to attract.As a company,...Show more
    Last updated: 1 day ago • Promoted
    Lead Database Reliability Engineer

    Lead Database Reliability Engineer

    Qualys • Foster City, CA, United States
    Full-time
    Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!.The Qualys SaaS platform is database centric and relies heavily on Oracle, Elast...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantum • Palo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Staff Systems Reliability Engineer

    Staff Systems Reliability Engineer

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Staff Systems Reliability Engineer.Key Responsibilities Design and implement scalable, fault-tolerant AWS-based infrastructure Develop and maintain CI / CD pipelines and...Show more
    Last updated: 4 days ago • Promoted
    Reliability Engineer

    Reliability Engineer

    Periodic • Menlo Park, CA, United States
    Full-time
    We are an AI + physical sciences lab building state of the art models to make novel scientific discoveries.We are well funded and growing rapidly. Team members are owners who identify and solve prob...Show more
    Last updated: 20 days ago • Promoted
    Senior Site Reliability Engineer, Storage

    Senior Site Reliability Engineer, Storage

    Epoch Biodesign • San Francisco, CA, United States
    Full-time
    Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation.Take a look at what we do! - https : / / www. We aim to align the long term interests of the c...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials, Inc. • San Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show more
    Last updated: 30+ days ago • Promoted
    Reliability Engineer

    Reliability Engineer

    Etched • Cupertino, CA, US
    Full-time
    Etched is building AI chips that are hard-coded for individual model architectures.Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower laten...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Reliability Engineer (26861)

    Sr. Reliability Engineer (26861)

    Supermicro • San Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more
    Last updated: 1 day ago • Promoted
    Reliability / DFX Engineer

    Reliability / DFX Engineer

    OpenAI • San Francisco, CA, United States
    Full-time
    OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-na...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Reliability

    Software Engineer, Reliability

    OpenAI • San Francisco, CA, United States
    Full-time
    Join the engineering teams that bring OpenAI’s ideas safely to the world!!.The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consu...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Site Reliability Engineer.Key Responsibilities Design, build, and maintain cloud-based infrastructure architecture for high availability and security Collaborate with ...Show more
    Last updated: 30+ days ago • Promoted
    Reliability Engineer

    Reliability Engineer

    Robust.ai • San Carlos, CA, US
    Full-time
    Robust AI is a fast-growing, early-stage startup founded in 2019 by an unsurpassed team of veterans in robotics, AI and business. We are a collaborative group with a wide range of backgrounds and pe...Show more
    Last updated: 15 days ago • Promoted