Talent.com
Machine Learning Infrastructure Engineer - SIML, ISE
Machine Learning Infrastructure Engineer - SIML, ISEApple Inc. • Cupertino, CA, United States
Machine Learning Infrastructure Engineer - SIML, ISE

Machine Learning Infrastructure Engineer - SIML, ISE

Apple Inc. • Cupertino, CA, United States
11 hours ago
Job type
  • Full-time
Job description

Cupertino, California, United States Software and Services

Are you passionate about Generative AI? Are you interested in working on groundbreaking generative modeling technologies to enrich billions of people? We are the Intelligence System Experience (ISE) team within Apple’s software organization. The team operates at the intersection of multimodal machine learning and system experiences. Our multidisciplinary ML teams focus on a broad spectrum of areas, including Visual Generative Foundation Models, Multimodal Understanding, Visual Understanding of People, Text, Handwriting, and Scenes, Personalization, Knowledge Extraction, Conversation Analysis, Behavioral Modeling for Proactive Suggestions, and Privacy-Preserving Learning. These innovations form the foundation of the seamless, intelligent experiences our users enjoy every day.We are seeking a ML Infrastructure Engineer to design, optimize, and scale the systems that power large-scale model training across the organization. This role sits at the intersection of high-performance computing, machine learning, and infrastructure engineering, delivering the core capabilities teams rely on to iterate quickly and reliably.

Description

The ideal candidate brings strong software engineering fundamentals, deep familiarity with distributed training, and a passion for building infrastructure that is efficient, observable, and easy for ML practitioners to use. You’ll work closely with model developers and platform teams to ensure training workflows are fast, reliable, and cost-effective—while also supporting users operationally to keep them unblocked and productive.

  • Build and maintain distributed training infrastructure
  • Optimize training performance through profiling, parallelization strategies and hardware-aware tuning.
  • Develop reliable pipelines for data loading, checkpointing, logging, and monitoring to support high-throughput training jobs.
  • Collaborate directly with ML engineers to understand scaling bottlenecks and design solutions that improve both training speed and resource efficiency.
  • Create and maintain tooling that simplifies how users configure, launch, and debug distributed training jobs.
  • Implement strong observability across training workflows—telemetry, dashboards, alerts, and diagnostics.
  • Support training workloads, investigate failures, triage performance regressions, and gather real feedback from users.

Minimum Qualifications

  • Bachelors, Masters degree in Computer Science, or a related technical field; or equivalent practical experience.
  • 3+ years of experience in software development, with strong Python skills and familiarity with systems programming concepts.
  • Hands-on experience with ML training frameworks (e.g., PyTorch Distributed, DeepSpeed, JAX, TensorFlow).
  • Knowledge of distributed systems, parallel computing, and accelerator architectures (GPU / TPU).
  • Experience debugging performance and reliability issues in complex, large-scale systems.
  • Preferred Qualifications

  • Strong communication skills and the ability to collaborate with ML practitioners and infra teams.
  • At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $181,100 and $272,100, and your base pay will depend on your skills, qualifications, experience, and location.

    Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs.

    Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan.

    You’ll also receive benefits including : Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition.

    Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.

    Note : Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

    Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .

    Apple accepts applications to this posting on an ongoing basis.

    #J-18808-Ljbffr

    Create a job alert for this search

    Machine Learning Engineer • Cupertino, CA, United States

    Related jobs
    Machine Learning Engineer

    Machine Learning Engineer

    Hive • San Francisco, California, United States
    Full-time
    Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    Visa • Foster City, CA, United States
    Permanent
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
    Last updated: 3 hours ago • Promoted • New!
    Software Engineer, Machine Learning Infrastructure

    Software Engineer, Machine Learning Infrastructure

    David AI • San Francisco, California, United States
    Full-time
    David AI is the first audio data research company.We bring an R&D approach to data–developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, an...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer - Infrastructure, Machine Learning

    Senior Software Engineer - Infrastructure, Machine Learning

    Baton • San Francisco, California, United States
    Full-time
    With $10B in freight under management, our technology reaches every part of the U.We design and ship category-defining software that enables Ryder and its 50,000+ customers—including some of the wo...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer (SWE)

    Machine Learning Engineer (SWE)

    Mercor • San Francisco, California, United States
    Full-time
    Mercor is training models that predict how well someone will perform on a job better than a human can.Similar to how a human would review a resume, conduct an interview, and decide who to hire, we ...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    Scribd • San Francisco, California, United States
    Full-time
    At Scribd (pronounced “scribbed”), our mission is to spark human curiosity.Join our team as we create a world of stories and knowledge, democratize the exchange of ideas and information, and empowe...Show more
    Last updated: 30+ days ago • Promoted
    AI Infrastructure Engineer, Model Serving Platform

    AI Infrastructure Engineer, Model Serving Platform

    Scale AI, Inc. • San Francisco, CA, United States
    Full-time
    As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting-edge research and product...Show more
    Last updated: 30+ days ago • Promoted
    Senior / Staff Software Engineer, Machine Learning Infrastructure

    Senior / Staff Software Engineer, Machine Learning Infrastructure

    Nuro • Mountain View, California, United States
    Full-time
    Nuro is a self-driving technology company on a mission to make autonomy accessible to all.Founded in 2016, Nuro is building the world’s most scalable driver, combining cutting-edge AI with automoti...Show more
    Last updated: 4 days ago • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Ambience Healthcare • San Francisco, California, United States
    Full-time
    Ambience is developing the most capable AI systems for healthcare and medicine.As healthcare costs soar to 17.US GDP and a projected shortage of 100,000 physicians within the next decade, the need ...Show more
    Last updated: 30+ days ago • Promoted
    ML Infrastructure Engineer

    ML Infrastructure Engineer

    Openai • San Francisco, California, United States
    Full-time
    The Runtime team builds the low level framework components to power our ML training systems.We work on building robust, scalable, high performance components to support our distributed training wor...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Machine Learning Infrastructure

    Software Engineer, Machine Learning Infrastructure

    Datologyai • Redwood City, California, United States
    Full-time
    Companies want to train their own large models on their own data.The current industry standard is to train on a random sample of your data, which is inefficient at best and actively harmful to mode...Show more
    Last updated: 30+ days ago • Promoted
    MTS, Infrastructure Engineer

    MTS, Infrastructure Engineer

    Delphina • San Francisco, California, United States
    Full-time
    Today’s Data Scientists are in pain - spending their time manually wrangling data, building models through slow trial and error, taking on painstaking rewrites for deployment, and dealing with coun...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    Onyx • San Francisco, California, United States
    Full-time
    Onyx is a popular open source project with hundreds of thousands of users.The project has over 10K stars and over 3K community members across Slack and Discord (these stats may already be out of da...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer, Recommendation

    Machine Learning Engineer, Recommendation

    Newsbreak • Mountain View, California, United States
    Full-time
    NewsBreak is redefining the way users interact with local news and their communities.By bridging local users, local content creators, and local businesses, our mission is to foster safer, more vibr...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    Arc Institute • Palo Alto, California, United States
    Full-time
    The Arc Institute is a new scientific institution that conducts curiosity-driven basic science and technology development to understand and treat complex human diseases. Headquartered in Palo Alto, ...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Machine Learning Infrastructure

    Software Engineer, Machine Learning Infrastructure

    Nuro • Mountain View, California, United States
    Full-time
    Nuro is a self-driving technology company on a mission to make autonomy accessible to all.Founded in 2016, Nuro is building the world’s most scalable driver, combining cutting-edge AI with automoti...Show more
    Last updated: 4 days ago • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Hedra • San Francisco, California, United States
    Full-time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer 756

    Machine Learning Engineer 756

    Protegrity • Palo Alto, California, United States
    Remote
    Full-time
    At Protegrity, we lead innovation by using AI and quantum-resistant cryptography to transform data protection across cloud-native, hybrid, on-premises, and open source environments.We leverage adva...Show more
    Last updated: 30+ days ago • Promoted