Talent.com
Sr. Staff Software Engineer, AI Infra
Sr. Staff Software Engineer, AI InfraLinkedin • Mountain View, California, United States
Sr. Staff Software Engineer, AI Infra

Sr. Staff Software Engineer, AI Infra

Linkedin • Mountain View, California, United States
30+ days ago
Job type
  • Full-time
Job description

Company Description

LinkedIn is the worlds largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. Were also committed to providing transformational opportunities for our own employees by investing in their growth. We aspire to create a culture thats built on trust, care, inclusion, and fun where everyone can succeed.

Job Description

At LinkedIn, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a LinkedIn office on select days, as determined by the business needs of the team.

Join us to push the boundaries of scaling large models together. The team is responsible for scaling LinkedIn's AI model training, feature engineering and serving with hundreds of billions of parameters models and large scale feature engineering infra for all AI use cases from recommendation models, large language models, to computer vision models. We optimize performance across algorithms, AI frameworks, data infra, compute software, and hardware to harness the power of our GPU fleet with thousands of latest GPU cards. The team also works closely with the open source community and has many open source committers (TensorFlow, Horovod, Ray, vLLM, Hugginface, DeepSpeed etc.) in the team. Additionally, this team focussed on technologies like LLMs, GNNs, Incremental Learning, Online Learning and Serving performance optimizations across billions of user queries.

Model Training Infrastructure : As an engineer on the AI Training Infra team, you will play a crucial role in building the next-gen training infrastructure to power AI use cases. You will design and implement high performance data I / O, work with open source teams to identify and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and optimize deep learning training, and provide advanced support for internal AI teams in areas like model parallelism, tensor parallelism, Zero++ etc. Finally, you will assist in and guide the development of containerized pipeline orchestration infrastructure, including developing and distributing stable base container images, providing advanced profiling and observability, and updating internally maintained versions of deep learning frameworks and their companion libraries like Tensorflow, PyTorch, DeepSpeed, GNNs, Flash Attention. PyTorch Lightning and more and more.

Model Serving Infrastructure : this team builds low latency high performance applications serving very large & complex models across LLM and Personalization models. As an engineer, you will build compute efficient infra on top of native cloud, enable GPU based inference for a large variety of use cases, cuda level optimizations for high performance, enable on-device and online training. Challenges include scale (10s of thousands of QPS, multiple terabytes of data, billions of model parameters), agility (experiment with hundreds of new ML models per quarter using thousands of features), and enabling GPU inference at scale.

As a Sr. Staff Software Engineer, you will have first-hand opportunities to advance one of the most scalable AI platforms in the world. At the same time, you will work together with our talented teams of researchers and engineers to build your career and your personal brand in the AI industry.

Responsibilities :

  • Owning the technical strategy for broad or complex requirements with insightful and forward-looking approaches that go beyond the direct team and solve large open-ended problems.
  • Designing, implementing, and optimizing the performance of large-scale distributed serving or training for personalized recommendation as well as large language models.
  • Improving the observability and understandability of various systems with a focus on improving developer productivity and system sustenance.
  • Mentoring other engineers, defining our challenging technical culture, and helping to build a fast-growing team.
  • Working closely with the open-source community to participate and influence cutting edge open-source projects (e.g., vLLMs, PyTorch, GNNs, DeepSpeed, Huggingface, etc.).
  • Functioning as the tech-lead for several concurrent key initiatives AI Infrastructure and defining the future of AI Platforms.

Qualifications

Basic Qualifications :

  • BS / BA in Computer Science or related technical field or equivalent technical experience
  • 5+ years of industry experience in software design, development, and algorithm related solutions
  • 5+ years of experience programming in object-oriented languages such as Python, C++, Java, Go, Rust, Scala
  • 2+ years of experience as an architect, or technical leadership position
  • 5+ years of experience in the industry with leading / building deep learning systems
  • Hands-on experience developing distributed systems or other large-scale systems
  • Preferred Qualifications :

  • MS or PhD in Computer Science or related technical discipline.
  • 10+ years of experience in software design, development, and algorithm related solutions with at least 5 years of experience in a technical leadership position
  • 10+ years of experience in an object-oriented programming language such as Python, C++, Java, Go, Rust, Scala
  • 5+ years of experience with large-scale distributed systems and client-server architectures
  • Experience building ML applications, LLM serving, GPU serving.
  • Co-author or maintainer of any open-source projects
  • Expertise in machine learning infrastructure, including technologies like MLFlow, Kubeflow and large scale distributed systems
  • Expertise in deep learning frameworks and tensor libraries like PyTorch, Tensorflow, JAX / FLAX
  • Suggested Skills :

  • ML Algorithm Development
  • Machine Learning and Deep Learning
  • Information retrieval / recommendation systems
  • Technical leadership
  • LinkedIn is committed to fair and equitable compensation practices. The pay range for this role is $180,000 to $300,000. Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to skill set, depth of experience, certifications, and specific work location. This may be different in other locations due to differences in the cost of labor.

    The total compensation package for this position may also include annual performance bonus, stock, benefits and / or other applicable incentive compensation plans. For more information, visit https : / / careers.linkedin.com / benefits.

    Additional Information

    Equal Opportunity Statement

    We seek candidates with a wide range of perspectives and backgrounds and we are proud to be an equal opportunity employer. LinkedIn considers qualified applicants without regard to race, color, religion, creed, gender, national origin, age, disability, veteran status, marital status, pregnancy, sex, gender expression or identity, sexual orientation, citizenship, or any other legally protected class.

    LinkedIn is committed to offering an inclusive and accessible experience for all job seekers, including individuals with disabilities. Our goal is to foster an inclusive and accessible workplace where everyone has the opportunity to be successful.

    If you need a reasonable accommodation to search for a job opening, apply for a position, or participate in the interview process, connect with us at [email protected] and describe the specific accommodation requested for a disability-related limitation.

    Reasonable accommodations are modifications or adjustments to the application or hiring process that would enable you to fully participate in that process. Examples of reasonable accommodations include but are not limited to :

  • Documents in alternate formats or read aloud to you
  • Having interviews in an accessible location
  • Being accompanied by a service dog
  • Having a sign language interpreter present for the interview
  • A request for an accommodation will be responded to within three business days. However, non-disability related requests, such as following up on an application, will not receive a response.

    LinkedIn will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by LinkedIn, or (c) consistent with LinkedIn's legal duty to furnish information.

    San Francisco Fair Chance Ordinance ​

    Pursuant to the San Francisco Fair Chance Ordinance, LinkedIn will consider for employment qualified applicants with arrest and conviction records.

    Pay Transparency Policy Statement ​

    As a federal contractor, LinkedIn follows the Pay Transparency and non-discrimination provisions described at this link : https : / / lnkd.in / paytransparency.

    Global Data Privacy Notice for Job Candidates ​

    Please follow this link to access the document that provides transparency around the way in which LinkedIn handles personal data of employees and job applicants : https : / / legal.linkedin.com / candidate-portal.

    Create a job alert for this search

    Staff Software Engineer • Mountain View, California, United States

    Related jobs
    Senior, Software Engineer - AI

    Senior, Software Engineer - AI

    Walmart • Sunnyvale, CA, United States
    Full-time +1
    Develop next generation Last Mile Delivery software applications, including very high capacity, guaranteed availability, and mass market usability without compromising quality.Engage in hands on en...Show more
    Last updated: 3 days ago • Promoted
    Sr. Staff Software Engineer - AI + Data Intelligence Platform

    Sr. Staff Software Engineer - AI + Data Intelligence Platform

    Databricks Inc. • Mountain View, CA, United States
    Full-time
    Staff Software Engineer – AI + Data Intelligence Platform.Databricks is looking for an experienced engineer to build the next generation of our Data Intelligence Platform.You will work with product...Show more
    Last updated: 11 days ago • Promoted
    Sr. Software Engineer

    Sr. Software Engineer

    Procyon TS • Santa Clara, CA, United States
    Full-time
    Role : Sr Data Science Lead - Deep Learning.Location : Santa Clara, CA - Onsite.A minimum of 7 years of experience in data science, with a proven track record of delivering impactful data-driven solu...Show more
    Last updated: 3 days ago • Promoted
    Software Engineer, AI Infra Innovation

    Software Engineer, AI Infra Innovation

    Pure Storage • Santa Clara, CA, United States
    Full-time
    We're in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry.Here, you lead with innovative thinking, grow along with us, and join the smartest team in t...Show more
    Last updated: 3 days ago • Promoted
    Software Engineer Sr. Staff

    Software Engineer Sr. Staff

    Hewlett Packard Enterprise Development LP • San Jose, CA, United States
    Full-time
    This role has been designed as 'Hybrid' with an expectation that you will work on average 2 days per week from an HPE office. Hewlett Packard Enterprise is the global edge-to-cloud company advancing...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Software Engineer - AI / LLM Applications (26456)

    Sr. Software Engineer - AI / LLM Applications (26456)

    Super Micro Computer • San Jose, CA, United States
    Full-time
    Supermicro® is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customer...Show more
    Last updated: 3 days ago • Promoted
    Sr. Software Engineer - Applied AI (REMOTE)

    Sr. Software Engineer - Applied AI (REMOTE)

    GEICO • Palo Alto, CA, United States
    Remote
    Full-time
    At GEICO, we offer a rewarding career where your ambitions are met with endless possibilities.Every day we honor our iconic brand by offering quality coverage to millions of customers and being the...Show more
    Last updated: 3 days ago • Promoted
    Sr. Software Engineer- Agentic AI & Digital Experience

    Sr. Software Engineer- Agentic AI & Digital Experience

    Zscaler • San Jose, CA, United States
    Full-time
    Zscaler accelerates digital transformation so our customers can be more agile, efficient, resilient, and secure.Our cloud native Zero Trust Exchange platform protects thousands of customers from cy...Show more
    Last updated: 3 days ago • Promoted
    Staff Software Engineer Lead, AI / ML

    Staff Software Engineer Lead, AI / ML

    Google Inc. • Mountain View, CA, United States
    Full-time
    Google place Mountain View, CA, USA.Bachelor’s degree or equivalent practical experience.ML infrastructure, or specialization in another ML field. ML design and ML infrastructure (e.Master's or PhD ...Show more
    Last updated: 29 days ago • Promoted
    Sr. Software Engineer - AI / LLM Applications (26456)

    Sr. Software Engineer - AI / LLM Applications (26456)

    Supermicro • San Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more
    Last updated: 3 days ago • Promoted
    Sr Software Engineer - AI Infrastructure

    Sr Software Engineer - AI Infrastructure

    Oracle • Santa Clara, CA, United States
    Full-time
    Oracle Cloud Infrastructure (OCI) is looking for a Senior Software Engineer - AI Infrastructure to lead the development of scalable, resilient, and secure infrastructure systems that underpin the c...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer Sr. Staff

    Software Engineer Sr. Staff

    HPE • San Jose, CA, United States
    Full-time
    This role has been designed as ‘Hybrid’ with an expectation that you will work on average 2 days per week from an HPE office. Hewlett Packard Enterprise is the global edge-to-cloud company advancing...Show more
    Last updated: 30+ days ago • Promoted
    AI Software Architect - Senior Staff

    AI Software Architect - Senior Staff

    D-Matrix • Santa Clara, CA, United States
    Full-time
    AI to power the transformation of technology.We are at the forefront of software and hardware innovation, pushing the boundaries of what is possible. We value humility and believe in direct communic...Show more
    Last updated: 2 days ago • Promoted
    Software AI Engineer

    Software AI Engineer

    Jade Global • San Jose, CA, United States
    Full-time
    Design, develop, and implement AI / ML models and algorithms using Python.Deploy and manage AI solutions on cloud platforms (e. Collaborate with data scientists and other engineers to integrate AI mod...Show more
    Last updated: 30+ days ago • Promoted
    Sr Staff Engineer Software (AI Ops)

    Sr Staff Engineer Software (AI Ops)

    Palo Alto Networks • Santa Clara, California, United States
    Full-time
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show more
    Last updated: 30+ days ago • Promoted
    Staff Software Engineer - Fleet Intelligence

    Staff Software Engineer - Fleet Intelligence

    Aurora • Mountain View, CA, US
    Full-time
    Who we are Aurora's mission is to deliver the benefits of self-driving technology safely, quickly, and broadly.The Aurora Driver will create a new era in mobility and logistics, one that will bring...Show more
    Last updated: 30+ days ago • Promoted
    Sr Staff, Software Engineer

    Sr Staff, Software Engineer

    Gap Inc. • Pleasanton, CA, United States
    Full-time
    The Build team makes it easier and faster to build high-quality applications at Gap by providing a comprehensive and opinionated set of development tools and CI / CD infrastructure.Our objective is t...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer Sr. Staff

    Software Engineer Sr. Staff

    Lockheed Martin Corporation • Sunnyvale, CA, United States
    Full-time
    Space is a critical domain, connecting our technologies, our security and our humanity.While others view space as a destination, we see it as a realm of possibilities, where we can do more — we can...Show more
    Last updated: 30+ days ago • Promoted