Location is San Francisco, Mountain View or Seattle
Job / Team Description :
The R&D Operations Organization is seeking a Senior Technical Program Manager (TPM) with a background in distributed AI, resource management, forecasting, capacity and strategic planning to join our team.
This role involves supporting platform operations, handling customer escalations, and monitoring cluster health.
Additionally, you'll ensure optimal compute resource allocation aligned with product, sales, and research priorities and drive decisions with data, analysis, and reporting in GenAI.
This job is fast-paced, cross-functional, and requires strong communication, prioritization, and organization skills. It involves a deep understanding of stakeholder management, and the ability to navigate an environment with passionate people intent on delivering valuable products.
The impact you will have :
Act as a single point of contact for escalations from sales and global support teams and help with various billing and support issues
Drive innovation and deliver high-quality products by ensuring that AI teams have the necessary GPU and resources.
Improve product margins by leading strategic initiatives to optimize GPU utilization and procurement.
Establish and maintain effective communication with technical and non-technical stakeholders and customers, including regular project updates, status reports, and presentations.
Deliver step-level improvements with compute management, efficiency and scalability by identifying and implementing process improvements.
Ensure strategic alignment across Sales, Global Support, and Engineering
What we look for
5+ years of professional experience with a Computer Science Bachelor’s degree (or related degree) and related experience in technical program management, distributed platforms, resource management, execution and strategic planning.
Proven track record of driving cross-functional teams to deliver complex technical projects on time and with high quality.
Excellent communication, negotiation and analytical skills, with the ability to document standard operating procedures and processes
Advanced working SQL Knowledge, Ability to build and maintain analytics to track, forecast, and visualize consumption through ad-hoc SQL, reports, and dashboards
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
Self-motivated and able to work independently, as well as in a team environment.
Preferred good working knowledge of GPU technology and its applications in generative AI and machine learning.
Familiarity with big data technologies such as Apache Spark, Delta Lake, and MLflow is a plus.
Experience with compute capacity management, as well as financial analysis or sales / deal desk quoting, is a plus.
Preferred Qualifications :
Master's or advanced technical degree
Experience with Machine Learning / AI Products and Platforms
Experience with large scale software and cloud native infrastructure
Previous hardware or software development experience is a plus