Talent.com
AI Infrastructure Solution Architect, Principal

AI Infrastructure Solution Architect, Principal

JobgetherCA, US
6 days ago
Job type
  • Full-time
  • Remote
  • Quick Apply
Job description

This position is posted by Jobgether on behalf of a partner company. We are currently looking for an AI Infrastructure Solution Architect, Principal in California (USA).

As an AI Infrastructure Solution Architect, Principal, you will play a key leadership role in shaping next-generation AI infrastructure and deployment frameworks. You’ll design full-stack reference architectures that enable scalable, observable, and efficient AI inference environments — both in cloud and on-premises. Working closely with technical teams and ecosystem partners, you’ll integrate advanced automation, orchestration, and observability capabilities to ensure high-performance and reliability across large-scale AI workloads. This position is ideal for an expert who thrives on technical complexity, enjoys building end-to-end solutions, and is motivated to push the boundaries of AI infrastructure innovation.

Accountabilities :

  • Design and implement comprehensive AI infrastructure reference solutions covering compute, storage, networking, and orchestration layers.
  • Develop infrastructure-as-code templates and automation tools (Ansible, Terraform, Helm) for efficient provisioning and management of AI clusters.
  • Integrate AI workloads into Kubernetes-based environments to support model deployment, scaling, and fault tolerance.
  • Build telemetry, observability, and monitoring frameworks using Prometheus, Grafana, and OpenTelemetry to ensure real-time system insights.
  • Create and maintain dashboards, health-check systems, and performance metrics for operational visibility and optimization.
  • Collaborate with internal performance and software teams to validate infrastructure against real-world workloads and industry benchmarks.
  • Work directly with customers, OEMs, and ISVs to customize and deploy AI infrastructure solutions tailored to specific environments.
  • Document and publish deployment guides, best practices, and performance tuning recommendations for broad ecosystem adoption.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field.
  • 10+ years of experience in infrastructure architecture, systems engineering, DevOps, or AI platform management.
  • Proven expertise in designing and managing AI or high-performance computing infrastructures at scale.
  • Strong experience with scripting and automation tools such as Python, Bash, Ansible, Terraform, and Helm.
  • Deep understanding of Kubernetes, containerization, and orchestration technologies (Ray, KServe, Kubeflow).
  • Familiarity with model serving platforms like Triton Inference Server and Ray Serve.
  • Hands-on experience with observability tools (Prometheus, Grafana, OpenTelemetry).
  • Strong system debugging, troubleshooting, and incident response capabilities.
  • Excellent collaboration, communication, and stakeholder engagement skills.
  • Experience with GPU-based or custom AI accelerator environments is a strong plus.
  • Benefits

  • Competitive compensation ranging from USD 175,000 to 260,000, plus bonus and equity opportunities.
  • Comprehensive medical, dental, and vision coverage.
  • 401(k) retirement plan with employer contribution.
  • Hybrid work setup in Santa Clara, CA (3–5 days onsite per week).
  • Flexible work schedule and wellness-oriented benefits.
  • Educational and professional development programs.
  • Inclusive and collaborative work culture that values respect, humility, and innovation.
  • Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.

    When you apply, your profile goes through our AI-powered screening process designed to identify top talent efficiently and fairly.

    🔍 Our AI evaluates your CV and LinkedIn profile thoroughly, analyzing your skills, experience, and achievements.

    📊 It compares your profile to the job’s core requirements and success indicators to determine your match score.

    🎯 Based on this analysis, we automatically shortlist the top 3 candidates who best fit the role.

    🧠 When necessary, our human team performs an additional manual review to ensure no exceptional profile is missed.

    The process is transparent, skills-based, and bias-free — focusing entirely on your fit for the role. Once the shortlist is completed, it is shared directly with the company owning the job opening. The final decision and next steps (such as interviews or further assessments) are handled by their internal hiring team.

    Thank you for your interest!

    #LI-CL1

    Create a job alert for this search

    Solution Architect Ai • CA, US