AI System Solution ArchitectCango Inc. • San Francisco, CA, United States
AI System Solution Architect
Cango Inc. • San Francisco, CA, United States
14 hours ago
Job type
Full-time
Job description
Join Cango Inc as a Senior Solutions Architect focusing on LLM and diffusion model inference on large-scale GPU clusters.
Responsibilities
Design end-to-end technical architecture for LLM and Diffusion model inference on large-scale GPU clusters.
Develop innovative solutions in KV Cache management, distributed scheduling, pipelining / batching strategies, memory allocation, and P2P / IB communication.
Architect a multi-tenant serving framework that balances throughput, latency, and cost.
Define product positioning and differentiation based on industry trends and company strategy.
Develop technical evolution plans (e.g., token streaming like vLLM, syntax parsing like SGLang, Diffusion acceleration).
Align closely with internal GPU infrastructure and business teams to ensure timely product delivery.
Lead performance engineering efforts including NCCL tuning, NUMA binding, CUDA kernel optimization.
Drive cross-team collaboration (GPU kernel, compiler, distributed system, frontend APIs) to ensure system stability and scalability.
Organize benchmarking and performance testing against industry leaders (vLLM, SGLang, TensorRT, etc.).
Guide engineering team on implementation strategies, experimental methodologies, and optimization pathways.
Engage with open-source communities and contribute core components to enhance technical influence.
Communicate directly with North America-based clients to understand their needs for AI inference, training, and deployment.
Translate customer needs into internal implementation plans and coordinate across operations, engineering, and delivery teams.
Qualifications
5+ years of experience in computer infrastructure, GPU cloud, or large-scale cloud computing in the U.S., with a deep understanding of the North American tech ecosystem.
Master’s or Ph.D. in Computer Science, Electrical Engineering, or related fields preferred.
5+ years of hands‑on experience in deep learning systems or GPU optimization, including leading the design of at least one large‑scale AI inference or training system.
Proficiency with PyTorch, CUDA, NCCL, Triton, TensorRT, MPI / IB / RDMA, etc.
Deep understanding of projects like vLLM, SGLang, DeepSpeed, FasterTransformer.
Practical experience in LLM inference optimization (e.g., KV Cache, P2P vs CPU routing, batching strategies).
Ability to integrate system‑level optimization with product usability (API and Serving layers).
Strong architectural thinking and cross‑functional communication skills to translate complexity into clear product roadmaps.
Preferred
Open‑source contributions (e.g., to vLLM, DeepSpeed, Ray, Triton‑Server, SGLang, etc.).
Experience launching GPU cloud or AI infrastructure products (e.g., RunPod, Lambda, Modal, SageMaker).
Familiarity with emerging LLM inference trends such as speculative decoding, continuous batching, and streaming inference.
What We Offer
Hands‑on opportunity to manage and optimize GPU clusters at multi‑thousand‑card scale, operating at the forefront of global compute infrastructure.
Strategic partner role in both product architecture and business decisions alongside core leadership team.
Key role in building the next‑generation GPU‑based AI inference infrastructure.
High degree of autonomy in product and architectural decisions.
Competitive compensation package with equity incentives.
Global team and access to cross‑regional GPU cluster resources.
#J-18808-Ljbffr
Create a job alert for this search
Solution Architect • San Francisco, CA, United States
Related jobs
Sr. Solution Architect
Supermicro • San Jose, CA, United States
Full-time
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more
Last updated: 30+ days ago • Promoted
Kafka Solution Architect
VirtualVocations • Fremont, California, United States
Full-time
A company is looking for a Kafka Solution Architect - Remote.Key Responsibilities Architect and deliver software products and solutions using Kafka Design and implement cloud environments and op...Show more
Last updated: 1 day ago • Promoted
AI Solutions Engineer
VirtualVocations • Fremont, California, United States
Full-time
A company is looking for an AI Solutions Engineer to build and deploy intelligent, real-time solutions for clients using platforms like Cresta and Kore.
Key Responsibilities : Configure, develop, a...Show more
Last updated: 30+ days ago • Promoted
AI Solution Architect
VirtualVocations • Hayward, California, United States
Full-time
A company is looking for an AI Solution Architect to provide technical leadership and oversight for AI solutions in Microsoft Azure environments.
Key Responsibilities Serve as the technical and ar...Show more
Last updated: 13 hours ago • Promoted • New!
Systems Architect
Reliable Robotics • Mountain View, CA, United States
Permanent
We're building safety-enhancing technology for aviation that will save lives.Automated aviation systems will enable a future where air transportation is safer, more convenient and fundamentally tra...Show more
Last updated: 30+ days ago • Promoted
Senior Solutions Architect
VirtualVocations • Santa Clara, California, United States
Full-time
A company is looking for a Senior Solutions Architect, Data Processing.Key Responsibilities Research and develop techniques to GPU-accelerate high-performance databases, ETL, and data analytics a...Show more
Last updated: 30+ days ago • Promoted
Presales Solution Architect
VirtualVocations • Concord, California, United States
Full-time
A company is looking for a Presales Solution Architect (US).Key Responsibilities Support pre-sales efforts for data collection, annotation, and evaluation projects Collaborate with clients to id...Show more
Last updated: 30+ days ago • Promoted
Sales Solution Architect - Analog & Mixed-Signal Platforms
Synopsys • Sunnyvale, CA, United States
Full-time
At Synopsys, we drive the innovations that shape the way we live and connect.Our technology is central to the Era of Pervasive Intelligence, from self-driving cars to learning machines.We lead in c...Show more
Last updated: 6 days ago • Promoted
AI Architect
VirtualVocations • Concord, California, United States
Full-time
A company is looking for an AI Architect & Strategist to shape its enterprise AI strategy and drive responsible AI adoption.
Key Responsibilities Develop and evolve the company's AI strategy in al...Show more
Last updated: 30+ days ago • Promoted
Solution Acceleration Architect
Twilio • San Francisco, CA, United States
Full-time
At Twilio, we're shaping the future of communications, all from the comfort of our homes.We deliver innovative solutions to.
As we continue to revolutionize how the world interacts, we're acquiring ...Show more
Last updated: 12 days ago • Promoted
System Architect, Quantum Networking
PsiQuantum • Palo Alto, CA, United States
Full-time
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
Last updated: 30+ days ago • Promoted
Anaplan Solution Architect
Anaplan • San Francisco, CA, United States
Full-time
At Anaplan, we are a team of innovators focused on optimizing business decision-making through our leading AI-infused scenario planning and analysis platform so our customers can outpace their comp...Show more
Last updated: 18 days ago • Promoted
Anaplan Solution Architect
VirtualVocations • San Francisco, California, United States
Full-time
A company is looking for a Senior Anaplan Architect to lead the design, development, and deployment of scalable Anaplan models across complex enterprise environments.
Key Responsibilities Lead end...Show more
Last updated: 30+ days ago • Promoted
AI Systems Engineer
VirtualVocations • Fremont, California, United States
Full-time
A company is looking for an Engineering & AI Systems Engineer to design and implement internal tools that enhance operational efficiency.
Key Responsibilities Build and deploy internal tools to ad...Show more
Last updated: 30+ days ago • Promoted
Sr. Solution Architect - Enterprise
Supermicro • San Jose, CA, United States
Full-time
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more
Last updated: 30+ days ago • Promoted
Sr. Solution Engineer - HPC & AI Systems
Supermicro • San Jose, CA, United States
Full-time
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more
Last updated: 9 hours ago • Promoted • New!
Solution Architect - Presales
Informatica LLC • Redwood City, CA, United States
Full-time
Build Your Career at Informatica.We seek innovative thinkers who believe in the power of data to drive meaningful change.
At Informatica, we welcome adventurous minds eager to solve the world's most...Show more
Last updated: 8 days ago • Promoted
Workday Solution Architect
VirtualVocations • Concord, California, United States
Full-time
A company is looking for a Workday Enterprise Solution Architect who will play a pivotal role in their internal technology team.
Key Responsibilities Transform business requirements into scalable ...Show more
Last updated: 23 days ago • Promoted
Presales Solution Architect
Informatica LLC • Redwood City, CA, United States
Full-time
Build Your Career at Informatica.We seek innovative thinkers who believe in the power of data to drive meaningful change.
At Informatica, we welcome adventurous minds eager to solve the world's most...Show more
Last updated: 8 days ago • Promoted
System Architect, Simulations & Models
PsiQuantum • Palo Alto, CA, United States
Full-time
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more