Job Title : Staff Machine Learning Engineer, LLM FineTuning (Verilog / RTL Applications)
Level : Staff
Location : San Jose, CA (USA)
Cloud : AWS (primary; Bedrock + SageMaker)
Why this role exists :
Were building privacypreserving LLM capabilities that help hardware design teams reason over Verilog / SystemVerilog and RTL artifactscode generation, refactoring, lint explanation, constraint translation, and spectoRTL assistance. Were looking for a Stafflevel engineer to technically lead a small, highleverage team that finetunes and productizes LLMs for these workflows in a strict enterprise dataprivacy environment.
You dont need to be a Verilog / RTL expert to start; curiosity, drive, and deep LLM craftsmanship matter most. Any HDL / EDA fluency is a strong plus.
What youll do (Responsibilities) :
- Own the technical roadmap for Verilog / RTLfocused LLM capabilitiesfrom model selection and adaptation to evaluation, deployment, and continuous improvement.
- Lead a handson team of applied scientists / engineers : set direction, unblock technically, review designs / code, and raise the bar on experimentation velocity and reliability.
- Finetune and customize models using stateoftheart techniques (LoRA / QLoRA, PEFT, instruction tuning, preference optimization / RLAIF) with robust HDLspecific evals :
- Compile / lint / simulatebased pass rates, pass@k for code generation, constrained decoding to enforce syntax, and doesitsynthesize checks.
- Design privacyfirst ML pipelines on AWS :
- Training / customization and hosting using Amazon Bedrock (including Anthropic models) where appropriate; SageMaker (or EKS + KServe / Triton / DJL) for bespoke training needs.
- Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints), IAM leastprivilege, CloudTrail auditing, and Secrets Manager for credentials.
- Enforce encryption in transit / at rest, data minimization, no public egress for customer / RTL corpora.
- Stand up dependable model serving : Bedrock model invocation where it fits, and / or lowlatency selfhosted inference (vLLM / TensorRTLLM), autoscaling, and canary / bluegreen rollouts.
- Build an evaluation culture : automatic regression suites that run HDL compilers / simulators, measure behavioral fidelity, and detect hallucinations / constraint violations; model cards and experiment tracking (MLflow / Weights & Biases).
- Partner deeply with hardware design, CAD / EDA, Security, and Legal to source / prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements.
- Drive productization : integrate LLMs with internal developer tools (IDEs / plugins, code review bots, CI), retrieval (RAG) over internal HDL repos / specs, and safe tooluse / functioncalling.
- Mentor & uplevel : coach ICs on LLM best practices, reproducible training, critical paper reading, and building securebydefault systems.
What youll bring (Minimum qualifications) :
10+ years total engineering experience with 5+ years in ML / AI or largescale distributed systems; 3+ years working directly with transformers / LLMs.Proven track record shipping LLMpowered features in production and leading ambiguous, crossfunctional initiatives at Staff level.Deep handson skill with PyTorch, Hugging Face Transformers / PEFT / TRL, distributed training (DeepSpeed / FSDP), quantizationaware finetuning (LoRA / QLoRA), and constrained / grammarguided decoding.AWS expertise to design and defend secure enterprise deployments, including :Amazon Bedrock (model selection, Anthropic model usage, model customization, Guardrails, Knowledge Bases, Bedrock runtime APIs, VPC endpoints)SageMaker (Training, Inference, Pipelines), S3, EC2 / EKS / ECR, VPC / Subnets / Security Groups, IAM, KMS, PrivateLink, CloudWatch / CloudTrail, Step Functions, Batch, Secrets Manager.Strong software engineering fundamentals : testing, CI / CD, observability, performance tuning; Python a must (bonus for Go / Java / C++).Demonstrated ability to set technical vision and influence across teams; excellent written and verbal communication for execs and engineers.Nice to have (Preferred qualifications)
Familiarity with Verilog / SystemVerilog / RTL workflows : lint, synthesis, timing closure, simulation, formal, test benches, and EDA tools (Synopsys / Cadence / Mentor).Experience integrating static analysis / ASTaware tokenization for code models or grammarconstrained decoding.RAG at scale over code / specs (vector stores, chunking strategies), tooluse / functioncalling for code transformation.Inference optimization : TensorRTLLM, KVcache optimization, speculative decoding; throughput / latency tradeoffs at batch and token levels.Model governance / safety in the enterprise : model cards, redteaming, secure eval data handling; exposure to SOC2 / ISO 27001 / NIST frameworks.Data anonymization, DLP scanning, and code deidentification to protect IP.