Machine Learning Engineer, LLM Fine‑Tuning
We are actively hiring for a Machine Learning Engineer focused on LLM fine‑tuning for Verilog / RTL applications.
Location : San Jose, CA (Onsite)
Skills : LLM fine‑tuning, Verilog / RTL, AWS, Bedrock, SageMaker
Responsibilities
- Own the technical roadmap for Verilog / RTL‑focused LLM capabilities—from model selection and adaptation to evaluation, deployment, and continuous improvement.
- Lead a hands‑on team of applied scientists / engineers : set direction, unblock technically, review designs / code, and raise the bar on experimentation velocity and reliability.
Fine‑tune and customize models using state‑of‑the‑art techniques (LoRA / QLoRA, PEFT, instruction tuning, preference optimization / RLAIF) with robust HDL‑specific evals :
Compile‑ / lint‑ / simulate‑based pass rates, pass@k for code generation, constrained decoding to enforce syntax, and “does‑it‑synthesize” checks.Design privacy‑first ML pipelines on AWS :
Training / customization and hosting using Amazon Bedrock and SageMaker (or EKS + KServe / Triton / DJL) for bespoke training needs.Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints), IAM least‑privilege, CloudTrail auditing, and Secrets Manager for credentials.Enforce encryption in transit / at rest, data minimization, no public egress for customer / RTL corpora.Stand up dependable model serving : Bedrock model invocation where it fits, and / or low‑latency self‑hosted inference (vLLM / TensorRT‑LLM), autoscaling, and canary / blue‑green rollouts.Build an evaluation culture : automatic regression suites that run HDL compilers / simulators, measure behavioral fidelity, and detect hallucinations / constraint violations; model cards and experiment tracking (MLflow / Weights & Biases).Partner deeply with hardware design, CAD / EDA, Security, and Legal to source / prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements.Drive productization : integrate LLMs with internal developer tools (IDEs / plug‑ins, code review bots, CI), retrieval (RAG) over internal HDL repos / specs, and safe tool‑use / function‑calling.Mentor & uplevel : coach ICs on LLM best practices, reproducible training, critical paper reading, and building secure‑by‑default systems.Qualifications
10+ years total engineering experience with 5+ years in ML / AI or large‑scale distributed systems; 3+ years working directly with transformers / LLMs.Proven track record shipping LLM‑powered features in production and leading ambiguous, cross‑functional initiatives at Staff level.Deep hands‑on skill with PyTorch, Hugging Face Transformers / PEFT / TRL, distributed training (DeepSpeed / FSDP), quantization‑aware fine‑tuning (LoRA / QLoRA), and constrained / grammar‑guided decoding.AWS expertise to design and defend secure enterprise deployments : Bedrock, SageMaker, S3, EC2 / EKS / ECR, VPC / Subnets / Security Groups, IAM, KMS, PrivateLink, CloudWatch / CloudTrail, Step Functions, Batch, Secrets Manager.Strong software engineering fundamentals : testing, CI / CD, observability, performance tuning; Python a must (bonus for Go / Java / C++).Demonstrated ability to set technical vision and influence across teams; excellent written and verbal communication for execs and engineers.Seniority Level
Mid‑Senior level
Employment Type
Full‑time
Job Function
Engineering and Information Technology
Industries
IT Services and IT Consulting
#J-18808-Ljbffr