Talent.com
Research Intern - Audio-Visual VoiceAI (Open Source)

Research Intern - Audio-Visual VoiceAI (Open Source)

WhissleAIFremont, CA, US
Hace 14 horas
Tipo de contrato
  • A tiempo completo
Descripción del trabajo

Research Intern – Audio-Visual VoiceAI (Open Source)

We're looking for a Research Intern to join WhissleAI and help advance our open-source work at the intersection of speech, vision, and structured understanding — inspired by projects like

  • https : / / music.whissle.ai /
  • advanced speech recognition asr.whissle.ai
  • and recent multi-modal alignment research (example : https : / / aclanthology.org / 2025.emnlp-main.845.pdf)

You'll work on developing audio-visual foundation models that connect voice, context, and environment — enabling systems that can listen, see, and act coherently in real time. Most of this work is open-source and contributes directly to the broader research community.

Ideal candidate

  • Undergrad, Master's, or PhD student in CS, AI, or related field
  • Prior research experience (conference / workshop publications a plus)
  • Strong background in one or more of : multimodal learning, audio-visual representation learning, speech modeling, or self-supervised methods
  • Experience with PyTorch, Hugging Face, or similar frameworks
  • What you'll do

  • Prototype and evaluate audio-visual alignment models
  • Extend our open-source ASR and meta-speech pipelines
  • Collaborate on papers, demos, and real-time VoiceAI applications
  • Location : Remote

  • Type : Paid internship / research collaboration
  • Crear una alerta de empleo para esta búsqueda

    Research Intern • Fremont, CA, US