Overview
Advisor, Federated Learning Data Scientist
The Advisor Federated Learning Data Scientist plays an essential leadership role, responsible for identifying, assessing, and implementing cutting-edge algorithmic solutions that leverage diverse datasets while ensuring data privacy and security for our partners. This position requires comprehensive knowledge in small molecule drug development, ADME / Tox, antibody engineering, and / or genetic medicine, combined with expertise in data science and statistical analysis to develop sophisticated models utilising federated learning. This role focuses on building large-scale, pre-trained models in a decentralized, privacy-preserving manner, and will be instrumental in advancing Lillys pipeline by designing critical algorithms and workflows that expedite the creation of transformative therapies. The ideal candidate will pioneer the development of semi-supervised foundation models that can learn from vast, distributed datasets without centralizing sensitive information.
Key Responsibilities
- Foundation Model Architecture : Design and develop novel deep learning architectures (e.g., Transformer, Graph Neural Network-based) for large-scale, federated pre-training on unlabeled or partially labeled data distributed across multiple sources.
- Semi-Supervised & Self-Supervised Learning : Implement and advance state-of-the-art semi-supervised and self-supervised learning algorithms (e.g., contrastive learning, masked auto-encoding) tailored for federated learning, such as communication bottlenecks and data heterogeneity.
- Federated Optimization & Aggregation : Develop robust and communication-efficient federated aggregation strategies (e.g., FedAvg, FedProx, SCAFFOLD) that are stable for large, complex models and can handle non-IID data.
- Downstream Task Adaptation : Create protocols for fine-tuning and adapting the pre-trained federated foundation models for downstream tasks, ensuring knowledge transfer while maintaining privacy.
- Data Curation & Simulation : Collaborate with data engineering teams to establish pipelines for accessing and simulating distributed datasets. Develop high-fidelity simulation environments to test, debug, and benchmark federated pre-training strategies before real-world deployment.
- Scalability and Performance : Profile, analyze, and optimize computational performance (memory, latency, communication cost) of federated training and inference to ensure scalability to many clients and large datasets.
- Scientific Dissemination : Author high-impact research papers for publication in top-tier ML conferences and relevant journals. Prepare and deliver presentations to internal and external audiences.
- Code & Model Governance : Write clean, reproducible code. Contribute to internal libraries and ML platforms. Implement version control for data, code, and models to ensure robust research.
- Cross-Functional Collaboration : Work with software engineers, MLOps, privacy experts, and domain scientists to translate research concepts into practical solutions.
- Literature Review & Innovation : Maintain understanding of latest federated learning and related fields to drive innovation and contribute to the teams research strategy.
Basic Qualifications
PhD in a data science field such as Biostatistics, Statistics, Machine Learning, Computational Biology, Computational Chemistry, Physics, Applied Mathematics, or related field from an accredited college or universityMinimum of 2 years of experience in the biopharmaceutical industry or related fields, with demonstrated expertise in drug discovery and early development.Additional Preferences
Experience in developing statistical and machine learning models for complex endpoints.Broad understanding of emerging scientific and technical breakthroughs.Exceptional interpersonal and communication skills, with a keen ability to understand and navigate complex relationships.Strong problem-solving, analytical, and project management skills.Highly self-motivated and organized.Ability to influence across disciplines and levels.Learning Agility : adapt to changing circumstances and apply learnings to new situations.Portfolio Mindset : align program decisions with overall goals.Independent, self-starter, able to work without supervision.Site-based role in Indianapolis (preferred) or San Diego, San Francisco, or Boston; relocation is provided.Lilly is dedicated to helping individuals with disabilities participate in the workforce. If you require accommodation to submit a resume, please complete the accommodation request form. Lilly is an EEO Employer and does not discriminate on the basis of age, race, color, religion, gender identity, sex, gender expression, sexual orientation, genetic information, ancestry, national origin, protected veteran status, disability, or any other legally protected status. Our ERGs support networks across many affinity groups. Actual compensation will depend on education, experience, skills, and location. The anticipated wage ranges from $142,500 to $228,800 for this role. Full-time employees are eligible for a company bonus and a comprehensive benefits program.
#WeAreLilly
Seniority level
Mid-Senior levelEmployment type
Full-timeJob function
Human ResourcesIndustries : Internet NewsReferrals increase your chances of interviewing at BioSpace. This description is provided for context and may include information from multiple sources.
#J-18808-Ljbffr