Director, Data Engineering

Biohub is a 501(c)(3) biomedical research organization building the first large-scale scientific initiative combining frontier AI with frontier biology to solve disease. We build the technology to help scientists around the world use AI-powered biology to study how cells operate, organize, and work as part of systems to understand why disease happens and how to correct it.

The Team

Our AI research team sits at the heart of our mission to unlock new dimensions of biological understanding. You will leverage state-of-the-art AI to accelerate discovery and drive transformative insights in biology — developing novel AI models purpose-built for biological research, engineering robust systems that enable breakthrough science at unprecedented scale, and translating these advances into practical tools that empower researchers worldwide.

Our approach is comprehensive and integrated, bringing together world-class AI model development, exceptional engineering talent, high-quality biological data, powerful computing infrastructure, and strategic partnerships. Success requires excellence across five interconnected pillars:

  1. Training frontier AI models specifically for biology.
  2. Building engineering systems that maximize research velocity and efficiency.
  3. Executing a sophisticated data strategy that fuels AI development.
  4. Operating a world-class AI compute platform.
  5. Creating impactful products that transform AI capabilities into accessible scientific tools.

The Opportunity

This role will lead Data Engineering, building the infrastructure that makes biological foundation models possible. You will oversee the ingestion of data from public repositories and internal generation projects, transforming heterogeneous biological formats into AI-ready datasets.

This is a player-coach role. You will spend meaningful time on technical leadership—architecture decisions, code review, and unblocking hard problems—while also building and managing a high-performing team. We value small, high-functioning teams, use AI tools aggressively, and care deeply about code quality and operational reliability.

What You'll Do

  • Team Leadership: Lead a team of data engineers, setting technical direction and ensuring the delivery of reliable, scalable data infrastructure.
  • Architecture: Drive architecture decisions for petabyte-scale pipelines across cloud and on-prem environments for genomic and imaging data.
  • Operational Excellence: Build a culture of 99%+ pipeline reliability and strong observability.
  • Talent Development: Recruit, develop, and retain exceptional engineers who combine infrastructure experience with biological intuition.
  • Cross-functional Partnership: Collaborate with AI Research, Data Science, and Scientific Data Strategy to translate requirements into engineering priorities.

What You'll Bring

  • Leadership: 10+ years of experience leading data engineering/infrastructure teams, with 5+ years as a people manager.
  • Technical Scale: Proven track record of building AI training data pipelines at petabyte scale.
  • Foundations: Strong ability to go deep on architecture, review code, and solve complex technical bottlenecks.
  • Culture: Ability to attract high-tier talent and nurture a high-ownership environment.
  • Adaptability: Comfort with ambiguity and setting direction as requirements evolve.
  • Nice to have: Experience with biology, bioinformatics, or life sciences data.

Compensation & Logistics

  • Pay Range: $323,000 - $444,400 (Redwood City, CA or New York).
  • Hybrid Work: At least 60% of the working month onsite (approx. 3 days/week).

Benefits

  • Generous 401(k) employer match.
  • Paid volunteer time off.
  • Family-forming benefits funding.
  • Relocation support.

Biohub

Apply
Job Type:
Permanent
Location:
New York, NY (Hybrid); Redwood City, CA (Hybrid)
Hybrid
Date posted:
March 31, 2026
$323,000 - $444,400