Senior MLOps Engineer - Deep Genomics
Deep Genomics is at the forefront of using artificial intelligence to transform drug discovery. Our proprietary AI platform decodes the complexity of RNA biology to identify novel drug targets, mechanisms, and therapeutics inaccessible through traditional methods. With expertise spanning machine learning, bioinformatics, data science, engineering, and drug development, our multidisciplinary team in Toronto and Cambridge, MA is revolutionizing how new medicines are created.
The Opportunity
Join us in building the future of AI-driven drug discovery. In this role, you will own and evolve the infrastructure that powers our ML pipelines—from cloud environments and CI/CD systems to workflow orchestration and model deployment. You will work closely with ML scientists, bioinformaticians, and software engineers to keep our platform reliable, reproducible, and scalable.
Key Responsibilities
- Infrastructure: Maintain and improve cloud infrastructure (GCP) using Infrastructure-as-Code tools (Terraform).
- Governance: Manage IAM, RBAC, and permission policies across cloud environments.
- CI/CD: Own and evolve pipelines (CircleCI, GitHub Actions) and ensure best practices are followed.
- Orchestration: Administer and support workflow orchestration platforms (e.g., Seqera/Nextflow, Argo, Kubeflow).
- ML Ops: Operate and configure ML experiment tracking and registry tooling (e.g., W&B, MLflow).
- Containerization: Build and maintain containerized environments (Docker) and manage Kubernetes clusters.
- Hardware: Manage GPU resources—provisioning, scheduling, and debugging hardware and driver issues.
- Tooling: Write and maintain Python tooling, scripts, and integrations that support ML infrastructure.
- Deployment: Deploy ML models to production environments and monitor their performance.
Qualifications
Basic Qualifications
- Experience: 4+ years of experience operating production infrastructure.
- Cloud & IaC: Proficiency with cloud platforms (GCP preferred; AWS/Azure acceptable) and Infrastructure-as-Code (Terraform).
- Containers: Extensive hands-on experience with Kubernetes and Docker.
- CI/CD: Solid background in systems like CircleCI or GitHub Actions.
- GPU Management: Experience provisioning, debugging, and managing drivers for GPU compute.
- Programming: Strong Python programming skills; familiarity with package and environment management (e.g., pip, conda, pixi).
- Soft Skills: Self-motivated problem solver with excellent communication skills.
Preferred Qualifications
- ML Familiarity: Understanding of ML frameworks (e.g., PyTorch), workflows (training/inference), and the model lifecycle.
- MLOps Tooling: Familiarity with tools like W&B, Ray, or VertexAI, and distributed compute patterns.
- Kubernetes: Knowledge of K8s CRDs and batch/gang schedulers (e.g., Volcano, Kueue).
- Data: Experience with large-scale datasets, storage, and versioning.
- Domain Knowledge: Interest or experience in biology and/or machine learning science.
- Environment: Experience working directly with scientists in an interdisciplinary setting, and familiarity with data compliance (HIPAA, SOC 2).
- Startup: Previous startup experience.
What We Offer
- Impact: A collaborative and innovative environment at the frontier of computational biology, machine learning, and drug discovery.
- Compensation: Highly competitive compensation, including meaningful stock ownership.
- Benefits: Comprehensive health, vision, and dental coverage for employees and families; employee and family assistance programs.
- Flexibility: Flexible hours, extended long weekends, holiday shutdown, and unlimited personal days.
- Family Support: Maternity and parental leave top-up coverage, as well as new parent paid time off.
- Growth: Dedicated learning and development budget, plus regular "lunch and learns."
- Locations: Modern facilities in the heart of Toronto (an epicenter of ML/AI research) and Kendall Square, Cambridge, MA (a global center of biotech).