Senior Data Engineer
Altis Labs
About Altis Labs
Altis Labs is the computational imaging company accelerating clinical trials with AI. We are on a mission to help get the most effective novel treatments to patients sooner.
Top 20 biopharma sponsors like AstraZeneca, Johnson & Johnson, and Bayer Pharmaceuticals use our AI models trained on the industry's largest cancer imaging database to measure treatment effect with greater confidence. Our fully-automated AI models predict efficacy from clinical trial imaging data so that sponsors can optimize trial design and accelerate development of their most promising drugs.
Founded in 2019, Altis is a venture-backed AI company headquartered in Toronto. We are actively growing our team in Canada and the US across functional areas.
About the Position
We are looking for a Senior Data Engineer to design, build, and operate robust data systems that power machine-learning and analytics workflows in a medical and medical-imaging context. This role sits at the intersection of data engineering, ML infrastructure, and cloud platforms, and will work across multiple cloud environments and compute backends.
You will be responsible for building scalable, reliable pipelines for medical imaging data (e.g., DICOM) and associated clinical metadata, supporting downstream ML research, model training, and production deployment. You should be comfortable operating in complex environments, collaborating closely with ML scientists, MLOps engineers, and software engineers.
This is a senior, hands-on role with significant architectural responsibility.
Responsibilities & Expectations:
Data Architecture & Pipelines
- Design, build, and maintain scalable data pipelines for large-scale medical imaging and clinical datasets
- Ingest, validate, normalize, and version DICOM and related medical imaging formats
- Build pipelines that support batch workloads
- Ensure data quality, lineage, reproducibility, and auditability across systems
Medical & Imaging Context
- Work with medical imaging data (CT, PET-CT, MR, etc.) and associated metadata
- Understand healthcare data constraints, including PHI handling, privacy, and compliance considerations
- Collaborate with ML teams on feature extraction pipelines (e.g., volumes, segmentations, derived imaging features)
Cloud & Infrastructure
- Operate across multi-cloud environments (e.g., AWS, GCP, Slurm), including hybrid or on-prem components
- Deploy and manage data workloads on Kubernetes (writing Terraform and Helm charts or manifests where appropriate)
- Work with containerized pipelines and distributed compute (e.g., GPU-enabled workloads)
- Optimize cost, performance, and reliability across cloud platforms
Collaboration & Ownership
- Partner closely with ML scientists, MLOps, and software engineers to support end-to-end workflows
- Own systems in production: monitoring, alerting, debugging, and incident response
- Contribute to technical direction, architecture decisions, and best practices
- Mentor junior engineers and raise the overall data engineering bar
Qualifications:
- 6+ years of experience in data engineering or related roles
- Strong experience building production-grade data pipelines at scale
- Excellent Python skills (and comfort with performance-critical data workloads)
- Deep understanding of data modeling, distributed systems, and ETL/ELT patterns
- Hands-on experience with medical imaging data, especially DICOM
- Familiarity with medical or healthcare data contexts (clinical metadata, imaging workflows, regulatory constraints)
- Experience working in regulated or privacy-sensitive environments is strongly preferred
- Experience operating data systems in at least one major cloud providers (AWS, GCP, Azure)
- Working knowledge of Kubernetes (deploying workloads, debugging pods, managing resources)
- Experience with containerized pipelines (Docker)
Nice to have:
- Experience supporting ML or AI workloads, especially in imaging or healthcare
- Familiarity with MLOps concepts (data versioning, experiment tracking, reproducibility)
- Experience with GPU-backed workloads and high-performance compute
- Knowledge of federated learning or distributed data architectures
- Prior experience in startups or fast-moving, ambiguous environments
Benefits:
- Competitive pay and generous equity participation
- Coverage for medical, vision, and dental insurance
- 4 weeks of vacation per year
