Arian Amani

Arian Amani

Machine Learning Scientist

AI VIVO

Wellcome Sanger Institute

Biography

I build AI systems that model how cells respond to drugs and perturbations — bridging deep generative models, single-cell biology, and production ML to accelerate therapeutic discovery.

At AI VIVO I develop models for molecular generation (SMILES, ADMET), structure-based design (AutoDock Vina, Boltz-2), virtual cells and perturbation prediction and generation, and multimodal pipelines at scale. As a Data Scientist at the Wellcome Sanger Institute (Lotfollahi Lab), I co–first authored CellDISECT (bioRxiv 2025; under review at Nature Methods), co-authored SP-FM (arXiv 2026), fine-tune single-cell foundation models for perturbation prediction, and help maintain CPA for the community.

Interests
  • Deep Generative Models
  • Causal Representation Learning
  • Flow Matching
  • Single-Cell Perturbation
  • Virtual Cells
  • Drug Discovery
Education
  • Applied Computer Science & Artificial Intelligence, 2026

    Sapienza University of Rome

  • BSc in Computer Science, 2023

    Amirkabir University of Technology

Current Work

  • CellDISECT (bioRxiv, 2025; under review at Nature Methods) / Code: Causal VAE for covariate disentanglement, counterfactual perturbation prediction, and cell-type discovery from large-scale scRNA-seq.
  • SP-FM (arXiv:2601.11827, 2026): Shortest-path flow matching with mixture-conditioned bases for OOD generalization to unseen biological conditions.
  • Why Perturbation Prediction Needs Much Better Metrics (Medium, Mar 2026): Essay on evaluation metrics, mean-predictor baselines, and benchmarking in computational biology.
  • AI VIVO: Machine learning for de novo molecular generation, genetic and chemical perturbation prediction (virtual cells), and target discovery.

Work Experience

 
 
 
 
 
AI VIVO
Machine Learning Scientist
AI VIVO
December 2024 – Present 1 yr 4 mos Cambridge, United Kingdom (remote)
  • Develop deep learning and generative models for drug discovery using transformer and flow matching architectures
  • Deploy and scale ML pipelines on GCP using PyTorch Lightning and Docker
  • Design multi-modal ML pipelines integrating molecular structure and biological assay data
  • Maintain scalable pipelines using PyTorch, Lightning, RDKit, and HuggingFace
  • Experience with computational chemistry tools like BioSolveIt, AutoDock Vina, and Boltz-2
 
 
 
 
 
Wellcome Sanger Institute
Data Scientist
November 2022 – Present 3 yrs 5 mos Hinxton, United Kingdom (remote)
  • Lotfollahi Lab: co–first author of CellDISECT — causal VAE for disentangled single-cell representations and in silico perturbation prediction (2M+ cells, 100+ counterfactual conditions); manuscript under review at Nature Methods
  • Fine-tuning of ~300M-parameter single-cell transformer foundation models (e.g. scFoundation, Geneformer, scGPT) with LoRA and prompt tuning on 1500+ perturbation conditions
  • Lead maintainer for CPA (Compositional Perturbation Autoencoder), including high-volume code review and releases for the single-cell community
  • Collaboration across a large PhD/postdoc team on preprints, open-source tools, and PyTorch-based single-cell genomics methods
 
 
 
 
 
Virasad
Computer Vision Engineer
January 2022 – May 2022 5 mos Tehran, Iran

Responsibilities include:

  • Delivered >95% accuracy solutions for tasks with limited data (15 images per class)
  • Spearheaded development on 5 diverse projects meeting client requirements
  • Led individual projects, enhancing development pipelines

Projects

CellDISECT
CellDISECT (Cell DISentangled Experts for Covariate counTerfactuals) is a causal generative model for disentangled single-cell representations and counterfactual perturbation prediction. Preprint on bioRxiv; under review at Nature Methods.
CellDISECT
CPA (Compositional Perturbation Autoencoder)
CPA is a deep generative framework to learn effects of perturbations at the single-cell level. It performs OOD predictions of unseen combinations of drugs, learns interpretable embeddings, estimates dose-response curves, and provides uncertainty estimates.
CPA (Compositional Perturbation Autoencoder)

Recent Publications

Quickly discover relevant content by filtering publications.
(2026). Shortest-Path Flow Matching with Mixture-Conditioned Bases for OOD Generalization to Unseen Conditions. arXiv.

PDF Cite DOI arXiv

(2025). Integrating multi-covariate disentanglement with counterfactual analysis on synthetic data enables cell type discovery and counterfactual predictions. bioRxiv.

PDF Cite Code Project DOI bioRxiv

Teaching Experience

 
 
 
 
 
Teaching Assistant
Sharif University of Technology
September 2022 – August 2023 1 yr Tehran, Iran
  • Machine Learning for Bioinformatics (Graduate Course) | Spring 2023
    • Prepared teaching material on CNNs & AutoEncoders, designed assignments, and coordinated class contests.
  • Introduction to Machine Learning | Fall 2022
    • Designed and graded assignments for a class of 150 students, conducted a workshop on Variational AutoEncoders.
 
 
 
 
 
Teaching Assistant
Amirkabir University of Technology
September 2021 – March 2022 7 mos Tehran, Iran
  • Introduction to Image Processing and Neural Networks | Fall 2022
    • Conducted workshops and lectures on OpenCV and Deep Learning for a class of 80 students.
  • Advanced Programming with C++ | Spring 2022
    • Designed assignments and projects for a class of 90 students, evaluated student submissions.

Accomplish­ments

Coursera
Deep Learning Specialization
  • Neural Networks and Deep Learning
  • Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
  • Structuring Machine Learning Projects
  • Convolutional Neural Networks
  • Sequence Models
See certificate