Helmholtz Munich · Institute of Computational Biology

Hi, I'm Raphaël.
I work on
AI for biology.

More precisely, I'm an aspiring computational-biology researcher interested in generative modeling, and, among other things, one stubborn problem: how to help our models not fall apart under biological contexts they haven't seen. Right now, I'm doing my Master's thesis at Helmholtz Munich in the Fabian Theis Lab.

What I'm working on GitHub

Research interests

I'm interested in problems working towards virtual experimental biology.

The dream is models good enough that a lot of experimental work could start in silico: faster wet-lab campaigns, AI in the loop when deciding which experiments to run, and maybe a bit more scientific discovery along the way. The virtual cell is one avenue toward that.

What bugs me most is how badly our models do out of distribution. Put them in a context or condition they weren't trained on and they often fail, in ways I'd argue are worse than how we humans cope. Faced with something new, we lean on what we already know, reason about how it's similar to or different from the situation in front of us, and we're often roughly right. We're far from perfect, sometimes we grab the wrong prior and do worse than chance, but the gap between that and how a model behaves in an unfamiliar biological context feels like something worth trying to close.

I also suspect AI-assisted experimental design is where progress is likeliest in the next few years, before the harder generalization questions are solved, given the amount of effort being put into it. So I'm trying to keep one foot in each.

The virtual cell & out-of-distributionmodels that hold up in unfamiliar biological contexts

AI-assisted experimental designhelping navigate the design space

Past work

A few things I've worked on, different angles trying to get structure out of biological or clinical data.

AI in the Life Sciences · Elsevier2025 · Master 1 project

Rubrice R., Guéneau V., Briandet R., Cornuéjols A., Guigue V. (2025). A machine-learning framework for the prediction and analysis of bacterial antagonism in biofilms using morphological descriptors. AI in the Life Sciences, Elsevier.

Predicting bacterial antagonism from biofilm morphology

Can you tell whether a beneficial Bacillus strain will out-compete a pathogen from the morphology of single-species biofilms alone, without exhaustive co-culture screening? Working with the applied-maths team at AgroParisTech and Paris-Saclay (Antoine Cornuéjols, Vincent Guigue) and the biofilm team at INRAE's Micalis Institute (Romain Briandet, Virgile Guéneau), we built a first framework for it. My part was a homology-controlled cross-validation scheme so the model couldn't cheat on closely related strains; together we used SHAP and permutation importance to see which morphological features actually drive exclusion.

Paper Code Dataset

Dassault Systèmes20256-month internship

Aziliz Cottin, Marine Zulian · Clinical Decision Team

Explainable clustering for clinical data
Helping clinicians read what a clustering result actually means.

A six-month research internship on making clustering results interpretable: when patients fall into subgroups, finding ways to explain what actually separates them, with a focus on small cohorts where standard methods struggle. This one is proprietary and under NDA, so can't say more sorry :(

Owkin & Servier2025★ AI Methodology Award

Jhonathan Felix-Ramos, Adrien Bouchet, Timothée Sanchez, Jean Radig

Extending THREADS for glioblastoma
A multimodal idea that taught us how hard the problem is.

We proposed extending THREADS, a multimodal foundation-model approach, to bring together the many modalities in the MOSAIC glioblastoma dataset. In the short time we then had after the event to actually finish the project, our results were honestly quite poor, and that was the instructive part: integrating that many modalities across different biological scales, in a very low-N setting with noisy signals, is genuinely hard. The heads converge at different rates, the modalities carry different noise levels and structures, and naive integration just doesn't hold up.

Link

AI4AS Conference · ETH Zurich2025 · IODAAOral presentation

Tran G., Genin R., Lauront A., Petitet M., Rubrice R., Heuzé V., Guigue V., Cornuéjols A.

Predicting forage nutritional value with AI
Estimating energy and protein from chemical composition and textual data.

Can AI accurately predict the energy and protein content of forages from chemical descriptors and natural-language metadata, without expensive in vivo trials? Working with AgroParisTech (Antoine Cornuéjols, Vincent Guigue) and animal nutrition specialists (Véronique Heuzé, Gilles Tran), we built models combining chemical composition data with textual descriptions of forage samples. The project was selected for an oral presentation at the 1st EAAP Conference on Artificial Intelligence in Animal Science (AI4AS), ETH Zurich, June 2025.

Conference

IRCAN, Nice20243-month internship · Bachelor project

Sophie Lanciano, Gaël Cristofari · Cristofari Lab, IRCAN

TELLAM: reading the repeats genomes usually discard
An RNA-seq pipeline for retrotransposon expression in senescent cells.

A bioinformatics pipeline to quantify retrotransposon expression in senescent cells: the repetitive-element signal that standard RNA-seq workflows often discard, but that may matter for how cells age.

Distinctions

PR[AI]RIE-PSAI Institute Scholar, Paris School of AI scholarship

2025

AI Methodology Award, Owkin & Servier glioblastoma hackathon

2025

Regional Excellence Scholarship, Guadeloupe Region

2019

French (native) · English, C2 proficient

Trajectory

From wet to dry lab.

Each step pulled me toward more ML without ever leaving the biology behind. Last year focused on the maths side, and I think that mix is worth it.

2020-2022

Prépa BCPST · Lycée Masséna, Nice

Two years of France's competitive science track (mathematics, physics, chemistry, biology) before the grandes écoles entrance exams.

2022-2026

Engineering degree, life sciences · AgroParisTech

The national institute for life sciences and industries. I specialized in health engineering, and met biostatistics and machine learning for the first time.

2024-2025

MSc Computational Biology · Université Paris-Saclay

Highest Honors. Where bioinformatics merged with modeling: learning theory, graph theory, deep learning, gene-expression analysis.

2025-2026

Master MVA · ENS Paris-Saclay & Université Paris-Cité

Applied Maths and Computer Science. One of France's most demanding programs. Health track.

2026

Master's thesis · Helmholtz Munich

In the Theis lab at the Institute of Computational Biology, supervised by Alessandro Palma, Guy Wolf, Lorenzo Consoli and Fabian Theis. To name few aspects, the project combines Reinforcement Learning, Bioagents, Generative modeling and Perturb-seq data.

2027 →

Incoming PhD, applied mathematics · Université de Montréal & ?

Officially admitted at Université de Montréal (Guy Wolf, Mila), still being worked out.

Now

More science ahead.

Finishing my thesis at Helmholtz to conclude the MVA, then between Munich and Montréal for the PhD. Always glad to talk single-cell, generative models, or AI for biology.

Email raphael [dot] rubrice [at] ens-paris-saclay [dot] fr

GitHub @raphaelrubrice Last Publication AI in the Life Sciences, 2025 LinkedIn Raphaël Rubrice Google Scholar Raphaël Rubrice

Hi, I'm Raphaël. I work on AI for biology.