Science recap of 2024
Early in 2024, we posed a central question at TrueYouOmics (TYO): Can we use multi-omics in blood to predict whether someone will develop cancer (or even die) within the next few years?.
In other words, we wanted to see if we could estimate someone’s risk for future diseases.
Below is the path we followed to answer this question.
Book a meeting with us
TYO founding team at the BioTechX conference.
Photo by Scott Graham on Unsplash.
1. Read prior work
Our initial step was to delve into the latest academic research addressing similar questions.
We found remarkable studies utilizing data from the United Kingdom Biobank (UKB), which demonstrated that risks for conditions like dementia, cancer, and cardiovascular events can be predicted years in advance using DNA and protein profiles from blood samples. However, these predictions achieved accuracies in the range of 80%, leaving room for improvement.
Moreover, we found no commercial product offering a comparable service, nor any clear evidence of one under development.
Drawing on our extensive experience in biotech and pharma, and inspired by opportunities to refine and advance these academic methodologies, we decided to take up the challenge and put our ideas into action
2. Request data access
To reproduce and potentially outperform existing research, we requested access to UKB data. This process took almost 16 weeks and required extensive preparation and training.
The power of the UKB dataset lies in the fact that participants were recruited 10–15 years ago, at which time they provided blood samples, and they have been followed ever since.
We know which of the 50 thousand participants developed new diseases or died over time, making it possible to correlate specific DNA mutations and protein patterns in those samples with later disease onset or death events, precisely the question we sought to answer at the beginning of the year.
Photo by Mika Baumeister on Unsplash.
Photo by krakenimages on Unsplash.
3. Assemble the team
In parallel, we outlined the skills required to address our main question.
With help from CSEM, we assembled a stellar team:
Carine Poussin (Head of AI for Life Sciences, CSEM)
Lucas Witter (Data Scientist, CSEM)
Jonas Meirer (Data Scientist, CSEM)
Andres Lanzos (Co-Founder and CSO, TYO)
Kevin Yar (Co-Founder and CEO, TYO)
We chose to work under the Scrum framework and use GitLab for version control and collaboration.
4. Code first version
While waiting for UKB approval, we began coding our initial models using artificial data to simulate future disease outcomes (e.g., the likelihood of developing cancer in five years).
Our pipeline, called TyoEngine, integrates standard machine learning (ML) models like XGBoost and state-of-the-art approaches identified in our literature review, such as Mogonet.
We also developed novel ML models that could outperform current best-in-class methods.
Although we cannot disclose much about these proprietary models, we can say they incorporate prior biomedical knowledge to produce more accurate and interpretable predictions.
Jonas Meirer, Data Scientist, CSEM.
Andres Lanzos (Co-Founder and CSO, TYO) at the DayOne accelerator.
5. Get first results
We finally gained access to UKB data in mid-November 2024.
As is often the case in data science, most of our time was devoted to data processing. We spent countless hours converting the data into a format compatible with TyoEngine.
Nonetheless, we obtained our first results before year’s end and we achieved 70–80% accuracy in predicting whether a UKB participant would develop various cancers or die within a 10-year window after their initial blood draw.
This performance is comparable to state-of-the-art models, a promising outcome for our initial test.
Science outlook for 2025