Science recap of 2024

Early in 2024, we posed a central question at TrueYouOmics (TYO): Can we use multi-omics in blood to predict whether someone will develop cancer (or even die) within the next few years?.

In other words, we wanted to see if we could estimate someone’s risk for future diseases.

Below is the path we followed to answer this question.

Book a meeting with us

TYO founding team at the BioTechX conference.

Photo by Scott Graham on Unsplash.

1. Read prior work

Our initial step was to delve into the latest academic research addressing similar questions.

We found remarkable studies utilizing data from the United Kingdom Biobank (UKB), which demonstrated that risks for conditions like dementia, cancer, and cardiovascular events can be predicted years in advance using DNA and protein profiles from blood samples. However, these predictions achieved accuracies in the range of 80%, leaving room for improvement.

Moreover, we found no commercial product offering a comparable service, nor any clear evidence of one under development.

Drawing on our extensive experience in biotech and pharma, and inspired by opportunities to refine and advance these academic methodologies, we decided to take up the challenge and put our ideas into action

2. Request data access

To reproduce and potentially outperform existing research, we requested access to UKB data. This process took almost 16 weeks and required extensive preparation and training.

The power of the UKB dataset lies in the fact that participants were recruited 10–15 years ago, at which time they provided blood samples, and they have been followed ever since.

We know which of the 50 thousand participants developed new diseases or died over time, making it possible to correlate specific DNA mutations and protein patterns in those samples with later disease onset or death events, precisely the question we sought to answer at the beginning of the year.

Photo by Mika Baumeister on Unsplash.

Photo by krakenimages on Unsplash.

3. Assemble the team

In parallel, we outlined the skills required to address our main question.

With help from CSEM, we assembled a stellar team:
Carine Poussin (Head of AI for Life Sciences, CSEM)
Lucas Witter (Data Scientist, CSEM)
Jonas Meirer (Data Scientist, CSEM)
Andres Lanzos (Co-Founder and CSO, TYO)
Kevin Yar (Co-Founder and CEO, TYO)

We chose to work under the Scrum framework and use GitLab for version control and collaboration.

4. Code first version

While waiting for UKB approval, we began coding our initial models using artificial data to simulate future disease outcomes (e.g., the likelihood of developing cancer in five years).

Our pipeline, called TyoEngine, integrates standard machine learning (ML) models like XGBoost and state-of-the-art approaches identified in our literature review, such as Mogonet.

We also developed novel ML models that could outperform current best-in-class methods.

Although we cannot disclose much about these proprietary models, we can say they incorporate prior biomedical knowledge to produce more accurate and interpretable predictions.

Jonas Meirer, Data Scientist, CSEM.

Andres Lanzos (Co-Founder and CSO, TYO) at the DayOne accelerator.

5. Get first results

We finally gained access to UKB data in mid-November 2024.

As is often the case in data science, most of our time was devoted to data processing. We spent countless hours converting the data into a format compatible with TyoEngine.

Nonetheless, we obtained our first results before year’s end and we achieved 70–80% accuracy in predicting whether a UKB participant would develop various cancers or die within a 10-year window after their initial blood draw. You can read more about our lung cancer case study here.

This performance is comparable to state-of-the-art models, a promising outcome for our initial test.

Science outlook for 2025

Moving forward, we aim to continue refining TyoEngine to surpass state-of-the-art benchmarks and exceed 90% accuracy for predicting outcomes such as cancer, death, and other conditions (e.g., cardiovascular, metabolic, and neurological diseases).

A key area of focus will be reducing the number of false positives (i.e., maintaining a low False Discovery Rate, or FDR). As Andres (our co-founder) has emphasized, accuracy alone is not the best metric for unbalanced datasets where most tested individuals do not have the disease.

We want to avoid scenarios like:
- Mammograms for breast cancer with FDRs of up to 75% (Refs 1, 2)
- PAP tests for cervical cancer with FDRs of up to 80% (Refs 3, 4)
- CT scans for lung cancer with FDRs of up to 96% (Refs 5, 6). This means that for every 100 people testing positive, 96 do not have cancer.

Our goal is to deliver highly accurate yet practical tools that minimize this burden of false positives, ensuring consumers and healthcare professionals can trust our predictions.

Kevin Yar (Co-Founder and CEO, TYO) at the START Summit final.