In Silico Drug Repurposing Using Machine Learning Models

Authors

  • Daniel Lindberg Research Scientist Author
  • Matteo Klein Senior Lecturer Author

DOI:

https://doi.org/10.62648/v21.i04.2025.pp19-27

Keywords:

Drug repurposing, Machine learning, Graph neural network, Transformer, Knowledge graph embedding

Abstract

Drug repurposing--the systematic identification of new therapeutic indications for approved or investigational
drugs--offers a time- and cost-efficient alternative to de novo drug discovery, leveraging existing safety profiles and
pharmacokinetic data to accelerate clinical translation. Machine learning has emerged as the dominant computational
paradigm for large-scale in silico drug repurposing, enabling the integration of heterogeneous biomedical data
sources--drug-target interaction networks, gene expression signatures, chemical structure fingerprints, disease-gene
associations, and electronic health record phenome-wide associations--into unified predictive frameworks that can
generate repurposing hypotheses across thousands of drug-disease pairs simultaneously. This study develops and
benchmarks five machine learning architectures for drug repurposing prediction: a gradient boosting machine (GBM), a
graph neural network (GNN) operating on drug-target-disease heterogeneous networks, a variational autoencoder (VAE)
learning latent drug-disease embeddings from transcriptomic signatures, a transformer-based multi-modal fusion model
integrating structure, target, and phenotype features, and a knowledge graph embedding model (TransE/RotatE).
Evaluated on the Gottlieb benchmark (593 drugs, 313 diseases, 1,933 known associations) using 10-fold
cross-validation, the transformer fusion model achieved the highest AUROC of 0.947 and AUPR of 0.891, outperforming
GNN (AUROC 0.923), VAE (AUROC 0.908), GBM (AUROC 0.874), and TransE (AUROC 0.856). Prospective validation
of the top 50 novel repurposing predictions against post-2020 clinical trial registrations confirmed 14 of 50 predictions
(28%) as having entered clinical evaluation--substantially higher than the 3-5% random baseline. Highlighted top
predictions include metformin for hepatocellular carcinoma (supported by GWAS and observational evidence), baricitinib
for amyotrophic lateral sclerosis (now in Phase II trial), and sildenafil for Alzheimer's disease (supported by
population-based cohort data and animal model evidence).

Downloads

Published

2025-09-01

How to Cite

In Silico Drug Repurposing Using Machine Learning Models. (2025). International Journal of Life Sciences Biotechnology and Pharma Sciences, 21(04), 19-27. https://doi.org/10.62648/v21.i04.2025.pp19-27

Similar Articles

1-10 of 50

You may also start an advanced similarity search for this article.