Mining Alzheimer’s–Related Gene Mentions from PubMed Using NLP and Enrichment Analysis: A Temporal and Network Perspective

Authors

  • My Abdelmajid Kassem Plant Genomics and Bioinformatics Lab, Department of Biological and Forensic Sciences, Fayetteville State University, Fayetteville, NC 28301, USA https://orcid.org/0000-0003-3478-0327
  • Khalid Lodhi Plant Genomics and Bioinformatics Lab, Department of Biological and Forensic Sciences, Fayetteville State University, Fayetteville, NC 28301, USA
  • Youssef Jouad IT Programs Data Center, Durham Technical Community College, Durham, NC 27703, USA
  • Jiazheng Yuan Plant Genomics and Bioinformatics Lab, Department of Biological and Forensic Sciences, Fayetteville State University, Fayetteville, NC 28301, USA

DOI:

https://doi.org/10.5147/jaimlb.267

Keywords:

Alzheimer’s disease, Gene mining, Named entity recognition, BioBERT, PubMed text mining, Co-mention networks, Temporal trends, Biomedical informatics, Neurodegenerative diseases, Literature surveillance

Abstract

Alzheimer’s disease (AD) remains a leading cause of morbidity and mortality worldwide, with genetics playing a critical role in disease onset and progression. However, systematically mapping the evolving landscape of gene-focused AD research remains challenging due to the rapid growth of biomedical literature. I applied large-scale named entity recognition using BioBERT NER on 9,742 PubMed abstracts from 2010 to 2023 related to AD genetics. Entities were extracted, quantified, and visualized using bar plots, temporal trend analyses, and co-mention network graphs. Enrichment analysis was performed using the Enrichr API on top-mentioned gene entities. “AD” and “Alzheimer” dominated mentions across the dataset, validating the retrieval strategy. Geographical trends aligned with global research output, while co-mention networks revealed thematic clustering between AD, Alzheimer’s disease, and key genes. Temporal trends demonstrated consistent focus on top genes over 14 years, underscoring stable scientific interest in genetic underpinnings of AD. Enrichment analysis confirmed associations with known neurodegenerative pathways. This study highlights the feasibility and value of scalable biomedical NER and network analysis to map and monitor the research landscape of AD genetics. The workflow provides a quantitative foundation for tracking emerging gene targets and research gaps, facilitating hypothesis generation and informed prioritization in neurodegenerative research.

Downloads

Download data is not yet available.

Downloads

Published

02/08/2026 — Updated on 02/12/2026

Versions

Issue

Section

Articles

How to Cite

Mining Alzheimer’s–Related Gene Mentions from PubMed Using NLP and Enrichment Analysis: A Temporal and Network Perspective. (2026). Journal of Artificial Intelligence, Machine Learning, and Bioinformatics, 47–54. https://doi.org/10.5147/jaimlb.267 (Original work published 2026)