To improve prediction accuracy, the research team developed a new method called the Genetic Progression Score (GPS) to predict progression from preclinical to disease stage. GPS leverages the ideas behind transfer learning. This is a machine learning technique in which a model is trained on one task or dataset and then fine-tuned for another related task or dataset. Bibo Jiang, assistant professor of public health sciences at Penn State University, explains. School of Medicine and lead author of the study. This allows researchers to gather better information from smaller data samples.
For example, in medical image processing, artificial intelligence models can be trained to determine whether a tumor is cancerous or non-cancerous. Creating a training dataset requires medical professionals to label images one by one, which can be time-consuming and limit the number of images available. Instead, transfer learning uses more easily labeled images, such as cats and dogs, creating a much larger dataset, Liu said. It is also possible to outsource the work. The model learns to differentiate between animals and can then be refined to distinguish between malignant and benign tumors.
“There’s no need to train a model from scratch,” Liu says. “The way the model segments elements from an image to determine whether it’s a cat or dog is transferable. With some tweaking, the model can be refined to separate images of tumors from images of normal tissue. .”
GPS is trained on data from a large case-control genome-wide association study (GWAS). GWAS is a common approach in human genetics research to identify genetic differences between people with and without specific autoimmune diseases and to detect potential risk factors. It also incorporates data from electronic medical record-based biobanks, which contain a wealth of information about patients, including genetic mutations, laboratory tests, and clinical diagnoses. This data can help identify individuals in preclinical stages and characterize the progression from preclinical to disease stages. Data from both sources is then integrated to refine the GPS model by incorporating factors related to actual disease onset.
“By integrating the biobank with a large case-control study, we gained the strength and improved predictive accuracy that comes from the large sample size of the case-control study,” Liu said, adding that higher GPS scores He explained that people are at high risk of progressing from the preclinical stage to the disease stage.
The research team used real-world data from the Vanderbilt University Biobank to predict the progression of rheumatoid arthritis and lupus, followed by data from the National Institutes of Health’s All of Us Biobank, a health data initiative. was used to validate the GPS risk score. GPS predicted disease progression more accurately than 20 other models that relied solely on biobank or case-control samples, or models that combined biobank and case-control samples in other ways.
Accurate prediction of disease progression using GPS allows for early intervention, targeted monitoring, and individualized treatment decisions, leading to improved patient outcomes, Liu said. It also has the potential to improve clinical trial design and recruitment by identifying individuals most likely to benefit from new treatments. Although the study focused on autoimmune diseases, the researchers said a similar framework could be used to study other types of diseases.
“When we talk about underrepresented populations, it’s not just about race. It’s also about patient groups that are understudied in the medical literature because they make up only a small portion of typical datasets. AI and transfer learning can help study these populations and reduce health disparities,” Liu said. “This study reflects the strength of Penn State’s comprehensive research program in autoimmune diseases.”
Liu and Jiang, along with study co-author Laura Carrel, professor of biochemistry and molecular biology. Galen Falk, associate professor of dermatology. Nancy Olsen, H. Thomas, and the Dorothy Willits Hallowell Chair in Rheumatology formed the Autoimmune Working Group and have been working together for nearly a decade. They lead innovative clinical trials, conduct research studies to understand the biological mechanisms of autoimmune diseases, and develop AI methods to address a variety of problems related to autoimmune diseases. .
Chen Wang, a Ph.D. in bioinformatics and genomics from Penn State University, and Havell Markus, a joint degree student in the MD/PhD Medical Scientist Training Program, are co-first authors of the study. Other Penn State authors on this paper include: Avantika R. Diwadkar, graduate student. Chakrit Kunsriraksakul, who graduated from the MD/PhD Medical Scientist Training Program during the research period. and Xingyan Wang, who was a research assistant at Penn State College of Medicine at the time of the study.
Other contributors include Bingshan Li, professor of molecular physiology and biophysics at Vanderbilt University School of Medicine, and Xue Zhong, research assistant professor of medical genetics. Xiaowei Zhan, associate professor of public health at the University of Texas Southwestern Medical Center;
This research was supported by funding from the National Institutes of Health, including the Office of Data Sciences and Emerging Technologies, National Institute of Allergy and Infectious Diseases.