There are small sequences in our genome that have immense power to control nearby genes.
These DNA sequences, known as cis-regulatory elements (CREs), can turn neighboring genes on and off.
Now, researchers at Yale University School of Medicine (YSM), the Jackson Laboratory, the Massachusetts Institute of Technology and the Broad Institute of Harvard University have discovered a new approach to controlling precisely how genes are turned on. We have developed a new generative AI method to design control elements without or expressed intracellularly. AI-designed synthetic DNA can only switch on genes in certain types of cells in the body.
Researchers describe an AI platform known as Computational Optimization of DNA Activity (CODA) in a paper published October 23 in the journal Nature.
Controlling how genes are expressed in specific cell types could one day greatly improve gene therapy. This potential treatment has the potential to rewrite disease-causing mutations, but it could also be used to rewrite disease-causing mutations in cells harboring the disease, such as certain types of neurons that malfunction in Parkinson’s disease. We need better ways to deliver treatments directly to diseased immune cells (such as immune cells). HIV.
CODA, a newly designed AI platform, could one day bring gene therapy to diseased cells in a more targeted way, disabling treatments in healthy parts of the body where they could cause harm. may be helpful. Some of the early experimental gene therapies failed to advance to clinical use due to these deleterious off-target effects. Ultimately, CODA’s designers hope to use the method to develop targeted gene therapies for brain, metabolic, and blood diseases.
beyond human ability
“This project essentially asks the question, ‘Can we learn to read and write the code for these regulatory elements?'” said YSM assistant professor of genetics and the study’s senior author. said Dr. Stephen Reilly. “If you think about it from a language perspective, the grammar and syntax of these elements is poorly understood. So we set out to build a machine learning method that could learn more complex code than it could run on its own.”
That complex code stands in contrast to our genetic language, which was written in a fairly simple code that was cracked decades ago. Each three-letter string in a gene sequence is translated into a different amino acid, which is a building block of a protein. There are only 64 three-letter combinations, so learning the language of genes is not difficult.
However, this is not the case with regulatory elements, which are part of the nearly 99% of the human genome, which is made up of non-gene DNA. These regulatory sequences do not seem to follow a simple code, at least not one easily discernible to humans. And the space of possible combinations of DNA sequences that make up these elements is vast. For an average-sized regulatory element, the number of possible combinations of different DNA sequences is larger than the number of atoms in the known universe, Riley said.
“Not all the computers in the world can search every possible combination of sequences, so we have to figure out clever ways to search for it,” he said.
Machine learning approaches have only recently become available
Only recently have such large-scale problems required computational approaches through deep learning, a type of artificial intelligence that researchers have used to generate new DNA sequences. Similar to the generative AI approach underlying well-known tools such as DALL-E and ChatGPT, CODA can create new CREs based on a training database.
Pardis Sabeti, MD, DPhil, co-senior author of the study, a core member of the Broad Institute, and a professor at Harvard University, said the new technology has extraordinary potential. “By applying machine learning and molecular biology to the logic of when and where CRE functions, we can use generative AI to leverage that knowledge and experimentally develop tools to regulate gene expression in new ways. And perhaps one day we can build on it therapeutically as well,” Sabeti said.
This research involves complex work, and more work will be done in the future. “Combining computational models with large-scale experimental approaches is a powerful strategy,” said Dr. Ryan Tuohy, associate professor at the Jackson Laboratory and co-senior author of the study. “But a model is only as good as the data it learns from it. By validating our findings, we can quickly identify where improvements can be made.”
Scientists trained the AI model CODA on data from naturally occurring control elements so that it could iterate over DNA sequences that already work, rather than classifying every possible sequence. They used data from the activity of more than 775,000 different regulatory elements in human blood, liver, and brain cells grown in the lab. Regulatory elements are like molecular control knobs on genes, determining whether and to what extent a gene is switched on or off. And these elements themselves are often only active in certain cell types, such as liver cells, meaning that the genes they affect are only turned on in that type of cell.
Identify specific target cells
When scientists tested AI-designed regulatory elements in these same three cell types, they found that in many cases, the synthetic elements were actually more specific for a particular cell type than any naturally occurring sequences. It turned out to be true. They then tested a subset of these synthetic elements in living zebrafish and mice and discovered that the sequences also functioned to switch on test genes in specific cell types in the living animals. In one case, an AI-designed regulatory element turned on a reporter gene only in a very specific cell layer of the mouse brain, even though it was delivered everywhere in the animal’s body.
“We were impressed with how effectively the CODA-designed sequences achieved cell type specificity,” said Dr. Rodrigo Castro, a computational scientist at the Jackson Laboratory and co-first author of the paper. Ta.
Next, the researchers plan to use different cell types to develop even more cell type-specific regulatory elements. The company also plans to combine AI-designed elements with other technologies needed for gene therapy, starting with specific diseases of the brain, metabolism, or blood. In theory, Riley said, this approach could be used for all kinds of genetic diseases.
Dr. Sagar Gosai, co-lead author of the study and a postdoctoral fellow in the Sabeti lab at the Broad Institute, said this method has the potential to surpass human evolution as a means of treating disease. “Natural CREs are abundant, but represent a small fraction of possible genetic elements, and natural selection has limited their function,” Gosai said.
Riley agreed.
“There are a lot of potential solutions to different possibilities that we would like regulators to do,” Reilly said. “Evolution may never have wanted to develop a really good driver for an Alzheimer’s drug, but that doesn’t mean it can’t exist.”
—-
This research was supported by Howard Hughes Medical Institute and National Institutes of Health grants UM1HG009435, R00HG010669, R01HG012872, and R35HG011329.