A new genome editing approach inactivates vast numbers of transposable elements in genomes with potential to better understand some of their biology, and for engineering cells and organs
By Benjamin Boettner
(BOSTON) — The genome of our cells is scattered with thousands and thousands of repetitive short stretches of DNA known as “endogenous transposable elements” (TEs) or “selfish genetic elements” that together make up to a mindboggling 45% of our DNA. Belonging to different families that carry mysterious names like Alu, LINE-1 and HERV, some originate from “retroviruses” or so-called mobile “transposable elements” that have either settled into silent co-existence with the genome, or remain active, copying themselves out of one location and inserting into new ones with varying frequencies. What’s more, each cell likely differs from another in its TE repertoire.
Having first referred to TEs as “junk DNA” that litters our genomic DNA, researchers have come to realize that these elements have actively participated in the evolution of our genomes. A concept is even gaining momentum that they have taken over vital functions in that they help control the expression of genes like, for example, such responding to hormones, involved in maternal-fetus communication and in immune tolerance in the developing uterus during pregnancy. However, in the other extreme, they also have been linked to the development of diseases like multiple sclerosis, and stand in the way of using organs from pigs as transplants for human patients – porcine TEs may jump into human cells potentially triggering the generation of new infectious viruses – and of engineering cells for medical and biotechnological purposes.
The ever-expanding tool box of genome engineers is offering hope for overcoming some of these challenges, and especially the CRISPR system, which started a revolution of genome editing in laboratories has been used to inactivate TEs in genomes. In 2015, Wyss Institute Core Faculty member George Church, Ph.D., and his group has used the classical CRISPR-Cas9 system to inactivate 62 PERVs, the counterparts of human endogenous retroviruses (HERVs) in pigs, in one fell swoop – setting a record in TE editing. Only two years later, in 2017, the team had generated the first live pigs using the engineered cells and a technique known as “somatic cell nuclear transfer”. Already the early success along this road prompted Church, also with help from the Wyss Institute’s translation engine, to co-found the startup company eGenesis, which further pursues CRISPR-mediated organ engineering, also is Professor of Genetics at Harvard Medical School (HMS), and Professor of Health Sciences and Technology at Harvard and MIT.
“Multiplexed genome editing is at the heart of many of my groups’ projects, and developing new editing tools and computational pipelines enabling us to do this is a real continuum that feeds many of our ultimate goals in cell, tissue and genome engineering, as well as the de-extinction of historical genes,” said Church, who also is a co-founder and a driving force in Genome Project Write (GPW). As part of it, his group works to create cells and organs that are resistant to viruses, and whole genome methods that could enable the controlled investigation of genetic variants.
The accomplishments reported in the earlier pig studies show that, in theory, the CRISPR system has the potential for larger-scale genome operations. However, in praxis, CRISPR-Cas9, especially if used to edit many loci simultaneously, can also be toxic to cells. This is because the Cas9 enzyme, which researchers guide to its target sequences in the genome with the help of a small “guide RNA” (gRNA), creates double-strand breaks (DSBs) in the DNA double helix. These then are used to inactivate sequences or to introduce precise edits into the DNA code, using the cell’s molecular DNA repair machinery. However, with an increasing number of intended simultaneous edits with their associated DSBs, the DNA repair machinery is overwhelmed and cannot repair all DNA lesions, which eventually causes the cells to die.
“The dogma in our lab for the longest time was that nobody could make more than 100 edits in a cell. We set out to break that barrier and by doing so create a way to systematically inactivate and study the roles of entire TE families,” said Postdoctoral Fellow Cory Smith, co-first author of the new study from the Church group. The findings of their study are reported in a recent article in the journal Nucleic Acids Research.
To overcome Cas9’s harmful effects, genome researchers have used “nicking base editors” (nBEs). By fusing Cas9 variants, which still have the ability to be guided to target sequences but lost the ability to cut one strand of the DNA double-helix, to enzymes known as nucleotide deaminases, they could switch out a complementary C:G for a T:A (via cytidine deamination), or an A:T for a G:C (via adenine deamination) base pair in the complimentary strands of the double helix and thereby change the genetic code. Importantly, BEs where much less damaging to DNA therefore enabling a higher degree of multiplexed editing.
“To set the stage for inactivating large numbers of TEs in genomes with the precision and safety we were aiming for, we further engineered the base editors by removing their ability to cut both of the DNA strands since we identified that nicking was responsible for much of the genotoxicity observed” said Oscar Castanon, Ph.D., a Postdoctoral Fellow working with Church and other co-first author. These advanced synthetic genome editing enzymes are called “dead base editors” (dBE) to distinguish them form merely “nicking base editors”.
Smith, Castanon, and other members of Church’s group focused their attention on two different types of cells: a common human cell line called 293T, which developers of genome engineering agents use as their work horse and human induced pluripotent stem cells (hiPSCs), which can be used to create many cell and tissue types in the culture dish. 293T cells can be very easily handled in the laboratory and they have a low DNA damage response and, in contrast, hiPSCs are very sensitive to DNA damage which often results in cell death. “These two cell types are on opposite ends of a spectrum, into which other cells of interest researchers are eager to study effects of repetitive DNA in would likely fall,” explained Smith.
Before getting to work with their dead Cas9 BEs in the two cell lines, the team had to remove an additional roadblock. They had to be able to determine how many copies of HERV, LINE-1 and Alu elements the two types of cells contained in the first place, because only then could they assess the efficiency of their TE inactivation approach. Also, it is estimated that common whole genome sequencing methods miss 1 to 5% of genomic DNA, and that this part of the genome likely contains large chunks that are filled with TEs. “This initially was a challenge, as high-throughput DNA sequencing produces short reads of the genome that are assembled into larger sequences using considerable computer power,” said Castanon. The nature of repetitive DNA often makes many of these short reads indistinguishable from one other. However, the researchers used both, experimental and bioinformatic approaches that enabled them to more accurately predict the actual copy numbers of TEs.
“Almost as we expected, the copy numbers of all three classes of TEs were much higher than previously estimated, higher in 293T cells than hiPSCs, in part also because 293T cells were known to contain an almost complete additional set of chromosomes, which makes them near-triploid, and spanning a huge range from 32 for HERV TEs in hiPSCs to about 161,000 for Alu TEs in 293T cells,” said Castanon.
After introducing their optimized dead-Cas9 BEs for editing specific Cytidines and Adenines, and a single gRNA that targeted one of the three types of TEs in the two cell types, they cultured the cells for up to four weeks while the reagents went about their business. Finally, they assessed their success in the cells that survived the procedure unscathed by again sequencing their genomic DNA and analyzing it with their bioinformatic methods. The effects were stunning: “First, cells whose genomes were exhaustively edited remained viable, while cells that were provided with a classical “live” Cas9 enzyme that was able to create double strand breaks and the same gRNA did much more poorly and ultimately died,” said Smith. “Then, we demonstrated that up to 13,000 TEs and 12,200 TEs were precisely edited in 293T cells and hiPSCs, respectively. These numbers were three orders of magnitude greater than those reached in previous attempts.”
Smith and Castanon think that to unequivocally show whether and how a specific type of TE matters for a specific biological or disease process, it might be important to inactivate nearly all of its copies in the genome, likely by combining several different gRNAs targeting different parts of it. But besides the sheer number of edits they were able to perform, their work already provided several additional interesting insights. Especially their adenine-directed dead Cas9 BE demonstrated high editing efficacy and extremely low off-target effects at DNA outside of any TE sequences. It also became clear that their approach, was likewise targeting silent as well as actively jumping TEs, which they were uncertain about when they commenced the project.
“We have unlocked new genome editing technology in this work that brings us a big step closer to tackling central questions regarding the function and relevance of TEs in various biological processes and diseases,” said Church. “These base editing tools are also becoming highly relevant for our organ-engineering efforts, and GPW in which we develop technologies for synthesis and editing at the genomic scale even in the most challenging regions of the genome.”
Additional authors on the study from the Wyss Institute and/or HMS include Khaled Said, Verena Volf, Parastoo Khoshakhlagh, Ph.D., Amanda Hornick, Raphael Ferreira, PhD, Wyss Associate Faculty member Chun-Ting Wu, Ph.D., Marc Güell, Shilpa Garg, Alex Ng, and Hannu Myllykallio in the order they are listed on the article. The work was funded by the National Human Genome Research Institute of the National Institutes of Health under grants RM1HG008525 and R01GM987654, and the Boehringer Ingelheim Fonds.