Genome engineering technology transforms living cells into archival data storage devices that capture, store, and propagate information over time
By Benjamin Boettner
(BOSTON) — Researchers are developing ways to harness DNA, the blueprint of biological life, as a synthetic raw material to store large amounts of digital information outside of living cells, using expensive machinery. But, what if they could coerce living cells, like large populations of bacteria, into using their own genomes as a biological hard drive that can be used to record information and then be tapped for it anytime? Such an approach could not only open entirely new possibilities of data storage, but also be engineered further into an effective memory device that may be able to record the molecular experiences cells are having during their development, or exposure to stresses and pathogens in a chronological fashion.
In 2016, a team at the Wyss Institute for Biologically Inspired Engineering and Harvard Medical School (HMS) lead by Wyss Core Faculty member George Church, Ph.D., built the first molecular recorder based on the CRISPR system, which allows cells to acquire bits of chronologically provided, DNA-encoded information to generate a memory of them in the genome of bacteria as a cell model. The information, stored away as an array of sequences in the CRISPR locus, can be recalled and used to reconstruct a timeline of events. However, “as promising as this was, we did not know what would happen when we tried to track about a hundred sequences at once, or if it would work at all. This was critical since we are aiming to use this system to record complex biological events as our ultimate goal,” said Seth Shipman, Ph.D., a Postdoctoral Fellow working with Church.
Now, in a new study published in Nature, the same team shows in foundational proof-of-principle experiments that the CRISPR system, developed further as a first-of-its-kind approach, is able to encode information as complex as a digitized image of a human hand, reminiscent of some of the first paintings drawn on cave walls by early humans, and a sequence of one of the first motion pictures made ever, that of a galloping horse, in living cells.
The CRISPR system helps bacteria to develop immunity against the constant onslaught of viruses in their different environments. As a memory of survived infections, it captures viral DNA molecules and generates short so-called “spacer” sequences from them, that are added as new elements upstream of previous elements in a growing array located in the CRISPR locus of bacterial genomes. The by now famous CRISPR-Cas9 protein constantly resorts to this memory to destroy the same viruses when they return. Besides Cas9, which has become a widely used genome engineering tool, other parts of the CRISPR system, however, have so far not been exploited much technologically.
“In this study, we show that two proteins of the CRISPR system, Cas1 and Cas2, that we have engineered into a molecular recording tool, together with new understanding of the sequence requirements for optimal spacers, enables a significantly scaled-up potential for acquiring memories and depositing them in the genome — as information that can be provided by researchers from the outside, or that, in the future, could be formed from the cells’ natural experiences,“ said Church, who also is the Robert Winthrop Professor of Genetics at Harvard Medical School and a Professor of Health Sciences and Technology at Harvard and MIT. “Harnessed further, this approach could present a way to cue different types of living cells in their natural tissue environments into recording the formative changes they are undergoing into a synthetically created memory hotspot in their genomes.”
To approach complex information on much larger scales, the team resorted to still and moving images because they represent constrained and clearly defined data sets, while a movie, in addition, offers the opportunity to have bacteria acquire information frame-wise over time. “We designed strategies that essentially translate the digital information contained in each pixel of an image or frame as well as the frame number into a DNA code, that, with additional sequences, is incorporated into spacers. Each frame thus becomes a collection of spacers,” said Seth Shipman, the study’s first author. “We then provided spacer collections for consecutive frames chronologically to a population of bacteria which, using Cas1/Cas2 activity, added them to the CRISPR arrays in their genomes. And after retrieving all arrays again from the bacterial population by DNA sequencing, we finally were able to reconstruct all frames of the galloping horse movie and the order they appeared in.”
While realizing this new concept of molecular recording, Shipman together with second-author and Postdoctoral Fellow Jeff Nivala, Ph.D., during their analysis, defined a valuable set of requirements that make spacer sequences likely to be more easily acquired, and identified sequence features that prevent their acquisition into growing CRISPR arrays — the do’s and don’ts of spacer design.
In future work, the team will focus on establishing molecular recording devices in other cell types and on further engineering the system so that it can memorize biological information. “One day, we may be able to follow all the developmental decisions that a differentiating neuron is taking from an early stem cell to a highly-specialized type of cell in the brain, leading to a better understanding of how basic biological and developmental processes are choreographed,” said Shipman, who, in addition to Church, is also mentored by neurobiologist and co-author Jeffrey Macklis, Ph.D., the Max and Anne Wien Professor of Life Sciences and Professor of Stem Cell and Regenerative Biology at Harvard University. Once adapted to specific paradigms, the approach could also lead to better methods for generating cells for regenerative therapy, disease modeling and drug testing.
“This groundbreaking technology advances the field of DNA-based information storage by leveraging the biological machinery of living cells to record, archive and propagate that information, in addition to potentially providing a new way to study dynamic biological and developmental processes inside the living body. It is yet another example of bioinspired engineering at its best,” said Wyss Institute Founding Director Donald Ingber, M.D., Ph.D., who also is the Judah Folkman Professor of Vascular Biology at HMS and the Vascular Biology Program at Boston Children’s Hospital, as well as Professor of Bioengineering at SEAS.
The study was supported by grants from the National Institute of Mental Health, the National Human Genome Research Institute, the Simons Foundation Autism Research Initiative, the National Institute of Neurological Disorders and Stroke, Paul G. Allen Frontiers Group, and the Wyss Institute. In addition, Shipman is a Shurl and Kay Curci Foundation Fellow of the Life Sciences Research Foundation.