There are thousands of mutations that occur in the breast cancer-linked genes BRCA1 and BRCA2. Some of these cause breast or ovarian cancer, while others are harmless. When we design a genetic test for predisposition to breast cancer, we have to know which ones to test for. The same is true of almost any gene that plays a role in disease—you’ll find many mutations in that gene in the general population, only some of which cause health problems. So how do we know which mutations to worry about?
We start by using the genetic code. The genetic code, cracked by scientists in the 1960s, makes it surprisingly easy to “read” our DNA and understand how a particular mutation affects a gene. As genetic testing takes on a bigger role in predicting, diagnosing, and treating disease, we rely on this code to help us make sense of the data. Unfortunately, the genetic code applies to less than two percent of our DNA. In an effort to read the rest, researchers are trying to crack a new genetic code—and this next one is turning out to be much more difficult to solve than the first. In fact, scientists may have to give up the idea that we can use a “code” to “read” the rest of our DNA.
When scientists were working out the original genetic code in the 1950s and ’60s, all sorts of complicated schemes were proposed to explain how information is stored in our genes. The problem they were trying to solve was how a gene, made of DNA, codes the information to make a particular protein—an enzyme, a pump, a piece of cellular scaffolding, or some other critical component of the cell’s working machinery. They were looking for a code that would translate the four-letter DNA alphabet of genes into the 20-letter amino acid alphabet of proteins.
Thanks to its simplicity, the genetic code is a powerful tool in our hunt for mutations that cause disease. Unfortunately, it has also led to the genetic equivalent of a drunk looking for his lost keys under the lamppost.
The actual solution turned out to be incredibly simple: Genetic code basically works like a child’s decoder wheel. Three letters of the DNA alphabet spell out a single letter of the amino acid alphabet. For example, the DNA sequence ‘TTA’ in a gene corresponds to the amino acid leucine. The sequence ‘TTT’ codes for phenylalanine. So when a breast cancer patient has a mutation that replaces ‘A’ with a ‘T’ at a point in her BRCA2 gene, then researchers use the genetic code to see that ‘TTA’ is changed to ‘TTT,’ causing phenylalanine to replace leucine in the resulting protein. Merely knowing about the leucine-to-phenylalanine swap doesn’t tell you exactly how the mutation contributes to the disease. But it is often a significant clue, and it helps researchers sort the most potentially damaging mutations from ones that are likely harmless.
Thanks to its simplicity, the genetic code is a powerful tool in our hunt for mutations that cause disease. Unfortunately, it has also led to the genetic equivalent of a drunk looking for his lost keys under the lamppost. Researchers have put much of their effort into looking for disease mutations in those parts of our genomes that we can read with the genetic code—that is, parts that consist of canonical genes that code for proteins. But these genes make up less than two percent of our DNA; much more of our genetic function is outside of genes in the relatively uncharted “non-coding” portions. We have no idea how many disease-causing mutations are in that non-coding portion—for some types of mutations, it could be as high as 90 percent.
This non-coding portion of our DNA is the reason scientists are searching for a new genetic code. The problem is that—unlike the beautifully simple genetic code, which is easily comprehended by the human mind—this new genetic code may not be much of a code at all.
In the original code, we treat DNA like a text, something we read without worrying about the messy details of protein and DNA chemistry. But the metaphor of DNA as a text starts to unravel when we consider what lies outside of our standard genes. Many of these other parts of our DNA function differently, as regulators of genes that ensure that, say, brain genes are switched on in the brain and not in the liver. The chemical details really do matter here, because a segment of regulator DNA serves as an assembly point for dozens or hundreds of pieces of cellular machinery. A mutation here can act in many different ways that are difficult to predict unless you understand the pieces of cellular machinery involved. To make sense of these types of mutations, researchers have begun to come up with a complicated set of rules, each of which applies in some situations but not others. As one group of scientists recently wrote, it’s not clear “whether such a multi-rule system will ever be condensed into a single code.”
Why does this ultimately matter to anyone but the poor scientists trying to figure it all out? It matters because we need to look critically at the frequently made promises that genetics will revolutionize how we approach our health and how we make decisions in society. The original genetic code was cracked by little more than a dozen scientists in about a decade. To crack the next code, which may not even be much like a code, we have invested in enormous consortia involving dozens of institutions and hundreds of scientists, who have been working for more than a decade and could easily continue for another one. Nobody knows what the end result will look like; we only know that we need to decipher our “non-coding” DNA in order to fully address diseases ranging from cancer to mental illness.
We lucked out with the original genetic code—nature turned out to work in a way we could easily understand. We aren’t always so lucky. Reading the rest of our DNA will be much more difficult.