A circular diagram that connects all 64 possible combinations of the letters A, C, G, and U, which are colored red, yellow, blue, and green respectively.  Abbreviations for different codons are listed on the outer edge of the circle.

Your genetic code has many ‘words’ for the same thing – information theory can help explain the lack of function

Almost all organisms, from bacteria to humans, use the same genetic code. This code acts as a dictionary, translating genes into amino acids used to build proteins. The universality of the genetic code reflects the lineage between all organisms and the important role this code plays in the structure, function and regulation of biological cells.

Understanding how the genetic code works is the foundation of genetic engineering and synthetic biology. But there are still many unsolved mysteries, such as why the code is important for various biological processes such as protein folding.

As a student working at the interface of biology and physics, I use information theory – the mathematics of how information is stored and transmitted – to study some of these interesting questions. Just as computers need strings of binary code to function, biological systems also rely on bits of information.

In my recent research, I suggest that optimization theory may provide a possible explanation for a long-standing mystery about certain idiosyncrasies in how amino acids are synthesized.

Different words mean the same thing

The genetic code is made up of “words” made up of four letters: A, C, G and U. Each of these letters represents a different building block called a nucleotide: adenine, cytosine, guanine and uracil. A molecular machine called a ribosome reads the codebook to translate genes into proteins.

A circular diagram that connects all 64 possible combinations of the letters A, C, G, and U, which are colored red, yellow, blue, and green respectively.  Abbreviations for different codons are listed on the outer edge of the circle.
The codon sequence is read from the center of the genetic code wheel.
Mouagip via Wikimedia Commons

Ribosomes read three-letter words called codons, and there are 64 different four-letter combinations that make up different codons. In this list of 64 words, 61 include amino acids, and three indicate the ribosome to stop the synthesis of proteins in the cell. For example, “AUG” codes for the amino acid methionine and also indicates the beginning of a protein.

But as in any other language, there are synonyms – different codons can encode the same amino acid. In fact, since there are only 20 amino acids but 61 different words to combine them, there are many similarities. An amino acid can have anywhere from one to six different coding codons. There are only two amino acids with one codon, methionine and tryptophan. This inefficiency also helps ribosomes perform their functions correctly even when there is a typo in the genetic code.

Environmental engineering guidelines

Why some amino acids have more synonyms than others is a mystery that has puzzled scientists for decades. Is there a pattern to this difference, or is it random? To answer this question, scientists study the laws that govern decision making in nature.

If a human engineer designed the genetic code, they would want to make sure that each amino acid has the same amount of dysfunction to prevent errors and promote uniformity. The 61 codes map to the 20 amino acids that are equally spaced, and each amino acid is assigned three codons.

But nature has important things. Dynamic models of natural systems such as bacteria show that nature is always striving for improvement. Not only the final form of protein needs to be optimal, but so do its intermediate forms. Adaptation ensures that natural systems can adapt to different environments.

Scientists understand some of the principles that nature follows when creating the genetic code. For example, the spatial arrangement of atoms and atoms inside and around the genetic code can affect its function, as well as the coordination of other cellular structures involved in the creation of proteins.

Information theory and genetics

My research shows that there are two other important considerations for natural systems: the information-theoretic nature of the genetic code and the principle of maximum entropy.

In line with the way a computer processes data consisting of 0’s and 1s, life processes a genetic code based on data consisting of the four letters A, C, G and U. Mathematically, however, the most powerful way to represent data is not binary (or base 2) – using 0s and 1s, as computers do – but rather base e. In short for Euler’s number, u is an irrational number – which means there is no way to write its value directly using fractions or decimals (even though it is about 2.718).

Mandelbrot set, a mathematical fractal, shown in black on a blue background.  The edges of the fractal are blue and white
A Mandelbrot set is a mathematically generated fractal.
PantheraLeo1359531 via Wikimedia Commons, CC BY

The relationship of Nature to expansion using this irrational number is responsible for the repetition of fragments seen in the twisted shore, fern leaves, snowflakes and trees. Beyond biology, information processing is also used in mathematics and cosmology.

Another principle that applies to nature is that of maximum entropy. Entropy is a measure of disorder in a system, and the principle of maximum entropy states that systems evolve into states of greater disorder. This principle allows researchers to make inferences from limited data and is used to explain how amino acids interact in proteins.

In the context of codon groups, the maximum entropy principle means that nature is crunching the data as much as possible – this means that the function describing the distribution of codon groups must be mathematically difficult to deduce. Learning how to increase the computational complexity of this task reveals potential patterns underlying codon groups.

I believe that these two principles can help explain the distribution of codon groups in the genetic code and demonstrate the usefulness of mathematics in analyzing natural processes. Although there are many biological mysteries that scientists have yet to solve, information theory can be a powerful tool to help unravel the genetic code.

#genetic #code #words #information #theory #explain #lack #function

Leave a Reply

Your email address will not be published. Required fields are marked *