Bioencryption can store almost a million gigabytes of data inside bacteria

A new method of data storage that converts information into DNA sequences allows you to store the contents of an entire computer hard-drive on a gram’s worth of E. coli bacteria…and perhaps considerably more than that.

The idea of storing data inside bacteria has been around for about a decade. Even very simple bacteria have long strands of DNA with tons of bases available for data encryption, and bacteria are by their nature far more resilient to damage than more traditional electronic storage. Bacteria are nature’s hardiest survivors, capable of surviving just about any disaster that would finish off a regular hard drive. Besides, bacteria’s natural reproduction would create lots of redundant copies of the data, which would help preserve the integrity of the information and make retrieval easier.

Preparing traditional data for storage inside bacteria is simple enough. There are four DNA bases that can be used to make up the DNA strings: adenine, cytosine, guanine, and thymine. That basically means we’re working with a four number system, also known as quaternary numbers.

In a presentation on their breakthrough, the Hong Kong researchers showed how to change the word “iGEM” into DNA-ready code. They used the ASCII table to convert each of the individual letters into a numerical value (i=105, G=71, etc.), which can then be changed from base-10 to base-4 (105=1221, 71=0113, etc.). Finally, those numbers can be changed into their DNA base equivalents, with 0, 1, 2, and 3 replaced with A, T, C, and G. And so iGEM becomes ATCTATTGATTTATGT.

Once the raw data is ready, the researchers say a few algorithms can be used to weed out redundant and repetitive information. That doesn’t just save a ton of space – lots of repetition in the DNA sequence can actually be biologically harmful to the wellbeing of the DNA and bacteria, so this step rather neatly solves two problems at once.

DNA strands aren’t long enough to store complicated information like a photograph or a book, so the best available solution is to fragment the data into lots of little pieces and spread it among the different cells. To make that work, the researchers have to create a system that allows the fragments to identified and ultimately put back in the right order. So they created a three-part structure for all the DNA: header, message, and checksum.

The header is an 8-base-long sequence that is divided into four levels of identifying information – zone, region, area and district – which allows each fragment to be put back in the right order. After the message carries the actual usable data, the checksum provides a repetition of the original header, which is useful in controlling for minor mutations to the bacteria.

So, let’s say the information has been encrypted and placed in lots of different cells of bacteria. How then does someone retrieve the data on the other end? The decrypter would take the DNA and run it through what’s known as next-generation high-throughput sequencing, or NGS. This particular type of sequencing analyzes and compares multiple copies of the same sequence and then uses majority-voting to figure out which bases are correct if parts of the data have decayed. Then the compression algorithms could be reversed to restore the raw data to its original form.

The last step would be snapping the fragments back together in the correct order so that the DNA strands could be translated back into useful data. This is where we go from just data storage to data encryption. The person trying to read the data needs a formula that will reveal the right order of the headers and checksums – without that formula, the data remains meaningless.

That’s the theory – how about the application? Well, let’s hear straight from the researchers themselves:

This rci-system is feasible in DH5-alpha strain of E. coli, as supported by extracted plasmid DNA size. It is found that the size of the DNA extracted is consistent with that of DNA stored in the plasmid before extraction. There is no loss of DNA, implying that no large deletion has occurred during the experimental procedure.

In the first trial, we encoded a short message in a single vector, together with two inverted repeats. We designed primers which targets the encoded message either in normal orientation or reverse-complementary orientation. Both sets of primers could be used to generate PCR products, indicating that encoded message exists in both recombinated and normal forms. Sequencing results confirmed the correctness of the PCR product.

The possibilities of this biotechnology are truly amazing. A single gram of E. coli cells could hold up to 900,000 gigabytes (or 900 terabytes) of data, meaning these bacteria have almost 500 times the storage capacity of a top of the line commercial hard drive.

Indeed, my best hard drive is a 1.5 terabyte drive that weights just about exactly one kilogram. If I had that hard drive’s weight in storage bacteria, I’d have 900 petabytes of storage space that could sit unobtrusively in the corner of my desk. Of course, we don’t know yet the precise practical applications – it’s quite possible this will remain strictly used for complex encryption work.

Now, there does seem like one potential concern with using E. coli to store data: isn’t E. coli dangerous? It appears there’s not too much to worry about there – the researchers used non-virulent strains of the bacteria, and the bacteria can’t do much more than store the data and reproduce. The DNA sequences that represent the data are total gibberish when it comes to encoding potentially dangerous proteins.

For more, check out the researchers’ website and presentation on their new biotechnology.

Bioencryption can store almost a million gigabytes of data inside bacteria

Sign up for our newsletters

Latest news

China Just Dropped Another Bomb on America’s Frontier AI Companies

‘Symbol’ Is a Surreal Escape Room Trip Better Experienced Than Explained

Body Bags Found Outside OpenAI HQ as Execs Increasingly Fear for Their Lives

People Can Lose Their Zest for Life After Starting GLP-1s, Docs Warn

FCC Chairman Wants to Repeal a Key Rule That Would Fundamentally Change Broadcast News

A Rare Atlantic Niña Is Emerging Amid a Super El Niño. Here’s What That Means

The R-Rated Director’s Cut of ‘The X-Files’ Movie Has a Title and Release Date

Marvel Comics Finally Has a New Editor-in-Chief

Latest Reviews

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Narwal Freo Z10 Turbo Review: Midrange Vacuum, High-End Performance

X by Xreal a01+ Review: AR Glasses That Are Light on Your Face (and Wallet)

Razer Blade 16 (2026) Review: A Gaming Laptop You Can Actually Call ‘Portable’

Lenovo IdeaPad Slim 5x Gen 11 Review: Solid ARM at a Budget Price

Related Articles

Bioencryption can store almost a million gigabytes of data inside bacteria

Sign up for our newsletters

China Just Dropped Another Bomb on America’s Frontier AI Companies

‘Symbol’ Is a Surreal Escape Room Trip Better Experienced Than Explained

Body Bags Found Outside OpenAI HQ as Execs Increasingly Fear for Their Lives

People Can Lose Their Zest for Life After Starting GLP-1s, Docs Warn

FCC Chairman Wants to Repeal a Key Rule That Would Fundamentally Change Broadcast News

A Rare Atlantic Niña Is Emerging Amid a Super El Niño. Here’s What That Means

The R-Rated Director’s Cut of ‘The X-Files’ Movie Has a Title and Release Date

Marvel Comics Finally Has a New Editor-in-Chief

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Narwal Freo Z10 Turbo Review: Midrange Vacuum, High-End Performance

X by Xreal a01+ Review: AR Glasses That Are Light on Your Face (and Wallet)

Razer Blade 16 (2026) Review: A Gaming Laptop You Can Actually Call ‘Portable’

Lenovo IdeaPad Slim 5x Gen 11 Review: Solid ARM at a Budget Price

Related Articles

The Best Budget Laptops Under $1,000 for Back to School

The Best Tech to Level Up Summer 2026

The Misunderstood Parasites That Rule Our World

Don’t Be Afraid of Self-Improving AI, Says a16z-Backed Startup Mirendil

How Seriously Should We Take The Threat of Mirror Life?

The Poop Emoji Got Gravity Right, Physicists Find