Three simulated galaxies grouped by a neural network on top, followed by three real galaxies in the corresponding buckets on the bottom
Image: Top row: Greg Snyder, Space Telescope Science Institute, and Marc Huertas-Company, Paris Observatory. Bottom Row: Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS).

Will all the warped images and funny names, it can be easy to forget that machine learning can have important uses in science—specifically, when it comes to categorizing things. Scientists have lately been putting a neural network to good use identifying distant galaxies.

An international team of researchers point out that there are a buttload of space pictures out there, from both the nearby and distant universe. But more surveys are around the corner that will have tons more data—more than humans can effectively sift through. It can be tough to synthesize this data, and connect the dots between young and old galaxies. That’s where neural networks come in.

“Once we’ve trained a computer on many thousands of images from our simulations, the computers can see things that we just can’t,” Joel Primack, distinguished professor of physics emeritus from the University of California, Santa Cruz, told Gizmodo. “That’s very helpful.”

The researchers started with a powerful simulation to create 35 model galaxies, then used further software to create around 10,000 images, both clear and fuzzed up. They trained a neural network on the images in order to identify their similarities. The researchers then fed the trained network real data—images of distant galaxies from the CANDELS survey. It successfully lumped the galaxies into three categories based on their shape. These categories correspond to three phases in galactic evolution, which they call the pre-blue nugget phase, the blue nugget phase, and the post-blue nugget phase.

Basically, after feeding the neural network images of simulated galaxies, the researchers were able to get useful information about real galaxies. That’s pretty cool.


A neural network would obviously be super helpful for large-scale surveys. The Wide Field Infrared Survey Telescope, which would launch in the 2020s, could capture millions of galaxies at Hubble’s resolution in single images. The Large Synoptic Survey Telescope will image a huge chunk of the Southern sky every night from Earth, and could record 15 terabytes of data each day. A neural network could quickly identify the things that stand out that might be of most interest to astronomers, or point out things a human eye might miss.

Others are excited. “There’s a hope among some researchers that if artificial intelligence can sort through astronomical data, classify it, and tell us about interesting things that it finds, then the human capacity for learning about the universe is expanded beyond what we imagine we’re capable of alone,” said Michael Oman-Reagan, PhD candidate at Memorial University in Newfoundland and Labrador, Canada, who researches exploration beyond the solar system and the potential for extraterrestrial life.

Perhaps machine learning could help humans in the search for extraterrestrial intelligence, he thought.


This is exciting stuff, but Primack warned me to be cautious—this is just a proof-of-concept. These training sets could take a great many hours worth of processing time to generate, so they’re not realistic ways to categorize the data just yet. On top of that, the simulations might still be too limited to fully capture the diversity of galaxies, according to the paper slated for publication in The Astrophysical Journal. 

But things are progressing, and Primack’s isn’t the only team working on this. Others are also training computers on large galactic datasets, and Primack offered a shoutout to the Feedback in Realistic Environments collaboration who are making realistic galactic models.

Ultimately, it shouldn’t be AI’s job to replace scientists, but instead help them manage the incredible amounts of data recorded by the newest observatories.