As computational-heavy research gains momentum, the amount of data researchers generate is exploding—a single sequencing of DNA requires as much as 28 terabytes. So where do American researchers store their most ginormous data sets? In the largest academic cloud server in the US, at the San Diego Supercomputer Center.
The SDSC Cloud has an initial capacity of 5.5 petabytes—roughly 250 billion pages of text—and achieves sustained read rates of 8-10GB/s—that's 250GB every 30 seconds. And that's just to start. The Cloud is scalable, on-demand, up to hundreds of petabytes. "We believe that the SDSC Cloud may well revolutionize how data is preserved and shared among researchers, especially massive datasets that are becoming more prevalent in this new era of data-intensive research and computing," said Michael Norman, director of SDSC said in a press release. "The SDSC Cloud goes a long way toward meeting federal data sharing requirements, since every data object has a unique URL and could be accessed over the Web."
What Norman means is that every data file uploaded to the Cloud is given a persistent URL and access levels are determined by the user—anywhere from completely private to open access. What's more, users will know where each copy of their data is stored on the HIPAA and FISMA compliant servers. Users even soon have the option to have a copy stored offsite on the UC Berkeley servers.
The SDSC cloud is based on the OpenStack OS developed by NASA and Rackspace Hosting. The system is disk-based, and data is written to multiple independent storage servers which are validated for consistency and which provide two—sometimes three— redundant copies spread across the servers. The SDSC Cloud employs two Arista Networks 7508 switches which create 768 10Gb Ethernet ports and are configured using multi-chassis link aggregation (MLAG). And since the system supports HTTP-based protocols, this data is available from any web browser running Windows, OSX, or UNIX operating systems—even mobile devices.
Built as part of UC San Diego's Research Cyberinfrastructure (RCI) project, the Cloud server already hosts data from UC San Diego's Libraries, School of Medicine, Rady School of Management, and Jacobs School of Engineering, as well as federally-funded research projects from the National Science Foundation, National Institutes for Health, and Centers for Medicare and Medicaid Services.
The only thing small about this gargantuan server system is the price for access. Storage costs $3.25/month for each 100GB, so $32.50 a terabyte a month. Those users with grant money can also essentially lease the actual hardware that stores their data for five years at a significant savings over the monthly rate.
Monster Machines is all about the most exceptional machines in the world, from massive gadgets of destruction to tiny machines of precision, and everything in between.
You can keep up with Andrew Tarantola, the author of this post, on Twitter, Facebook, or Google+.