A 100-gigabit highway for science

May 2, 2012
ESnet Weather

Climate researchers are expected to generate 2 petabytes of data by 2014 (credit: Prabhat and M. Wehner/LBNL)

Over the last decade, the amount of scientific data transferred by thousands of researchers around the world over the Department of Energy’s ESnet (Energy Sciences Network) 100-gigabit network has increased at a rate of about 72 percent per year, says Greg Bell, acting director of ESnet, which is managed by Lawrence Berkeley National Laboratory.

In an effort to spur U.S. scientific competitiveness, as well as accelerate development and widespread deployment of 100-gigabit technology, the Advanced Networking Initiative (ANI) was created with $62 million in funding from the American Recovery and Reinvestment Act (ARRA) and implemented by ESnet. ANI was established to build a 100 Gbps national prototype network and a wide-area network testbed.

To cost-effectively deploy ANI, ESnet partnered with Internet2, a consortium that provides high-performance network connections to universities across America, which also received a stimulus grant from the Department of Commerce’s Broadband Technologies Opportunities Program.

So far, more than 25 groups have taken advantage of ESnet’s wide-area testbed, which is open to researchers from government agencies and private industry to test new, potentially disruptive technologies without interfering with production science network traffic. The testbed currently connects three unclassified DOE supercomputing facilities: the National Energy Research Scientific Computing Center (NERSC) in Oakland, Calif., the Argonne Leadership Computing Facility (ALCF) in Argonne, Ill., and the Oak Ridge Leadership Computing Facility (OLCF) in Oak Ridge, Tenn.

“No other networking organization has a 100-gigabit network testbed that is available to researchers in this way,” says Brian Tierney, who heads ESnet’s Advanced Networking Technologies Group. “Our 100G testbed has been about 80 percent booked since it became available in January, which just goes to show that there are a lot of researchers hungry for a resource like this.”

Climate 100

Climate researchers are producing some of the fastest growing datasets in science. Five years ago, the amount of information generated for the Nobel Prize-winning United Nations International Panel on Climate Change (IPCC) Fourth Assessment Report was 35 terabytes (trillion bytes) — equivalent to the amount of text in 35 million books, occupying a bookshelf 248 miles (399 km) long.

Experts predict that by 2014, when the next IPCC report is published,  2 petabytes (quadrillion bytes) of data will have been generated for it — a 580 percent increase in data production.

To ensure that researchers will use future 100-gigabit effectively, another ARRA-funded project called Climate 100 brought together middleware and network engineers to develop tools and techniques for moving unprecedentedly massive amounts of climate data.

“Increasing network bandwidth is an important step toward tackling ever-growing scientific datasets, but it is not sufficient by itself; next-generation high-bandwidth networks need to be evaluated carefully from the applications perspective as well,” says Mehmet Balman of Berkeley Lab’s Scientific Data Management group, a member of the Climate 100 collaboration.

According to Balman, climate simulation data consists of a mix of relatively small and large files with irregular file size distribution in each dataset. This requires advanced middleware tools to move data efficiently on long-distance high-bandwidth networks.

At the 2011 Supercomputing Conference in Seattle, Wash., the Climate 100 team used their tool and the ANI testbed to transport 35 terabytes of climate data from NERSC’s data storage to compute nodes at ALCF and OLCF.

“It took us approximately 30 minutes to move 35 terabytes of climate data over a wide-area 100 Gbps network. This is a great accomplishment,” says Balman. “On a 10 Gbps network, it would have taken five hours to move this much data across the country.”

Space Exploration

In 2024, the most powerful radio telescope ever constructed will go online. Comprising 3,000 satellite dishes spread over 250 acres, this instrument will generate more data in a single day than the entire Internet carries today. Optical fibers will connect each of these 15-meter-wide (50 ft.) satellite dishes to a central high performance computing system, which will combine all of the signals to create a detailed “big picture.”

“Given the immense sensor payload, optical fiber interconnects are critical both at the central site and from remote stations to a single correlation facility,” says William Ivancic, a senior research engineer at NASA’s Glenn Research Center. “Future radio astronomy networks need to incorporate next generation network technologies like 100 Gbps long-range Ethernet links, or better, into their designs.”

In anticipation of these future networks, Ivancic and his colleagues are utilizing a popular high-speed transfer protocol called Saratoga to effectively carry data over 100-gigabit long-range Ethernet links. But because it was cost-prohibitive to upgrade their local network with 100-gigabit hardware, the team could not determine how their software would perform in a real-world scenario—that is, until they got access to the ANI testbed.

Within the next few months, the official ANI project will be coming to an end, but the community will continue to benefit for decades to come from its investments. The 100-gigabit prototype network will be converted into ESnet’s fifth-generation production infrastructure, one that will be scaled to 44 times its current size. ESnet will also seek new sources of funding for the 100-gigabit testbed to ensure that it will be available to network researchers on a sustained basis.

Approximately 13.7 billion years ago, the Universe was almost homogenous — meaning that every location in the cosmos was similar. Today, this is no longer the case. This simulation starts from a nearly homogeneous Universe and shows how the it has changed over billions of years. Performed on 4,096 cores of NERSC’s “Hopper” system with the Nyx code, this movie was generated with over 5 terabytes of data and was transferred to the SC11 Conference exhibit floor in Portland, Ore., last November, over ESnet. The video at top shows the simulation streaming on a 10 Gbps link, while the one at bottom shows the same model streaming on a 100 Gbps link. These simulations were generated by Prabhat (LBNL).