The University of Rochester today provided an advanced look of its new data visualization lab, one of the centerpieces of its commitment to apply high performance computing and data science approaches to solve scientific problems. The lab creates the immersive visual experience necessary to allow researchers to understand and manipulate large and complex sets of scientific information.
“This new visualization lab represents the next step in the University’s plans to create the state-of-the-art infrastructure necessary to become a leader in the field of data science.” said Rob Clark, vice president for research and the dean of the Hajim School of Engineering and Applied Sciences. “This resource will bring together experts in the fields of computer science, the biological and life sciences, the physical sciences, and the social sciences and humanities, to develop new tools that enable researchers to harness the potential of big data.”
The new Visualization-Innovation-Science-Technology-Application (VISTA) Collaboratory was created with the support of $5 million from New York State after it was identified by the Finger Lakes Regional Economic Development Council as a priority project in 2012. The lab is part of a $30 million investment made by the University, New York State, and IBM in the Health Sciences Center for Computational Innovation (HSCCI) and more than $50 million that has been invested in recent years to expand the University’s high performance computational resources.
The VISTA Collaboratory, which is located in the Carlson Science and Engineering Library, completes the creation of what is essentially a massive scale, integrated, high performance supercomputing system. The display in the new lab consists of an array of 24 monitors, is 20 feet wide and 8 feet tall, and has a resolution (50 megapixels) approaching that of IMAX theaters. The visualization lab has a direct high speed fiber optic connection to the University’s Data Center, linking the display with an IBM Blue Gene/Q – which, with 16,384 processing cores, is one of the most powerful supercomputers on the planet – and the new IBM “BlueHive 2” Linux supercomputer cluster – which has a data storage capacity of 2 petabytes or 2 million gigabytes. To put this in perspective, 1 petabyte could store the complete human genome – which consists of 3 billion base pairs – of every individual in America.
Only a handful of other U.S. institutions – such as Stanford University and Oak Ridge National Laboratory – have developed similar capabilities.
The visualization lab will be the key to not only helping scientists understand data, but it will also enable them develop new analytical tools, collaborate with colleagues from other institutions, and train new generations of researchers and engineers in the field of data science. As a user facility available to industry, the lab is also expected to strengthen and expand existing research collaborations with companies like IBM, Xerox, and Wegmans, and attract new private sector partners.
From a research perspective, the large scale display helps scientists overcome one of the primary hurdles associated with big data: our ability to generate large and complex sets of data has largely outpaced our ability to understand and extract meaningful observations from this information. The size, orientation, and high resolution capabilities of the display create an immersive experience that allows scientists to look at and compare large sets of data on one screen or observe fine detail in the context of larger structures.
“The best analytical tool we have is still the human brain,” said David Topham, the executive director of the HSCCI and a professor in the Department of Microbiology and Immunology. “We can see relationships between data that computers cannot. But in order to do that you have to have the information in front of you so you can see the patterns and connections that matter. In other words, you need to be able to see the forest and the trees simultaneously.”
The new visualization lab, combined with Blue Gene/Q and BlueHive 2 systems, places the University of Rochester at the forefront of the national trend to unlock the potential of big data. Federal funding institutions, such as the National Institutes of Health, are pressing scientists to not only employ high performance computing in their research, but to also develop new ways to analyze large sets of data and build sophisticated computer-generated simulations.
The unique capabilities of the HSCCI and other University computational resources has already helped generate more than $300 million in research funding over the last six years. The University’s new Institute for Data Science (IDS) will create an estimated 460 construction and permanent jobs and generate $530 million in additional research funding over a ten year period.
The University will soon break ground on a new 50,000-square-foot building which will house the IDS and bear the Wegman name in recognition of the Wegman Family Charitable Foundation’s recent $10 million gift. The IDS was also recently named by New York State as a Center for Excellence for Data Sciences. The building will serve as a hub for faculty and new data science research and education programs in the fields of medicine, science and engineering, the humanities, education, and business.