GigaScience publishes big data biological and biomedical sciences studies with high-speed transport from Aspera
SINGAPORE—2013 BIO-IT WORLD ASIA — May 29, 2013—Aspera, Inc., creators of next-generation technologies that move the world’s big data at maximum speed, today announced that GigaScience, an online open-access, open-data life sciences journal, co-published by BGI and BioMed Central has adopted a suite of software products from Aspera to provide authors, reviewers, and other users with the tools to upload and download the extremely large data sets that accompany manuscripts at maximum speed.
GigaScience publishes big-data articles covering the full spectrum of biological and biomedical sciences, including fields based on difficult-to-access data such as imaging studies, neuroscience, and systems biology. All of the manuscripts that are accepted and published in the journal focus on the use, analysis, or tool development for large-scale data sets.
Perceiving a problem with the reproducibility of data-heavy scientific studies, GigaScience set out to provide a solution. With the goals of making research articles transparent, the research itself reproducible and reusable, and large-scale data easily accessible and citable, GigaScience hosts the complete data sets associated with each published article in a comprehensive public database, GigaDB. It further provides each dataset with a ‘digital object identifier,’ which makes it easier for people to locate the files they’re looking for and also provides the means for people to directly cite the data when reusing or reproducing research. The data sets submitted in support of published articles can easily reach multiple terabytes in size.
To handle the transfer of such enormous datasets, GigaScience has adopted a suite of software products from Aspera to provide authors, reviewers, and other users with the tools to quickly and easily upload and download all the large data sets that accompany manuscripts. GigaScience selected Aspera Connect Server to rapidly transfer all the data sets that accompany submitted manuscripts to the GigaScience database and Aspera Console to manage and monitor the entire end-to-end transfer process.
Prior to using Aspera, GigaScience struggled with FTP and even tried shipping physical drives. However, GigaScience and researchers cannot wait weeks to finish uploading large data sets attached to manuscripts – the journal typically likes to return reviews to authors within two weeks and publish immediately upon acceptance; and researchers need rapid download times to be most effective.
“People want to use this data; they don’t want to sit and wait for a week while the data is downloading,” said Laurie Goodman, editor-in-chief of GigaScience. “Aspera is the only solution currently out there that can meet the journal’s needs to provide a reasonable way for people to access data in a timely manner.”
Authors use Aspera’s free downloadable Connect Web Browser Plug-in to submit manuscript-associated large data sets to a private data storage site at GigaScience. Staff reviewers then access the files, using the browser plug-in to download and upload files at high speed. If a paper is accepted for publishing, the data is then transferred to the journal’s public database, GigaDB, via Aspera, where it is readily available for journal readers to view and download, again using the Aspera Connect Web Browser Plug-in.
“Scientific research relies on very large data sets that need to be transferred and shared,” said Richard Heitmann, vice president of marketing for Aspera. “We are pleased to have helped GigaScience realize its goal of improving the accessibility and reproducibility of research and articles for the life science community.”
GigaScience (http://www.gigasciencejournal.com) aims to revolutionize data dissemination, organization, understanding, and use. An online open-access open-data journal, we publish ‘big-data’ studies from the entire spectrum of life and biomedical sciences. To achieve our goals, the journal has a novel publication format: one that links standard manuscript publication with an extensive database, GigaDB, that hosts and provides a citable format for all associated data and downloadable data analysis tools, as well as Galaxy platform resources. GigaScience is based out of BGI, the world’s largest genomics institute, which carries out research relevant to human diseases, prenatal care, agriculture, and the environment. BGI co-publishes GigaScience with BioMed Central, the world’s largest open-access publisher.
GigaDB is available at http://gigadb.org. GigaGalaxy (http://galaxy.cbiit.cuhk.edu.hk) is a joint project with Chinese University of Hong Kong-and, BGI, at the CUHK-BGI Innovation Institute of Trans-omics. The Galaxy platform and GigaDB are supported by BGI and the China National Genebank.