Management and Storage of Scientific Data
Funding Agency:
- Department of Energy
The DOE SC program in Advanced Scientific Computing Research (ASCR) hereby announces its interest in basic research in computer science exploring innovative approaches to the management and storage of scientific data.
Modern scientific computing relies on processing a deluge of data coming from both experiments and simulations, with even relatively modest scientific activities generating petabytes of data. Planned upgrades of experimental facilities in the foreseeable future, combined with the increased computing capabilities of DOE’s exascale supercomputers and other state-ofthe-art computing capabilities coming online over the next few years (for more information, see https://science.osti.gov/ascr/Facilities/User-Facilities/Upgrades), promise to compound the many challenges in storing and managing data such that it can be effectively used to fuel scientific discovery [2-12].
Traditional large-scale scientific data management has relied on the use of file formats optimized for simple access patterns on parallel, distributed file systems. These files have tended to be metadata poor and complicated to access, lacking flexible indexing for efficient searching, where enabling new kinds of analysis often requires writing new, low-level code [2-5]. Scientific workflows have also become increasingly complicated, integrating both simulation and the analysis of data from experiments, exploiting advanced machine-learning techniques [4,8-10], and requiring distributed, multi-stage processing [5-7]. Additionally, significant opportunities exist to enhance trust and aid scientific reproducibility by enhancing our ability to record data provenance and verify data integrity. Fortunately, through a combination of past scientific-datamanagement investments and leveraging the growing ecosystem of big-data and database technologies, scientific endeavors have made significant improvements in their data management and use. While the ever-increasing scale of scientific data threatens that progress, new “smart” storage and networking technologies that provide embedded computational capabilities; novel methods for indexing, representing, and distributing data; and advanced techniques for interfacing with data management systems and integrating into programming environments promise significant breakthroughs. Moreover, new techniques for scientific data management can help integrate data into large scientific-data and computational ecosystems that embody the FAIR principles of Findability, Accessibility, Interoperability, and Reuse [Error! Reference source not found.], thereby enabling collaborative, responsive science at yet-unprecedented scales [2-5].
Ceiling: DOE National Laboratories: $750,000 per year • All other applicants: $300,000 per year;
$13,000,000
Internal Submission Deadline for Institutional Submission Selection: Please submit a draft Pre-Application document following the required format in FOA to Atam Dhawan at dhawan@njit.edu with cc to respective department chair, college dean and Kathleen O’Neill at ko86@njit.edu by April 24 for institutional review. The decision on institutional selection will be provided by April 26, 2022.
Pre-Application Submission Deadline: May 5, 2022 at 5:00 PM ET
June 13, 2022, at 11.59 PM Eastern
Dr. Hal Finkel [Primary] 301-903-1304 hal.finkel@science.doe.gov