Advances in AI continue to be dependent on broad access to high quality data, models, and computational infrastructure. The Federal Government has significant data and computing resources that are of vital benefit to the Nation’s AI research and development efforts. Increased access to data and computing resources will broaden the community of experts, researchers, and industries participating at the cutting edge of AI R&D. Increased access will strengthen the competitiveness of experts across the country, support more equitable growth of the field, expand AI expertise, and enable AI application to a broader range of fields. To realize this potential, a number of actions are underway.
National AI Research Resource (NAIRR)
The National AI Initiative Act of 2020 calls for the National Science Foundation (NSF), in coordination with the White House Office of Science and Technology Policy (OSTP), to form the National AI Research Resource (NAIRR) Task Force. This task force will investigate the feasibility of establishing the NAIRR, and propose a roadmap and implementation plan detailing how such a resource should be established and sustained.
The NAIRR is envisioned as a shared computing and data infrastructure that will provide AI researchers with access to compute resources and high-quality data, along with appropriate educational tools and user support. The roadmap and implementation plan developed by the NAIRR Task Force will consider topics such as the appropriate ownership and administration of the NAIRR; a model for governance; required capabilities of the resource; opportunities to better disseminate high-quality government datasets; requirements for security; assessments of privacy, civil rights, and civil liberties requirements; and a plan for sustaining the resource, including through public-private partnerships.
Data Resources for AI R&D
credit: Nicolle Rager Fuller, National Science Foundation
NSF’s initiative on Harnessing the Data Revolution is helping transform research through a national-scale approach to research data infrastructure
High quality datasets are critically important for training many types of AI systems. The National AI Initiative directs Federal agencies to provide and facilitate the availability of curated, standardized, secure, representative, aggregate, and privacy-protected data sets for AI R&D. These directives build on a number of ongoing Federal actions to increase access to data while also maintaining safety, security, civil liberties, privacy, and confidentiality protections. For example, twenty-seven Federal Agencies developed the 2020 Action Plan to implement the Federal Data Strategy, which defines principles and practices to generate a more consistent approach to the use, access, and stewardship of Federal data. The Data.gov resource provides access to a broad range of the U.S. Government’s open data, tools, and resources. The Department of Energy is supporting an Open Data Initiative at Lawrence Livermore National Laboratory to share rich and unique datasets with the larger data science community. Additionally, the National Science Foundation is leading in the development of a cohesive, federated, national-scale approach to research data infrastructure through the Harnessing the Data Revolution Big Idea. This initiative is helping to transform research across all areas of science and engineering, including AI.
The NAIIA calls on the National Institute of Standards and Technology (NIST) to develop guidance to facilitate the creation of voluntary data sharing arrangements between industry, federally funded research centers, and Federal agencies to advance AI research and technologies. Additionally, best practices for documentation of datasets are being developed by NIST, to include standards for metadata and for the privacy and security of datasets.
Together, these and related actions to increase the availability of data resources are driving top-notch AI research toward new technological breakthroughs and promoting scientific discovery, economic competitiveness, and national security.
HPC Infrastructure for AI
credit: Carlos Jones, Oak Ridge National Laboratory/U.S. Dept. of Energy
Summit scientific supercomputer at Oak Ridge National Laboratory
While algorithms and data play strong roles in the performance of AI systems, equally important is the computing infrastructure upon which the AI systems run. The United States is a world leader in the development of high-performance computing infrastructure that supports AI research. The most recent strategy guiding U.S. activities in high performance computing is laid out in the National Science and Technology Council’s strategic plan from November 2020, entitled “Pioneering the Future Advanced Computing Ecosystem“, which builds upon the 2015 National Strategic Computing Initiative defined by Executive Order 13702.
Examples of cutting-edge high performance computing resources in the United States include the Department of Energy’s Summit scientific supercomputer at Oak Ridge National Laboratory, launched in 2018. Summit provides unprecedented computer power for research across a broad variety of scientific domains, including artificial intelligence, energy, and advanced materials. Summit also provides unmatched capabilities for integrating AI and scientific discovery. In May 2019, DOE announced plans to build the Frontier supercomputer, which is expected to debut in 2021 as the world’s most powerful computer, designed to accelerate innovation in AI.
The National Science Foundation (NSF) also invests significantly in the exploration, development, and deployment of a wide range of cyberinfrastructure technologies that can be useful for AI R&D, including next-generation supercomputers. In 2018, NSF funded the largest and most powerful supercomputer the agency has ever supported to serve the nation’s science and engineering research community. The new high-performance computing system, called Frontera, has the highest scale, throughput, and data analysis capabilities ever deployed on a university campus in the United States.
The National Aeronautics and Space Administration also has a strong high-end computing program, and augmented their Pleiades supercomputer with new nodes specifically designed for Machine Learning and AI workloads.
Going forward, the National AI Initiative Act of 2020 directs DOE to make high performance computing infrastructure at national laboratories available for AI, make upgrades needed to enhance computing facilities for AI systems, and establish new computing capabilities necessary to manage data and conduct high performance computing for AI systems. Through these and related efforts, the Federal government is ensuring that high performance computing systems are increasingly available to advance the state of the art in AI.
Leveraging Cloud for AI R&D
Cloud platforms provide robust, agile, reliable, and scalable computing capabilities that can help accelerate advances in AI. Increased access to powerful cloud computing resources can broaden the ability of AI researchers to participate in the AI research and development (R&D) needed for cutting-edge technological advances. To capitalize on this opportunity, the 2019 Executive Order 13859 on Maintaining American Leadership in Artificial Intelligence directed Federal agencies to prepare recommendations on better enabling the use of cloud computing resources for federally funded AI R&D. The resulting NSTC report, Recommendations for Leveraging Could Computing Resources for Federally Funded Artificial Intelligence Research and Development, identified key recommendations on launching pilot projects, improving education and training opportunities, cataloguing best practices in identify management and single-sign-on strategies, and establishing best practices for the seamless use of different cloud platforms. Actions are underway to adopt these recommendations.
Several Federal agencies have launched pilot projects to identify and explore the advantages and challenges associated with the use of commercial clouds in conducting federally funded research. One example is NSF’s Cloud Access program, which funded an entity that has established partnerships with public cloud providers, assists NSF in allocating cloud computing resources, manages cloud computing accounts and resources, provides user training on cloud computing, and provides strategic technical guidance in using public cloud computing platforms. NIH is also conducting cloud and data pilots through two initiatives – STRIDES (Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability) and AIBLE (AI for BiomedicaL Excellence). These initiatives are addressing challenges associated with data storage and accessibility by establishing partnerships with commercial cloud service providers and harnessing the power of the commercial cloud in support of biomedical research.
Lessons learned from these and other pilot projects will be invaluable input for potential concepts for the National AI Research Resource.