
Summary
For the United States to remain on the cutting edge of artificial intelligence (AI) development, model developers need access to novel, high-quality, and underused data sets. This is true for frontier model advancement, but even more so for ensuring that models can be effectively fine-tuned for accomplishing specific tasks such as industrial operations, drug development, and climate prediction that contribute to scientific discovery, economic dynamism, and national security. This policy should be advanced by directing the United States Trade Representative (USTR) within the Department of Commerce to prioritize working with foreign governments, particularly those the US already has strong relationships with, to establish a policy of licensing crucial data to be used to train AI models. The USTR should also work with Congress when crafting new trade agreements or treaties to include language that expressly calls for data sharing and unencumbered cross-border data flows for the purpose of training AI models.
Problem
Now that leading AI model developers have scraped the web and have incorporated most, if not all, publicly available data to train their models, they are increasingly seeking access to high-quality proprietary datasets to drive system improvements. The US government has an opportunity to secure the nation's technological advantage in AI by negotiating and securing access to key data flows on behalf of US industry. Specifically, access to data from allies in areas of strategic importance will support continued building and fine-tuning of AI models to support diffusion domestically, while also establishing mutually beneficial relationships globally around a key input for future AI model development. Where possible, the US may also aim to secure these arrangements on an exclusive basis, denying access to geopolitical rivals as they attempt to catch up in AI.
Exclusive agreements are particularly important in the context of competition with China. Presently, Chinese model developers enjoy the benefit of a domestic legal framework that facilitates firm access to information, as well as efforts by regional governments, such as those in Shanghai and Shenzhen, to collect and curate datasets to spur AI development. The federal government should promote these exclusive agreements to mitigate potential data shortages in the near term, while extending existing norms related to cross-border data flows and AI model training over the medium to long term. Such agreements could be a critical plank of Western collaboration around the sharing and use of key inputs for training AI models as a counter to the techno-authoritarian ecosystem being developed by the People's Republic of China and its collaborators.
Continue reading at rebuilding.tech.
