Large language models (LLMs) help researchers answer important, timely, and interesting questions without having to spend a fortune on data collection. But there is no consensus about when and how to use LLMs to produce high-quality data from historical source materials. This lack of consensus can be problematic because these data are often entirely novel, so researchers may not have access to other data or statistics for comparison. The Federal Reserve Bank of Philadelphia’s Center for the REstoration of Economic Data (CREED) is actively engaged in using LLMs to digitize historical data of economic interest. As such, its researchers are developing best practices for using LLMs to produce high-quality data from historical source materials. This article describes the best practices we and our coauthors have developed for two recent Philadelphia Fed projects.
This article appeared in the Fourth Quarter 2025 issue of Economic Insights. Download and read the full issue.
View the Full Article