Article

Harvesting Historical Data with LLMs

Q4 2025

Economic Insights — Researchers at the Philadelphia Fed are using large language models to do what was once too expensive and time-consuming to do at scale: collect and analyze data from historical records.

Economic Insights CREED

Large language models (LLMs) help researchers answer important, timely, and interesting questions without having to spend a fortune on data collection. But there is no consensus about when and how to use LLMs to produce high-quality data from historical source materials. This lack of consensus can be problematic because these data are often entirely novel, so researchers may not have access to other data or statistics for comparison. The Federal Reserve Bank of Philadelphia’s Center for the REstoration of Economic Data (CREED) is actively engaged in using LLMs to digitize historical data of economic interest. As such, its researchers are developing best practices for using LLMs to produce high-quality data from historical source materials. This article describes the best practices we and our coauthors have developed for two recent Philadelphia Fed projects.

This article appeared in the Fourth Quarter 2025 issue of Economic Insights. Download and read the full issue.

View the Full Article

Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables?

Verónica Bäcker-Peral,
Vitaly Meursault &
Christopher Severen

September 2025

WP 25-28 – Multimodal LLMs are fast and cost effective at digitizing historical tables. Data from our LLM-based digitization pipeline achieve 98.6 percent fidelity and are statistically indistinguishable from manually digitized data.

Regional Economics

Working Paper

The Price of Housing in the United States, 1890–2006

Ronan C. Lyons,
Allison Shertzer,
Rowena Gray &
David Agorastos

Revised: October 2025

WP 24-12/R – This paper introduces new rental and home sales price series for the 20th century constructed from historical newspapers, and it explores the different rent and home price trajectories of cities across the United States.

CREED

Regional Economics

Working Paper

Lockdowns and Innovation: Evidence from the 1918 Flu Pandemic

Enrico Berkes,
Olivier Deschênes,
Ruben Gaetani,
Jeffrey Lin &
Christopher Severen

Revised: May 2023

WP 20-46/R – Does social distancing harm innovation? Nonpharmaceutical interventions adopted during the 1918 flu pandemic did not cause local declines in patenting. Instead, NPIs may have preserved other inventive factors.