Multimodal LLMs offer a watershed change for the digitization of historical tables, enabling low-cost processing centered on domain expertise rather than technical skills. We rigorously validate an LLM-based pipeline on a new panel of historical county-level vehicle registrations. This pipeline is estimated to be 100 times less expensive than outsourcing options, reduces critical parsing errors from 40% to 0.3%, and matches human-validated gold standard data with an R2 of 98.6%. Analyses of growth and persistence in vehicle adoption are statistically indistinguishable whether using LLM or gold standard data. LLM-based digitization unlocks complex historical tables, enabling new economic analyses and broader researcher participation.
View the Full Working Paper
Working Paper
Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables?
September 2025
WP 25-28 – Multimodal LLMs are fast and cost effective at digitizing historical tables. Data from our LLM-based digitization pipeline achieve 98.6 percent fidelity and are statistically indistinguishable from manually digitized data.