This paper describes methodological approaches for extracting structured data from large-scale historical document archives, comparing “hyperspecialized” with “adaptive modular” strategies using 56 years of Philadelphia property deeds as a case study. Hyperspecialized approaches rely on custom models optimized for specific tasks, potentially achieving superior accuracy but requiring substantial investment while offering limited flexibility. Adaptive modular approaches combine existing off-the-shelf or lightly customized tools in ensemble to balance accuracy, cost-effectiveness, and flexibility. We find that LLMs can be particularly valuable in the context of a modular approach. Complementary tools can simplify and focus tasks for LLMs, thus improving performance and reliability, lowering costs, and decreasing review time. We also demonstrate how our approach proves valuable when research questions evolve organically, as modular components can be repurposed across multiple projects without (re)building specialized models. Our approach should be easily adaptable to other research involving deeds and similar documents.

View the Full Discussion Paper