Boston's public institutions are sitting on a quiet mess. Duplicate images — misfiled photographs, redundant digital scans, and unverified visual records — have accumulated across city archives, university libraries, and nonprofit collections to the point where administrators can no longer reliably certify what they hold. The immediate question is not whether to fix it, but who makes the call on what gets deleted, what gets kept, and on what timeline.
The issue has sharpened this summer because several federally funded digitization grants, including programs tied to the Institute of Museum and Library Services, are approaching reporting deadlines. Institutions that received IMLS funding in 2023 and 2024 must demonstrate clean, deduplicated digital collections to qualify for renewal cycles opening in early 2027. For Boston, where the university and biotech economy underwrites a substantial share of the city's archival and research infrastructure, that deadline carries real financial weight.
Where the Problem Is Concentrated
Two institutions are at the center of the local reckoning. The Boston Public Library's Digital Repository, headquartered on Boylston Street in Copley Square, holds hundreds of thousands of digitized images spanning neighborhood history, city planning records, and protest photography dating to the 1960s busing crisis. Staff there have flagged that a significant share of the repository's photograph holdings contain near-identical duplicate scans created during successive digitization campaigns — sometimes three or four versions of the same print, each catalogued separately under slightly different metadata.
Separately, Northeastern University's Archives and Special Collections on Huntington Avenue has been working through its own deduplication project since 2025, focused on visual materials related to the South End and Roxbury. The challenge, archivists there have described in public presentations, is that automated deduplication tools frequently misidentify near-duplicates as distinct images when lighting conditions or crop differences are minor. Human review remains necessary for any collection where the historical record is contested or legally sensitive.
The Boston City Archives in West Roxbury faces a related but distinct problem: analog photographs that were scanned multiple times by different contractors between 2018 and 2022 under separate city procurement contracts. The result is digital redundancy that consumes server storage and complicates Freedom of Information responses, since staff must manually confirm which version of an image is the authoritative record before releasing it.
The Decisions That Cannot Wait
Three choices are now unavoidable. First, institutions must decide whether to adopt a unified deduplication standard or allow each archive to set its own threshold for what counts as a true duplicate. The Library of Congress has published guidance recommending a perceptual hash comparison standard, but Boston's institutions have not collectively adopted it.
Second, there is a budget question. Commercial deduplication software licensed for institutional use typically runs between $8,000 and $25,000 annually depending on collection size, according to vendor pricing sheets published by companies serving the U.S. library market. For the Boston Public Library, which operates under the city's budget authority, any new software acquisition above $10,000 requires a procurement process that can take four to six months. That timeline sits uncomfortably close to the 2027 IMLS renewal window.
Third, and most consequentially, someone must establish a retention policy with legal standing. Massachusetts public records law requires that government-held images with evidentiary value be retained on a schedule approved by the Secretary of State's office. Deleting a duplicate that turns out to be the only surviving version of a legally significant photograph would expose a city agency to records destruction liability.
The Mayor's Office of New Urban Mechanics, which has previously coordinated city technology and civic data projects, is one body positioned to broker a cross-institutional framework. Whether it takes that role — or whether BPL, the City Archives, and the universities proceed independently — will shape how quickly Boston can bring its visual record into reliable order. Institutions that move first to establish clean, verified collections will be better placed for the next round of federal digitization money. Those that wait may find the window has closed before the backlog is cleared.