Boston's city government is sitting on a digital mess years in the making. Across at least a dozen municipal departments — from the Office of Arts and Culture on City Hall Plaza to the Boston Public Library's Digital Repository Program on Boylston Street — duplicate image files have quietly metastasized inside public archives, consuming server space, slowing retrieval times, and making it harder for researchers and city staff alike to find original, authoritative records.
The problem didn't arrive overnight. It is the accumulated result of three separate waves of digitization funding that Boston received between 2015 and 2023, each administered by a different department with different technical standards and almost no coordination across agencies. When one grant cycle ended and another began, contractors frequently re-scanned documents that had already been digitized, uploading new versions alongside — never replacing — the originals. The Boston City Archives, housed at 201 Rivermoor Street in West Roxbury, estimates it manages upwards of four million digital image files. No public audit has established exactly how many are duplicates, but archivists who work there have described the redundancy problem in public budget testimony as a growing operational burden.
How the Duplication Problem Took Root
The first major digitization push came through a state-administered Library of Massachusetts grant program in 2015, which pushed institutions citywide to scan historical records quickly and cheaply. Speed was rewarded; deduplication was not required. The Boston Public Library's Kirstein Business Branch on Boylston Street digitized tens of thousands of permit and property records under that program. A second wave, funded partly through federal IMLS grants around 2018 and 2019, introduced a different file-naming convention, meaning the same photograph of, say, a Roxbury triple-decker from 1962 might now exist under three separate identifiers in three separate folders.
Mayor Michelle Wu's administration inherited the backlog when she took office in November 2021. Her progressive technology agenda has emphasized open data and civic transparency — principles that sit awkwardly alongside an archive that, in practice, is difficult for the public to search. The city's Open Data portal, launched through Boston's Department of Innovation and Technology, lists dozens of datasets but does not yet include a unified image repository. That gap has drawn repeated criticism from preservationists and genealogists who rely on historical records tied to neighborhoods like Dorchester and Jamaica Plain, where property turnover and community change have made old photographs and survey documents especially valuable.
Duplicate records also carry a direct financial cost. Cloud storage is not free. Boston's IT department budget for fiscal year 2026 — approved by the City Council in June 2025 — allocated roughly $14.2 million to citywide data infrastructure, a figure that includes licensing fees for storage platforms that hold redundant files. Archivists and technology staff have noted in public budget hearings that a systematic deduplication effort could reduce active storage needs materially, though no official cost-savings projection has been published.
What a Fix Actually Looks Like
The path forward involves both technology and policy. Deduplication software can identify identical or near-identical image files using hash-matching algorithms, flagging them for review before deletion. The Boston Public Library's Digital Repository Program has piloted one such tool on a subset of its newspaper photograph collection — roughly 80,000 images scanned from the old Boston Herald-Traveler archive — with results expected later this summer. If the pilot is successful, the Library plans to present findings to the Department of Innovation and Technology by September 2026.
Separately, the City Archives in West Roxbury is developing a new metadata standard that would require any future digitization contractor to check files against a master registry before uploading. The standard, still in draft form, would apply to any project receiving city funding after January 2027.
For Bostonians who use these archives — historians, attorneys pulling property chains in South Boston, community groups documenting neighborhood change in Hyde Park — the practical advice for now is straightforward: when requesting historical images through the city's public records process, specify the date range and original source department. That extra detail helps archivists navigate toward verified originals rather than landing on a duplicate that may carry a wrong metadata tag. The cleanup is underway. It will take time.