Boston city staff finished a preliminary sweep of their digital records infrastructure this week, surfacing a problem that archivists and IT managers had flagged for months: duplicate image files, in some cases hundreds of identical or near-identical photographs, have accumulated across municipal databases serving the Boston City Archives on School Street, the Boston Planning Department's Roxbury office, and at least two branches of the Boston Public Library's digital collections unit.
The audit, completed by July 2, found redundant files spread across shared network drives and public-facing portals. The problem is not cosmetic. Duplicate images consume storage bandwidth, slow retrieval times for researchers, and — critically for the planning and permitting workflow — have caused misfiled documentation on several active development projects in Jamaica Plain and Dorchester, according to the scope-of-work summary circulated internally this week.
Why the Problem Compounded So Fast
Digital records at Boston municipal agencies expanded sharply after 2020, when remote work requirements pushed staff to upload documents and site photographs through multiple cloud platforms simultaneously. The Boston Planning Department alone shifted permitting documentation for hundreds of Dorchester parcels onto a new content management system during 2021 and 2022, a transition that was completed without a deduplication protocol in place. Files migrated from the old system were not checked against incoming uploads, meaning a single construction site photograph could exist under three or four different file names within the same database.
The Boston City Archives, housed at 201 Rivermoor Street in West Roxbury, holds photographic records dating to the nineteenth century. Staff there began flagging the digital duplication issue formally in March after a cataloguing project for the Charlestown neighborhood survey turned up 1,400 near-duplicate images of the same set of streets, each scanned at slightly different resolutions during separate digitisation runs between 2018 and 2024. The redundancies made the online finding aid nearly unusable for researchers trying to trace property histories.
At the Boston Public Library's Digital Repository Services team, which maintains the citywide BPL Digital Collections portal, similar drift had occurred across the Norman B. Leventhal Map Center's photograph holdings. A BPL project update published on the library's staff intranet in June placed the number of flagged duplicate or near-duplicate image assets across the entire BPL digital system at roughly 23,000 files — a figure that represents approximately 4 percent of the repository's total holdings as of the end of fiscal year 2025.
What Comes Next for Researchers and Residents
The city's Information and Technology Department, working alongside the Archives and the Planning Department, is piloting perceptual hashing software — a tool that compares images mathematically rather than by file name — on a subset of 5,000 photographs drawn from the Dorchester neighborhood planning files. Results from that pilot are expected before the end of July. If the approach works at scale, it would be applied to the broader municipal photo estate before the end of calendar year 2026.
For residents and researchers who rely on the BPL Digital Collections portal or the city's Inspectional Services document lookup tool on City Hall Plaza, the practical near-term effect is intermittent slowness in image retrieval, particularly for records tied to Jamaica Plain parcels along Centre Street, where active housing production has generated a high volume of recent site photography uploads. The ISD tool was down for approximately 90 minutes on the morning of July 1 — an outage that city IT attributed, without specific elaboration, to database maintenance related to the audit.
Archivists at the City Archives say researchers working on neighborhood history projects should, for now, cross-reference the online finding aids with the physical card catalogue available on-site during the Archives' public hours, Tuesday through Friday, 9 a.m. to 4 p.m. The deduplication work will not alter or delete any original files; all removals require sign-off from the city archivist before any record is permanently retired from the system.