Boston Archives and City Agencies Move to Stamp Out Duplicate Images in Digital Records This Week
A quiet but consequential push to clean up redundant visual data is reshaping how Boston's public institutions manage their digital collections.
A quiet but consequential push to clean up redundant visual data is reshaping how Boston's public institutions manage their digital collections.
Boston city agencies and several of the region's major research institutions have accelerated efforts this week to address a persistent but underappreciated problem in digital records management: duplicate images clogging databases, inflating storage costs, and complicating public access to archives. The push, driven in part by a broader digital infrastructure review tied to Mayor Michelle Wu's open-data initiatives, surfaced as a practical priority for at least three departments managing large visual collections.
The timing matters. Boston's biotech corridor along Binney Street in Cambridge and the Longwood Medical Area both house institutions that maintain vast repositories of imaging data — from clinical research scans to laboratory documentation. At the same time, city agencies including the Boston Public Library's Norman B. Leventhal Map and Education Center on Boylston Street have been migrating decades of analog material into searchable digital formats. Without automated deduplication tools, the same image can exist in dozens of slightly altered versions, each tagged differently and occupying redundant server space.
On July 2, the Mayor's Office of New Urban Mechanics circulated an internal memo — confirmed by city records staff familiar with the process — flagging duplicate image replacement as a line item in the FY2027 technology budget review. The document identified redundant file management as a contributor to avoidable annual storage expenditure across city departments. No final figure has been attached to that budget line yet, but industry benchmarks suggest municipal governments of Boston's size commonly absorb six-figure annual costs purely from unmanaged digital duplication.
The Boston Public Library system, which serves more than 3.6 million visitors annually across its 26 branches, has been piloting deduplication software in its digital collections unit since early spring. The Kirstein Business Branch on City Hall Plaza and the Central Library in Copley Square both feed into a shared digital asset management system. Librarians there have been manually flagging duplicate scans — a process that staff have described internally as time-consuming and error-prone, according to procurement documents reviewed this week. The library's technology team is now evaluating at least two vendor proposals for automated replacement workflows, with a decision expected before the end of August.
Northeastern University's Digital Scholarship Group, based on Huntington Avenue, has separately published guidance this month for graduate researchers on how to audit image repositories before submission to institutional repositories. Their recommended protocol involves hash-based comparison tools — software that generates a unique fingerprint for each image file — to identify exact and near-duplicate matches before final archiving. The university's library system holds more than 1.2 million digital objects, a number that has grown roughly 18 percent since 2022 as remote research accelerated.
The pressure isn't just technical. Massachusetts public records law requires agencies to maintain accurate, retrievable documentation. Duplicate images can surface as separate records in response to public records requests, creating confusion about which version is authoritative and exposing agencies to legal challenge. The Secretary of State's office issued updated digital records guidance in March 2026 that specifically addressed version control for image files — a nod to how widespread the problem has become across state and municipal agencies.
For residents and researchers in Jamaica Plain and Dorchester, where community land trust organizations have been digitizing neighborhood planning documents going back to the 1970s, the practical stakes are real. The Dudley Street Neighborhood Initiative, which maintains its own archive of community development records, has flagged duplicate imagery in planning documents as an obstacle to clean historical searches.
What comes next is a combination of software procurement and staff training. The Boston Public Library expects its vendor selection to wrap up by late August, with implementation beginning in October. City departments under the Office of New Urban Mechanics have been asked to submit duplicate-image audits by September 12 as part of the FY2027 budget process. Researchers and community organizations working with digital archives should contact their relevant department records liaison now to flag known duplication issues before those audits close — getting problems on record before the formal review ends is the most direct way to influence how resources get allocated.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Boston
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News