The Daily Boston

Boston news, every day

News

Boston's Digital Archives Have a Duplicate Image Problem. Here's What Officials and Experts Are Saying About It.

From City Hall to the Boston Public Library, a quiet but costly crisis in digital records management is forcing a reckoning over how the city stores, tags, and retrieves visual assets.

By Boston News Desk · Published 4 July 2026, 2:58 pm

3 min read

Boston's Digital Archives Have a Duplicate Image Problem. Here's What Officials and Experts Are Saying About It.
Photo: Photo by Phil Evenden on Pexels

Boston's municipal and institutional archives are sitting on tens of thousands of duplicate digital images — redundant files that clog servers, inflate storage costs, and make retrieving accurate historical records a slow, error-prone process. That's the consensus emerging from archivists, city technology officers, and university librarians who have spent the better part of 2026 auditing the problem.

The issue has moved from a back-office nuisance to a genuine policy concern, driven in part by Mayor Michelle Wu's broader push to modernize city operations and digitize neighborhood planning documents, particularly those tied to housing development in Jamaica Plain and Dorchester. When duplicate images infest a records system, planners pulling permit photos, inspectors verifying site conditions, and residents appealing zoning decisions can end up looking at the wrong version of the same file — sometimes one that's years out of date.

What Institutions Are Actually Dealing With

The Boston Public Library's Digital Repository Services team, based at the Central Library on Copley Square, has been quietly working through a backlog of roughly 200,000 digitized image files that were ingested during a federally funded digitization sprint between 2020 and 2023. Duplicate detection software flagged approximately 18 percent of those files as either exact or near-exact copies during an internal review completed in early spring 2026. The library has not publicly released that figure, but the scope of the problem is consistent with what digital preservation specialists describe as typical for large-scale retroactive scanning projects.

Northeastern University's library system, which manages its own substantial digital collections along Huntington Avenue, has faced a parallel challenge. Archivists there have pointed to the absence of a standardized metadata schema as a root cause — images get uploaded multiple times because different departments don't know a file already exists in the system. The university adopted the Dublin Core metadata standard for new ingestions beginning in January 2026, a change intended to reduce redundancy going forward, but it doesn't fix what's already in the repository.

At the city level, the Department of Innovation and Technology has been piloting a duplicate-detection tool within the city's internal content management system since March. The pilot covers roughly 40,000 image files associated with public works and building inspection records. Officials familiar with the project say the tool has flagged duplicate rates above 20 percent in some departmental folders, which translates directly into unnecessary cloud storage expenditure — a meaningful line item when cloud storage is billed monthly per terabyte.

The Stakes for Neighborhood Planning and Public Trust

The practical consequences aren't abstract. In Dorchester, where the Wu administration has pushed aggressively on affordable housing production along the Fairmount Line corridor, planning staff working on Section 3A compliance documentation have encountered situations where site photographs stored in the city's system don't carry reliable timestamps or geotags — partly because duplicates created during file migrations stripped that embedded metadata. Housing advocates working with Dorchester-based nonprofits say that kind of records confusion slows down appeals and community review processes.

Digital records experts note that the cost of remediation rises sharply the longer institutions wait. A 2024 report from the Council on Library and Information Resources estimated that retroactive deduplication and metadata repair for a mid-sized municipal archive can run between $80,000 and $250,000 depending on collection size and the degree of metadata damage — figures that city budget offices are increasingly factoring into capital planning cycles.

The Massachusetts State Archives has offered technical guidance to municipalities on image file standards, pointing to its own transition to the TIFF format as a baseline for archival-quality preservation. Several Boston-area municipalities have used that guidance as a starting point for their own remediation plans.

For institutions and city departments still assessing the scope of their own duplicate problems, archivists recommend starting with an automated hash-comparison audit before attempting any manual review — it's faster, cheaper, and surfaces the worst offenders quickly. The BPL and Northeastern have both signaled they intend to complete their deduplication work before the end of fiscal year 2027. Whether City Hall hits a similar target will depend on how the next budget cycle treats the Department of Innovation and Technology's remediation request.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.