The Daily Boston

Boston news, every day

News

Boston's Digital Archives Have a Duplicate Image Problem. Here's What Officials and Experts Are Saying About It.

From City Hall to the Boston Public Library, archivists and technologists are pushing for a coordinated fix to a sprawling redundancy crisis in municipal and institutional photo collections.

By Boston News Desk · Published 4 July 2026, 2:40 pm

4 min read

Boston's public institutions are sitting on tens of thousands of duplicate digital images — redundant files clogging servers, distorting search results, and costing taxpayers real money in storage and staff hours. The problem, which archivists have flagged for years, is now drawing fresh attention from the Wu administration's Office of Digital Innovation as city departments undertake a broader push toward consolidated data infrastructure.

The stakes are higher than they might seem. As Boston's universities, hospitals, and municipal agencies digitize decades of physical records, the failure to flag and replace duplicate images early creates compounding problems downstream — inflated storage costs, degraded metadata accuracy, and public-facing archives that return muddled results when residents search for historical records. The issue is not unique to Boston, but the city's density of institutional collections — from the Boston Public Library's Copley Square headquarters to the Northeastern University Archives on Huntington Avenue — makes it a particularly acute local challenge.

What the Experts Are Saying

Digital preservation specialists at the Massachusetts Institute of Technology and Harvard's Weissman Preservation Center have for several years advocated for what the field calls "deduplication protocols" — automated processes that identify identical or near-identical image files and flag them for human review before removal or consolidation. The core argument is straightforward: storing five versions of the same photograph of Faneuil Hall wastes server capacity and muddies provenance records that librarians depend on.

The BPL, which holds one of the largest municipal photograph collections in New England, has been piloting image-hashing tools since early 2025 as part of its Digital Commonwealth initiative, a statewide collaborative program that aggregates digitized collections from libraries and historical societies across Massachusetts. The pilot covers a subset of the library's estimated 1.2 million digitized items, according to program documentation published by the Commonwealth.

On the municipal side, Boston's Information and Technology Department has flagged duplicate media assets as a line item in its ongoing data governance review, a process that began under a framework the Wu administration introduced in fiscal year 2025. City officials have described the broader data consolidation effort as a prerequisite for more ambitious smart-city initiatives, though specific timelines for the image deduplication component have not been publicly released.

Archivists at smaller institutions — including the Jamaica Plain Historical Society on Centre Street and the Dorchester-based Dottie's Coffee Lounge community archive project — say they face the same problem with far fewer resources. For organizations operating on annual budgets well under $100,000, paying for commercial deduplication software is not feasible. Some have turned to open-source tools like ExifTool and Perceptual Hash libraries, which can be run on standard laptops but require staff time that volunteer-run organizations struggle to spare.

What Comes Next

The most concrete near-term development is a working group that the Boston Art Commission and the Office of Digital Innovation are expected to convene later this summer, according to the commission's published 2026 program calendar. The group's mandate includes reviewing how city-owned image assets — photographs of public art, streetscapes, and civic events — are stored and tagged across departments.

For institutions not connected to that municipal process, preservation specialists recommend a three-step approach: run a checksum audit to identify exact duplicates first, then apply perceptual hashing to catch visually similar but technically distinct files, and finally establish a controlled vocabulary for metadata before any files are removed. Skipping the metadata step, archivists warn, is how institutions lose track of which version of an image is the authoritative one.

The Digital Commonwealth program, administered through the Boston Public Library system, accepts digitization grant applications on a rolling basis and has historically prioritized projects that include deduplication and metadata standardization as deliverables. For community groups in neighborhoods like Roxbury and East Boston that are actively digitizing local history collections, that program represents one of the few accessible funding pathways available before the end of the current state fiscal year on June 30, 2027.

The city has not announced dedicated funding for the deduplication effort independent of the broader data governance review. Until it does, the burden falls unevenly — on stretched library staff at Copley Square, volunteer archivists in Jamaica Plain, and IT teams already managing a transit data integration project for the MBTA that is consuming significant departmental bandwidth.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.