The Daily Boston

Boston news, every day

News

Boston's Digital Archives Have a Duplicate Image Problem. Here's What Officials and Experts Are Saying About It.

From City Hall to the Boston Public Library, administrators and technologists are wrestling with a sprawling redundancy crisis inside public digital collections.

By Boston News Desk · Published 4 July 2026, 4:17 pm

3 min read

Boston's Digital Archives Have a Duplicate Image Problem. Here's What Officials and Experts Are Saying About It.
Photo: Kate Ryan / Public domain (Wikimedia Commons)

Boston's public institutions are sitting on thousands of duplicate digital images — redundant files clogging archival servers, inflating storage costs, and quietly undermining the integrity of records that residents, researchers, and historians depend on. The problem, which affects everything from the city's permitting database to the Boston Public Library's digitized photograph collections, has prompted a new round of scrutiny from municipal administrators and academic specialists this summer.

The issue gained fresh urgency after the Mayor's Office of New Urban Mechanics flagged the redundancy problem in a June 2026 internal review of the city's digital infrastructure. That review, which examined records management systems across several departments, found that duplicated image files were adding measurable overhead to cloud storage contracts renewed earlier this year. The city's IT department manages storage agreements that run into the millions of dollars annually across all departments combined.

Why Boston Archivists Are Pushing Back

At the Boston Public Library's Copley Square branch, staff in the Digital Services unit have been grappling with the challenge since the library began a large-scale digitization effort under its Transformative Technology Initiative. Scanning backlogs from the Leslie Jones photography collection and other historic holdings created multiple file versions — originals, compressed derivatives, and working copies — often stored without consistent naming conventions. That left archivists uncertain which version was authoritative and which was expendable.

Northeastern University's library system on Huntington Avenue faces a similar bind. The university's digital repository, which holds academic research materials alongside community-facing collections tied to the Boston Research Center, has grown substantially over the past three years. Librarians there have been working with metadata specialists to implement deduplication protocols, but the process is labor-intensive and requires resolving disagreements about which image version carries the highest fidelity.

Specialists in digital preservation say Boston is not unusual — the problem is endemic to institutions that digitized rapidly without unified standards. What makes the Boston situation distinctive is the scale of its ambitions. The Wu administration has made open data and digital access central planks of its technology agenda, meaning redundant or mislabeled archival images carry direct policy consequences, not just technical ones. Researchers pulling images from Analyze Boston, the city's open data portal, have flagged cases where the same historical photograph appeared under multiple accession numbers with different metadata attached.

What Comes Next for City and University Systems

The city's Department of Innovation and Technology has been in conversations with vendors about automated deduplication tools that use perceptual hashing — a technique that identifies visually identical or near-identical images even when file names differ. Several peer institutions in cities including Chicago and New York have piloted similar tools against their municipal photograph archives.

At the Massachusetts Institute of Technology, researchers in the Libraries' Digital Preservation group have published guidance recommending that institutions adopt the PRONOM file format registry and JHOVE validation tools before beginning any large deduplication pass. Both are open-source frameworks that allow archivists to verify file integrity before deletion — a critical safeguard when the records in question include irreplaceable historical images of neighborhoods like Roxbury, Jamaica Plain, and the South End from the mid-twentieth century.

The practical stakes are real. Storage costs for unmanaged digital archives can compound quickly; cloud storage pricing typically scales with volume, meaning redundant files directly inflate annual contracts. Institutions that have completed deduplication projects report storage reductions ranging from 15 to 40 percent depending on how aggressively files were duplicated during initial scanning runs, according to published case studies from the Digital Preservation Coalition.

For Boston residents and scholars who rely on these collections, the immediate advice from archivists is straightforward: when downloading images from public repositories, check the accession number and metadata date stamp, and request clarification from the holding institution if multiple versions appear in search results. The Boston Public Library's Ask a Librarian service, reachable through the bpl.org portal, can direct researchers to the canonical version of contested files. The longer fix — systematic deduplication across city and university systems — is expected to take well into 2027 to complete.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.