The Daily Boston

Boston news, every day

News

Boston's Digital Archives Have a Duplicate Image Problem. Here's What Officials and Experts Are Saying About Fixing It.

From City Hall to the Boston Public Library, administrators and technologists are reckoning with a backlog of redundant digital files that is straining storage budgets and slowing public access to historical records.

By Boston News Desk · Published 4 July 2026, 2:45 pm

3 min read

Boston's Digital Archives Have a Duplicate Image Problem. Here's What Officials and Experts Are Saying About Fixing It.
Photo: North End Union, Boston. School of Printing / Public domain (Wikimedia Commons)

Boston's public institutions are sitting on millions of duplicate digital images — redundant scans, copied photographs, and replicated records that eat up server space and make archival searches frustratingly slow. The problem has been building for years, but pressure is mounting in 2026 to actually do something about it.

The issue surfaced publicly this spring when the Boston Public Library's Digital Repository Program flagged it during a routine infrastructure review. The BPL, which holds more than 2.2 million digitized items across its Copley Square headquarters and branch network, acknowledged that an unquantified but significant share of its digital image holdings exist in multiple copies — the byproduct of overlapping digitization projects, staff turnover, and inconsistent file-naming conventions accumulated over more than a decade of scanning drives.

City archivists and library technologists say the situation is not unique to Boston. But the scale here, and the cost of cloud storage contracts the city holds with third-party vendors, has pushed the conversation from background noise to budget line.

Why It Matters Right Now

The timing is tied directly to money. The Wu administration's FY2027 budget, passed by the City Council in June, includes a consolidated digital infrastructure allocation for municipal departments. Library and archives officials are now competing for a portion of those funds to upgrade deduplication software — tools that automatically detect and flag redundant files before they're ingested into long-term storage.

The Massachusetts Board of Library Commissioners, which distributes state and federal Library Services and Technology Act grants to institutions across the Commonwealth, is also reviewing a handful of pending proposals from Boston-area libraries that include deduplication components. Grants under that program have historically ranged from $10,000 to several hundred thousand dollars depending on project scope.

At Northeastern University's Snell Library on Huntington Avenue, digital preservation staff have been piloting an open-source deduplication workflow since late 2024. The project, which uses perceptual hashing — a technique that identifies visually identical or near-identical images even when file names differ — has so far flagged tens of thousands of redundant image files in Northeastern's special collections. Staff there have described the effort internally as ongoing, with no firm public completion date yet announced.

Meanwhile, the Boston City Archives on School Street has been working through a separate challenge: historical photograph collections donated by community organizations in Dorchester and Jamaica Plain over the past decade. Many of those donations arrived as USB drives or burned DVDs, with no metadata standards applied, meaning the same image could appear under three different file names in three different folders.

What the Experts Are Recommending

Digital preservation specialists point to two broad approaches. The first is retroactive: run existing collections through deduplication software, review flagged items manually to confirm they're true duplicates rather than near-matches with documentary value, and then delete or consolidate. The second is preventive: establish ingestion protocols that check incoming files against existing holdings before they enter the archive at all.

The American Library Association's digital preservation working groups have published guidelines recommending that institutions adopt both approaches in tandem, noting that retroactive cleanup alone tends to recreate the same problems within a few years if intake processes aren't also reformed.

For Boston's biotech and university sector — which generates enormous volumes of research imagery and frequently donates collections to public repositories — the practical advice from archivists is to establish file-naming and metadata standards at the point of creation, not after the fact. The Francis A. Countway Library of Medicine on Shattuck Street, which serves Harvard Medical School and holds significant medical photography collections, has been developing exactly that kind of pre-submission checklist for institutional donors.

The BPL has not set a public deadline for resolving its duplicate backlog. City officials have indicated that any software procurement tied to the FY2027 digital infrastructure allocation would need to go through standard bidding procedures, meaning contracts are unlikely to be awarded before early 2027. In the meantime, archivists say they are doing what they can manually — a slow process, but one they argue is better than waiting.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.