The Daily Boston

Boston news, every day

News

Boston's Duplicate Image Problem: The Key Decisions That Will Shape What Comes Next

City agencies and universities are sitting on thousands of redundant digital assets — and the choices made this summer will determine whether the cleanup effort holds.

By Boston News Desk · Published 4 July 2026, 2:57 pm

3 min read

Boston's Duplicate Image Problem: The Key Decisions That Will Shape What Comes Next
Photo: Photo by Mike Norris on Pexels

Boston's public institutions are staring down a surprisingly costly housekeeping problem. Across city government, the Boston Public Library's digital collections, and the sprawling research networks anchored at Northeastern University and Massachusetts General Hospital, duplicate digital images have quietly accumulated into a storage and accessibility burden that administrators can no longer defer. The question now is not whether to act — it is how, and who pays.

The timing matters because several major contracts for cloud storage and digital asset management are up for renewal before the end of fiscal year 2026, which closes September 30. Decisions made in the next sixty days will lock in infrastructure choices for the better part of a decade. Letting those renewals roll over without a deduplication strategy in place means paying for redundant data at rates that have only climbed since the pandemic-era digitization push.

How the Backlog Built Up

The problem did not happen overnight. Between 2020 and 2023, city departments and Boston-area research institutions poured resources into digitizing physical archives. The Boston Public Library's Copley Square branch scanned tens of thousands of photographs from its Print Department collection. The City of Boston's Archives, based on City Hall Plaza, digitized permit records, planning documents, and maps going back to the nineteenth century. Northeastern's Snell Library accelerated its own oral history and photograph digitization program.

Each initiative ran largely on its own timeline, with its own software stack. When projects overlapped — historical images of the South End, say, or photographs from the old West End neighborhood before its 1958 demolition — copies multiplied without a central registry to flag them. By some estimates used in comparable municipal digitization efforts in cities like Chicago and Philadelphia, duplicate files can account for anywhere from fifteen to thirty percent of total storage volume in large-scale archival projects. No Boston agency has yet published a specific local audit figure, but administrators at several institutions have acknowledged the issue is under active review.

The financial stakes are real. Commercial cloud storage pricing from major providers runs roughly two to four cents per gigabyte per month for archival tiers. For a collection running into hundreds of terabytes — a realistic scale for a combined BPL and city archive footprint — the annual carrying cost of unnecessary duplicates can reach into the hundreds of thousands of dollars.

The Decisions Ahead

Three choices will define the outcome. First, whether Boston adopts a shared deduplication platform across agencies or lets each institution handle the problem independently. A shared approach offers economies of scale but requires cooperation between entities that have historically guarded their own IT budgets. The Mayor's Office of New Urban Mechanics, which has coordinated cross-agency technology pilots before, is one natural convener for that conversation.

Second, who sets the metadata standards that allow systems to recognize a duplicate in the first place. Without agreed-upon tagging conventions, automated deduplication tools produce false positives — flagging legitimately distinct images as redundant and risking permanent deletion of irreplaceable material. The Boston Landmarks Commission, which holds authority over historically significant records, would need a seat at that table.

Third, whether the city pursues a vendor contract or builds on open-source tools already in use at local universities. MIT Libraries, just across the Charles River in Cambridge, has implemented open-source digital asset management software that handles deduplication as a native function. A formal knowledge-sharing arrangement between MIT and the BPL would cost far less than a proprietary enterprise contract and could be structured before the September 30 fiscal deadline.

The Fourth of July weekend has emptied most of the relevant offices. Real work resumes Monday. Budget directors, archivists, and IT procurement officers at City Hall have roughly twelve working weeks to move from acknowledgment to a signed framework. Miss that window, and the redundant files — and their carrying costs — roll into fiscal year 2027 with no plan attached.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.