The Daily Boston

Boston news, every day

News

Boston's Duplicate Image Problem: What Happens Next and the Key Decisions Ahead

City agencies and universities are sitting on thousands of redundant digital files — and the clock is ticking on who pays to fix it.

By Boston News Desk · Published 4 July 2026, 3:41 pm

3 min read

Boston's Duplicate Image Problem: What Happens Next and the Key Decisions Ahead
Photo: Photo by Abdullah Almutairi on Pexels

Boston's public institutions are facing a quiet but expensive reckoning. Duplicate digital images — redundant photographs, scanned documents, and archival files stored simultaneously across multiple servers — have accumulated across city departments, Boston Public Library's digital collections, and the research networks of Northeastern University and Boston University, creating storage bloat that IT administrators say is measurable in the hundreds of terabytes. The question now is not whether to clean it up, but who makes the call, who absorbs the cost, and what gets deleted forever.

The issue has sharpened this summer because several city contracts with cloud storage vendors are due for renewal before September 30, 2026. Renegotiating those contracts without first auditing duplicate holdings means agencies risk paying inflated rates for data they don't need. Mayor Michelle Wu's administration has made digital infrastructure efficiency part of its broader operational reform agenda, and the upcoming budget cycle — with capital requests due to the Office of Budget Management by August 15 — forces the question into the open.

Where the Problem Lives

The duplication is concentrated in a handful of institutions. Boston Public Library's Digital Commonwealth project, headquartered on Boylston Street in Copley Square, holds more than 1.7 million digitized items, some of which were uploaded by partner libraries without deduplication protocols in place. Separately, the city's Department of Innovation and Technology, based in City Hall on Cambridge Street, manages image archives for permitting, inspections, and public works that span at least a decade of overlapping scanning projects. Jamaica Plain's Hyde Square and Dorchester's Uphams Corner have both been subjects of repeated community-documentation photography campaigns run by different nonprofits and city offices, often producing near-identical image sets stored in separate silos with no cross-referencing.

On the university side, Northeastern's Snell Library on Huntington Avenue and BU's Mugar Memorial Library on Commonwealth Avenue each maintain institutional repositories with their own duplication challenges, particularly in research datasets where multiple lab teams have deposited processed image files without checking what colleagues already submitted. Neither institution has confirmed publicly what their current deduplication backlog costs them annually.

The Decisions That Can't Wait

Three choices are coming fast. First, city IT leadership must decide whether to invest in automated deduplication software — tools that can cost between $40,000 and $150,000 for an enterprise license — or to assign staff time to manual audits, which IT managers in comparable mid-sized American cities have estimated at roughly 2,000 person-hours for a 50-terabyte archive. Second, institutions need to agree on a shared metadata standard so future uploads are tagged consistently enough to catch duplicates before they propagate. The Boston Area Research Initiative, based at Harvard's Ash Center in Cambridge, has been advocating for exactly this kind of inter-institutional data governance framework, though no binding agreement is in place. Third, and most consequentially, someone has to define what counts as a true duplicate versus a version with archival value — a distinction that matters enormously to librarians, historians, and community groups in neighborhoods like Roxbury and South End whose visual histories have already been incompletely preserved.

The Wu administration's Smart City initiative, which allocated $2.1 million in the fiscal year 2026 budget for municipal technology upgrades, could theoretically cover some deduplication costs, but city officials have not publicly confirmed whether that fund is available for this specific use. The Office of Digital Equity and Emerging Technology, which operates out of City Hall, is the likely convening body for any cross-agency decision.

What comes next matters beyond storage bills. Get the decisions wrong — delete the wrong version, fail to agree on standards, or let vendors lock the city into proprietary formats — and institutions lose irreplaceable material while still paying too much. The MBTA's hard-learned lessons about deferred maintenance apply here too: cheap inaction now tends to generate expensive crises later. The August 15 budget deadline gives city departments roughly six weeks to put a coherent plan on paper. The library community, the university networks, and the neighborhoods whose histories live inside these files are all watching to see whether anyone actually does.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.