Boston's Duplicate Image Problem: What Happens Next and the Key Decisions Ahead
City agencies and universities are sitting on thousands of redundant digital assets — and the clock is ticking on who pays to fix it.
City agencies and universities are sitting on thousands of redundant digital assets — and the clock is ticking on who pays to fix it.

Boston's public institutions are confronting a sprawling, unglamorous crisis hiding in plain sight: duplicate digital images clogging archives, wasting server capacity, and quietly undermining the accuracy of everything from city permit portals to hospital records. The problem has accelerated fast enough that IT administrators at several Boston agencies are now being pressed to make concrete decisions about replacement workflows before the end of fiscal year 2026.
The timing matters. Mayor Michelle Wu's administration has pushed hard on digital-services modernization as part of a broader open-government agenda, and the city's Department of Innovation and Technology has been tasked with auditing legacy data infrastructure across municipal departments. Duplicate image files — photographs, scanned documents, ID photos, building inspection records — are among the most common and most expensive forms of digital waste flagged in such audits. When the same file exists under four different names across three different servers, the downstream consequences range from billing errors to court-record disputes.
Two institutions in particular illustrate the scale of the challenge. The Boston Public Library's Digital Commonwealth project, which archives tens of thousands of historical photographs accessible from the Copley Square branch, has been working since at least 2024 to reconcile duplicate entries introduced during a mass digitization push. Separately, Northeastern University's library system on Huntington Avenue has identified redundant image records across its archival databases as a direct consequence of migrating to a new content management system last year.
Neither institution is alone. Across the Longwood Medical Area, hospitals and research institutions that share imaging data — including patient-intake photographs and lab-specimen scans — face strict HIPAA compliance requirements that make duplicate records not just inefficient but legally hazardous. A duplicate patient image attached to the wrong record is a liability, not merely a storage problem.
The financial pressure is real. Cloud storage costs for institutions managing large image libraries have climbed steadily; industry estimates place the average cost of storing one terabyte of redundant enterprise data at roughly $23 per month on standard cloud tiers, and municipal IT departments rarely operate lean enough to purge duplicates in real time. For a city agency holding, say, 40 terabytes of building-inspection photographs accumulated since the early 2010s — a plausible figure given Boston's pace of development in Dorchester and Jamaica Plain — the redundancy bill compounds year over year.
Three choices are now forcing themselves onto the agenda of anyone managing a Boston-area image archive. First, whether to invest in automated deduplication software — tools that scan file hashes and metadata to identify identical or near-identical images — or rely on manual review, which is cheaper upfront but far slower. Second, whether to consolidate storage onto city-managed servers or migrate to a vendor-managed cloud environment, a question with both cost and sovereignty implications. Third, and most politically fraught, who owns the decision when duplicate records span multiple agencies or institutions that don't share a chain of command.
That third question is where progress most often stalls. The City of Boston's open-data portal, accessible at data.boston.gov, lists dozens of datasets maintained by separate departments, and there is currently no single policy governing how image assets are versioned, retired, or replaced when errors are found. The Wu administration's digital-services team has signaled interest in a unified data governance framework, but no ordinance or formal policy has been enacted as of July 4, 2026.
For institutions wanting to get ahead of the problem, the practical path forward involves three steps: commission a full asset inventory before attempting any deletion, adopt a file-naming and metadata standard that makes duplicates detectable before they multiply, and designate a named data steward with actual authority to retire redundant records. The Boston Public Library's ongoing Digital Commonwealth reconciliation project offers a working model — imperfect, still in progress, but further along than most. Other institutions across the city would do well to study what Copley Square has already learned the hard way.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Boston
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News