Boston's public institutions are sitting on a growing problem buried inside their digital storage systems: millions of duplicate image files clogging servers, inflating IT budgets, and undermining the integrity of everything from property records in Dorchester to archival photo collections at the Boston Public Library on Boylston Street. The pressure to fix it is finally producing results — and a sharper conversation about who is responsible for cleaning it up.
The issue has been quietly escalating for several years, but city technology officers and university archivists say 2026 has become something of a reckoning. Cloud storage costs have risen sharply across municipal governments nationally, and Boston's Department of Innovation and Technology has been working through a broader digital infrastructure review that began in late 2025. Duplicate imagery — the same scan saved under multiple filenames, or the same photograph uploaded across different departmental databases — has emerged as one of the most stubborn inefficiencies in that review.
What Officials and Experts Are Saying
The conversation is no longer confined to IT back offices. Administrators connected to Mayor Michelle Wu's open data and transparency initiatives have flagged duplicate image accumulation as a direct obstacle to building reliable public-facing platforms. The city's online permitting portal, which serves thousands of contractors and homeowners each month in neighborhoods like Jamaica Plain and East Boston, has seen its back-end image libraries grow unwieldy as inspection photos, site documentation, and permit attachments pile up without automated deduplication protocols in place.
At Northeastern University's Roux Institute partnership programs, researchers working on machine learning applications have pointed to the Boston problem as a textbook case of what happens when institutions scale their digitization efforts without first establishing data governance policies. The practical consequence is not just wasted server space — duplicate images in property and zoning records can create genuine legal ambiguity, particularly when two versions of the same document carry different metadata timestamps.
Librarians and digital archivists at the Boston Public Library have been dealing with the problem on the cultural heritage side. The library's Digital Commonwealth program, which aggregates historical photographs and documents from institutions across Massachusetts, has invested in perceptual hashing tools — software that identifies visually identical or near-identical images even when file names and formats differ. The approach has become a model that smaller municipal agencies are now being encouraged to adopt.
The Cost and the Fix
Storage is not free. Industry benchmarks suggest that redundant files can account for anywhere from 20 to 40 percent of an organization's total data volume, though Boston-specific figures have not been publicly released as part of any official audit. The MBTA, which maintains an extensive archive of infrastructure inspection imagery used for safety compliance on the Red, Orange, and Green Lines, is among the transit agencies nationally that have begun piloting automated deduplication workflows to bring those ratios down.
Boston-based technology firm Crisp Data Solutions, which holds a contract with several Massachusetts municipalities, has publicly advocated for what it calls a "single source of truth" model — centralizing image repositories so that departments stop maintaining separate, redundant copies of the same files. The firm presented findings at a civic technology forum held at the Harvard Kennedy School in Cambridge in March 2026, drawing attendees from city agencies and regional planning bodies including the Metropolitan Area Planning Council.
For residents and smaller organizations wondering what this means practically: city officials have indicated that any property owner or developer submitting documentation through Boston's Inspectional Services Department should expect new file submission guidelines by the fourth quarter of 2026, designed specifically to reduce duplicate uploads at the point of entry rather than after the fact. Archivists and records managers at institutions ranging from the Massachusetts State Archives on Columbia Point to neighborhood historical societies in Roxbury are being encouraged to request technical assistance through the Digital Commonwealth network before undertaking new digitization projects. Getting ahead of duplication, experts say, is far cheaper than sorting through it later.