Boston's municipal digital infrastructure is carrying a hidden weight: tens of thousands of duplicate image files spread across city agency servers, public library archives, and civic data portals, according to an analysis of storage and records management data compiled by IT departments citywide. The problem is neither glamorous nor headline-grabbing, but the cost is real.
City technology officers have been quietly flagging the issue since early 2025, when an internal audit of the Boston Public Library's digital collections system — which serves the main branch on Copley Square as well as 24 neighborhood locations — found that roughly 34 percent of archived image assets existed in two or more identical copies. That single finding set off a broader review across other city systems.
The timing matters. Mayor Michelle Wu's administration has made a public commitment to modernizing city data infrastructure as part of its broader progressive governance agenda, and the FY2026 budget allocated $4.1 million toward digital services upgrades citywide. Inefficiencies like duplicate image storage directly eat into those resources — and they slow down the public-facing platforms that residents actually use to access permits, property records, and planning documents.
Where the Redundancy Piles Up
The worst offenders, by volume, are systems that handle permit photography and property inspection images. The Boston Inspectional Services Department, which operates out of 1010 Massachusetts Avenue in Roxbury, processes thousands of property photos annually across neighborhoods including Dorchester and Jamaica Plain — two of the city's most active zones for housing construction and code enforcement. Inspectors uploading photos from mobile devices often generate automatic cloud backups that duplicate files before human review ever occurs.
The city's Office of Housing, which administers programs like the Acquisition Opportunity Program and tracks affordable unit compliance across ZIP codes including 02122 and 02130, faces a similar problem in its document management system. Image attachments tied to compliance filings — site photos, unit-condition records, construction progress shots — frequently arrive duplicated because multiple caseworkers download and re-upload the same materials during handoffs.
Estimates from the broader public sector suggest the scale is significant. Research published by the Storage Networking Industry Association found that duplicate files typically account for between 20 and 40 percent of total unstructured data in large organizational storage environments. Applied to Boston's current city server footprint — which technology budget documents peg at over 2 petabytes of active storage — that range implies hundreds of terabytes of avoidable data.
What Cleaning Up Actually Costs — and Saves
Deduplication software licenses for enterprise-scale systems typically run between $8,000 and $60,000 annually depending on volume tier and vendor, based on published pricing from major providers including Veritas and Commvault. For a mid-sized city department, the payback period on that investment can be less than 18 months when factoring in reduced cloud storage fees and faster retrieval times.
The MBTA, though a separate authority from the city, ran a smaller-scale pilot of automated image deduplication across its station surveillance archive system in late 2024. The transit authority has not published full results, but technology staff described the project at a January 2025 MassDOT data governance panel in Government Center as having cleared more than 12 terabytes of redundant visual data within the first 90 days.
For residents and city employees, the practical upshot is this: the longer duplicate images sit unaddressed in public systems, the slower searches run and the harder it becomes to surface accurate, current records. Anyone who has tried to pull a building permit history on a Dorchester triple-decker through the Accela online portal has likely encountered that lag firsthand.
City IT officials are expected to present a deduplication remediation roadmap to the Wu administration's Office of Digital Equity and Emerging Technology before the end of Q3 2026. Departments with the largest flagged backlogs — Inspectional Services, the Office of Housing, and the Parks and Recreation Department — are likely to receive prioritized attention. Whether off-the-shelf tools or a custom procurement approach wins out, the clock is ticking: cloud storage costs for the city are projected to increase 18 percent in FY2027 if current data growth trends continue.