Boston's city agencies are sitting on millions of duplicate digital images embedded in planning documents, transit records, and public housing files — and the people responsible for managing those archives say the problem has reached a point where it can no longer be ignored. Officials at City Hall and technology specialists working with the Wu administration are now pushing for a standardized duplicate-image-replacement protocol, arguing that bloated databases are driving up storage costs and creating real delays in public records requests.
The issue surfaced prominently this spring during a review of the Boston Planning Department's document management system, which handles rezoning applications, environmental impact filings, and neighborhood variance requests across districts from Jamaica Plain to East Boston. City technology staff found that a significant share of stored image files in the department's database were redundant copies — the same scanned site maps, engineering diagrams, and aerial photographs saved multiple times across different project folders.
Why This Is Landing on Officials' Desks Now
The timing is not accidental. The Wu administration has been pushing an ambitious digitization effort across city departments since 2024, part of a broader open-government initiative designed to make planning and housing documents more accessible to residents. That push dramatically accelerated the volume of files being ingested into city systems. More files, managed without consistent deduplication standards, means more duplicates.
The MBTA has been grappling with a parallel version of the problem. The transit authority's capital project documentation — covering everything from the Green Line Extension punch-list items to the ongoing upgrades at North Station — runs through a document control system that was not built with modern image-deduplication tools in mind. Technology consultants working on MBTA infrastructure projects have flagged redundant image storage as a contributing factor in slower-than-expected retrieval times for engineering records.
At Northeastern University's Roux Institute, researchers studying municipal data infrastructure have been watching Boston's situation closely. The institute has been in discussions with the city's Department of Innovation and Technology, headquartered on City Hall Plaza, about piloting a deduplication framework that would use hash-based image matching — essentially a digital fingerprinting method — to automatically identify and replace redundant files with a single canonical version linked across all relevant documents.
What Experts and Advocates Are Actually Recommending
The core debate among technical experts is not whether to replace duplicate images, but how. One school of thought favors a retroactive batch-processing approach: run a deduplication algorithm across the entire archive, flag matches above a threshold of, say, 98 percent similarity, and replace them systematically. A second approach, favored by some archivists at the Boston Public Library's Digital Repository Service on Boylston Street, calls for embedding deduplication at the point of ingestion — catching duplicates before they enter the system rather than cleaning them up afterward.
The cost argument is straightforward. Cloud storage is not free, and city contracts for managed document storage have grown steadily. Deduplication projects in comparable municipal systems — including those in Philadelphia and Chicago — have produced storage reductions in the range of 20 to 40 percent for image-heavy archives, according to published case studies from the Urban Libraries Council. For Boston, where the Planning Department alone processed more than 3,400 permit applications in 2025, even modest efficiency gains translate into real budget relief.
The practical stakes for ordinary residents show up in public records timelines. Under Massachusetts Public Records Law, agencies have 10 business days to respond to standard requests. Overcrowded, poorly indexed image databases slow that process. Community groups in Dorchester, particularly those tracking proposed developments along Morrissey Boulevard, have complained repeatedly about slow document delivery when requesting planning files.
City officials say a pilot deduplication program could launch as early as the fourth quarter of 2026, starting with the Planning Department's post-2020 files before expanding to older records. For residents, advocates, and developers waiting on documents, the practical advice from records specialists is straightforward: submit requests as specifically as possible — citing project addresses, permit numbers, and date ranges — to help overwhelmed staff locate the right files faster, whatever the state of the underlying archive.