Boston's city government is sitting on a digital mess it created itself. Across departments from the Office of Housing Stability on Tremont Street to the Boston Public Health Commission's offices near Ruggles Station, servers are clogged with duplicate image files — the same photograph scanned twice, the same permit graphic uploaded three or four times, the same infrastructure photo attached to half a dozen separate records. City IT administrators have been quietly working since late 2024 to quantify the scale of the problem, and the numbers are not flattering.
The issue matters right now for a specific reason. Mayor Michelle Wu's administration has been pushing a sweeping open-data initiative tied to the city's expanded housing production push in Jamaica Plain and Dorchester, where development applications, site photographs, and engineering graphics must be stored in publicly accessible databases. When the underlying digital archives are riddled with redundant files, it slows retrieval times, inflates cloud storage costs, and makes it harder for residents and advocates to search records — undermining the transparency goals the administration has staked political capital on.
How the Duplication Problem Built Up Over Years
The roots go back to roughly 2017, when Boston began an aggressive push to digitize paper permit and inspection records. The effort was well-intentioned. The city contracted with multiple vendors at different points to scan legacy files, and each vendor delivered its own archive with its own file-naming conventions. When those archives were merged into the city's central document management system — a platform administered through the Department of Innovation and Technology on City Hall Plaza — nobody ran a systematic deduplication sweep. Files piled on top of files.
The MBTA's separate but related transparency problems offered Boston a cautionary example closer to home. After years of reliability crises on the Green and Orange Lines, the T's capital project archives were found to contain thousands of redundant engineering drawings and inspection photos that complicated contractors' ability to pull accurate, current records quickly. The situation illustrated what happens when digital hygiene is deferred during rapid digitization: the short-term gain of getting records online is partially eaten away by the long-term cost of managing a bloated, disorganized archive.
Boston's universities contributed indirectly to the expectation gap. Northeastern University's library systems and the MIT Libraries — both of which have run sophisticated digital asset management programs for more than a decade — have set a regional standard that city government has struggled to match. Researchers and advocates who routinely use those institutional systems and then try to pull city permit images from Boston's Analyze Boston data portal notice the contrast immediately.
What the Fix Looks Like — and What It Costs
The city's Department of Innovation and Technology began issuing RFPs in early 2025 for deduplication and digital asset management tools. That process has moved slowly, in part because of competing budget priorities. The Wu administration's fiscal year 2026 budget, passed last spring, allocated roughly $4.2 million to citywide IT infrastructure upgrades — a figure that has to stretch across cybersecurity hardening, the MBTA-linked mobility data project, and the document management overhaul simultaneously.
Practically, the deduplication work involves more than running software. Human review is required for images where automated hash-matching fails — situations where the same photograph was scanned at different resolutions or compressed differently, producing files that are not technically identical but are functionally redundant. That review work is being piloted in the Inspectional Services Department, which handles building permits for high-activity corridors including Blue Hill Avenue in Dorchester and Centre Street in Jamaica Plain.
For residents and developers trying to pull records today, the advice from city technology staff is consistent: use the direct property lookup tool on the Analyze Boston portal rather than keyword image searches, which are more likely to surface duplicates and slow results. The ISD's permit desk at 1010 Massachusetts Avenue can also provide direct file links that bypass the cluttered general archive.
A full remediation timeline has not been made public. The Department of Innovation and Technology has indicated a phased approach, with the Inspectional Services pilot expected to wrap by the fourth quarter of 2026 before rolling out to other departments. Whether that schedule holds will depend heavily on how the next round of IT contract awards shakes out this fall.