Boston's city government is sitting on hundreds of thousands of duplicate digital images — scanned permit documents, property photographs, infrastructure inspection records — that have quietly clogged municipal databases for more than a decade, costing taxpayer dollars in storage and slowing the clerks and planners who depend on accurate records every day. The problem did not emerge overnight. It accumulated through a series of decisions, budget constraints, and technology transitions that stretched from the Menino administration through today.
The issue matters now because Mayor Michelle Wu's office has made digital government efficiency a stated priority, and the city's housing production push in neighborhoods like Jamaica Plain and Dorchester is generating thousands of new permit applications monthly. When planning staff search property records and pull duplicate image files, the administrative drag is real. A single parcel on Blue Hill Avenue can carry four or five versions of the same inspection photograph stored under different file names across incompatible systems — none of them flagged as redundant.
How the Duplicates Piled Up
The roots of the problem trace to the early 2010s, when the City of Boston's Department of Innovation and Technology began digitizing paper records at scale. That effort, while well-intentioned, ran across multiple city departments with no unified naming convention and no deduplication protocol. The Inspectional Services Department, the Boston Planning & Development Agency, and the Registry Division each maintained their own document management platforms. Files moved between systems — sometimes manually, sometimes through automated batch transfers — and copies multiplied without any audit trail.
A 2019 migration to a new enterprise content management platform was supposed to consolidate the mess. Instead, it imported the redundant files wholesale. Staff at City Hall Annex on Cambridge Street flagged the problem internally at the time, but a full remediation was deferred in favor of getting the new system operational. Then the pandemic hit, remote work expanded digital file creation further, and the backlog grew.
The MBTA's parallel experience with its own asset-management databases offers a local comparison. The transit authority spent roughly three years between 2021 and 2024 untangling duplicate infrastructure photos from its Green Line Extension documentation — a cautionary tale that city IT planners have reportedly studied as they map Boston's own cleanup effort.
What a Fix Actually Looks Like
Deduplication at municipal scale is not simply a matter of running a piece of software. Identical image files saved under different metadata tags — different upload dates, different user IDs, different department codes — do not always register as duplicates to standard detection tools. The city must reconcile metadata, verify which version of a record is legally authoritative, and then archive rather than delete the redundant copies, because Massachusetts public records law requires retention schedules to be followed even for duplicates.
The Boston Public Library's Digital Repository Program, which manages the city's archival digitization partnerships, uses a checksum-based verification system that flags files sharing identical binary signatures regardless of filename. That same approach is now being evaluated for city permitting records. The library program has processed more than 1.2 million archival images since its launch, according to its publicly available program documentation, and the deduplication rate on ingested batches has run as high as 18 percent — meaning nearly one in five files submitted was already in the system.
If that ratio holds for municipal permit and inspection records, the cleanup could remove or consolidate tens of thousands of files across the Inspectional Services and BPDA databases alone. That would shrink cloud storage costs, which the city pays under a state-negotiated contract with tiered pricing based on data volume.
For residents and developers filing permits for projects in Roxbury, South Boston, or along the Washington Street corridor, the practical payoff would be faster document retrieval and fewer instances of clerks asking applicants to resubmit images already on file. City IT officials have indicated a phased remediation plan is under review, with a pilot targeting Inspectional Services records scheduled to begin before the end of the 2026 fiscal year. Whether the broader rollout gets funded in the next budget cycle will depend on how the pilot performs — and on whether the administration can keep the issue visible against competing municipal priorities.