Boston's city government is sitting on a digital housekeeping problem that has been building since at least 2015. Across departments ranging from the Boston Planning & Development Agency to the Mayor's Office of New Urban Mechanics, duplicate images — the same photograph filed under two, three, sometimes a dozen different filenames — have quietly inflated storage costs and made public records requests slower and more expensive to fulfill. A citywide audit completed this spring found the problem concentrated in databases tied to permitting, housing inspections, and community engagement documentation.
The timing matters. Mayor Michelle Wu's administration has pushed hard on transparency and digital access, particularly around housing production in neighborhoods like Jamaica Plain and Dorchester, where permit records and site inspection photographs are frequently requested by community groups, lawyers, and journalists. When the same image exists in multiple records, staff have to manually reconcile files before releasing them — adding hours to responses that are already subject to tight statutory deadlines under Massachusetts public records law.
How the Problem Accumulated
The duplication issue did not happen overnight. It traces back to a fundamental structural problem: Boston's departments built their own digital filing systems largely in isolation from one another. The Inspectional Services Department, which handles building and housing code enforcement across the city's 23 neighborhoods, used a different content management platform than the BPDA's permitting portal. Neither system talked to the other. When a property on Blue Hill Avenue in Dorchester was flagged for multiple inspections over several years, images uploaded at each visit were stored separately, with no automated deduplication running in the background.
The problem compounded when the city began digitizing legacy paper files starting around 2017 as part of a broader open-government push. Scanning vendors were paid per page or per image, creating a financial incentive — whether intentional or not — that did not discourage redundancy. City IT staff later identified batches of images that had been scanned and uploaded two or three times from the same physical folder.
The MBTA's own public-facing document systems, which intersect with city planning files around transit-oriented development projects near Orange Line stations in Jamaica Plain, added another layer. Station-area planning photographs ended up in at least two separate city repositories as well as the T's own archive.
What the Audit Found and What Comes Next
The spring 2026 audit, conducted internally by the city's Department of Innovation and Technology, identified more than 40,000 image files across six major departmental databases that were flagged as likely duplicates based on hash-matching software. That figure represents a conservative estimate; the audit covered only databases migrated to the city's unified cloud infrastructure since January 2023. Older legacy systems were not included in the sweep.
Storage costs are not trivial. Cloud storage for municipal government, procured through state-negotiated contracts, runs roughly $0.023 per gigabyte per month at current rates under the Massachusetts Operational Services Division framework. High-resolution inspection photographs average between 4 and 8 megabytes each. Forty thousand duplicate images adds up to hundreds of gigabytes in redundant data sitting on taxpayer-funded servers month after month.
The city is now piloting a deduplication protocol at Inspectional Services, with a target completion date of December 2026. The protocol uses automated hash-comparison tools to flag probable duplicates for human review before deletion — a safeguard against accidentally purging records that, while visually identical, were attached to distinct legal proceedings or permit numbers. Boston Public Library's Digital Repository Services team, which has managed large-scale digitization projects in its own right, has been consulted on best practices for metadata preservation during the process.
For residents and researchers who frequently file public records requests through the city's online portal at boston.gov, the practical upshot is gradual. Response times on image-heavy requests should shorten as the cleanup proceeds, and the long-term goal is a unified asset management system that prevents new duplicates from accumulating. For now, the work is methodical and unglamorous — exactly the kind of infrastructure maintenance that rarely makes headlines until the years it goes undone start adding up on the invoice.