Boston's municipal digital archives contain tens of thousands of duplicate image files — redundant photographs, scanned permits, and building inspection records stored multiple times across city servers — a problem that has compounded quietly for years and is now forcing a systematic overhaul of how the city manages its document infrastructure. The issue, which touches departments from the Boston Inspectional Services Division to the Office of Housing Stability on City Hall Plaza, surfaced formally during an internal audit completed in early 2026.
The timing matters. Mayor Michelle Wu's administration has pushed hard on digitizing city services, particularly around housing and permitting, as part of a broader effort to accelerate affordable-unit approvals in neighborhoods like Jamaica Plain and Dorchester. That acceleration meant more documents, more scans, more uploads — and, it turns out, more duplication. The audit found that redundant files were clogging storage systems and, in some cases, causing version-control failures where inspectors could not confirm which image of a property was the most recent.
How the Duplication Problem Took Root
The roots of the problem stretch back to the city's 2019 migration to a cloud-based document management platform, when multiple departments uploaded legacy files independently rather than through a single coordinated pipeline. Inspectional Services, the Boston Planning Department, and the city's public works offices each maintained separate upload protocols. Files got tagged differently, transferred more than once, and in some cases scanned again from paper originals that had already been digitized.
The MBTA's parallel push to digitize engineering documents during the same period created similar headaches — a cautionary parallel that city IT staff were apparently aware of but did not act on quickly enough. By 2023, the problem had grown large enough that staff at the Bolling Building on Washington Street, which houses several city administrative offices, were flagging duplicate records internally. No formal corrective process was launched until late 2025.
Storage costs are not trivial. Cloud storage for municipal governments running large image libraries typically runs between $0.02 and $0.05 per gigabyte per month depending on contract tier — and Boston's city archive, according to general municipal benchmarks for cities of comparable size, can easily run into the hundreds of terabytes. Redundant files compound those costs directly. A 2024 report from the National Association of Government Archives and Records Administrators found that duplicate digital records account for an estimated 30 percent of unstructured data held by mid-size American cities, a figure that has risen sharply since 2018 as digitization programs scaled up without matching data-governance policies.
What the City Is Doing About It Now
The Wu administration has directed the city's Department of Innovation and Technology to run a deduplication pass across affected databases, starting with the permitting and inspections records most critical to the housing pipeline. The Jamaica Plain Neighborhood Development Corporation and Dorchester Bay Economic Development Corporation, both of which interact regularly with city permitting systems on affordable housing projects, have been briefed on potential short-term slowdowns as the cleanup proceeds.
The deduplication effort is expected to run through the fall of 2026. City IT staff are deploying hash-based comparison tools — software that generates a unique fingerprint for each image file and flags matches — rather than manual review, which would be impractical at scale. The process is not without risk: aggressive automated deduplication can occasionally delete files that appear identical but carry different metadata, so staff will manually verify a sample of flagged records before deletion.
For residents and developers who rely on city records — whether pulling building permits on Blue Hill Avenue or checking inspection histories for properties near Egleston Square — the practical advice is straightforward: if you submitted documents to a city department between 2019 and 2024 and need to verify what the city has on file, now is the time to request a records confirmation through Boston's 311 system or directly through the relevant department. The cleanup will eventually make the archive more reliable, but the transition period carries its own uncertainties.