The Daily Boston

Boston news, every day

News

How Boston's Public Records Ended Up Flooded With Duplicate Images — And Why It Took Years to Notice

A quiet crisis in the city's digital archive system has been building since at least 2019, and officials are only now reckoning with the full scope of the problem.

By Boston News Desk · Published 4 July 2026, 2:40 pm

3 min read

Boston's municipal digital archives contain tens of thousands of duplicate image files — redundant photographs, scanned permits, and building inspection records stored multiple times across city servers — a problem that has compounded quietly for years and is now forcing a systematic overhaul of how the city manages its document infrastructure. The issue, which touches departments from the Boston Inspectional Services Division to the Office of Housing Stability on City Hall Plaza, surfaced formally during an internal audit completed in early 2026.

The timing matters. Mayor Michelle Wu's administration has pushed hard on digitizing city services, particularly around housing and permitting, as part of a broader effort to accelerate affordable-unit approvals in neighborhoods like Jamaica Plain and Dorchester. That acceleration meant more documents, more scans, more uploads — and, it turns out, more duplication. The audit found that redundant files were clogging storage systems and, in some cases, causing version-control failures where inspectors could not confirm which image of a property was the most recent.

How the Duplication Problem Took Root

The roots of the problem stretch back to the city's 2019 migration to a cloud-based document management platform, when multiple departments uploaded legacy files independently rather than through a single coordinated pipeline. Inspectional Services, the Boston Planning Department, and the city's public works offices each maintained separate upload protocols. Files got tagged differently, transferred more than once, and in some cases scanned again from paper originals that had already been digitized.

The MBTA's parallel push to digitize engineering documents during the same period created similar headaches — a cautionary parallel that city IT staff were apparently aware of but did not act on quickly enough. By 2023, the problem had grown large enough that staff at the Bolling Building on Washington Street, which houses several city administrative offices, were flagging duplicate records internally. No formal corrective process was launched until late 2025.

Storage costs are not trivial. Cloud storage for municipal governments running large image libraries typically runs between $0.02 and $0.05 per gigabyte per month depending on contract tier — and Boston's city archive, according to general municipal benchmarks for cities of comparable size, can easily run into the hundreds of terabytes. Redundant files compound those costs directly. A 2024 report from the National Association of Government Archives and Records Administrators found that duplicate digital records account for an estimated 30 percent of unstructured data held by mid-size American cities, a figure that has risen sharply since 2018 as digitization programs scaled up without matching data-governance policies.

What the City Is Doing About It Now

The Wu administration has directed the city's Department of Innovation and Technology to run a deduplication pass across affected databases, starting with the permitting and inspections records most critical to the housing pipeline. The Jamaica Plain Neighborhood Development Corporation and Dorchester Bay Economic Development Corporation, both of which interact regularly with city permitting systems on affordable housing projects, have been briefed on potential short-term slowdowns as the cleanup proceeds.

The deduplication effort is expected to run through the fall of 2026. City IT staff are deploying hash-based comparison tools — software that generates a unique fingerprint for each image file and flags matches — rather than manual review, which would be impractical at scale. The process is not without risk: aggressive automated deduplication can occasionally delete files that appear identical but carry different metadata, so staff will manually verify a sample of flagged records before deletion.

For residents and developers who rely on city records — whether pulling building permits on Blue Hill Avenue or checking inspection histories for properties near Egleston Square — the practical advice is straightforward: if you submitted documents to a city department between 2019 and 2024 and need to verify what the city has on file, now is the time to request a records confirmation through Boston's 311 system or directly through the relevant department. The cleanup will eventually make the archive more reliable, but the transition period carries its own uncertainties.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.