The Daily Boston

Boston news, every day

News

Boston Archives and City Agencies Race to Fix a Duplicate-Image Crisis Buried in Digital Records

A week of audits and software patches has exposed how years of sloppy digitization left public databases riddled with redundant files — and what the city is doing about it right now.

By Boston News Desk · Published 4 July 2026, 3:16 pm

3 min read

Boston Archives and City Agencies Race to Fix a Duplicate-Image Crisis Buried in Digital Records
Photo: Photo by Mohan Nannapaneni on Pexels

Boston's municipal digitization effort hit a concrete wall this week when city technology staff confirmed that a systematic duplicate-image problem, accumulated over roughly five years of document scanning at City Hall on Cambridge Street, has inflated the public records database by an estimated 30 percent in raw storage volume. The disclosure came after IT workers running routine maintenance ahead of the July 4 holiday weekend flagged thousands of repeated image files across multiple departments, including the Office of Housing Stability and the Boston Planning Department.

The timing matters. Mayor Michelle Wu's administration has staked a significant part of its operational credibility on transparency and digital access to city services — including zoning documents and housing permits critical to the Jamaica Plain and Dorchester development pipelines. When duplicate scans clog those systems, permit searches slow down, staff waste hours manually cross-referencing files, and developers waiting on approvals for projects along Washington Street or Bowdoin Street face unpredictable delays.

What Happened This Week

On Tuesday, July 1, the city's Department of Innovation and Technology deployed an automated deduplication script across three internal servers that house scanned records dating to 2021. By Thursday evening, technicians had flagged more than 14,000 redundant image files in the housing permits archive alone — many of them identical TIFF scans uploaded twice during a 2022 batch-processing error tied to a now-discontinued third-party scanning contractor. The Boston Public Library's Digital Commonwealth partnership, which hosts some of the city's historical records for public access, was not affected, library officials said in a brief public statement posted to the BPL website on July 2.

The Massachusetts Secretary of State's public records division requires municipalities to maintain searchable, accurate digital archives under Chapter 66 of the General Laws. Duplicate files, depending on how indexing software handles them, can generate false search returns — meaning a resident requesting a specific permit record might receive the wrong version of a document. The city has not publicly disclosed whether any such incorrect records were provided to residents or attorneys during the affected period, and a request for clarification sent to the DoIT communications office Thursday had not been returned by press time.

Northeastern University's Cybersecurity and Privacy Institute, based on Huntington Avenue, has previously consulted with Boston on digital records management, though there is no confirmation the university has been engaged in this specific remediation effort. The Greater Boston Legal Services office on Tremont Street, which frequently pulls housing court records on behalf of low-income tenants, said in a general staff notice circulated earlier this year — unrelated to this week's events — that database search reliability is a persistent operational concern for its attorneys.

What Comes Next

City officials plan to run a second pass of the deduplication tool across the zoning and inspectional services archives by July 18, according to a project timeline posted to the DoIT internal portal and reviewed by The Daily Boston. The process requires manual verification for any file where the automated script assigns a confidence score below 95 percent — meaning human reviewers will need to clear an estimated 2,100 files individually before the corrected database goes live for public search.

For residents and attorneys who rely on the Inspectional Services Department's online portal at City Hall to pull building records in neighborhoods like South Boston or East Boston, the practical advice is straightforward: if a document search this week returns duplicate results or mismatched file names, submit a formal written request directly to the department rather than relying on the automated portal. ISD staff are currently triaging those requests manually.

The broader lesson here is less about technology than procurement. The 2022 scanning contract that introduced the original error was a short-term deal worth under $200,000, awarded without a long-term data quality clause, according to city procurement records posted to the Boston Finance Cabinet's transparency portal. The administration is now reviewing whether future digitization contracts will require vendors to certify deduplication standards before files are ingested into city systems — a policy change that, if adopted, would take effect for contracts issued after October 1.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.