The Daily Boston

Boston news, every day

News

How Boston's Public Records Ended Up Full of Duplicate Images — and Why Fixing It Took Years

A quiet data-quality crisis inside city hall's digital archive system has been building since the early 2010s, and the push to finally clean it up is reshaping how Boston manages its own institutional memory.

By Boston News Desk · Published 4 July 2026, 2:43 pm

3 min read

How Boston's Public Records Ended Up Full of Duplicate Images — and Why Fixing It Took Years
Photo: Photo by Mohammed Abubakr on Pexels

Boston's city government is sitting on tens of thousands of duplicate digital images — photographs, scanned permits, zoning maps, and inspection records — spread across at least four separate document management platforms, according to city technology staff who have discussed the problem at public oversight hearings. The duplication problem didn't happen overnight. It is the accumulated result of more than a decade of piecemeal digitization drives, agency mergers, and software migrations that were never fully reconciled.

The issue matters now because Mayor Michelle Wu's administration has tied its open-government commitments to the performance of Boston's public-facing data portals, including Analyze Boston, the city's open data hub hosted at data.boston.gov. Redundant image files bloat storage costs, slow search results, and — most critically — create conflicting records when the same document appears in two places with different metadata. For residents in Jamaica Plain trying to pull historical zoning records on Boylston Street, or Dorchester homeowners checking inspection histories on Bowdoin Avenue, a duplicate-image error isn't abstract. It can mean getting the wrong building permit date or a misfiled inspection photo from a different property entirely.

A Problem Rooted in the 2010s Digitization Push

The roots go back to roughly 2011 and 2012, when multiple Boston departments — the Inspectional Services Department, the Boston Planning and Development Agency's predecessor office, and the Registry Division — each launched independent scanning projects with separate vendors and separate file-naming conventions. There was no citywide standard at the time. A photograph of a condemned triple-decker on Geneva Avenue might be saved as a JPEG in one system and a TIFF in another, with different file names, different geographic tags, and different retention flags. Neither system knew the other copy existed.

Then came the 2014 consolidation of several permitting functions under what eventually became the Inspectional Services Department's online portal. Files were migrated, but a full deduplication sweep was never completed. The problem compounded again in 2018 and 2019, when the city moved toward cloud storage through its contract with a commercial records management vendor, pulling legacy files from on-premise servers at City Hall on School Street into a new environment. Duplicate images moved right along with the originals.

The Boston Public Library's Digital Repository, based at the Central Library in Copley Square, faced a parallel but distinct version of the same challenge when it expanded its digitized collections after 2017. Librarians there developed internal hashing protocols — essentially digital fingerprinting — to catch duplicate image files before they entered the archive. That approach is now being studied by city technology staff as a potential model for municipal records.

What a Fix Actually Looks Like

Deduplication at scale is not a one-afternoon project. Industry estimates for large municipal archives — those holding more than two million files, which Boston's combined systems almost certainly exceed — put the staff time required for a full audit-and-purge cycle at anywhere from 18 to 36 months, depending on how many legacy formats are involved. Boston has not publicly released a cost estimate or a firm project timeline for its current effort, which is being coordinated through the Department of Innovation and Technology on City Hall Plaza.

The practical advice for residents and researchers in the meantime is straightforward: when pulling records through Analyze Boston or through the Inspectional Services public portal, cross-reference any image file against its associated permit number and the listed record-creation date. If two images share a permit number but carry different dates, flag the discrepancy directly with the relevant department. City staff have indicated at past oversight sessions that user-reported conflicts are currently one of the more reliable ways duplicate entries get identified and escalated for manual review.

The longer-term goal, as framed in the Wu administration's 2025 digital-services priorities document, is a unified content repository with automated deduplication built into the ingest pipeline — meaning new files would be checked against the existing archive before being saved. Getting there requires finishing the audit first. On the Fourth of July 2026, that audit is still underway.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.