The Daily Boston

Boston news, every day

News

Boston Leads U.S. Cities in Scrubbing Duplicate Images From Public Records — But Lags Behind London and Amsterdam

As municipalities worldwide digitize decades of archived documents, Boston's efforts to identify and remove redundant scanned images from public databases are drawing comparisons — both flattering and sobering — to peer cities abroad.

By Boston News Desk · Published 4 July 2026, 3:45 pm

3 min read

Boston Leads U.S. Cities in Scrubbing Duplicate Images From Public Records — But Lags Behind London and Amsterdam
Photo: Photo by Mohammed Abubakr on Pexels

Boston's city archives division has been quietly working through a backlog of duplicate scanned images embedded in public property and permit records since at least January 2025, a push that has gained new urgency as the Wu administration presses forward on a broader open-data initiative tied to housing production in Dorchester and Jamaica Plain. The problem is unglamorous but consequential: redundant image files clog database queries, slow permitting timelines, and inflate storage costs for systems that city departments — and residents — rely on daily.

The issue matters now because Boston, like dozens of other cities worldwide, has been racing to digitize paper records accumulated over generations. That process, done quickly and often with inconsistent scanning protocols, has left municipal databases riddled with duplicate images of the same documents — sometimes three or four versions of a single building inspection photograph or zoning map. As the city moves to open more of those records to the public through its Analyze Boston data portal, the redundancy problem has become harder to ignore.

What Boston Is Actually Doing

The Inspectional Services Department, headquartered on City Hall Plaza, has been the primary testing ground. Staff there began a systematic deduplication audit covering permit image files dating back to 2004, according to the department's publicly posted workflow documentation. The effort is being coordinated with the Mayor's Office of New Urban Mechanics, which has previously partnered with Northeastern University's Civic Data Design Lab on data quality projects. The Boston Public Library's digital repository on Boylston Street has run a parallel effort on its own scanned historical collections, piloting open-source hash-matching tools — software that compares image files at a pixel level to flag identical copies — since the spring of 2025.

Jamaica Plain's neighborhood planning files have been among the first batches cleared, partly because active rezoning discussions around the Hyde Square corridor have made clean, accessible records a practical necessity for developers and residents attending community meetings. Dorchester's records are queued next, a priority given the volume of permit activity associated with new multi-family housing along Dot Ave.

How Boston Compares to London, Amsterdam, and Chicago

The comparison with peer cities is instructive. London's Greater London Authority completed a systemwide duplicate-image purge of its planning portal in late 2023, reducing the portal's total image storage load by roughly 34 percent, according to figures the GLA published in its 2024 annual digital services report. Amsterdam's municipality went further, mandating standardized file-naming and resolution requirements for all scanned submissions beginning in 2022, which dramatically reduced the rate of duplicates entering the system in the first place — a prevention-over-cleanup model that city archivists have pointed to as the more efficient long-term approach.

Chicago's Department of Buildings launched a deduplication project in early 2024 using automated flagging software integrated directly into its permit management system, with the city reporting in a March 2025 budget document that the effort had freed approximately 12 terabytes of server space. Boston has not yet published equivalent metrics for its own audit, and the Analyze Boston portal does not currently include a public-facing dashboard tracking deduplication progress — a gap that transparency advocates have noted in public comments submitted to the city's open-data steering committee.

The absence of a prevention-first policy is the clearest vulnerability in Boston's current approach. Without standardized submission requirements, new duplicates continue to enter the system even as old ones are removed. That is the reform Amsterdam implemented four years ago, and the one that city data administrators say has saved the most staff time in the long run.

For residents and developers navigating Boston's permitting process, the practical upshot is straightforward: searches on the Analyze Boston portal and the Inspectional Services online permit lookup should become faster and more reliable as the audit progresses through 2026. Anyone filing permit applications through the city's online systems this summer should ensure they are submitting single, clearly labeled image files — not multiple scans of the same document — to avoid contributing to the backlog that city staff are still working to clear.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.