The Daily Boston

Boston news, every day

News

Boston's Digital Archives Are Drowning in Duplicate Images — Here's How the City Stacks Up Against London and Amsterdam

From Roxbury demolition records to Fenway construction permits, city agencies are grappling with terabytes of redundant visual data, and the clock is ticking on a federal digitization deadline.

By Boston News Desk · Published 4 July 2026, 3:45 pm

3 min read

Boston's Digital Archives Are Drowning in Duplicate Images — Here's How the City Stacks Up Against London and Amsterdam
Photo: Photo by Richard Lathrop on Pexels

Boston's municipal archives hold somewhere north of 14 million scanned documents, and a significant chunk of that catalogue — city officials have privately acknowledged for months — is clogged with duplicate images. The same crumbling triple-decker on Dudley Street appears tagged under three different parcel numbers. A Jamaica Plain zoning hearing from 2019 shows up in four separate departmental folders. The problem is not unique to Boston, but the city's response increasingly is.

The issue landed on the desk of the Mayor's Office of New Urban Mechanics earlier this year when a routine audit of the City of Boston's Assessing Department database flagged thousands of redundant property photographs uploaded during the pandemic-era scramble to digitize paper records. The audit, completed in March 2026, was part of a broader push tied to a U.S. National Archives directive requiring municipalities receiving federal preservation grants to demonstrate data integrity by December 31, 2026.

Why Boston's Approach Differs From London and Amsterdam

London's approach, rolled out through the Greater London Authority's Datastore initiative in late 2024, relies on perceptual hashing — an algorithmic method that flags near-identical images even when filenames differ — applied across all 32 borough councils. Amsterdam went further, integrating deduplication directly into its Stadsarchief ingest pipeline so duplicates are caught before they enter the permanent record. Boston has not yet adopted either model at scale.

What Boston does have is the Office of Digital Innovation, housed at City Hall on Cambridge Street, which has been piloting an AI-assisted deduplication tool across the Inspectional Services Department since January 2026. The pilot covers roughly 800,000 images tied to building permits in Dorchester and East Boston — two of the city's most active construction corridors. Early results, shared internally but not yet publicly released, suggest the tool flags duplicate or near-duplicate entries at a rate of about one in eleven images, a figure consistent with what Amsterdam's Stadsarchief reported when it audited its own backlog in 2023.

The comparison matters because both London and Amsterdam tied their deduplication programs to broader open-data commitments, making clean, deduplicated image sets available to researchers, journalists, and developers. Boston's version remains largely internal. The city's open data portal, Analyze Boston, currently hosts property inspection photographs only in limited, request-based formats — a gap that civic tech groups including Code for Boston, which meets weekly near South Station, have flagged as a barrier to independent accountability work.

Pressure Mounts Ahead of December Federal Deadline

The federal deadline is concentrating minds at the Massachusetts Archives on Columbia Point, which manages state-level records but coordinates with Boston on shared digitization projects. The National Archives' digitization grant program, administered under the National Historical Publications and Records Commission, requires grant recipients to submit a data-quality certification by year's end. Boston received a NHPRC grant of $249,000 in 2024 for the Inspectional Services digitization project, making compliance non-negotiable.

For residents and property owners, the practical stakes are real. Duplicate images in the Assessing Department database have, in documented cases, caused permit processing delays when inspectors pull the wrong photograph version during a hearing. A homeowner on Blue Hill Avenue attempting to contest an assessed valuation in early 2026 found their file contained conflicting exterior photographs taken three years apart, both labeled as current. The case was eventually resolved, but it added weeks to the process.

City officials are expected to present a full deduplication roadmap to the Boston City Council's Committee on Government Operations before Labor Day. If the pilot in Dorchester and East Boston is expanded citywide on the current timeline, the cleanup process will run through mid-2027 — six months past the federal certification deadline, which will likely require the city to request an extension or submit a partial compliance report. Code for Boston and the Northeastern University Civic Data Design Lab have both signaled interest in partnering on a public-facing version of the tool, which could put Boston ahead of where London's GLA Datastore sat when it launched. That conversation is still early.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.