Boston's municipal archives hold somewhere north of 14 million scanned documents, and a significant chunk of that catalogue — city officials have privately acknowledged for months — is clogged with duplicate images. The same crumbling triple-decker on Dudley Street appears tagged under three different parcel numbers. A Jamaica Plain zoning hearing from 2019 shows up in four separate departmental folders. The problem is not unique to Boston, but the city's response increasingly is.
The issue landed on the desk of the Mayor's Office of New Urban Mechanics earlier this year when a routine audit of the City of Boston's Assessing Department database flagged thousands of redundant property photographs uploaded during the pandemic-era scramble to digitize paper records. The audit, completed in March 2026, was part of a broader push tied to a U.S. National Archives directive requiring municipalities receiving federal preservation grants to demonstrate data integrity by December 31, 2026.
Why Boston's Approach Differs From London and Amsterdam
London's approach, rolled out through the Greater London Authority's Datastore initiative in late 2024, relies on perceptual hashing — an algorithmic method that flags near-identical images even when filenames differ — applied across all 32 borough councils. Amsterdam went further, integrating deduplication directly into its Stadsarchief ingest pipeline so duplicates are caught before they enter the permanent record. Boston has not yet adopted either model at scale.
What Boston does have is the Office of Digital Innovation, housed at City Hall on Cambridge Street, which has been piloting an AI-assisted deduplication tool across the Inspectional Services Department since January 2026. The pilot covers roughly 800,000 images tied to building permits in Dorchester and East Boston — two of the city's most active construction corridors. Early results, shared internally but not yet publicly released, suggest the tool flags duplicate or near-duplicate entries at a rate of about one in eleven images, a figure consistent with what Amsterdam's Stadsarchief reported when it audited its own backlog in 2023.
The comparison matters because both London and Amsterdam tied their deduplication programs to broader open-data commitments, making clean, deduplicated image sets available to researchers, journalists, and developers. Boston's version remains largely internal. The city's open data portal, Analyze Boston, currently hosts property inspection photographs only in limited, request-based formats — a gap that civic tech groups including Code for Boston, which meets weekly near South Station, have flagged as a barrier to independent accountability work.
Pressure Mounts Ahead of December Federal Deadline
The federal deadline is concentrating minds at the Massachusetts Archives on Columbia Point, which manages state-level records but coordinates with Boston on shared digitization projects. The National Archives' digitization grant program, administered under the National Historical Publications and Records Commission, requires grant recipients to submit a data-quality certification by year's end. Boston received a NHPRC grant of $249,000 in 2024 for the Inspectional Services digitization project, making compliance non-negotiable.
For residents and property owners, the practical stakes are real. Duplicate images in the Assessing Department database have, in documented cases, caused permit processing delays when inspectors pull the wrong photograph version during a hearing. A homeowner on Blue Hill Avenue attempting to contest an assessed valuation in early 2026 found their file contained conflicting exterior photographs taken three years apart, both labeled as current. The case was eventually resolved, but it added weeks to the process.
City officials are expected to present a full deduplication roadmap to the Boston City Council's Committee on Government Operations before Labor Day. If the pilot in Dorchester and East Boston is expanded citywide on the current timeline, the cleanup process will run through mid-2027 — six months past the federal certification deadline, which will likely require the city to request an extension or submit a partial compliance report. Code for Boston and the Northeastern University Civic Data Design Lab have both signaled interest in partnering on a public-facing version of the tool, which could put Boston ahead of where London's GLA Datastore sat when it launched. That conversation is still early.