Boston's city archives hold millions of scanned images — property deeds, permit applications, zoning filings, historical photographs — but a significant share of that digital inventory is, simply, the same document scanned twice. Or three times. Sometimes more. The problem has a name inside City Hall: duplicate image proliferation, and fixing it has become an urgent priority for the Wu administration's open-government initiative heading into the second half of 2026.
The issue matters right now because the city is deep into a $14 million overhaul of its public-records portal, a project that began in earnest in early 2025 under the Office of Digital Transformation. That platform is supposed to give residents in Dorchester, Jamaica Plain, East Boston and every other neighborhood seamless access to municipal documents. Duplicate images slow search results, inflate storage costs and, in some cases, surface outdated or superseded versions of legally significant records ahead of the accurate ones.
A Decade of Patchwork Scanning
The roots of the problem stretch back to at least 2014, when the city's Inspectional Services Department on Tremont Street began digitizing paper permit files in-house while the Registry of Deeds simultaneously ran its own scanning operation at the Suffolk County Courthouse on Pemberton Square. Neither effort was coordinated with the Boston Public Library's digital collections program on Boylston Street, which was preserving its own tranche of historical municipal maps and photographs at roughly the same time.
When Mayor Marty Walsh's administration pushed an early open-data agenda through Analyze Boston — the city's public data portal — files were ingested from all three sources without a deduplication layer. The same parcel survey could appear under three different file names, each uploaded by a different department, each tagged with slightly different metadata. By the time the Wu administration inherited the system in 2022, internal estimates put the redundancy rate in the permit-filing archive at somewhere between 18 and 23 percent of total stored images, according to a city technology assessment document circulated among department heads. Backups compounded the count further.
The financial stakes are real. Cloud storage is not free, and at municipal scale, retaining hundreds of thousands of unnecessary image files carries a measurable annual cost. The state's Public Records Law also creates legal exposure: when a resident files a request under Chapter 66 of Massachusetts General Laws, staff must be confident they are producing the definitive version of a document, not a superseded scan.
What the Cleanup Actually Looks Like
The city's approach, developed in partnership with Northeastern University's Roux Institute and the nonprofit Boston Indicators research group at the Boston Foundation on Atlantic Avenue, involves a three-stage process: automated hash-matching to flag pixel-identical duplicates, a secondary review for near-duplicates produced by different scanner settings, and a final human audit for records flagged as legally sensitive.
The first stage is already running. Inspectional Services and the Boston Planning Department — reconstituted after the Boston Planning and Development Agency was restructured in 2024 — are the initial test cases. Jamaica Plain's Washington Street corridor, one of the most active rezoning zones in the city over the past four years, generated a particularly dense cluster of redundant filings, making it a useful stress test for the matching algorithm.
Progress is slower than officials had hoped. The human audit queue, as of late June 2026, covers roughly 40,000 flagged image pairs, and the team handling the work is small. Residents and attorneys who pull records through the city's online portal may still encounter duplicates for months.
For anyone filing a public records request in the near term, city archivists recommend specifying the exact document date and the department of origin in the request form — details that help clerks route the query to the canonical version of a file rather than whichever copy surfaces first in a general search. The Office of Digital Transformation has published a plain-language guide to filing precise requests on the Boston.gov website, updated in May 2026. That is the most practical tool available until the deduplication project reaches its later stages, currently projected for the first quarter of 2027.