The Daily Boston

Boston news, every day

News

Boston's Duplicate Image Problem: The Key Decisions Ahead for City Records and Public Archives

As municipal digitization accelerates across Boston's neighborhoods, officials face a mounting backlog of redundant image files—and the choices they make now will shape public access to city records for years.

By Boston News Desk · Published 4 July 2026, 3:41 pm

3 min read

Boston's Duplicate Image Problem: The Key Decisions Ahead for City Records and Public Archives
Photo: Photo by Jack Sherman on Pexels

Boston's push to digitize decades of paper records has produced an unexpected headache: thousands of duplicate images clogging the city's digital archive systems, slowing public records requests and straining the storage infrastructure that underpins everything from permit lookups in Dorchester to deed searches near Faneuil Hall. The problem, which has grown alongside accelerating scanning efforts at City Hall and the Boston City Archives on West Roxbury Parkway, is now forcing a set of decisions that archivists and city technology officials can no longer defer.

The timing is not accidental. Mayor Michelle Wu's administration has pushed hard on open-data and transparency commitments since 2022, and the digitization effort was meant to be a flagship deliverable. But bulk scanning operations—particularly those tied to housing records in Jamaica Plain and building permits in South Boston—routinely produce multiple image files for the same document when scanners malfunction, operators re-run batches, or format conversions generate secondary copies. The result is an archive that is nominally comprehensive but practically difficult to search.

What the Backlog Actually Looks Like

The Boston City Archives holds physical and digital records dating to the colonial era, with the bulk of recent digitization concentrated on post-1980 municipal documents. Industry standards in records management suggest that duplicate image rates in large-scale municipal scanning projects can run as high as 15 to 20 percent of total files before deduplication software is applied—a figure that, projected against Boston's known scanning volumes, implies tens of thousands of redundant files across the system. No official public count of Boston's specific duplicate inventory has been released as of this writing.

The Massachusetts Secretary of State's office, which oversees public records law statewide, sets a 10-business-day response window for records requests under Chapter 66 of the General Laws. Archivists and records managers working with large duplicate-heavy databases consistently report that retrieval times climb when staff must manually verify which copy of a document is the authoritative one. For residents trying to pull building inspection records on Blue Hill Avenue or zoning decisions affecting Egleston Square, that delay is not abstract.

The Boston Public Library's Norman B. Leventhal Map Center, which has separately managed its own high-profile digitization projects, completed a deduplication and metadata audit of roughly 10,000 map images in 2023—a project that took about 14 months and required custom scripting alongside commercial software. That experience offers a rough benchmark for what city hall would face with a larger, more legally sensitive document set.

The Decisions That Cannot Wait

Three choices sit at the center of what comes next. First, city technology staff must decide whether to pursue automated hash-matching deduplication—fast, but prone to flagging legitimately distinct documents that scanned identically—or manual audit workflows, which are slower but legally defensible for records that may end up in court proceedings. Second, officials must determine which department owns the problem: the city's Department of Innovation and Technology on City Hall Plaza, or the Archives on West Roxbury Parkway. Divided ownership has historically stalled projects of this kind in mid-size American cities. Third, and most consequentially, the administration must decide how to handle the period between now and full remediation—specifically, whether public records responses during that window will flag known duplication issues or simply deliver the file as found.

Housing advocates in Roxbury and Dorchester have a direct stake in the outcome. Permit and inspection records are central to tenant-side litigation and code-enforcement complaints, and a database that produces duplicate or ambiguous returns undermines exactly the transparency the Wu administration has promoted. The MBTA's own parallel digitization of maintenance logs—relevant to the ongoing federal safety oversight still in effect as of mid-2026—illustrates how document integrity problems compound quickly when records feed into regulatory and legal processes.

A city-funded RFP for a broader digital records management overhaul was expected to move through the procurement process before the end of fiscal year 2026, which closed June 30. Whether that contract has been awarded, and what scope it covers for deduplication specifically, will be the clearest signal of how seriously the administration intends to treat the problem before it metastasizes further.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.