The Daily Boston

Boston news, every day

News

Boston's Digital Archive Push Runs Into a Stubborn Problem: Duplicate Images Clogging City Databases

Officials, archivists and technology experts are weighing in on how to clean up years of redundant visual data before it derails a multimillion-dollar records modernization effort.

By Boston News Desk · Published 4 July 2026, 3:25 pm

4 min read

Boston's Digital Archive Push Runs Into a Stubborn Problem: Duplicate Images Clogging City Databases
Photo: Photo by Ren Aukeman on Pexels

Boston's push to digitize decades of city records has hit a wall that no one publicly planned for. Across multiple municipal departments, duplicate images — scanned permit documents, zoning photographs, historical property records — have accumulated in city-managed databases to the point where storage costs are climbing and retrieval times are slowing, according to city budget documents reviewed by The Daily Boston. The problem is now forcing a broader conversation about what responsible digital governance actually looks like in a city that has staked significant political capital on technology reform.

The timing matters. Mayor Michelle Wu's administration has made open government and digital accessibility central planks of her second term, and the City of Boston's Department of Innovation and Technology has been expanding its mandate since early 2025. Redundant image files are not a glamorous issue, but archivists and database administrators say they are exactly the kind of infrastructure problem that quietly erodes confidence in larger modernization projects — and can cost substantially more to fix later than to address now.

Where the Problem Shows Up

The Inspectional Services Department, based on City Hall Plaza, handles thousands of image uploads each month tied to building permits and code enforcement in neighborhoods like Jamaica Plain and Dorchester, where housing production has accelerated under Wu's tenure. Staff there have flagged internally that duplicate scans of the same permit — sometimes three or four copies of a single document — have been entering the system since at least 2023, when the department transitioned to a new permit-tracking platform. The Boston Planning Department, which merged with the former BPDA in 2024, is dealing with a parallel issue in its zoning map archive.

Library and records professionals at the Boston Public Library's Leventhal Map and Education Center on Boylston Street have been watching the municipal situation with interest. The Leventhal Center has its own rigorous deduplication workflow for its digitized historical collections, developed over several years, and staff there have spoken publicly at American Library Association forums about the operational cost of skipping that step early in a digitization project. Their experience offers a cautionary model: the center estimates that retroactive deduplication on a backlogged collection takes three to four times longer than catching duplicates at the point of upload.

What Experts Are Recommending

Database administrators and civic technology advocates in the Boston area are generally pointing toward two remedies: automated hash-based deduplication tools that flag identical or near-identical image files at ingestion, and updated upload protocols that require staff to confirm a record does not already exist before saving. Both approaches are well-established in the private sector. The question is whether the city can implement them without disrupting active workflows in departments that cannot afford downtime.

Code for Boston, the civic technology volunteer organization that has partnered with city agencies on open-data projects since 2013, has the technical capacity to advise on lightweight deduplication scripts that could run against existing databases without requiring a full system overhaul. Similar collaborations between civic tech groups and municipal governments have worked in cities like Chicago and New York, where volunteer developer communities helped audit public datasets for redundancy as part of broader transparency initiatives.

The financial stakes are real. Cloud storage is not free, and as the city's image archive grows — Inspectional Services alone processes an estimated 40,000 permit-related documents annually — the cost of storing redundant files compounds. A rough industry benchmark puts the wasted storage cost of a 20 percent duplication rate in a mid-size municipal database at tens of thousands of dollars per year, though the city has not released a specific figure for Boston's situation.

The Department of Innovation and Technology has not publicly committed to a deduplication timeline as of July 4. Advocates say the most practical next step is a formal audit of image holdings across the three most active departments — Inspectional Services, the Planning Department, and the Registry Division at City Hall — before the fiscal year 2027 budget cycle begins in the fall. Without that baseline count, officials have no reliable number to bring to the table when negotiating storage contracts or staffing for records management. The audit, experts say, does not require a large team. It requires a deadline.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.