The Daily Boston

Boston news, every day

News

Boston City Archivists Race to Fix a Digital Mess: Duplicate Images Clogging Public Records Systems

A weeks-long audit of the city's digital asset library has exposed thousands of redundant image files across municipal departments, prompting a scramble to clean up records before the next budget cycle.

By Boston News Desk · Published 4 July 2026, 3:06 pm

3 min read

Boston City Archivists Race to Fix a Digital Mess: Duplicate Images Clogging Public Records Systems
Photo: David Adam Kess / CC BY-SA 4.0 (Wikimedia Commons)

Boston city archivists wrapped up the first phase of a duplicate-image audit this week, identifying more than 14,000 redundant image files spread across at least six municipal departments, according to a summary presented to the city's Office of Digital Innovation on Thursday. The cleanup effort, which began in late May, affects everything from permit photographs stored by the Inspectional Services Department to aerial survey imagery held by the Boston Planning Department offices on City Hall Plaza.

The timing matters. The Wu administration has pushed hard over the past eighteen months to consolidate city data systems under a unified digital infrastructure — part of a broader modernization drive that also encompasses MBTA coordination data and the city's affordable housing production tracking in neighborhoods like Jamaica Plain and Dorchester. Bloated image libraries slow that consolidation down, add unnecessary cloud storage costs, and create legal headaches when duplicate files attached to public records requests produce conflicting document sets.

How the Problem Built Up

The duplication problem is not new, but it got measurably worse during the pandemic years. When city staff shifted to remote work beginning in March 2020, departments began saving photographs and scanned documents to personal shared drives, departmental servers, and a citywide SharePoint environment simultaneously. Nobody was reconciling the copies. By the time staff returned to offices at One City Hall Square, the overlap had compounded across hundreds of individual user accounts.

The Office of Digital Innovation, working alongside the Boston Archives division housed in the West End, deployed a deduplication software tool starting June 2 to flag identical and near-identical image files. The tool flagged files using hash-matching — a standard technique that identifies bit-for-bit copies — and a perceptual similarity algorithm that catches images cropped or resaved at slightly different resolutions. Inspectional Services alone accounted for roughly 4,200 of the flagged files, most of them construction-site photographs taken by inspectors who emailed images to supervisors while also uploading them to a central case-management platform.

The practical cost is real. City cloud storage contracts, last renewed in fiscal year 2024, bill at tiered rates that increase once storage volumes cross defined thresholds. Carrying thousands of duplicate files pushes departments closer to those thresholds unnecessarily. While the city has not publicly released the line-item figure for excess storage charges, digital records specialists place the annual waste for mid-sized municipal systems of comparable scale in the range of $40,000 to $120,000 per year — money that could otherwise fund staff hours or equipment.

What Comes Next for Departments and the Public

Archivists plan to complete manual review of the flagged files by July 31, a deadline driven partly by the city's fiscal year 2027 budget taking effect August 1. Department heads at Inspectional Services and the Boston Public Library's Digital Commonwealth program — which shares some archival infrastructure with the city — have each been asked to designate a records liaison to sign off on deletions before anything is permanently removed.

The Boston Public Library's Copley Square main branch holds a parallel digital archive of historical photographs, some dating to the 1850s, that sits on separate infrastructure and is not part of this audit. Archivists were careful to draw that distinction publicly, given the sensitivity around permanent deletion of historical materials.

For residents who file public records requests — a volume that reached roughly 38,000 requests citywide in fiscal year 2025, per figures the city released earlier this year — the cleanup should eventually mean faster response times. Fewer redundant files means records staff spend less time sorting through competing versions of the same document before responding.

The Office of Digital Innovation expects to publish a summary report on the audit's findings by mid-August. Department liaisons have until July 18 to flag any files they believe were incorrectly marked for removal. Anyone with questions about a specific public records request can contact the city's Records Access Officer through the Boston.gov portal.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.