The Daily Boston

Boston news, every day

News

Boston's Digital Archives Push Forward on Duplicate Image Cleanup This Week

City agencies and local universities accelerate a long-delayed effort to purge redundant photographs from public-facing databases, with real consequences for residents searching housing and transit records online.

By Boston News Desk · Published 4 July 2026, 3:28 pm

3 min read

Boston's Digital Archives Push Forward on Duplicate Image Cleanup This Week
Photo: Photo by Dominik Gryzbon on Pexels

Boston's digital records infrastructure hit a visible milestone this week as the city's Department of Innovation and Technology, working alongside archivists at Boston Public Library's Copley Square branch, pushed through the first major batch deletion of duplicate images cluttering public databases — a housekeeping effort that has stalled repeatedly since a 2023 audit flagged the problem.

The cleanup matters now because the backlog has grown expensive and disorienting. Duplicate photographs have piled up across at least three major city-facing platforms: the MBTA's online service alerts portal, the Boston Planning and Development Agency's permitting database, and the city's own 311 constituent services interface. When residents search for housing permits in Dorchester or Jamaica Plain — two neighborhoods where the Wu administration has concentrated new construction approvals — they routinely pull up the same project photograph four or five times before finding updated imagery. That friction slows processing and, according to city technology staff, has contributed to a backlog of unresolved permit inquiries that numbered more than 1,400 open tickets as of late June.

What Happened This Week

On July 1, the Department of Innovation and Technology completed the first phase of an automated deduplication script applied to the BPDA's image repository. The script — built using open-source tooling and piloted internally since March — identified roughly 18,000 redundant image files across the permitting archive. Staff confirmed the initial pass removed approximately 11,200 confirmed duplicates, reducing server load on the city's AWS-hosted environment by around 14 percent, according to a project summary circulated to department heads ahead of the July Fourth holiday weekend.

Northeastern University's Civic Data Lab on Huntington Avenue has been a quiet partner in the effort. Researchers there helped the city validate the deduplication algorithm against a test set of 2,500 images drawn from MBTA station-condition reports filed between January 2024 and April 2026. The collaboration grew out of a broader data-quality initiative the lab launched in late 2024 to support municipal transparency goals under Mayor Michelle Wu's open-government agenda.

Boston Public Library's digital services team at Copley Square is separately working to resolve a parallel problem in its own collections portal, where digitized historical photographs of neighborhoods like the South End and Roxbury have been ingested multiple times through inconsistent upload workflows since 2019. The library has not published removal totals yet, but staff have confirmed the project is ongoing.

Why Redundant Images Are a Practical Problem

Storage is only part of the issue. The MBTA's public-facing elevator and escalator status pages have drawn repeated criticism from accessibility advocates because broken images — often duplicates that failed to render — left riders with disabilities unable to confirm station conditions before boarding at stops like Downtown Crossing and Back Bay. The T has been under a federal consent decree with the Federal Transit Administration since 2022, and image-quality failures on accessibility tools carry compliance implications that go beyond mere inconvenience.

For housing, the stakes are equally concrete. Jamaica Plain and Dorchester together accounted for more than 2,300 new housing unit approvals between January 2023 and December 2025, per BPDA project data. Each application typically generates between 15 and 40 uploaded images. Even a modest duplication rate multiplies quickly into tens of thousands of redundant files that complicate title searches, appeals, and community review processes.

City technology officials did not provide an exact cost figure for the full deduplication project before the Independence Day weekend, but procurement records show the Department of Innovation and Technology allocated $340,000 in fiscal year 2026 for database maintenance contracts that include image-management work.

The second phase of the automated cleanup, targeting the 311 system's photo attachments from constituent service requests, is scheduled to begin the week of July 13. Residents and community organizations that regularly submit photographs through 311 — including neighborhood associations in East Boston and Charlestown that document sidewalk and street-condition issues — should expect no disruption to active submissions during the migration window, according to the project timeline distributed internally. Longer term, the city plans to implement upload validation rules that flag suspected duplicates at the point of entry, which would prevent the backlog from rebuilding.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.