The Daily Boston

Boston news, every day

News

Boston's Digital Archives Are Full of Duplicate Images. Officials and Experts Say That's a Bigger Problem Than You'd Think.

City agencies, universities, and cultural institutions are grappling with how to clean up cluttered digital collections — and who should pay for it.

By Boston News Desk · Published 4 July 2026, 2:43 pm

3 min read

Boston's Digital Archives Are Full of Duplicate Images. Officials and Experts Say That's a Bigger Problem Than You'd Think.
Photo: Photo by Dominik Gryzbon on Pexels

Boston's public institutions are sitting on digital image libraries riddled with duplicates, outdated files, and redundant records — and the effort to fix that problem is drawing attention from city hall to Copley Square. Archivists, municipal tech officers, and academic librarians are now pushing for a coordinated approach, arguing that bloated digital collections cost real money and undermine public access to government records.

The issue has sharpened over the past year as Mayor Michelle Wu's administration has accelerated its open-data and digital-services agenda. Managing accurate, searchable image databases is foundational to that work — whether it's construction permit photos filed with the Boston Inspectional Services Department, heritage images held by the Boston Public Library's Digital Commonwealth program, or public-health documentation archived by Boston Public Health Commission staff at 1010 Massachusetts Avenue.

Why Duplicate Images Become an Institutional Headache

Duplicate image replacement — the process of identifying redundant files, selecting a canonical version, and systematically retiring the rest — sounds mundane. It is not. When city agencies and universities each maintain separate digital asset management systems with overlapping content, researchers and journalists pulling records under Chapter 66 of Massachusetts public records law can receive inconsistent document sets. Archivists at institutions including Northeastern University's Snell Library and the Boston Athenaeum on Beacon Street have flagged this as a growing concern as collections digitized in the early 2010s age into obsolescence and file duplication compounds.

At the Boston Public Library's Kirstein Business Branch, digital services staff have been working through the Digital Commonwealth repository — a statewide platform hosting collections from more than 170 Massachusetts institutions — to address metadata gaps and image redundancy. The effort is part of a broader push to make holdings more usable for the public, but it requires specialized labor that many smaller partner institutions struggle to fund.

Academic technologists at MIT's Digital Humanities group, based in Cambridge just across the Charles River, have noted that automated deduplication tools can flag roughly 15 to 30 percent of images in large institutional repositories as probable duplicates — though human review remains essential before any file is retired. That range reflects findings published in library and information science literature over the past several years, not a figure specific to any single Boston institution.

Money, Staffing, and the City's Role

Cost is the central argument city officials are making for taking the problem seriously now rather than later. Cloud storage is not free. The City of Boston's Department of Innovation and Technology, which oversees municipal data infrastructure, has been reviewing its digital asset overhead as part of budget planning that carried into fiscal year 2026. Storage costs for unmanaged, duplicate-heavy image archives scale quickly when agencies like the Boston Parks and Recreation Department or the Office of Arts and Culture are each uploading event photography without centralized deduplication protocols.

Jamaica Plain's Spontaneous Celebrations arts organization and the Dorchester-based Codman Square Neighborhood Development Corporation have both contributed images to city-affiliated community documentation projects, adding to the complexity of who owns what version of a file once it enters a municipal or quasi-public repository.

Library and records professionals point to a practical path forward: institutions need written deduplication policies, not just software. They recommend designating a canonical file standard — typically the highest resolution original — before any automated replacement process runs. They also stress that audit trails matter; any replaced image should be logged with a timestamp and a reason code, both for legal defensibility under Massachusetts records-retention schedules and for the integrity of the historical record.

For Boston residents and researchers, the practical upshot is straightforward. If you are filing a public records request with a city agency and receive image files that appear inconsistent or mislabeled, you have the right under state law to request clarification on which version is the official record. The Secretary of State's office in Boston maintains guidance on that process. For institutions still sorting out their internal systems, the message from archivists is consistent: start with an audit, not a deletion.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.