The Daily Boston

Boston news, every day

News

Boston's Duplicate Image Problem: The Key Decisions Ahead for City Archives and Public Records

A growing backlog of duplicate and unverified photographs in Boston's municipal and institutional archives is forcing city agencies, universities, and libraries to decide how to clean house — and who pays for it.

By Boston News Desk · Published 4 July 2026, 2:45 pm

3 min read

Boston's Duplicate Image Problem: The Key Decisions Ahead for City Archives and Public Records
Photo: Photo by Dominik Gryzbon on Pexels

Boston's public institutions are sitting on a quiet mess. Duplicate images — misfiled photographs, redundant digital scans, and unverified visual records — have accumulated across city archives, university libraries, and nonprofit collections to the point where administrators can no longer reliably certify what they hold. The immediate question is not whether to fix it, but who makes the call on what gets deleted, what gets kept, and on what timeline.

The issue has sharpened this summer because several federally funded digitization grants, including programs tied to the Institute of Museum and Library Services, are approaching reporting deadlines. Institutions that received IMLS funding in 2023 and 2024 must demonstrate clean, deduplicated digital collections to qualify for renewal cycles opening in early 2027. For Boston, where the university and biotech economy underwrites a substantial share of the city's archival and research infrastructure, that deadline carries real financial weight.

Where the Problem Is Concentrated

Two institutions are at the center of the local reckoning. The Boston Public Library's Digital Repository, headquartered on Boylston Street in Copley Square, holds hundreds of thousands of digitized images spanning neighborhood history, city planning records, and protest photography dating to the 1960s busing crisis. Staff there have flagged that a significant share of the repository's photograph holdings contain near-identical duplicate scans created during successive digitization campaigns — sometimes three or four versions of the same print, each catalogued separately under slightly different metadata.

Separately, Northeastern University's Archives and Special Collections on Huntington Avenue has been working through its own deduplication project since 2025, focused on visual materials related to the South End and Roxbury. The challenge, archivists there have described in public presentations, is that automated deduplication tools frequently misidentify near-duplicates as distinct images when lighting conditions or crop differences are minor. Human review remains necessary for any collection where the historical record is contested or legally sensitive.

The Boston City Archives in West Roxbury faces a related but distinct problem: analog photographs that were scanned multiple times by different contractors between 2018 and 2022 under separate city procurement contracts. The result is digital redundancy that consumes server storage and complicates Freedom of Information responses, since staff must manually confirm which version of an image is the authoritative record before releasing it.

The Decisions That Cannot Wait

Three choices are now unavoidable. First, institutions must decide whether to adopt a unified deduplication standard or allow each archive to set its own threshold for what counts as a true duplicate. The Library of Congress has published guidance recommending a perceptual hash comparison standard, but Boston's institutions have not collectively adopted it.

Second, there is a budget question. Commercial deduplication software licensed for institutional use typically runs between $8,000 and $25,000 annually depending on collection size, according to vendor pricing sheets published by companies serving the U.S. library market. For the Boston Public Library, which operates under the city's budget authority, any new software acquisition above $10,000 requires a procurement process that can take four to six months. That timeline sits uncomfortably close to the 2027 IMLS renewal window.

Third, and most consequentially, someone must establish a retention policy with legal standing. Massachusetts public records law requires that government-held images with evidentiary value be retained on a schedule approved by the Secretary of State's office. Deleting a duplicate that turns out to be the only surviving version of a legally significant photograph would expose a city agency to records destruction liability.

The Mayor's Office of New Urban Mechanics, which has previously coordinated city technology and civic data projects, is one body positioned to broker a cross-institutional framework. Whether it takes that role — or whether BPL, the City Archives, and the universities proceed independently — will shape how quickly Boston can bring its visual record into reliable order. Institutions that move first to establish clean, verified collections will be better placed for the next round of federal digitization money. Those that wait may find the window has closed before the backlog is cleared.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.