The Daily Boston

Boston news, every day

News

How Boston's Digital Archives Ended Up Riddled With Duplicate Images — And What's Being Done About It

A years-long backlog of redundant photographs in city and institutional databases has finally forced a reckoning, from City Hall to the Fenway cultural corridor.

By Boston News Desk · Published 4 July 2026, 2:40 pm

3 min read

Boston's public institutions are sitting on a problem that has been quietly compounding since at least 2018: thousands of duplicate images clogging digital archives at municipal agencies, universities, and nonprofits across the city, slowing systems, inflating storage costs, and making public records harder to search. The issue has come to a head this summer as several major institutions — including the Boston Public Library's Kirstein Business Branch on Government Center and the city's Office of Digital Innovation — have begun formal audits to identify and remove redundant files.

The timing matters. Mayor Michelle Wu's administration has made digital transparency a plank of its progressive governance agenda, pushing city departments to make permitting, housing applications, and public meeting records more accessible online. That push has meant more documents, more photographs of public projects, and more scanned files entering city servers — often uploaded multiple times, by multiple staff members, with no automated system to catch the overlap.

How the Backlog Built Up

The roots of the problem stretch back to the early digitization drives that followed Boston's 2014 Open Data Executive Order, which required city departments to publish datasets and records online. Departments without dedicated IT staff — and that included most of them — defaulted to uploading files manually through shared drives and content management systems. The same photograph of a Dorchester streetscape or a Jamaica Plain housing development might be uploaded by a planner, a communications staffer, and a public affairs officer in three separate folders, none of them talking to each other.

Boston Public Schools went through a similar reckoning in 2021 when its communications team discovered that its digital asset library contained roughly 14,000 image files, of which an internal review found nearly a third were exact or near-exact duplicates. The school department did not publish the full findings of that review, but staff familiar with the process described a cleanup effort that took the better part of six months. The MBTA's public affairs division faced a comparable situation after its 2019 rebranding campaign generated large batches of photographs of new Red Line and Orange Line rolling stock, many of which were uploaded in multiple resolutions without a naming convention.

The city's nonprofit and cultural sector has not been immune. The Boston Arts Academy Foundation and several Fenway-area institutions have been working since early 2025 with a digital asset management consultant to standardize how photographs are stored and tagged. The problem is not unique to Boston — major archives in Chicago and New York have faced similar audits — but the city's dense concentration of universities, hospitals, and public agencies sharing overlapping subject matter has made the duplication especially acute here.

What an Audit Actually Involves

Cleaning up a duplicate image problem is less dramatic than it sounds, but more labor-intensive than most administrators expect. A standard deduplication audit involves running files through hash-matching software that identifies bit-for-bit identical copies, followed by a manual review of near-duplicates — images that differ only in resolution, crop, or file format. For an archive of 50,000 images, that process typically takes four to eight weeks depending on staff capacity.

Storage costs are a genuine driver. Cloud storage for municipal governments in Massachusetts has risen sharply since 2022, with some city departments reporting per-terabyte annual costs that have more than doubled under renegotiated state contracts. Removing redundant files is one of the few cost-reduction levers that doesn't require a capital appropriation or City Council approval.

The Office of Digital Innovation, housed at Boston City Hall on Cambridge Street, has been developing a city-wide digital asset policy that would impose file naming standards and require deduplication checks before new uploads. A draft of that policy was circulating among department heads as of late June. If adopted, departments would have until January 1, 2027 to bring their existing archives into compliance — a deadline that leaves little margin for institutions still working through backlogs accumulated over a decade. For anyone managing images on behalf of a Boston public institution, the practical advice is straightforward: start the audit before the mandate lands on your desk.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.