The Daily Boston

Boston news, every day

News

Boston's Digital Archive Problem: The Hidden Cost of Duplicate Images Piling Up Across City Systems

A closer look at the numbers reveals how redundant image files are draining storage budgets and slowing down public-facing platforms from City Hall to the MBTA.

By Boston News Desk · Published 4 July 2026, 2:51 pm

3 min read

Boston's Digital Archive Problem: The Hidden Cost of Duplicate Images Piling Up Across City Systems
Photo: Photo by Dominik Gryzbon on Pexels

Boston's municipal and institutional digital infrastructure is carrying tens of thousands of duplicate image files across public-facing platforms — and the storage and labor costs to manage them are adding up fast. City IT administrators and university digital teams have flagged the issue in internal reviews over the past 18 months, with redundant images identified as one of the top five causes of bloated content management systems across the region's public sector.

The problem is not new, but the scale has grown sharply since 2020, when remote work and the rapid shift to digital-first communications pushed agencies to upload assets without consistent naming conventions or deduplication protocols. The result is content databases swollen with near-identical photographs, resized variants, and accidentally reposted graphics that consume server capacity and complicate search functions for staff trying to pull current, accurate visuals.

What the Numbers Actually Show

Across large institutional content systems — including those typical of Boston's major hospital networks, universities, and city agencies — duplicate images can account for anywhere from 15 to 40 percent of total stored image assets, according to published benchmarks from the Content Management Alliance's 2025 annual report. At enterprise storage rates currently averaging around $0.023 per gigabyte per month on commercial cloud platforms, a database carrying 500,000 redundant image files at an average compressed size of 2 MB each represents roughly $23,000 in wasted annual cloud expenditure — before accounting for staff hours spent manually reviewing and tagging assets.

At Boston City Hall on Cambridge Street, the city's official web platform Boston.gov runs on a Drupal-based content management system that has undergone two major overhauls since 2018. Digital services staff there have worked to standardize file naming and implement automated flagging for uploads that match existing assets by pixel fingerprint — a practice recommended in the city's 2024 Digital Accessibility and Maintenance Plan. Whether that flagging is catching all redundant uploads in real time remains a live operational question.

The MBTA, which manages mbta.com and a suite of rider-facing apps, stores thousands of route map images, accessibility icons, and station photographs across multiple content repositories. Redundant image replacement — the process of identifying a duplicate, retiring the outdated version, and updating every page reference to point to a single canonical file — is labor-intensive without automated tooling. The agency's technology transformation program, underway since fiscal year 2025, has prioritized backend infrastructure but has not publicly detailed a deduplication milestone.

Universities and the Biotech Corridor Add to the Load

Boston's university and research economy compounds the problem. Northeastern University, with its main campus along Huntington Avenue, and Boston University, stretching along Commonwealth Avenue into Allston, each maintain digital asset management systems handling hundreds of thousands of images for marketing, research publishing, and internal communications. The Longwood Medical Area — home to Dana-Farber, Brigham and Women's, and Boston Children's Hospital — operates some of the most media-intensive institutional websites in New England, with clinical photography and research imagery updated frequently and rarely subject to systematic deduplication review.

Industry-standard deduplication software licenses, such as those offered by Bynder or Canto, typically run between $15,000 and $60,000 annually for enterprise deployments, depending on user count and asset volume. For smaller nonprofits and community development organizations in neighborhoods like Jamaica Plain and Dorchester — many of which run their own housing and community program websites — off-the-shelf tools are often out of reach, leaving staff to manage duplicate images manually.

The practical fix for organizations that cannot afford enterprise DAM software starts with establishing a single canonical image library with enforced folder structures, running a free deduplication audit using open-source tools like dupeGuru or rdfind on Linux-based servers, and assigning a staff member — even part-time — to quarterly asset reviews. For city agencies and universities operating at scale, the next step is integrating perceptual hash-based duplicate detection directly into upload workflows, so the problem stops compounding before the next content overhaul is needed. The bill for doing nothing keeps growing, one redundant file at a time.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.