The Daily Boston

Boston news, every day

News

The Numbers Behind Boston's Duplicate Image Crisis: What the Data Actually Shows

City agencies, universities, and nonprofits are drowning in redundant digital assets — and the costs, measured in storage dollars and staff hours, are adding up fast.

By Boston News Desk · Published 4 July 2026, 2:48 pm

3 min read

The Numbers Behind Boston's Duplicate Image Crisis: What the Data Actually Shows
Photo: kolektif / CC0 (Wikimedia Commons)

Boston's public-sector digital infrastructure is carrying a hidden weight. Across municipal agencies, research institutions on Longwood Avenue, and community nonprofits in Dorchester and Jamaica Plain, duplicate image files — identical or near-identical photographs stored multiple times across disconnected servers — now account for an estimated 30 to 40 percent of total digital asset storage loads at mid-sized organizations, according to industry benchmarks published by the Digital Asset Management Society in its 2025 annual report. The city's own technology ecosystem, one of the most dense in the Northeast, makes Boston a particularly acute case study.

The timing matters. The Wu administration's broader push toward consolidated city services — including an ongoing overhaul of Boston.gov and the work of the Mayor's Office of New Urban Mechanics — has forced department heads to audit what they actually hold in cloud and on-premise storage. What they are finding, administrators at several agencies have acknowledged in public budget presentations, is redundancy on a scale that surprises even seasoned IT managers. With the city's fiscal year 2027 budget debate already underway at City Hall on Cambridge Street, storage consolidation has moved from a back-office annoyance to a line-item concern.

What the Numbers Actually Look Like

The math is not abstract. Enterprise cloud storage rates from major providers currently run between $0.02 and $0.023 per gigabyte per month for standard access tiers. An organization holding 10 terabytes of image assets — well within range for a mid-sized Boston nonprofit or a university department — and carrying 35 percent duplication is paying for roughly 3.5 terabytes of storage it does not need. Over 12 months, that redundancy alone can cost between $840 and $966 per year in pure storage fees, before factoring in bandwidth, retrieval costs, or staff time spent searching through redundant libraries.

Staff time is where the real cost accumulates. Researchers at Northeastern University's Roux Institute, which studies digital workflow efficiency, have documented that knowledge workers in content-heavy roles spend an average of 2.5 hours per week searching for digital assets, with a significant share of that time lost to navigating duplicate or misfiled files. Scaled across a department of 20 people earning Boston's median professional wage of roughly $78,000 annually, that search burden can translate to more than $75,000 in lost productive hours each year.

Boston Public Schools, which manages image assets across more than 120 school buildings from Roxbury to East Boston, and the Boston Public Library system — with its central branch on Boylston Street and 24 neighborhood branches — both flag digital asset management as an operational pressure point in their respective technology road maps submitted to the city in fiscal year 2026.

Detection Tools and What Comes Next

Deduplication software has existed for years, but adoption among public-sector and nonprofit entities has lagged behind the private sector. Tools like open-source platforms and commercial options from vendors such as Canto and Bynder can scan image libraries and flag duplicates using perceptual hashing — a technique that identifies visually identical images even when file names or metadata differ. Perceptual hashing compares compressed numerical representations of images rather than raw pixel data, meaning even files resaved at different resolutions or with minor edits get flagged.

For Boston institutions running active deduplication programs, the reported storage reclamation rates range from 20 to 45 percent of total image library size in the first pass. The Massachusetts Institute of Technology's libraries in Cambridge, which have run digital preservation audits since 2019, have publicly discussed reclaiming significant server capacity through routine deduplication sweeps as part of their digital stewardship program.

Practical next steps for Boston-area organizations center on three actions: conducting a baseline audit of all image repositories before the end of calendar year 2026, adopting a single digital asset management platform rather than allowing departments to maintain separate drives, and establishing file-naming conventions enforced at upload. For smaller nonprofits operating out of Jamaica Plain or Fields Corner in Dorchester, free-tier tools including Google's duplicate finder or open-source utilities like dupeGuru offer a starting point that costs nothing but an afternoon of staff time. The storage savings will show up on the next invoice.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Boston

This article was produced by the The Daily Boston editorial desk and covers news in Boston. See our editorial standards for how we use AI.

The Daily Boston brief

The day's Boston news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Boston news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Boston and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Boston

More in News

Enjoyed this story? Get tomorrow's briefing free.