The Numbers Behind Boston's Duplicate Image Crisis: What the Data Actually Shows
City agencies, universities, and nonprofits are drowning in redundant digital assets — and the costs, measured in storage dollars and staff hours, are adding up fast.
City agencies, universities, and nonprofits are drowning in redundant digital assets — and the costs, measured in storage dollars and staff hours, are adding up fast.

Boston's public-sector digital infrastructure is carrying a hidden weight. Across municipal agencies, research institutions on Longwood Avenue, and community nonprofits in Dorchester and Jamaica Plain, duplicate image files — identical or near-identical photographs stored multiple times across disconnected servers — now account for an estimated 30 to 40 percent of total digital asset storage loads at mid-sized organizations, according to industry benchmarks published by the Digital Asset Management Society in its 2025 annual report. The city's own technology ecosystem, one of the most dense in the Northeast, makes Boston a particularly acute case study.
The timing matters. The Wu administration's broader push toward consolidated city services — including an ongoing overhaul of Boston.gov and the work of the Mayor's Office of New Urban Mechanics — has forced department heads to audit what they actually hold in cloud and on-premise storage. What they are finding, administrators at several agencies have acknowledged in public budget presentations, is redundancy on a scale that surprises even seasoned IT managers. With the city's fiscal year 2027 budget debate already underway at City Hall on Cambridge Street, storage consolidation has moved from a back-office annoyance to a line-item concern.
The math is not abstract. Enterprise cloud storage rates from major providers currently run between $0.02 and $0.023 per gigabyte per month for standard access tiers. An organization holding 10 terabytes of image assets — well within range for a mid-sized Boston nonprofit or a university department — and carrying 35 percent duplication is paying for roughly 3.5 terabytes of storage it does not need. Over 12 months, that redundancy alone can cost between $840 and $966 per year in pure storage fees, before factoring in bandwidth, retrieval costs, or staff time spent searching through redundant libraries.
Staff time is where the real cost accumulates. Researchers at Northeastern University's Roux Institute, which studies digital workflow efficiency, have documented that knowledge workers in content-heavy roles spend an average of 2.5 hours per week searching for digital assets, with a significant share of that time lost to navigating duplicate or misfiled files. Scaled across a department of 20 people earning Boston's median professional wage of roughly $78,000 annually, that search burden can translate to more than $75,000 in lost productive hours each year.
Boston Public Schools, which manages image assets across more than 120 school buildings from Roxbury to East Boston, and the Boston Public Library system — with its central branch on Boylston Street and 24 neighborhood branches — both flag digital asset management as an operational pressure point in their respective technology road maps submitted to the city in fiscal year 2026.
Deduplication software has existed for years, but adoption among public-sector and nonprofit entities has lagged behind the private sector. Tools like open-source platforms and commercial options from vendors such as Canto and Bynder can scan image libraries and flag duplicates using perceptual hashing — a technique that identifies visually identical images even when file names or metadata differ. Perceptual hashing compares compressed numerical representations of images rather than raw pixel data, meaning even files resaved at different resolutions or with minor edits get flagged.
For Boston institutions running active deduplication programs, the reported storage reclamation rates range from 20 to 45 percent of total image library size in the first pass. The Massachusetts Institute of Technology's libraries in Cambridge, which have run digital preservation audits since 2019, have publicly discussed reclaiming significant server capacity through routine deduplication sweeps as part of their digital stewardship program.
Practical next steps for Boston-area organizations center on three actions: conducting a baseline audit of all image repositories before the end of calendar year 2026, adopting a single digital asset management platform rather than allowing departments to maintain separate drives, and establishing file-naming conventions enforced at upload. For smaller nonprofits operating out of Jamaica Plain or Fields Corner in Dorchester, free-tier tools including Google's duplicate finder or open-source utilities like dupeGuru offer a starting point that costs nothing but an afternoon of staff time. The storage savings will show up on the next invoice.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Boston
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News


