At least one in five digital image files stored across Boston's major public and academic institutions is an exact or near-exact duplicate, according to an analysis of digital asset management audits conducted by library technology consultants working with several of the city's universities. That figure — 20 percent or higher redundancy in large image repositories — has become a quiet driver of unnecessary IT spending at a moment when every dollar in municipal and university budgets is under scrutiny.
The timing matters. Mayor Michelle Wu's administration has pushed hard on digital equity and government transparency, which means city departments are migrating more records online faster than ever. The Boston Public Library's Copley Square branch alone digitized more than 400,000 archival images between 2022 and 2025 as part of its ongoing Digital Commonwealth partnership with the Massachusetts Board of Library Commissioners. Rapid digitization without duplicate-detection protocols is where the problem compounds.
The Scale of the Problem in Greater Boston
Northeastern University's library systems team flagged the duplicate image problem internally during a 2024 infrastructure review of its Snell Library digital collections on Huntington Avenue. Consultants found that storage consumption attributable to redundant image files was running at roughly 18 terabytes above what the actual unique-image count would justify. At current cloud storage pricing — around $23 per terabyte per month for enterprise-grade archival tiers — that translates to over $400 monthly in avoidable cost for a single institution's collection.
Scale that across the Fenway and Mission Hill corridor, where Wentworth Institute of Technology, Massachusetts College of Art and Design, and Simmons University all run their own digital repositories within a half-mile radius of each other, and the aggregate waste adds up fast. Industry benchmarks from the Digital Preservation Coalition suggest that large research libraries globally carry duplicate-image overhead averaging between 15 and 30 percent of their total visual asset storage — a range that Boston-area auditors say is consistent with what they are finding locally.
The City of Boston's own Department of Innovation and Technology, based in City Hall on Congress Street, began piloting a deduplication protocol across municipal image databases in March 2026. The pilot covers three departments — Public Works, Parks and Recreation, and the Inspectional Services Department — and is expected to produce a formal cost-benefit report by October. No figures from that report have been released publicly yet.
What Drives Duplicate Images — and What Fixes Them
The causes are less exotic than they sound. Staff turnover means the same photograph gets uploaded twice under different file names. Vendor migrations between content management systems — a routine event at places like the Isabella Stewart Gardner Museum on Evans Way or the Massachusetts Historical Society on Boylston Street — routinely generate duplicate files when import tools fail to check existing libraries. And because image files vary in resolution, the same photograph stored at 300 dpi and again at 72 dpi may not be caught by basic duplicate-detection software that relies on exact file-hash matching rather than perceptual hashing algorithms.
Perceptual hashing, which converts images into short numerical fingerprints based on visual content rather than file data, is increasingly the standard recommended by the Library of Congress and adopted by large research universities. MIT Libraries began integrating perceptual hash tools into its DSpace digital repository in 2025, citing both storage efficiency and improved collection integrity as goals.
For smaller Boston organizations without dedicated digital archivists, free tools like the open-source dupeGuru remain a practical starting point. Paid enterprise solutions from vendors such as Widen Collective or Bynder, both of which have clients among Boston's biotech and media sectors, offer automated deduplication at scale but carry licensing costs that can run above $30,000 annually for large repositories.
Institutions sitting on backlogs of unaudited image collections should prioritize a storage audit before the next budget cycle. For Boston city departments, the October report from the Department of Innovation and Technology will set a public benchmark — and likely prompt comparable reviews across the MBTA and the Boston Planning Department, both of which maintain large photographic archives tied to infrastructure and permitting records. The numbers already in hand make the case clearly enough: redundancy is not a minor housekeeping issue. It is a measurable, fixable budget leak.