Boston's public agencies, universities, and cultural institutions are sitting on millions of duplicated digital images — redundant files clogging servers, inflating storage costs, and making it harder to find accurate historical records. The problem has quietly worsened for over a decade, but a convergence of budget pressures and a city-wide push toward streamlined digital infrastructure is forcing administrators to act before the fiscal year closes out in September.
The timing matters. Mayor Michelle Wu's administration has tied its broader government modernization agenda to reducing waste in city IT departments, and duplicate image management — unglamorous as it sounds — sits squarely inside that mandate. Storage costs are not trivial. Cloud storage for municipal governments typically runs between $0.02 and $0.05 per gigabyte per month, and agencies hoarding multiple copies of the same photographs, maps, and scanned documents can accumulate terabytes of redundant data over a single budget cycle. With Boston's capital planning office already under pressure to justify technology expenditures heading into fiscal year 2027, the directive to clean house is real.
What's at Stake Across the City
The Boston City Archives, housed at 201 Rivermoor Street in West Roxbury, maintains the official photographic and document record of city government. Archivists there have flagged the duplicate problem as a long-standing obstacle to efficient cataloguing. The issue is compounded when multiple departments — the Boston Planning Department, the Parks and Recreation Commission, and the Office of Emergency Management — independently scan and store the same infrastructure photographs without a shared deduplication protocol.
The Boston Public Library's Digital Repository, which holds digitized collections from Copley Square and branches across Dorchester and Jamaica Plain, faces a parallel challenge. Institutions contributing to shared regional archives like the Digital Commonwealth network have identified duplicate ingestion — where the same image is uploaded from multiple partner organizations — as one of the top three data quality problems affecting search accuracy. Fixing it requires both technical tools and negotiated agreements about which institution holds the authoritative copy of a given file.
Boston University's Mugar Memorial Library and Northeastern University's Archives and Special Collections have each invested in deduplication software in the past two years, according to publicly available IT procurement records. The commercial tools most commonly deployed — including products from vendors like Cloudinary and ImageKit — use perceptual hashing algorithms to flag visually identical or near-identical images even when file names and metadata differ. Licensing costs for institutional-scale deployments typically start around $500 per month and scale with storage volume.
The Decisions That Will Define the Outcome
Three choices now sit in front of Boston's institutional decision-makers. First, who holds the authoritative copy? When the Boston Planning Department and the Office of Historic Preservation both store the same photograph of a Roxbury triple-decker, someone has to designate a canonical version and delete — or archive offline — the rest. That governance question has no purely technical answer.
Second, how far back does the cleanup go? Deduplicating files created after a certain date is manageable. Going back to scanned records from the 1990s and early 2000s, which often exist in incompatible formats and inconsistent resolutions, requires significantly more staff time and a clear policy on what counts as a true duplicate versus a meaningfully different version of an image.
Third, what happens to images that have already been published externally — on city websites, in planning documents submitted to the MBTA's ongoing Green Line Extension records, or in court filings? Deleting a file without tracking its downstream uses can break links in official documents and create legal headaches.
The Wu administration's Department of Innovation and Technology is expected to issue updated data governance guidelines later this summer. Institutions that move now — auditing their holdings, adopting deduplication tools, and establishing clear ownership agreements with partner organizations — will be better positioned to comply before the guidelines land. Those that wait risk a rushed, poorly executed cleanup that creates new problems rather than solving old ones. The Fourth of July holiday gives staff a rare quiet window. The real work starts Tuesday.