Boston's municipal and institutional databases are carrying thousands of duplicate images — redundant photographs, scanned documents, and design files stored multiple times across disconnected servers — and the people responsible for managing those systems say the problem has grown from a nuisance into a genuine administrative liability.
The issue surfaced publicly this spring when the city's Department of Innovation and Technology flagged the redundancy problem during a broader audit of Boston's data infrastructure, undertaken as part of Mayor Michelle Wu's open-government initiative. The audit, which examined systems used by agencies ranging from the Boston Planning Department to Boston Public Schools, found that duplicate image files were consuming server space, slowing retrieval times, and complicating compliance with the state's public records law.
Why This Is Coming to a Head Now
Several forces are colliding at once. The Wu administration has been migrating legacy city databases onto a consolidated cloud platform since late 2024, a project managed out of City Hall on Cambridge Street. That migration, designed to cut long-term IT costs, has exposed just how much redundant data accumulated over the previous decade when individual departments ran independent filing systems with no deduplication protocols.
At the same time, Boston's university and biotech sectors — major contributors to the city's economic base — have their own pressing reasons to care. Institutions along the Longwood Medical Area corridor, where teaching hospitals and research labs generate enormous volumes of diagnostic imaging and laboratory photography, have been wrestling with the same problem at far larger scale. Healthcare imaging standards require strict version control, and a duplicate file in the wrong place can mean a clinician reviews an outdated scan.
The Massachusetts Institute of Technology's Libraries, based in Cambridge, published guidance in March 2026 recommending that research institutions adopt automated deduplication workflows before consolidating archival collections. Harvard's Countway Library of Medicine, which holds one of the largest medical history photograph collections in New England, launched its own deduplication pilot program in January, covering roughly 400,000 digitized images accumulated since 2009.
Archivists and records managers describe the core challenge clearly: duplicate images don't just waste storage — they undermine the integrity of a public record. When two identical files carry different metadata, different access dates, or different classification tags, neither copy can be fully trusted without manual verification. That's a particular problem for the city, whose records are subject to requests under Massachusetts General Law Chapter 66.
What the Experts and Advocates Are Recommending
Digital preservation specialists at Northeastern University's Library on Huntington Avenue have been advising smaller Boston-area nonprofits and city agencies since at least 2023 on best practices for image file management. Their guidance emphasizes hash-based deduplication — a technique that generates a unique digital fingerprint for each image file and flags identical files automatically — as the most reliable method for large collections.
The Boston Public Library's Digital Repository Services team, operating out of the Central Library on Copley Square, has been running its own deduplication effort across its Norman B. Leventhal Map & Education Center collection, which holds more than 200,000 digitized maps and photographs. A BPL spokesperson confirmed the project began in the second quarter of 2025 but declined to provide completion timelines or cost figures.
On the policy side, advocates connected to the Massachusetts Digital Equity Coalition have argued that deduplication isn't just an IT housekeeping issue — it has direct consequences for how quickly residents in Dorchester and Jamaica Plain can access city documents through the online public records portal, which runs slower when databases are bloated with redundant files.
The practical path forward, according to digital archivists and city IT administrators who have spoken on the record at public technology forums, involves three steps: running an initial automated audit to identify duplicates, establishing a clear retention policy that designates a single authoritative copy, and building deduplication checks into the upload workflow so new redundancies can't accumulate. Boston's Department of Innovation and Technology has indicated it plans to present a formal recommendation to the Mayor's Office before the end of the third quarter of 2026. Until that policy lands, agencies have been told to hold off on major new image uploads to shared city servers.