Boston's municipal digital archive contains thousands of duplicate images — the same photograph of a Dorchester triple-decker or a Jamaica Plain community meeting stored two, three, sometimes four times across different city databases. The problem is not new, but a confluence of overlapping digitization efforts, vendor contracts, and pandemic-driven remote-work workarounds has pushed it to a point where city IT managers say the redundancy is actively slowing retrieval times and complicating public records requests.
The issue matters now because Mayor Michelle Wu's administration has staked a significant part of its open-government platform on accelerating public access to city records. Duplicate image files inflate storage costs, muddy search results, and make it harder for community groups in neighborhoods like Roxbury and East Boston to pull historical documentation they need to challenge or support development proposals. When the same image is indexed under four different filenames, a resident searching the city's online portal may not know whether they have retrieved one document or four.
How the Duplication Built Up Over a Decade
The roots of the problem run back to at least 2014, when the City of Boston launched a piecemeal effort to digitize paper planning records held at City Hall on Cambridge Street. Different departments — the Boston Planning and Development Agency, the Inspectional Services Department, and the Archives division of the City Clerk's office — each contracted separately with document management vendors, producing parallel image libraries with no shared naming convention or deduplication protocol.
The problem accelerated sharply in 2020. When offices closed in March of that year, staff across agencies scrambled to scan physical files from home or from skeleton crews operating out of City Hall Plaza. Without centralized oversight, the same property photograph or zoning map was frequently scanned by two different employees on two different days and uploaded to two different shared drives. Those drives were later migrated — imperfectly — into the city's enterprise content management system, carrying the duplicates along with them.
The MBTA's parallel records challenges offer a local comparison. The transit authority spent years trying to reconcile maintenance documentation held in separate depots across the Red and Orange lines, a process that transit advocates and the Federal Transit Administration flagged during oversight reviews. The underlying dynamic — rapid digitization without a unified data standard — is the same one that plagued city hall.
The Boston Public Library's Norman B. Leventhal Map & Education Center on Boylston Street dealt with a smaller-scale version of this problem when it digitized its historic map collection beginning around 2016. Librarians there developed a checksumming protocol — essentially a mathematical fingerprint for each image file — that flagged identical files before they were uploaded. City agencies did not adopt a comparable standard.
The Cost of Not Fixing It Sooner
Storage is not free. Enterprise cloud storage for government entities typically runs between $0.02 and $0.05 per gigabyte per month, and municipal image archives can run into hundreds of terabytes. Duplicate files mean duplicate costs, compounding month over month. Beyond money, the practical damage shows up in public records law compliance: under Massachusetts General Law Chapter 66, agencies must respond to public records requests within ten business days. Retrieval systems cluttered with redundant files slow that process, potentially exposing agencies to appeals and complaints with the Supervisor of Public Records.
Northeastern University's library system on Huntington Avenue, which manages its own large digital repository, implemented automated deduplication tools as part of a 2022 infrastructure upgrade. City officials have informally looked at that model as a potential template, though no formal partnership or procurement has been announced.
The Wu administration's IT office has circulated an internal assessment — its existence confirmed through public records filings — calling for a citywide image deduplication audit as part of a broader digital modernization initiative. The practical next steps for residents and community organizations: when filing public records requests with the BPDA or Inspectional Services, ask explicitly for a confirmation that the records produced represent unique files. It is a small step, but until the city completes its audit and implements a deduplication standard, it is the most reliable check available to anyone trying to build a clean, usable archive from Boston's digital records.