Boston's municipal technology offices are sitting on a problem that sounds trivial until you see the storage bills. City departments across Boston — from the Office of Planning to the Boston Public Library's digital preservation unit on Boylston Street — have accumulated hundreds of thousands of duplicate image files in their shared servers, a sprawl of redundant data that archivists and IT managers say is quietly draining public resources and scrambling institutional records.
The issue has sharpened this spring and summer as Mayor Michelle Wu's administration pushes its Open Government initiative, a broad effort to digitize city records and make them searchable by the public. Officials working on the project say the deduplication question — specifically, what to do with image files that appear twice, three times, or dozens of times across different departmental databases — is one of the stickiest technical and legal complications they have encountered.
Why It Matters Now
The timing is not accidental. Boston is in the middle of a significant infrastructure investment in digital records. The city's FY2026 budget allocated funds for expanded cloud storage capacity, and technology staff say duplicate image accumulation is one reason storage costs have climbed faster than projected. A single high-resolution scanned document — say, a zoning map from the Dorchester neighborhood or a building permit photograph from Jamaica Plain — can run several megabytes. Multiply that across years of uploads, departmental transfers, and backup cycles, and the redundancy becomes a genuine fiscal issue.
Archivists at the Boston City Archives, located on the ground floor of City Hall Plaza, say the deduplication problem is partly a legacy of how departments operated before centralized digital storage. Each office maintained its own filing system, so the same photograph of a South End property might have been uploaded independently by the Inspectional Services Department, the Assessing Department, and a third time during a legal proceeding. Nobody set a policy requiring a check before upload.
Roxbury Community College, which runs a digital media program that has partnered with Boston Public Schools on archiving projects, has grappled with the same issue at the institutional level. Faculty there have pointed out that without consistent file-naming conventions and metadata standards, even automated deduplication tools struggle to identify true duplicates versus images that are nearly — but not exactly — identical.
What Experts and Officials Are Saying
Technology policy specialists familiar with municipal digital governance note that Boston is hardly alone. Cities including Chicago and Denver have faced comparable cleanup exercises when centralizing legacy departmental data. What sets Boston's situation apart, they say, is the volume of material coming from its university and biotech sector — institutions like Northeastern University on Huntington Avenue and the Longwood Medical Area generate substantial quantities of public-facing digital content that intersects with city permitting and planning records.
Digital preservation professionals have argued publicly — in forums including the New England Archivists conference held in Providence in March 2026 — that municipalities need written image-retention policies before they expand storage, not after. The practical recommendation circulating among archivists is a three-step approach: audit existing holdings to identify duplicates, establish a single canonical file location for each unique image, and implement upload validation protocols that flag potential duplicates at the point of entry.
City technology staff have indicated that a formal deduplication policy is under internal review, though no adoption date has been announced. The Office of Emerging Technology, which coordinates digital infrastructure for Boston city government, is expected to present recommendations to the Wu administration later this summer.
For residents and neighborhood groups — including community development organizations in Jamaica Plain and Dorchester who rely on city planning documents — the practical stakes are real. Duplicate and mislabeled images in public databases can delay permit searches, complicate title research, and muddy the historical record for neighborhoods whose built environment is changing rapidly. Getting this right, archivists say, matters as much as the square footage of new housing going up on Washington Street.