Boston's public institutions are facing a reckoning over how they manage, store, and ultimately purge duplicate digital imagery from their archives — and the window for decisive action is narrowing fast. Across city departments, the MBTA, and several Fenway-area universities, duplicated image files are consuming server capacity, inflating storage contracts, and in some cases creating conflicting versions of the public record. The problem is not new, but pressure to resolve it has sharpened heading into fiscal year 2027 budget negotiations at City Hall.
The stakes are higher than they might appear. Boston is mid-way through a sweeping digital infrastructure overhaul tied to Mayor Michelle Wu's open-data commitments and a city-wide records modernization effort launched in 2024. Duplicate images lodged in multiple databases don't just waste money — they complicate public-records requests, slow Freedom of Information responses, and can produce inconsistencies in the planning documents that shape development decisions in Jamaica Plain and Dorchester, two neighborhoods where housing production is running at its highest pace in decades.
Where the Pressure Points Are
The MBTA's capital projects division has emerged as one of the more acute cases. The agency maintains photographic documentation of station conditions, construction progress, and accessibility upgrades across the Red, Orange, and Green lines. Sources familiar with the agency's document management say the problem is structural: project contractors submit images independently of internal MBTA photographers, meaning the same cracked platform at, say, Andrew Station on the Red Line may exist as a dozen near-identical files spread across separate vendor portals and internal drives. No one's figures for the total storage cost have been made public, but the MBTA's broader IT modernization contract — awarded in 2023 — was valued in the range of tens of millions of dollars, and storage redundancy was cited as a known cost driver in the agency's own capital planning documents from that period.
At Northeastern University on Huntington Avenue and at the Boston Public Library's Digital Commonwealth program on Boylston Street, archivists have been grappling with a related but distinct version of the challenge: digitization projects from the early 2010s produced multiple scans of the same historical photographs at varying resolutions, and the metadata tagging was inconsistent enough that automated deduplication tools flag legitimate variants as duplicates and vice versa. The Digital Commonwealth program, which hosts imagery from more than 100 Massachusetts cultural institutions, has been working since 2022 to standardize its ingestion protocols, but the cleanup of legacy material remains incomplete.
The Decisions That Can't Wait
Three choices will define how this plays out over the next six to twelve months. First, city and institutional IT leaders must decide whether to deploy automated deduplication software — tools that can process large archives quickly but require human review of edge cases, which adds labor costs that smaller agencies often cannot absorb. Second, there is the question of governance: who has authority to delete a file from a shared archive? At the MBTA and within city departments answerable to the Wu administration, that chain of custody is still being formalized. Third, and most consequential for the public, is the retention question — whether a duplicate image that was cited in a filed planning document for a Dorchester housing project, for example, must be preserved even after the original is confirmed, for the sake of legal defensibility.
The City of Boston's IT cabinet is expected to present updated records-management guidelines to the Mayor's Office before Labor Day 2026. That timeline gives departments roughly eight weeks to weigh in on proposed deduplication standards. Institutions outside the city's direct authority — the MBTA and private universities — will make their own calls, though the Digital Commonwealth's standardization work provides a potential model for regional alignment.
For residents and researchers who rely on public image archives — whether tracking construction timelines in Jamaica Plain or accessing historical photographs through the Boston Public Library — the practical advice is to download and save any specific images you rely on now. Until the deduplication and retention policies are locked in, the archive landscape remains unsettled, and files that exist today may be consolidated or reclassified before the end of the calendar year.