Boston's public institutions are staring down a surprisingly costly housekeeping problem. Across city government, the Boston Public Library's digital collections, and the sprawling research networks anchored at Northeastern University and Massachusetts General Hospital, duplicate digital images have quietly accumulated into a storage and accessibility burden that administrators can no longer defer. The question now is not whether to act — it is how, and who pays.
The timing matters because several major contracts for cloud storage and digital asset management are up for renewal before the end of fiscal year 2026, which closes September 30. Decisions made in the next sixty days will lock in infrastructure choices for the better part of a decade. Letting those renewals roll over without a deduplication strategy in place means paying for redundant data at rates that have only climbed since the pandemic-era digitization push.
How the Backlog Built Up
The problem did not happen overnight. Between 2020 and 2023, city departments and Boston-area research institutions poured resources into digitizing physical archives. The Boston Public Library's Copley Square branch scanned tens of thousands of photographs from its Print Department collection. The City of Boston's Archives, based on City Hall Plaza, digitized permit records, planning documents, and maps going back to the nineteenth century. Northeastern's Snell Library accelerated its own oral history and photograph digitization program.
Each initiative ran largely on its own timeline, with its own software stack. When projects overlapped — historical images of the South End, say, or photographs from the old West End neighborhood before its 1958 demolition — copies multiplied without a central registry to flag them. By some estimates used in comparable municipal digitization efforts in cities like Chicago and Philadelphia, duplicate files can account for anywhere from fifteen to thirty percent of total storage volume in large-scale archival projects. No Boston agency has yet published a specific local audit figure, but administrators at several institutions have acknowledged the issue is under active review.
The financial stakes are real. Commercial cloud storage pricing from major providers runs roughly two to four cents per gigabyte per month for archival tiers. For a collection running into hundreds of terabytes — a realistic scale for a combined BPL and city archive footprint — the annual carrying cost of unnecessary duplicates can reach into the hundreds of thousands of dollars.
The Decisions Ahead
Three choices will define the outcome. First, whether Boston adopts a shared deduplication platform across agencies or lets each institution handle the problem independently. A shared approach offers economies of scale but requires cooperation between entities that have historically guarded their own IT budgets. The Mayor's Office of New Urban Mechanics, which has coordinated cross-agency technology pilots before, is one natural convener for that conversation.
Second, who sets the metadata standards that allow systems to recognize a duplicate in the first place. Without agreed-upon tagging conventions, automated deduplication tools produce false positives — flagging legitimately distinct images as redundant and risking permanent deletion of irreplaceable material. The Boston Landmarks Commission, which holds authority over historically significant records, would need a seat at that table.
Third, whether the city pursues a vendor contract or builds on open-source tools already in use at local universities. MIT Libraries, just across the Charles River in Cambridge, has implemented open-source digital asset management software that handles deduplication as a native function. A formal knowledge-sharing arrangement between MIT and the BPL would cost far less than a proprietary enterprise contract and could be structured before the September 30 fiscal deadline.
The Fourth of July weekend has emptied most of the relevant offices. Real work resumes Monday. Budget directors, archivists, and IT procurement officers at City Hall have roughly twelve working weeks to move from acknowledgment to a signed framework. Miss that window, and the redundant files — and their carrying costs — roll into fiscal year 2027 with no plan attached.