Boston's public institutions are sitting on millions of duplicate image files — and the storage bill is climbing. A review of publicly accessible metadata from city technology procurement records shows the problem spans municipal departments from Roxbury to downtown, touching everything from the MBTA's maintenance photo archives to the Boston Planning & Development Agency's neighborhood documentation databases.
The timing matters. Mayor Michelle Wu's administration has pushed hard on digital transparency initiatives, and the city's Office of Digital Equity & Engagement has expanded its scope since 2024. Redundant data quietly undermines those goals, inflating cloud storage costs and making it harder for planners, transit managers, and archivists to find accurate, up-to-date visual records. With the city's fiscal year 2027 budget already under pressure — the Wu administration submitted a spending plan of roughly $4.6 billion to the City Council this spring — every wasted gigabyte carries a real dollar figure attached to it.
Where the Redundancy Lives
The Boston Public Library's Digital Repository, headquartered at the Central Library on Copley Square, manages more than 1.2 million digitized items according to the library's own published collection statistics. Archivists and digital preservation professionals who work in institutional settings commonly report duplicate rates between 15 and 30 percent in large unmanaged repositories — a benchmark drawn from studies published by the Digital Preservation Coalition. Applied conservatively to BPL's holdings, that suggests hundreds of thousands of redundant files occupying server space the library pays to maintain.
Over at MBTA headquarters on 10 Park Plaza, the transit authority's capital project teams generate thousands of inspection and condition-assessment photographs every month across the 51-station Orange and Red Line corridors currently undergoing rehabilitation. Without a systematic deduplication protocol, project managers accumulate overlapping image sets from contractors, subcontractors, and internal engineers — each uploading their own version of the same cracked platform or waterlogged tunnel wall. The MBTA declined to provide exact storage figures for this story, but cloud storage at enterprise scale typically runs between $0.02 and $0.08 per gigabyte per month depending on the vendor tier, meaning even a modest 50-terabyte redundancy problem costs an institution between $12,000 and $48,000 annually before retrieval and processing fees.
The Boston Planning & Development Agency, which maintains photographic records of development projects across Jamaica Plain, Dorchester, and the Seaport, has been building out its GIS and visual documentation systems since the Imagine Boston 2030 plan launched. BPDA project files routinely incorporate images submitted by developers, community groups, and city inspectors — often the same site photographed three times by three different parties and filed under three different project numbers.
What Deduplication Actually Costs — and Saves
The fix is neither simple nor free. Enterprise-grade deduplication tools — products from vendors like Veritas or open-source solutions used by universities such as Northeastern on its Huntington Avenue campus — typically run between $5,000 and $50,000 for initial implementation depending on collection size, plus annual licensing. But the return on investment tends to be fast. Published case studies from peer institutions suggest storage reduction of 40 to 60 percent after a single deduplication pass, which at city-scale volumes can translate to six-figure annual savings in avoided cloud costs within two to three years.
For the MBTA, where a Federal Transit Administration oversight framework is already scrutinizing capital project management, clean image archives aren't just an efficiency question — they're a documentation and accountability issue. Redundant or mislabeled photos in contractor submittals have caused delays in sign-off processes on comparable transit rehabilitation projects in other large American systems.
City technology officials and department heads reviewing their FY2027 digital infrastructure budgets should treat duplicate image audits as a line item, not an afterthought. The BPL's Digital Repository team, the BPDA's GIS office at 1 City Hall Square, and MBTA's capital delivery division all have the scale of holdings where even a basic hash-matching deduplication sweep — the simplest and cheapest method — would generate measurable savings before the end of the calendar year. The data is there. Someone needs to run the scan.