Boston's institutions are moving fast on a problem that has quietly drained storage budgets and tangled public records requests for years. This week, the City of Boston's Department of Innovation and Technology confirmed it is in active rollout of a duplicate-image-replacement protocol across municipal databases, targeting redundant photograph files that have accumulated in systems used by planning, housing, and public works departments. The cleanup touches records stretching back to at least 2018, when the city migrated several legacy document platforms to a unified cloud infrastructure.
The timing matters. Mayor Michelle Wu's administration has made open, efficient city data a visible part of its progressive governance pitch, and the housing production push in Jamaica Plain and Dorchester depends heavily on permitting workflows that pull from the same image-heavy databases now burdened by duplicates. A clogged system slows permit reviewers, delays community feedback cycles, and makes FOIA responses take longer — all problems the Wu administration has said it wants to eliminate.
What Happened This Week
On Tuesday, July 1, the city's DoIT team began a phased rollout of automated deduplication software across three departments: the Boston Planning Department, the Office of Housing, and the Public Works Department. The software flags near-identical image files — photos of building facades, construction site documentation, and neighborhood survey images — and queues them for human review before deletion or replacement with a canonical version. Sources familiar with the project say the first phase targets an estimated 40 terabytes of redundant image data held across city servers, though the city has not yet released official figures publicly.
Separately, Northeastern University's library system on Huntington Avenue announced Thursday that it had completed its own internal duplicate-image audit as part of a broader digital preservation initiative tied to the Boston Research Library Consortium. The university found that its digitized historical photograph collection — which includes images of the South End and Roxbury dating to the early twentieth century — had accrued significant duplication during a 2022 migration to a new content management system. Northeastern's digital archivist team said the cleanup would free roughly 12 terabytes of storage and improve search accuracy for researchers using the archive.
The Massachusetts Institute of Technology's Digital Collections program in Cambridge has been running a similar deduplication effort since February, according to documentation posted to its library blog. MIT cited retrieval accuracy as the primary driver: duplicate images were appearing multiple times in search results, undermining the reliability of the archive for researchers and journalists pulling historical visual material.
Why It Matters for Residents and Researchers
For ordinary Bostonians, the practical stakes are modest but real. The Planning Department's image records feed directly into the online permit portal used by homeowners applying for renovations in neighborhoods like South Boston and East Boston, where construction activity has surged since 2023. Redundant files slow the portal's image-loading functions, a complaint that appeared repeatedly in public feedback sessions held at City Hall Plaza last spring.
Cloud storage is not cheap. Commercial rates for the kind of enterprise storage the city uses typically run between $20 and $30 per terabyte per month depending on redundancy tier. Forty terabytes of redundant images at those rates represents a recurring cost of $800 to $1,200 a month — money that city budget advocates have argued should go toward services rather than duplicate data.
The Boston Public Library's Copley Square branch, which manages its own digital image archive as part of the Digital Commonwealth network, is expected to join the city's deduplication working group later this summer. The library's collections include tens of thousands of neighborhood photographs that overlap substantially with city planning records.
For residents and researchers trying to access these systems now, the city's DoIT office says the permitting portal and public records request tool will remain fully operational throughout the cleanup. Requests submitted during the transition may take slightly longer to process — the DoIT office has advised building in an extra three to five business days for image-heavy records requests filed before September 1.