Boston's Office of Digital Innovation quietly hit a milestone last spring: city archivists flagged more than 340,000 duplicate image files clogging municipal databases maintained by agencies ranging from the Inspectional Services Department to the Boston Planning & Development Agency. The cleanup, part of a broader records modernization push tied to Mayor Michelle Wu's smart-city agenda, has already freed up an estimated 14 terabytes of server space across city systems.
The effort matters now because state procurement rules updated in January 2026 require any Massachusetts municipality seeking federal broadband infrastructure grants to demonstrate active data hygiene practices — including documented duplicate-file audits — before applications clear the Executive Office of Technology Services and Security. Boston filed its first compliance report in March. Several other cities in the commonwealth have yet to file at all.
What Boston Is Actually Doing
The practical work is happening at two main nodes. The City Archives on Boylston Street has been running a pilot since October 2025 using open-source perceptual hashing software to scan building-permit photograph libraries — a category notorious for generating near-identical images submitted by contractors. Separately, the BPD's real estate imaging database in Roxbury, which holds property-condition photos going back to 2009, has been under a parallel audit contracted to a Kendall Square technology firm.
Boston Public Library's Digital Commonwealth project, headquartered at the Central Library in Copley Square, has faced its own version of the problem for years. Digitization campaigns that ran between 2014 and 2022 left thousands of redundant scans distributed across BPL's servers and partner repositories. Librarians there have been manually reconciling records since late 2024, a process staff describe as painstaking given the volume of historical photographs involved.
The city's approach leans heavily on automated flagging followed by human review — a two-step model that slows throughput but reduces wrongful deletion. That caution is partly a response to a 2023 incident in which Chicago's municipal mapping division permanently deleted a set of infrastructure survey photos that turned out to be unique, after an algorithm incorrectly classified them as duplicates. Chicago has not publicly disclosed the full scope of that loss.
How Boston Compares Globally
Amsterdam's City Archives completed a comparable project in 2024, processing roughly 2.1 million digitised images and cutting duplicate-file volume by 38 percent using a hybrid AI-and-archivist workflow developed with the University of Amsterdam's information science faculty. The Dutch model is widely cited in European municipal data circles as the benchmark for speed without sacrificing metadata integrity.
Seoul's Smart City Division tackled the problem differently. Beginning in 2023, the city integrated duplicate detection directly into its upload pipeline, meaning redundant images are rejected at the point of entry rather than cleaned out retroactively. That upstream approach has kept Seoul's public-records image database growing at a manageable rate, though critics in South Korea's archival community have argued it risks discarding genuinely distinct files that a hash algorithm reads as identical.
London's situation is closer to Boston's. The Greater London Authority is still working through legacy duplicates generated during a 2018 to 2022 digitisation push, and the GLA's digital team has acknowledged publicly that a full audit is unlikely to conclude before 2027.
By that measure, Boston is roughly on pace with London and meaningfully behind Amsterdam and Seoul. The city's January 2026 compliance filing with the state showed that 61 percent of targeted municipal image libraries had completed at least a first-pass duplicate scan — a figure that sounds substantial until you note that the BPDA's own construction-photography archive, one of the largest in the city system, was not yet included in that count.
For residents and developers who interact with city permitting portals, the cleanup has a direct practical payoff: faster load times on the Inspectional Services online portal, which serves Jamaica Plain and Dorchester homeowners disproportionately given those neighborhoods' high volumes of renovation permits. City officials expect a second compliance report due in September 2026 to show whether the BPDA archive has been brought into scope. That report will also determine whether Boston qualifies for the next round of federal broadband infrastructure funding under the NTIA's State Digital Equity Capacity Grant program.