Boston's city archives division, housed within the Department of Innovation and Technology on City Hall Plaza, has spent the better part of 18 months working through a backlog of roughly 340,000 duplicate digital images embedded in public permit, inspection, and planning records. The effort, which began in earnest in January 2025 under a storage-optimization directive tied to the Wu administration's broader open-data push, has cleared about 60 percent of those redundant files — freeing an estimated 4.2 terabytes of server space and cutting retrieval times for property records accessed through the city's Analyze Boston portal.
The timing is not accidental. Across municipal governments worldwide, the explosion of digital documentation tied to pandemic-era remote permitting left city IT departments drowning in duplicated imagery: the same building facade photographed six times, the same zoning map scanned from four different PDFs, identical inspection photos uploaded by field workers using separate mobile apps. Boston is not alone in facing this problem, but its response is drawing attention from peer cities that are still in the early stages of diagnosis.
What Boston Is Actually Doing
The city's deduplication work is concentrated in two record-heavy neighborhoods: Jamaica Plain and Dorchester, both of which saw sustained construction permitting activity between 2021 and 2024 tied to the Wu administration's housing production targets. The Inspectional Services Department, which operates out of 1010 Massachusetts Avenue, flagged the image redundancy problem internally after staff noticed that property case files for some Dorchester two-family conversion projects contained upward of 80 images, many of them pixel-for-pixel identical copies created when contractors resubmitted applications through the city's ePLACE permitting portal.
The city brought in MassIT, the state's central IT agency, as a technical partner rather than going directly to a private vendor — a deliberate procurement choice that kept the work under a pre-existing state contract and avoided a separate competitive bidding process. Deduplication algorithms flag images with a hash-matching method, flagging pairs or clusters of files that are byte-for-byte identical before a human reviewer confirms deletion. That hybrid approach — automated flagging, human sign-off — is slower than a fully automated purge but reduces the risk of accidentally deleting a legitimately distinct image that happens to share metadata with another file.
How Other Cities Are Handling the Same Problem
Chicago's Department of Buildings, which manages one of the largest municipal permitting image databases in the United States, launched a comparable cleanup in March 2026 after an internal audit found that its digital archive had grown by 28 percent in two years with no corresponding increase in unique properties under review. Chicago opted for a fully automated approach using a commercial deduplication platform, moving faster but accepting a higher error rate on edge cases.
Amsterdam's municipal records office — the Stadsarchief Amsterdam — has been dealing with a version of this problem at a different scale, focused on historical digitization projects where multiple scanning runs of the same physical document produced near-identical but not byte-perfect image pairs. The Dutch city has used perceptual hashing, a technique that catches visually similar images even when file metadata differs, since at least 2023. Boston's current system does not yet use perceptual hashing, which city IT officials have publicly identified as the next upgrade phase, though no contract or timeline has been formally announced.
London's planning portal, managed by the Greater London Authority, handles duplicate imagery as part of a continuous ingestion pipeline rather than periodic cleanup batches — a structural difference that reflects the GLA's larger IT budget and the volume of applications flowing through the system daily. Boston processes roughly 28,000 building permit applications annually, a fraction of London's volume, which gives the city more flexibility to run batch-cleanup cycles without disrupting live workflows.
For residents and small contractors in Boston, the practical effect of the cleanup is already visible: property record searches on the Analyze Boston portal that previously returned dozens of redundant thumbnail images now surface cleaner, faster-loading file sets. The next phase of the project, expected to roll out before the end of 2026, will extend the deduplication sweep to historical records predating 2019 — a dataset that city officials have described publicly as significantly larger and more complex than the pandemic-era backlog already addressed.