Boston's network of public institutions is sitting on a growing problem: sprawling digital archives riddled with duplicate images that are slowing systems, inflating storage costs, and complicating public records requests. The question now is not whether to act, but how — and who pays for it.
The pressure is real and immediate. Boston City Hall, which moved aggressively toward digital record-keeping under Mayor Michelle Wu's open-data initiatives, has seen its internal document management systems expand substantially since 2022. Duplicate image files — often created when staff scan the same permits, inspection reports, or planning documents multiple times across departments — clog retrieval systems and create legal liability when conflicting versions of the same record surface during litigation or public records disputes.
Why This Summer Is the Decision Point
Several forces are converging at once. The Wu administration's Office of Digital Equity and the city's Department of Innovation and Technology are both in active budget cycles, with fiscal year 2027 allocations under review through July. Any large-scale deduplication or archive migration project would need to be scoped and funded before September, when the city's technology procurement windows close for the calendar year.
At the same time, the Boston Public Library's Central Branch on Boylston Street has been conducting a parallel audit of its digitized collections — a trove that includes historical photographs, deed records, and newspaper clippings going back more than a century. The BPL's digital team has flagged that duplicate image ingestion during a 2023 scanning initiative created redundant files estimated at roughly 15 to 20 percent of a particular collection segment, according to internal documentation reviewed by The Daily Boston. Resolving that overlap requires decisions about which version of a scanned image becomes the authoritative record and which gets retired.
The stakes extend beyond City Hall and the library. Northeastern University, which manages one of the largest proprietary research image databases in the Fenway corridor, and the Massachusetts Institute of Technology's Digital Humanities lab in Cambridge both grapple with similar deduplication challenges at institutional scale. In the biotech sector — which anchors the Seaport District and Kendall Square economies — laboratory imaging systems generate enormous volumes of proprietary visual data, and duplicate file management has direct regulatory implications under FDA documentation standards.
The Decisions That Can't Wait
Three specific choices will define what happens next. First, institutions must decide on a deduplication standard. The two main approaches — hash-based matching, which identifies exact byte-for-byte duplicates, and perceptual hashing, which catches near-identical images even when file properties differ — carry different costs and error rates. For public archives, the wrong choice can mean permanent loss of a record that only appeared to be a duplicate.
Second, there is the question of retention versus deletion. Massachusetts public records law, under Chapter 66 of the General Laws, sets mandatory retention schedules for government documents. City attorneys will need to certify that any automated deletion of duplicate image files does not inadvertently remove a record still within its mandatory retention window. That legal review alone typically takes six to eight weeks.
Third, somebody has to own the process. The Dudley Square branch of the BPL — now officially the Nubian Square branch, reflecting the neighborhood's renamed commercial corridor — and the Boston City Archives on Boylston Street have historically operated on separate technology tracks. Consolidating their deduplication workflows would be more efficient but requires a formal interoperability agreement between two separate city entities.
For residents in Jamaica Plain and Dorchester, where active rezoning and housing-production reviews mean planning documents are being digitized and filed at high volume right now, the practical stakes are clear. A duplicated or misfiled image of a site plan or environmental review can stall a permit, delay affordable housing construction, or trigger an unnecessary appeal. Getting the archive infrastructure right is unglamorous municipal work — but it directly affects the speed of housing delivery that the Wu administration has made central to its second term.
Expect formal procurement notices from the city's Office of Budget Management by late August, and watch whether the BPL and the City Archives announce a coordinated approach or continue on separate tracks. That fork in the road, more than any single technology choice, will signal whether Boston is serious about solving the problem or managing it indefinitely.