The University of Connecticut Library and the School of Engineering are working to develop new technology that applies machine learning to handwriting text recognition that will allow researchers to have improved access to handwritten historic documents.
Handwritten documents are essential for researchers, but are often inaccessible because they are unable to be searched even after they are digitized.
The Connecticut Digital Archive, a project of the UConn Library, is working to change that with a $24,277 grant awarded through the Catalyst Fund of LYRASIS, a nonprofit organization that supports access to academic, scientific, and cultural heritage.
The irregularity in the handwriting in many old documents leaves the historical information in them inaccessible to Optical Character Recognition (OCR), a transfer method used for more than 20 years to assist in document discoverability.
To address that, historians and computer scientists have worked to apply machine learning to handwriting text recognition (HTR).
In the summer of 2019, the university’s library, in partnership with the Massachusetts Historical Society, Greenhouse Studios, and UConn School of Engineering, created a set of more than 16,000 images of 22 different characters from the John Quincy Adams Papers.
Characters in the Adams Papers were used to create a set of algorithms designed to recognize patterns in those images. The algorithms were modeled to form a neural network that takes the handwritten digits and develops a system to learn from them, similar to the human brain.
The Adams Papers pilot project produced promising results, with an accuracy rate of more than 86 percent when tested on all 22 characters.