In October 2023 I took a break from my PhD research to do a placement at the British Library. My project focused on Handwritten Text Recognition (HTR) of ancient Chinese manuscripts. I worked with the fabulous Stein collection from Dunhuang which includes all kinds of handwritten Chinese documents dating back to the Tang dynasty (618–907) and beyond. My work centred on use of eScriptorium, an open source platform designed for HTR of scripts in any language. Recently, a group of sinologists have begun work on transcription models for Chinese texts and my role was to help them develop these further. The ultimate goal is to provide the British Library and others with a tool capable of transcribing Chinese manuscripts on a vast scale.
My tasks involved selecting a variety of manuscripts, processing them with eScriptorium, and correcting the results so that the software could learn from its mistakes. It is an iterative approach to machine learning, allowing the software to learn after every use and improve step-by-step. I worked on segmentation, precisely identifying enjoylocations of text lines in each image; and transcription, ensuring that the image of each Chinese character was converted correctly to its digital equivalent.
Once I familiarised myself with the software and the process my work, in many ways, was quite straightforward. And yet I learned so much in the course of my activities. Although I had previously engaged in digital humanities training I only really retained the tools I used repeatedly in my own work. At the Library, not only did I learn a particular package of skills and concepts required for my role, I also learned about the work of my colleagues. I was exposed to a broad range of digital preservation activities and saw how they functioned together in a collaborative and supportive environment. It gave me a much better sense of the fields of digital research and manuscript studies as applied in an institution dedicated to sharing its resources and encouraging engagement of every kind. While there I learned on the job, taking every opportunity to benefit from the understanding of my colleagues; of equal importance, I saw more clearly what skills I should continue to develop and the opportunities they might open up.
One personal highlight was the chance to take the new HTR transcriptions and experiment with tools for their analysis. I was particularly interested in comparing one manuscript with later printed editions of the same text. Apps for such work are also in their developmental phase. Having this role at the British Library gave me some confidence to reach out to the developers of one app, LERA, so that I might trial the software for use on Chinese text. This personal project and that of eScriptorium have both seen good progress and inspired me to pursue this line of work in the future. My experience at the British Library has been fabulous and helps me see more clearly the range of opportunities that might develop out of my PhD.