Shaping the Stein collection’s Dunhuang corpus (2): the items from Cave 17’s ‘miscellaneous’ bundles

In a previous blog post , we looked at the instrumental role played by Wang Yuanlu during the selection of the items from the Cave 17. Wang, who directly chose from the small repository what to hand over to Stein for inspection, was very keen to divert his attention from the so-called ‘regular’ bundles, which were composed for the most part of Buddhist sutras in Chinese and Tibetan. During their first ever transaction, which took place between 21 May and 6 June 1907, Wang Yuanlu therefore began by handing over the ‘miscellaneous’ bundles, which he seemed to hold in low estimation. To Stein’s delight, these contained mixed and diverse materials, such as manuscripts in non-Chinese languages, illustrated scrolls, paintings, drawings, ex-votos, textiles, etc. Stein picked out any of the items that jumped at him as being particularly interesting and made sure to put them aside for ‘further examination’, the phrase that he used to refer to their removal in his transaction with Wang. This

Call for 2022 Chevening Fellowship: Digitised manuscripts from Dunhuang and the application of HTR tools to ancient Chinese texts

Come join us for a great opportunity to work at the British Library and find ways of applying cutting-edge technological solutions to ancient manuscripts! In two weeks, on 2 November 2021 at 12:00 (midday) GMT , the call for applications to a Chevening Fellowship on “Automating the recognition of historical Chinese handwritten texts” will close. It is open to Mainland China applicants only and if you are interested, or know someone who might be interested, please make sure you check this link.

For one year, starting in September 2022, the fellow will be part of the Digital Research Team, directly benefitting from their expertise and guidance, and will work closely with the Curators of Chinese collections. This is a rare chance to engage with a range of handwritten historical texts in Chinese language from Dunhuang and other associated archaeological sites that have been digitised as part of the International Dunhuang Project . To date, IDP has made images of around 35,000 manuscripts from the British Library collections available online, and this work is still ongoing.

The founding mission of IDP was to rectify the dispersal and general inaccessibility of Dunhuang texts and images by reuniting all these artefacts through the highest quality digital photography, and by pushing the limits of new web technologies to make this material accessible to all. We thus hope that the Chevening Fellowship will pave the way for innovative research through large-scale text analysis, and help us enhance the discoverability of the manuscripts in the Stein collection by opening them up for full-text search.

Thanks to the Lotus Sutra Manuscripts Digitisation Project, almost 800 manuscripts in Chinese from Dunhuang are currently being conserved and digitised to the highest standards by dedicated members of staff, with the aim of making images accessible on the IDP platform. We have so far conserved 555 manuscripts, digitised 401 of them, and made 127 available online. This corpus constitutes an ideal case study: firstly, because the canonical edition of the Lotus Sutra, which is one of the main scriptures in Mahayana Buddhism, is already transcribed; secondly, because the manuscripts themselves present minor variations, such as different handwritings, variant characters and scribal errors.

Detail of a scroll dated to 696, containing text from the Guanyin Sutra (Ch. Guanshiyin Jing 觀世音經), which is based on Chapter 25 of the Lotus Sutra. This image shows the colophon, where several alternative character forms, known as characters of Empress Wu, have been used by the copyist. © The British Library, Or.8210/S.217.

The Chevening fellow will focus on this rich content as a starting point to examine approaches, opportunities and possible solutions to automate the transcription of these historical textual collections in Chinese, in order to unlock their huge potential. As explained by Dr. Adi Keinan-Shoonbaert in a recent blog, the Library has successfully led several initiatives around the application of OCR and HTR tools to materials in a range of languages, increasingly broadening the scope of its work to include non-Western languages.

There are many challenges associated to the application of HTR tools to ancient Chinese texts, such as the reading direction (from right to left and top to bottom), or the absence of punctuation marks. This has resulted in different approaches to enable transcription. While some work on recognising individual characters for each line of text, other concentrate on each line as a whole, a method which may be especially relevant for cursive text and has been preferred for Japanese calligraphy (see Nagasaki, 2016).

The fellow will research the current landscape of Chinese handwritten text recognition. They will test our material with new digital tools and techniques, assessing available software options and making recommendations to help support the choice of HTR tools applicable to the Chinese texts from Dunhuang and other sites. Moreover, the fellow will also help us connect the dots between the many different projects that are going on in various parts of the globe, ensuring that the British Library plays an active part in the dialogue with the main actors in this emerging field, especially in China.

There is still time to apply, hurry up!