Paper 62

2018 Washington conference submission

Back to program.

Creating an IIIF/Transkribus enabled manuscript community to explore C17th literacy

Colin Greenstreet - Marine Lives & Chronoscopic Education (United Kingdom)

Abstract: MarineLives is a user-driven content-creating project, run by volunteers. It is dedicated to collaborative transcription, linkage and enrichment of High Court of Admiralty manuscripts, 1627-1677, together with thematically related documents from international collections. Our corpus constitutes a powerful source of commercial, social, material, legal and linguistic insight. Our current platform is a SMW published under a CC BY 3.0 licence. It is built on semi-diplomatic transcription and folio-based page structure, which relates full text transcription directly to digital images.

Our vision for the next five years is to create a virtual manuscript-based archive and associated research community, which will foster a culture of collaborative scholarship. We plan to grow our full text corpus to twenty-five million words and our image collection to 50,000 images. As we scale up our activities, we are looking for new archival and library partners. We are registering a charitable incorporated organisation (Chronoscopic Education).

We are interested in technology tools, in collaborative potential of technology standards and platforms, and in communities of interest which are forming around them. Our approach to technology is pragmatic, cost-sensitive, and holistic. To date we have used low cost solutions to digitisation, platform technology and server infrastructure. With the formation of Chronoscopic Education, we recognise the need to increase funding, in particular to support our technology ambitions, whilst retaining the spirit of an entrepreneurially run social venture.

A major MarineLives research priority for 2018 and 2019 is the exploration of C17th literacy. Our goal is a technology-enabled manuscript community, focused on the topic of literacy.

We are working with social historian (Dr Mark Hailwood, Bristol), who is planning a 2019 major grant bid to explore literacy. We are reaching out to historians, linguists and archivists, using our extensive Twitter followership [@MarineLivesorg]. By publishing content under the hashtag #Occupationalsignatures contained in our MarineLives wiki, we are are building interest in literacy research amongst academics, archivists and the general public.

We are exploring the potential of IIIF standards, viewers and annotation tools to support our vision. We are developing an IIIF demo of manifests for markes, initials and signatures from multiple IIIF image servers (e.g. by occupation, type, year range), which we will use with historians and computer scientists as the starting point for a robust, flexible spec [Show demo].

In parallel, we are scaling up our involvement with Transkribus technology for spatial analysis of manuscripts and for handwriting recognition. An early experimenter with Transkribus, we have committed to increasing our ground base to 3 million words, targetting a character error rate of below 10%. We are particularly interested in integrating Transkribus spatial analysis of documents (text and line regions) with IIIF manifests and annotation capability.

Finally, we are designing a data study group, in consultation with the Alan Turing Institute and the University of Edinburgh, to scope out tools for pattern recognition of signatures and markes in historical manuscripts as a basis for sub-population identification. This data study group will use images and data sourced both from our IIIF and Transkribus experiments.

Presentation type: 20 minute presentations (plus 5 mins questions)

Topics:

  • IIIF and archival collections,
  • IIIF enabled collaboration,
  • IIIF content communities (museums, manuscripts, newspapers, archival content, etc.)

Keywords:

  • Collaboration,
  • Content community,
  • Literacy,
  • Transkribus,
  • Pattern recognition,
  • Machine learning