A pluggable IIIF content enrichment pipeline

Matt McGrattan - Digirati (United Kingdom)

Presentation type: Presentation

Abstract:

IIIF and the Web Annotation Data Model give us a standards-compliant approach to the enrichment of digital objects. This presentation demonstrates a technique for taking existing IIIF manifests, that might have images only, and applying a configurable sequence of processing steps to add transcriptions through OCR and/or handwriting recognition, tagging through natural language processing using standard or project-specific taxonomies and other data sources, and performing machine-learning driven customised steps like facial recognition or image description. New components for this pipeline can be written and plugged into the processing pipeline and will be invoked as part of the event driven workflow, or 3rd party components, external to this workflow, can call in to the pipeline and use public APIs to make use of existing enrichment in their internal workflows, and independently enrich IIIF content from outside the main processing loop. Any IIIF resources can be run through this pipeline. We’ll look at how this technique could apply to generalised collection enrichment and description activity, or for individual research tasks.

Topics:

  • Annotation, including full-text or academic use cases,
  • Using IIIF material for Machine Learning and AI,
  • Linked Open “Usable” Data (LOUD) and IIIF,
  • IIIF Implementation Spectrum: large-scale or small-scale projects,
  • Interoperability in IIIF contexts

Keywords:

  • enrichment,
  • ocr,
  • workflow,
  • annotation,
  • services