Alignment of Full Text Database with IIIF images

KIYONORI NAGASAKI - The University of Tokyo (Japan), Masahiro Shimoda - The University of Tokyo (Japan)

Presentation type: Presentation

Abstract:

As the number of IIIF-compliant images on the Web has been increasing, the expectations of Buddhist researchers have also been growing so that they can efficiently utilize IIIF images of Buddhist scriptures in any Web site such as Gallica, Harvard University, Kyoto University, etc. Following upon our previous solution to link IIIF manifests with bibliographical data for Buddhist studies, we have developed a system to enable correspondence between characters in a full text database and parts of IIIF images. This presentation will report on the characteristics of the system and its specifications.

There are at least two effect brought about in aligning a full text database and IIIF images. One is the providing of support to browse and compare image-based witnesses with the texts. Another is to allow publication of the results of research of critical editing with the evidence contained in the images of the witnesses. The corresponding data includes positional data for the alignment and a kind of critical apparatus. As the former is a procedure carried out without the judgment by experts, it is processed by an automated method like OCR. However, although the latter is also done partially using automatic means, in many cases the judgement of specialists will be required. As the formats of both data should be almost same except for detailed explanation of critical editing, our system stores both kinds of data in the database.

The following items will be necessary for an alignment data in the system:

1. The side of a fragment of a text on the full text database

A) A targeted text itself

B) Location information of the text (e.g. 0270a20_00)

C) Language identifier of the text

D) Text id in the full text database

2. The side of part of a IIIF image

A) IIIF Manifest URI

B) Targeted canvas URI

C) Coordinate data of a targeted fragment on a IIIF image

D) Language identifier

E) Attribution in the IIIF Manifest

F) IIIF Image API URI of the part

3. Contributor’s name

4. Contributed time

5. Type of relationship (alignment / variant phrase / variation character / addition / deletion)

6. Status of the relationship data (Before publish/ published/ obsoleted)

The data are stored on a PostgreSQL sever and can be gotten by retrieving any item as JSON data.

We developed a system to align and view the data like below:

A Short Demonstration Video: http://www.dhii.jp/nagasaki/videos/sat2019_02.mp4

(The Alignment mode with OpenSeadragon)

A Short Demonstration Video: http://www.dhii.jp/nagasaki/videos/sat2019_01.mp4

(The Viewing mode with Mirador, aligning two Gallica images and an image from Kyoto University only by clicking the location data box)

The data are gradually increasing based on contributions through the Web by the collaboration of Buddhist researchers.

The system provides a brand new environment for Buddhist studies by the power of IIIF annotation. The potential of the data will enable the implementation of many other types of viewing. We will not only do so by ourselves but also encourage other researchers to address it through campaign activities.

Topics:

  • Annotation, including full-text or academic use cases,
  • Interoperability in IIIF contexts,
  • Implementations of IIIF outside of North America/Europe

Keywords:

  • A method of linking text and image,
  • Buddhist studies,
  • full text database