Presentation type: WorkshopAbstract:
Description of the workshop
Our comprehensive, hands-on Goobi IIIF workshop is designed to give all participants the skills they need to generate valid, standardised metadata from directories containing images using very simple resources in less than 20 minutes and without further help. The metadata can then be exported as METS/MODS (libraries), LIDO (museums) or even in TEI format for digital humanities. At the same time, Goobi-to-go automatically generates valid IIIF Presentation manifests. That means the images are immediately available via the IIIF Image API and can be used straight away in any number of other IIIF consumers (e.g. Mirador).
In the second part of the workshop, participants will gain an insight into our current developments using IIIF interfaces in combination with machine-learning techniques for various purposes, e.g. to perform image and full text analyses.
Goobi is an open-source application. For over 14 years, it has brought together many cultural institutions in currently 17 countries as part of a single digitisation community, focusing equally on the coordination of both simple and complex workflows (Goobi workflow) for digitisation projects and on the publication of the digitised results of those projects (Goobi viewer). Alongside standardised interfaces such as OAI-PMH and SRU, IIIF plays a crucial role in ensuring interoperability between published digital collections and different external data consumers. In light of the growing interaction with the burgeoning field of digital humanities, the Goobi community needed to make its numerous portals and the collections of many different cultural institutions available in readily consumable form for research purposes. With this in mind, Goobi already supports the IIIF Image API 2.1, IIIF Presentation API 2.1 and IIIF Change Discovery API in draft status as well as Web Annotations und Open Annotations for use in the context of crowdsourcing.
Goobi does much more than simply provide a range of data for other systems and consumers. In fact, the whole Goobi community is now increasingly focused on the use of IIIF APIs in new scenarios. In particular, Goobi makes good use of these interfaces in various ongoing development projects that can harness machine-learning processes to identify or generate new data. Machine-learning techniques based on pattern recognition and text analysis have already been used in the past to recognise publication types and segment content indices. However, now that data can be obtained from standardised interfaces, these techniques are no longer dependent solely on locally stored data from the file system. Equally, the Goobi community now benefits enormously from the latest OCR developments in Tesseract, although without further training data those developments cannot yet provide the desired recognition quality. Working without IIIF is no longer a viable option, whether your development project involves using machine-learning techniques to generate ground truth data and extract illustrations or creating synthetic texts with unusual fonts based on text segments and letter coordinates. Over the last two years in particular, it has become clear within the Goobi community that various IIIF interfaces will be needed if newer projects and therefore newer research data are to be viable in the long term.
- Annotation, including full-text or academic use cases,
- Using IIIF material for Machine Learning and AI,
- IIIF Implementation Spectrum: large-scale or small-scale projects,
- IIIF communities (3D, archives, museums, manuscripts, newspapers, etc.)
- IIIF generation,