NFDI Project "Text+ohd"
Text+ Interfaces to the Interview Collections in Oral-History.Digital
The project “Text+ Interfaces to the Interview Collections in Oral-History.Digital (text+oh.d),” completed in 2025, made interview collections developed in historical research projects accessible within the Text+ infrastructure. Through the development of interfaces and the standards-compliant transformation of transcripts in the cross-collection interview portal Oral-History.Digital (oh.d), extensive corpora of everyday spoken language have been made available for reuse by text- and language-based research communities.
DescriptionAs a result of the project, the metadata from interview archives and collections were harvested via an OAI-PMH interface and cataloged in the Text+ Registry (https://registry.text-plus.org/), (Open Archives Initiative Protocol for Metadata Harvesting); on the other hand, the transcripts of the multiple hours long interviews are made available in ISO-compliant TEI-XML files (Text Encoding Initiative) to users with appropriate access permissions.
Through interface development and the conversion of transcripts into standardized formats, this data will become easier to search and use across disciplines in the future. In this way, the University Library supports interdisciplinary collaboration between history and linguistics nationwide.
OAI-PMH Interface for Interview Collections
One of the project’s goals was to further develop the existing OAI-PMH interface in order to automatically provide public metadata on interview archives and their collections in the Dublin Core and DataCite format and to list them in the Text+ Registry.
The metadata now visible in the registry includes archive and collection descriptions, information on contact persons and project responsibilities, the participating institutions, the number of interviews, the year of publication, terms of use, and information on data protection, as well as a link to the catalog pages in oh.d itself. Additional metadata that is available via the interface but is not currently harvested by Text+ includes topics tagged with GND IDs and the leve of curationl of the interviews.
TEI-XML Transformation of Interview Metadata and Transcripts
In addition, provided the necessary permissions are in place, full texts should also be made available in standard-compliant formats via Text+ for reuse. Many transcripts in oh.d contain annotations of facial expressions, gestures, foreign-language expressions, pauses, word breaks, and other linguistic and non-linguistic phenomena. Until the project was implemented, they could only be downloaded as CSV tables along with additional annotations such as keywords, headings, and comments.
In the Text+oh.d project, these transcripts were converted, along with the metadata, into XML files that comply with the TEI-based standard “ISO 24624:2016 Language resource management — Transcription of spoken language” (Hedeland/Schmidt 2022). In this process, the transcripts were tokenized segment by segment, and timecodes, speaker changes, keywords, and transcription symbols were converted into annotation blocks with consistent TEI tags
The transformation of transcripts into an ISO-compliant TEI-XML format was specified and tested on three exemplary collections that represent different transcription methods as well as linguistic diversity with translations, and furthermore cover a broad range of topics.
Although the transformation routine was developed based on three collections and tested for them, the TEI-XML download is available for all collections on Oral-History.Digital. The TEI-XML transcripts can be downloaded after registering on the platform and receiving access to the relevant archive.
Duration and Funding:The "Nationale Forschungsdateninfrastruktur" (NFDI) is working to ensure that research data is findable, accessible, interoperable, and reusable in the long term, in accordance with the FAIR principles.
The “Text+” consortium, led by the Leibniz Institute for the German Language (IDS) in Mannheim, brings together expertise and data from universities, academies, and research centers. The focus is on:
- Corpora and text collections
- Lexical resources
- Editions and infrastructures
Every year, “Text+” funds new projects to improve data and services for language and text-oriented sciences.
The “Text+ohd” project was one of five collaborations funded in 2025, running from January 1, 2025, to November 30, 2025. Through its participation in “Text+,” the University Library of the Freie Universität is actively shaping the future of digital research, teaching, and information infrastructures for text- and language-oriented sciences.
Team: Kontakt:Freie Universität Berlin
University Library
Digital Interview Collections
Project Text+ohd
Garystr. 39
14195 Berlin
Web: https://www.fu-berlin.de/text_ohd
Mail: text+ohd@oral-history.digital

