Hennie Brugman, MPI for Psycholinguistics

Title: Annotated Recordings and Texts in the DoBeS project

Hennie Brugman
Max-Planck-Institute for Psycholinguistics
Nijmegen, The Netherlands

Email: Hennie.Brugman@mpi.nl


Abstract

On basis of some years of experience with building annotation tools and requirements from especially the DoBeS project a simple but powerful model for linguistic annotation was developed. The model covers multiple tiers or annotation layers that can depend on each other or be independent. Within this model a wide range of linguistic constructs can be represented in terms of a small number of basic components. Examples are complete or partial alignment of textual with audio or video data, morphological data, interlinear text, syntactic trees, coreference, discourse structure or gesture annotation. The paper will present the model and illustrate how it can be used to represent the phenomena that are mentioned.
The model is implemented as a set of Java classes that can be used for software tool construction. One of the most prominent tools that was developed is ELAN , a multimedia, multi-tier annotation tool.
A persistent representation of the model’s constructs is the Eudico Annotation Format (EAF), a relatively simple and human-analyzable XML format that is both used as persistent format for ELAN and for archiving annotation data within the DoBeS project. The paper will discuss how ELAN makes use of the model’s capabilities and it will briefly present the EAF format.