Annotation Tool

The Annotation Tool (DATT - Discourse Annotation Tool for Turkish) has been developed for this project and uses the tags given below in the table. DATT produces XML files as annotation data which are generated by the implementation of a stand-off annotation methodology.

METU-TDB Tags

Annotated Discourse Elements
METU-TDB

Explicit connectives

annotated

Connective modifiers

annotated

1st and 2nd argument

annotated

Supporting Material

annotated

Reference elements in supporting material

annotated

Shared elements

annotated

Supporting material for shared elements

annotated

As explained in Aktaş et al. (2010), in order to avoid overlapping tags, stand off annotation has been preferred rather than in-line annotation. In stand-off markup, annotations are stored separate from data. Since the base file is not modified during annotation, it is guaranteed that all the annotators are dealing with the same version of the data. The text spans of dependency constructions are represented in terms of character offsets from the beginning of the text file. This is a highly errorprone way of storing annotation data. If there is a shift in the character indexes in the original text file, previously annotated data will be meaningless. To compensate for this, we keep the text spans of annotations for recovery purposes.

Annotation files are well-formed XML files. One can easily add new features to the annotations. XML facilities available as online sources such as the libraries for search and post-processing reduce the implementation effort of adding new features.

An XML output example is shown below:

Annotation Tool interface is shown below: