Annotating an Oral Corpus using the Text Encoding Initiative. Methodology, Problems, Solutions

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)


The objective of this paper is to describe and evaluate the application of the Text Encoding Initiative (TEI) Guidelines to a corpus of oral French, this being the first corpus of oral French where the TEI has been used. The paper explains the purpose of the corpus, both in creating a specialist corpus of néo-contage that will broaden the range of oral corpora available, and, more importantly, in creating a dataset to explore a variety of oral French that has a particularly interesting status in terms of factors such as conception orale/écrite, réalisation médiale and comportement communicatif (Koch and Oesterreicher 2001). The linguistic phenomena to be encoded are both stylistic (speech and thought presentation) and syntactic (negation, detachment, inversion), and all represent areas where previous research has highlighted the significance of factors such as medium, register and discourse type, as well as a host of linguistic factors (syntactic, phonetic, lexical). After a discussion of how a tagset can be designed and applied within the TEI to encode speech and thought presentation, negation, detachment and inversion, the final section of the paper evaluates the benefits and possible drawbacks of the methodology offered by the TEI when applied to a syntactic and stylistic markup of an oral corpus.
Original languageEnglish
Pages (from-to)103-119
Number of pages17
JournalJournal of French Language Studies
Issue number1
Publication statusPublished - Mar 2008


Dive into the research topics of 'Annotating an Oral Corpus using the Text Encoding Initiative. Methodology, Problems, Solutions'. Together they form a unique fingerprint.

Cite this