Fénixa flexible information exchange data model for natural language processing

  1. José M. Gómez
  2. David Tomás
  3. Paloma Moreda
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2014

Número: 52

Páginas: 21-28

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

In this paper we describe Fénix, a data model for exchanging information between Natural Language Processing applications. The format proposed is intended to be flexible enough to cover both current and future data structures employed in the field of Computational Linguistics. The Fénix architecture is divided into four separate layers: conceptual, logical, persistence and physical. This division provides a simple interface to abstract the users from low-level implementation details, such as programming languages and data storage employed, allowing them to focus in the concepts and processes to be modelled. The Fénix architecture is accompanied by a set of programming libraries to facilitate the access and manipulation of the structures created in this framework. We will also show how this architecture has been already successfully applied in different research projects.

Referencias bibliográficas

  • Bird, S., D. Day, J. Garofolo, J. Henderson, C. Laprun, and M. Liberman. 2000. Atlas: A flexible and extensible architecture for linguistic annotation. In Proceedings of the second international conference on Language Resources and Evaluation, LREC '00.
  • Bird, S. and M. Liberman. 2001. A formal framework for linguistic annotation. Speech Communication, 33(1-2):23-60.
  • Cassidy, S. and J. Harrington. 2001. Multilevel annotation in the emu speech database management system. Speech Communication, 33(1{2):61-77.
  • Cunningham, H., D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Robert, D. Damljanovic, T. Heitz, M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, and W. Peters. 2011. Text Processing with GATE (Version 6).
  • Ferrucci, D. and A. Lally. 2004. Uima: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3-4):327-348.
  • Gómez, J. M. 2008. Intime: Plataforma de integración de recursos de pln. Procesamiento del Lenguaje Natural, 40:83-90.
  • Maeda, K., S. Bird, X. Ma, and H. Lee. 2001. The annotation graph toolkit: software components for building linguistic annotation tools. In Proceedings of the first international conference on Human language technology research, HLT '01, pages 1-6, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Moreno-Monteagudo, L. and A. Suárez. 2005. Una propuesta de infraestructura para el procesamiento del lenguaje natural. Procesamiento del Lenguaje Natural, 35.