A machine learning method for identifying impersonal constructions and zero pronouns in Spanish

  1. Rello Sánchez, Luz
  2. Suárez García, Pablo
  3. Mitkov, Ruslan
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2010

Número: 45

Páginas: 281-286

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

En este trabajo se presenta un método basado en aprendizaje automático para la clasificación de la elipsis del sujeto como referencial o no referencial en español. Se trata, tal como se desprende de la revisión bibliográfica realizada, del primer intento de identificar construcciones impersonales no referenciales en esta lengua. Una evaluación del sistema con un corpus de entrenamiento formado por 6.827 verbos anotados ha mostrado que alcanza una exactitud del 87%.

Referencias bibliográficas

  • Bergsma, S., D. Lin, and R. Goebel. 2008. Distributional identi cation of nonreferential pronouns. In Proceedings of the 46th Annual Meeting of the ACL/HLT-08, pages 10{18.
  • Brucart, J. M. 1999. La elipsis. In I. Bosque and V. Demonte, editors, Gramatica descriptiva de la lengua espa~nola, volume 2. Espasa-Calpe, Madrid, pages 2787{2863.
  • Chinchor, N. and L. Hirschman. 1997. MUC- 7 Coreference task de nition (version 3.0). In Proceedings of the MUC-97. Chomsky, N. 1981. Lectures on government and binding. Mouton de Gruyter, Berlin, New York.
  • Cleary, J.G. and L.E. Trigg. 1995. K*: an instance-based learner using an entropic distance measure. In Proceedings of the 12th ICML-95, pages 108{114.
  • Danlos, L. 2005. Automatic recognition of French expletive pronoun occurrences. In Robert Dale, Kam-Fai Wong, Jiang Su, and Oi Yee Kwong, editors, Natural language processing. Proceedings of the 2nd IJCNLP-05, pages 73{78, Berlin, Heidelberg, New York. Springer. Lecture Notes in Computer Science, Vol. 3651. Evans, R. 2001. Applying machine learning: toward an automatic classi cation of it. Literary and Linguistic Computing, 16(1):45{57.
  • Ferrández, A., A. Palomar, and L. Moreno. 1999. An empirical approach to Spanish anaphora resolution. Machine Translation, 14(3/4):191{216. Ferrández, A. and J. Peral. 2000. A computational approach to zero-pronouns in Spanish. In Proceedings of the 38th Annual Meeting of the ACL-2000, pages 166{ 172.
  • Hall, M., E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. 2009. The WEKA data mining software: an update. SIGKDD Explorations, 11(1):10{18.
  • Han, N. 2004. Korean null pronouns: classification and annotation. In Proceedings of the Workshop on Discourse Annotation. 42nd Annual Meeting of the ACL-04, pages 33{40.
  • Mitkov, R. 2002. Anaphora resolution. Longman, London. Mitkov, R. 2010. Discourse processing. In Alexander Clark, Chris Fox, and Shalom Lappin, editors, The handbook of computational linguistics and natural language processing. Wiley Blackwell, Oxford, pages 599{629.
  • Okumura, M. and K. Tamura. 1996. Zero pronoun resolution in Japanese discourse based on centering theory. In Proceedings of the 16th COLING-96, pages 871{876.
  • Peral, J. and A. Ferrandez. 2000. Generation of Spanish zero-pronouns into English. In D. N. Christodoulakis, editor, Natural Language Processing. Proceedings of the 2nd International Conference on NLP-2000. Springer, Berlin, Heidelberg, New York, pages 252{260. Lecture Notes in Computer Science, Vol. 1835.
  • Real Academia Espa~nola. 2001. Diccionario de la lengua espa~nola. Espasa-Calpe, Madrid, 22 edition.
  • Real Academia Espa~nola. 2009. Nueva gramática de la lengua espa~nola. Espasa- Calpe, Madrid.
  • Recasens, M. and E. Hovy. 2009. A deeper look into features for coreference resolution. In Lalitha Devi Sobha,
  • Antonio Branco, and Ruslan Mitkov, editors, Anaphora Processing and Applications. Proceedings of the 7th DAARC-09. Springer, Berlin, Heidelberg, New York, pages 29{42. Lecture Notes in Computer Science, Vol. 5847.
  • Rello, L. 2010. Elliphant: A machine learning method for identifying subject ellipsis and impersonal constructions in spanish. Master's thesis, University of Wolverhampton, UK.
  • Rello, L. and I. Illisei. 2009. A rule-based approach to the identi cation of Spanish zero pronouns. In Student Research Workshop. RANLP-09, pages 209{214.
  • Steinberger, J., M. Poesio, M. A. Kabadjov, and K. Jeek. 2007. Two uses of anaphora resolution in summarization. Information Processing and Management, 43(6):1663{ 1680.
  • Tapanainen, P. and T. Jarvinen. 1997. A non-projective dependency parser. In Proceedings of the 5th Conference on ANLP-97, pages 64{71.
  • Witten, I. H. and E. Frank. 2005. Data mining: practical machine learning tools and techniques. Morgan Kaufmann, London, 2 edition.
  • Yeh, C. and Y. Chen. 2003. Zero anaphora resolution in Chinese with partial parsing based on centering theory. In Proceedings of the International Conference on NLPKE- 03, pages 683{688.
  • Zhao, S. and H.T. Ng. 2007. Identi cation and resolution of Chinese zero pronouns: a machine learning approach. In Proceedings of the 2007 Joint Conference on EMNLP/CNLL-07, pages 541{550.