A simple approach to use bilingual information sources for word alignment

  1. Esplà Gomis, Miquel
  2. Sánchez Martínez, Felipe
  3. Forcada Zubizarreta, Mikel L.
Zeitschrift:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Datum der Publikation: 2012

Nummer: 49

Seiten: 93-100

Art: Artikel

Andere Publikationen in: Procesamiento del lenguaje natural

Zusammenfassung

En este artculo se describe un metodo nuevo y sencillo para utilizar fuentes de informacion bilingue para el alineamiento de palabras en segmentos de texto paralelos. Este metodo puede ser utilizado al vuelo, ya que no requiere de entrenamiento. Ademas, puede ser utilizado con corpus comparables. Hemos comparado los resultados de nuestro metodo con los obtenidos por la herramienta GIZA++, ampliamente utilizada para el alineamiento de palabras, obteniendo unos resultados bastante similares.

Bibliographische Referenzen

  • Al-Onaizan, Y. and K. Knight. 2002. Translating named entities using monolingual and bilingual resources. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 400-408, Philadelphia, Pennsylvania.
  • Brown, P.F., S.A. Della Pietra, V.J. Della Pietra, and R.L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263-311.
  • Dagan, I., K.W. Church, and W.A. Gale. 1993. Robust bilingual word alignment for machine aided translation. In Proceedings of the Workshop on Very Large Corpora, pages 1-8, Columbus, USA.
  • Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. volume 39 of Series B. Blackwell Publishing, pages 1-38.
  • Esplà, M., F. Sánchez-Martínez, and M.L. Forcada. 2011. Using word alignments to assist computer-aided translation users by marking which target-side words to change or keep unedited. In Proceedings of the 15th Conference of the European Association for Machine Translation, pages 81-88, Leuven, Belgium.
  • Esplà-Gomis, M., F. Sánchez-Martínez, and M.L. Forcada. 2011. Using machine translation in computer-aided translation to suggest the target-side words to change. In Proceedings of the 13th Machine Translation Summit, pages 172-179, Xiamen, China.
  • Esplà-Gomis, M., F. Sánchez-Martínez, and M.L. Forcada. 2012. UAlacant: using online machine translation for crosslingual textual entailment. In Proceedings of the First Joint Conference on Lexical and Computational Semantics, pages 472-476, Montreal, Quebeq, Canada.
  • Forcada, M.L., M. Ginestí-Rosell, J. Nordfalk, J. O'Regan, S. Ortiz-Rojas, J. Pérez-Ortiz, F. Sánchez-Martínez, G. Ramírez-Sánchez, and F. Tyers. 2011. Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 25(2):127-144.
  • Fung, P. and K. McKeown. 1997. Finding terminology translations from non-parallel corpora. pages 192-202.
  • Koehn, P. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of Machine Translation Summit X, pages 79-86, Phuket, Thailand.
  • Koehn, P., H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL, pages 177-180, Prague, Czech Republic.
  • Koehn, P., F.J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 48-54, Edmonton, Canada.
  • Kranias, L. and A. Samiotou. 2004. Automatic translation memory fuzzy match post-editing: A step beyond traditional TM/MT integration. In Proceedings of the 4th International Conference on Language Resources and Evaluation, pages 331-334, Lisbon, Portugal.
  • Lambert, P., A. De Gispert, R. Banchs, and J. Mari~no. 2005. Guidelines for word alignment evaluation and manual alignment. Language Resources and Evaluation, 39(4):267-285.
  • Och, F.J. and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19-51.
  • Rapp, R. 1999. Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pages 519-526, College Park, USA.
  • Schulz, S., K. Markó, E. Sbrissia, P. Nohama, and U. Hahn. 2004. Cognate mapping: a heuristic strategy for the semi-supervised acquisition of a Spanish lexicon from a Portuguese seed lexicon. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland.
  • Vogel, S., H. Ney, and C. Tillmann. 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th International Conference on Computational Linguistics, pages 836-841, Copenhagen, Denmark.