A simple approach to use bilingual information sources for word alignment

  1. Esplà Gomis, Miquel
  2. Sánchez Martínez, Felipe
  3. Forcada Zubizarreta, Mikel L.
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2012

Issue: 49

Pages: 93-100

Type: Article

More publications in: Procesamiento del lenguaje natural


In this paper we present a new and simple method for using sources of bilingual information for word alignment between parallel segments of text. This method can be used on the y, since it does not need to be trained. In addition, it can also be applied on comparable corpora. We compare our method to the state-of-the-art tool GIZA++, widely used for word alignment, and we obtain very similar results.

Bibliographic References

  • Al-Onaizan, Y. and K. Knight. 2002. Translating named entities using monolingual and bilingual resources. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 400-408, Philadelphia, Pennsylvania.
  • Brown, P.F., S.A. Della Pietra, V.J. Della Pietra, and R.L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263-311.
  • Dagan, I., K.W. Church, and W.A. Gale. 1993. Robust bilingual word alignment for machine aided translation. In Proceedings of the Workshop on Very Large Corpora, pages 1-8, Columbus, USA.
  • Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. volume 39 of Series B. Blackwell Publishing, pages 1-38.
  • Esplà, M., F. Sánchez-Martínez, and M.L. Forcada. 2011. Using word alignments to assist computer-aided translation users by marking which target-side words to change or keep unedited. In Proceedings of the 15th Conference of the European Association for Machine Translation, pages 81-88, Leuven, Belgium.
  • Esplà-Gomis, M., F. Sánchez-Martínez, and M.L. Forcada. 2011. Using machine translation in computer-aided translation to suggest the target-side words to change. In Proceedings of the 13th Machine Translation Summit, pages 172-179, Xiamen, China.
  • Esplà-Gomis, M., F. Sánchez-Martínez, and M.L. Forcada. 2012. UAlacant: using online machine translation for crosslingual textual entailment. In Proceedings of the First Joint Conference on Lexical and Computational Semantics, pages 472-476, Montreal, Quebeq, Canada.
  • Forcada, M.L., M. Ginestí-Rosell, J. Nordfalk, J. O'Regan, S. Ortiz-Rojas, J. Pérez-Ortiz, F. Sánchez-Martínez, G. Ramírez-Sánchez, and F. Tyers. 2011. Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 25(2):127-144.
  • Fung, P. and K. McKeown. 1997. Finding terminology translations from non-parallel corpora. pages 192-202.
  • Koehn, P. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of Machine Translation Summit X, pages 79-86, Phuket, Thailand.
  • Koehn, P., H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL, pages 177-180, Prague, Czech Republic.
  • Koehn, P., F.J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 48-54, Edmonton, Canada.
  • Kranias, L. and A. Samiotou. 2004. Automatic translation memory fuzzy match post-editing: A step beyond traditional TM/MT integration. In Proceedings of the 4th International Conference on Language Resources and Evaluation, pages 331-334, Lisbon, Portugal.
  • Lambert, P., A. De Gispert, R. Banchs, and J. Mari~no. 2005. Guidelines for word alignment evaluation and manual alignment. Language Resources and Evaluation, 39(4):267-285.
  • Och, F.J. and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19-51.
  • Rapp, R. 1999. Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pages 519-526, College Park, USA.
  • Schulz, S., K. Markó, E. Sbrissia, P. Nohama, and U. Hahn. 2004. Cognate mapping: a heuristic strategy for the semi-supervised acquisition of a Spanish lexicon from a Portuguese seed lexicon. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland.
  • Vogel, S., H. Ney, and C. Tillmann. 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th International Conference on Computational Linguistics, pages 836-841, Copenhagen, Denmark.