A trigram part-of-speech tagger for the Apertium free/open-source machine translation platform

  1. Sheikh, Zaid Md Abdul Wahab
  2. Sánchez-Martínez, Felipe
Libro:
Proceedings of the First International Workshop on Free/Open-Source Ruled-Bases Machine Translation: 2-3 november 2009, Universidad d'Alacant
  1. Pérez-Ortiz, Juan Antonio (coord.)
  2. Sánchez-Martínez, Felipe (coord.)
  3. Tyers, Francisc M. (coord.)

Editorial: Universidad de Alicante / Universitat d'Alacant

ISBN: 978-84-613-6188-5

Año de publicación: 2009

Tipo: Capítulo de Libro

Resumen

This paper describes the implementation of a second-order hidden Markov model (HMM) based part-of-speech tagger for the Apertium free/open-source rule-based machine translation platform. We describe the part-of-speech (PoS) tagging approach in Apertium and how it is parametrised through a tagger definition file that defines: (1) the set of tags to be used and (2) constrain rules that can be used to forbid certain PoS tag sequences, thus re-fining the HMM parameters and increasing its tagging accuracy. The paper also reviews the Baum-Welch algorithm used to estimate the HMM parameters and compares the tagging accuracy achieved with that achieved by the original, first-order HMM-based PoS tagger in Apertium.