A trigram part-of-speech tagger for the Apertium free/open-source machine translation platform
- Sheikh, Zaid Md Abdul Wahab
- Sánchez-Martínez, Felipe
- Pérez-Ortiz, Juan Antonio (coord.)
- Sánchez-Martínez, Felipe (coord.)
- Tyers, Francisc M. (coord.)
Editorial: Universidad de Alicante / Universitat d'Alacant
ISBN: 978-84-613-6188-5
Año de publicación: 2009
Tipo: Capítulo de Libro
Resumen
This paper describes the implementation of a second-order hidden Markov model (HMM) based part-of-speech tagger for the Apertium free/open-source rule-based machine translation platform. We describe the part-of-speech (PoS) tagging approach in Apertium and how it is parametrised through a tagger definition file that defines: (1) the set of tags to be used and (2) constrain rules that can be used to forbid certain PoS tag sequences, thus re-fining the HMM parameters and increasing its tagging accuracy. The paper also reviews the Baum-Welch algorithm used to estimate the HMM parameters and compares the tagging accuracy achieved with that achieved by the original, first-order HMM-based PoS tagger in Apertium.