Feasible lexical selection for rule-based machine translation

Tyers, Francis Morton

Feasible lexical selection for rule-based machine translation

Tyers, Francis Morton

Zuzendaria:

Mikel L. Forcada Zubizarreta Zuzendaria
Felipe Sánchez Martínez Zuzendarikidea

Defentsa unibertsitatea: Universitat d'Alacant / Universidad de Alicante

Fecha de defensa: 2013(e)ko uztaila-(a)k 17

Epaimahaia:

Núria Bel Rafecas Presidentea
Juan Antonio Pérez Ortiz Idazkaria
Helena de Medeiros Caseli Kidea

Saila:

LENGUAJES Y SISTEMAS INFORMATICOS

Mota: Tesia

Teseo: 347167 DIALNET RUA editor

Laburpena

This thesis addresses the problem of lexical selection in rule-based machine translation. Lexical selection is the task of choosing, for a given source-language word, the most adequate translation in the target language among a known set of alternatives. To address the problem, it presents a formalism for lexical selection based on fixed-length context. Rules in this formalism are compiled into a finite-state transducer where the input side of the transducer represents source-language patterns, and the output side are rules selecting or removing possible translations in the target language. A best-coverage algorithm was defined which selects the fewest --- meaning the longest --- rules which cover the input sentence. We show that given little time, it is possible for hand-written rules to be written that improve the lexical-selection performance over three out of four translation systems tested. Relying on hand-written rules brings us face-to-face with the knowledge-acquisition bottleneck. To overcome this, we describe a general method of learning lexical-selection rules in the previously-defined formalism. The general method can be trained in a supervised manner, by using a word-aligned parallel corpus, and also in a novel unsupervised manner using the rest of the modules of the MT system in which the lexical-selection module is to be embedded. The unsupervised training works by first generating all the possible translations of a given source language sentence as regards lexical selection, and then scoring them on a target-language model. The scores from the target-language model are normalised such that each translation gets a fractional count corresponding to its share of the probability mass. In training, these fractional counts are used in place of the word-alignment counts in the supervised method. As preliminary experiments showed that including the whole ruleset had a detrimental effect on lexical-selection quality, we define a threshold under which rules are discarded. The threshold is set by tuning on a development corpus. Both methods are evaluated on the same four systems, with the result that, for the supervised-learning method, an improvement in lexical-selection quality is shown in three out of four systems compared with the baseline of choosing the most-frequently-aligned translation. For the unsupervised-learning method, an improvement is shown in two out of the four systems. Examination of the results of the rule-learning methods showed that useful information could be being discarded at two points, first in the training process when we apply the threshold and discard possible rules. The second is in the application of the best-coverage algorithm, where we make the assumption that longer rules are better rules. To overcome these two restrictions, we present a well-formed probability model for finding the most probable translation based on the principle of maximum entropy. At training time, each rule is assigned a weight, which is learnt from the corpus. At runtime, instead of applying only the best-coverage, we apply all of the rules and for each translation, sum the weights of the matched rules to find the most probable translation. Using the maximum-entropy weighted rules with the supervised-learning method resulted in an improvement over the unweighted rules in three out of four systems, and for the unsupervised-learning method, a substantial improvement in one out of four systems. In conclusion, the thesis presents a formalism based on fixed-length context rules and an implementation based on finite-state transducers. Methods of learning these rules from parallel and monolingual corpora are presented, and a method of assigning rule weights based on the principle of maximum entropy. Improvements over the baseline are shown for the majority of the systems tested.