| Introduction
It
is not always easy to distinguish between
research and development or demonstration
system and product in the area of Machine
Translation (MT) however, it is useful to
distinguish between fully automatic machine
translation, human-assisted machine translation
and machine-assisted human translation all
of which fall within the purview of MT. In
this article the focus is exclusively on the
first of these, fully automatic machine translation.
The
mid-1980’s on into the early 1990’s
marked the zenith of linguistically inspired,
rule-based approaches to MT. Major projects
included Eurotra (Europe, 1982-1993; Durand,
et al. 1991), the fifth generation initiative
(Japan, 1981-1993; Nagao 1989) and various
small experimental systems in the US (e.g.,
Carnegie Mellon University – Goodman
& Nirenburg 1991; New Mexico State University
– Farwell & Wilks, 1991; University
of Maryland – Dorr 1993 among others)
which culminated in the Pangloss-Mikrokosmos
Spanish-English Knowledge-Based MT system
(Nirenburg 1995). But, in no small measure
it was the US government funded MT initiative
from 1991 through 1995 through which Pangloss
was funded which led to a profound shift in
focus for MT R&D (and not simply for MT
but for all NLP tasks). That same initiative
also funded the development of Candide (Brown,
et al. 1993), a French-English statistics-based
MT system developed at IBM which to this day
is the paradigm for statistics-based approaches.
Equally importantly, the initiative funded
the development of a framework for evaluating
and comparing the performances of different
systems over a “comparable” task
(White & O’Connell 1994). The upshot
of this initiative was that Candide out-performed
Pangloss on the evaluation task and, in fact,
almost performed as well as Systran French-English,
the leading commercial system that participated
in the evaluation exercise, a rule-based system
that had some 15 years of development behind
it. Thus, almost more impressive than the
translation results was the fact that those
results were achieved by a relatively small
research team (5 or so members) in a rather
short time (roughly three to five years).
This
initiative had a profound effect on MT R&D.
By the year 2000 there was a handful of groups
around the world working on statistical MT
(IBM Yorktown Heights., ISI, CMU, University
of Aachen, and Karlsruhe University). A year
earlier there was a summer workshop for researchers
at Johns Hopkins where the participants were
able to assemble an infrastructure and develop
a respectable working system. By 2002 there
were at least a dozen research groups around
the world working on statistical MT and by
2005 at least one system, from Language Weaver,
a spin-off from the ISI research group in
California, had been installed at the CIA
and incorporated into the workflow of intelligence
analysts of documents in unfamiliar source
languages.
Current
MT R&D
There
are two major corpus-based approaches to Machine
Translation that have become the focus of
the research community. Obviously, given what
has been presented thus far, there is a good
deal of energy being expended on improving
the state of the art of statistical systems.
But there is also a fairly high degree of
activity aimed at developing example-based
MT. In addition, as has always been the case
in this field, evaluation is receiving a good
deal of attention.
Statistical
MT
A
prototypical stochastic system consists of
three statistical models: alignment, translating
(or decoding) and target language modeling
for improving fluency. The first two are built
by looking at parallel texts, lots of parallel
texts, millions of words of parallel text
if possible. The text is generally broken
down into short units (such as sentences).
The alignment model essentially provides statistics
to answer questions such as the following:
given the third word of the source language
string, what is the likelihood its counterpart
is the first word of the target language string,
the second word of the target language string,
the third word, and so on. Given the fourth
word of the source language string, what is
the likelihood that its counterpart is the
first word of the target language string,
the second word, and so on. Then by simply
counting the number of times each case is
true in a huge corpus and dividing by the
number of strings altogether, the results
is a set of likelihoods for each alignment.
The translation model essentially provides
answers to the following: given a word “s”
(possibly in a specific context), what is
the likelihood that its target language equivalent
(the aligned counterpart) is “x,”
is “y,” is “z,” and
so on? The answer can be provided by simply
looking at the aligned counterparts of all
the occurrences of “s” (possibly
in specific context) in the source language
corpus and dividing by the total number of
occurrences. Now when these two statistical
models are applied to a sequence of words
in a novel source language string, they will
suggest a sequence of words in the target
language. Actually, they will suggest any
number of possible strings having different
words (translations) in different orders (alignments).
To select the most promising, the third model
is applied, the target language model. Looking
only at the target language text, this model
essentially provides statistics to answer
the following question: given words “a”
and “b” in that order, what is
the likelihood that the next word in the sequence
is “c,” what is the likelihood
that the next word is “d,” and
so on for all the words of the language. To
calculate these statistics, every sequence
of “a” followed by “b”
followed by “x” is inspected and,
for each different word “x,” the
number of occurrences and divide by the total
number of “a b x” sequences in
the corpus. Returning to the different translation
suggested by the alignment and translation
models, the target language model is applied
to each suggestion in order to calculate which
is the most probable.
Generally
this approach is better and better the larger
the parallel corpus there is for training
the models because the smaller the corpus,
the more likely novel combinations of words
will be encountered for which there is insufficient
statistical information to make choices. On
the other hand, there are so many possibilities
to calculate that even modern CPU and storage
capacities are such that it might take months
or years to actually carry out all the necessary
calculations. Thus we arrive at what is capturing
the interest of statistical MT researchers.
The basic issues are:
• How can approximate a complete calculation
be approximated so that the statistics are
reliable but at the same time the calculation
is possible within constraints of time and
memory,
• How can we improve each of the models
(in particular, what parameters can we use
beside word form, word form sequences and
word form alignments that might improve the
estimates.
In the former case, the expectation is that
larger corpora (such as the world wide web)
will inevitably lead to improved results.
In addition, there have been experiments with
different alignment techniques which focus
on “segments” of sentences (e.g.,
Deng, et al. 2004) or bilingual parallel “segments”
extracted from non parallel texts (e.g. Munteanu,
et al. 2004). In the latter case, even such
naïve additions word stems, part-of-speech
or morphological information (case, number,
noun-adjective agreement, verb-nominal agreement,
etc.) have sometimes lead to improved performance
(but not necessarily!). Yamuda and Knight
(2001), for instance, describe a technique
for developing syntactic transfer systems
using aligned corpora in which at least one
of the languages is syntactically annotated.
In the near future there is sure to be experimentation
using additional linguistically motivated
morphosyntactic and possibly semantic parameters.
Example-based
MT
A
prototypical example-based system also is
corpus based but it approaches the corpus
with different assumptions and different goals.
In this case the idea is to break the parallel
corpus down into repeating translation units,
constituent-sized templates generally, sequences
of words with possible variables interspersed
(such as “fishing license” –
“permiso” or “licencia de
pesca” vs “drivers license”
– “carnet” or “permiso
de conducer”) but including sequences
of constituents as well (such as “level
a building” – “arrasar un
edificio” vs “level the score”
– “igualar el marcador”
vs “level charges against” –
“hacer acusaciones en contra”)
and on up to full sentence templates (such
as “in for a penny, in for a pound”
– “de perdidos, al rio”).
These equivalence units are then stored in
a large database which can later be used to
support the translation process or such equivalences
may be detected on the fly as part of the
process of translation. In either case, during
translation, the input text is first matched
against the templates on the source language
side of the example base and, if a match is
found, the corresponding target template is
available for generation. If no match is found,
or if there is untranslated material corresponding
to a template variable, then the text may
be translated using a traditional rule-based
system. In some sense, the equivalences in
the example base are similar to the expressions
recorded in a translation memory except, they
may be so basic that any translator would
think as too obvious to warrant recording
and they may be so “literal” (e.g.
it might include “level a building”,
“level a barn”, “level a
skyscraper”, and so on) as to not merit
the time to record them. This approach was
initially developed in the 1980’s by
the Kyoto University MT research group working
on a grammar-based approach to Japanese-English
translation (Nagao 1984). It has three principle
advantages. First, it allows for the treatment
of discontinuous constituents such as “figure
out” in English which often appears
with intervening material as in “figure
the answer out.” Second, it allows the
translation system to deal with idiosyncratic
collocational phenomena (which all of the
examples above reflect). Finally, it allows
the translation system increased potential
to generate natural fluent as well as colloquial
target language text. More recently, it also
provides an additional advantage. It can be
incorporated more easily into a rule-based
MT system (as opposed to the combination of
rule-based and statistics-based systems.
As
example-based approaches try to increase the
use of the examples during processing (as
opposed to applying a general grammar-based
MT analyzer/generator) they appear to be slowly
converging with statistical approaches (which
conversely appear to be moving from a word-level
focus to a constituent level focus). More
recently, interest in example-based MT approach
has focused on trying to skip the construction
of an example base and instead attempt to
use the parallel corpus directly as a source
of examples during the translation process.
For this exercise to work, corpus alignment
is extremely important, as it is for statistical
approaches, although there is perhaps more
concern for establishing a constituent level
alignment as opposed to a word or string level
alignment (e.g., Owczarzak, et al. 2006).
In addition, there is a good deal interest
in improving the matching process between
source text and the source corpus of the parallel
aligned corpus and long with merging target
language text segments together to form fluent
output (Brown, et al. 2003).
Evaluation
As
mentioned earlier, a major outcome of the
US government funded MT initiative in the
early 1990’s was the development an
evaluation methodology that could compare
system performance on comparable tasks, the
translation of texts in a common genre (namely,
news articles) and of a similar length. That
methodology however relied on human evaluators
who assessed such factors as fidelity and
fluency and as a result was both expensive
and time consuming, especially from the perspective
of MT developers. As a result, in the late
1990’s an automatic evaluation technique
was developed at IBM which, while not especially
useful as a diagnostic, was shown to correlate
with human judgments of relative quality.
BLEU (Papineni, et al. 2002), as the methodology
is referred to, is a statistical metric which
provides a score based on the number and length
of text segments in an output translation
which match text segments of one or more reference
(human generated) translations. It is widely
used at this point and helps developers by
indicating whether a more recent version of
their system performs better, on a par with
or worse than a prior version as well as telling
them how the performance of their system compares
with others over a common test set.
But
BLEU has it draw backs not the least of which
is the fact that test set have to be developed
generally by human translators. It is not
very insightful. It does not recognize categories
of errors nor the strengths and weaknesses
in some broad sense of different systems and
so is not useful as a diagnostic tool. It
is entirely focused on a throughput, that
is, the relationship of the input and output
texts. As a result, other evaluation methodologies
have been proposed and there has been at least
one effort, FEMTI (King, et al. 2003), to
systematically analyze the objectives of an
evaluation and to suggest a range of metrics
based on objectives.
Integrating
fully automatic MT systems into translation
process
Statistical
MT systems are fully automatic translators
and cannot actually be integrated into the
work stream of a particular translator. Rather,
they are used replace the translator. But
the quality, while much improved, is not especially
good. Thus such systems are generally used
to support document filtering for assimilation
tasks such as information analysis, email
and chat specifically for texts in languages
unknown to a “customer.” In this
case the system provides its translation such
as it may be and the customer must decide
whether the document appears to be worth closer
investigation in which case it is passed to
a human translator. There are some cases of
applying fully automatic systems to dissemination
tasks (e.g., job descriptions) especially
if boring, repetitive, closed domain translation
is involved. In these cases translations are
automatically generated and then passed to
a human, ideally monolingual, post-editor
who produces a fluent version of the translation.
In fact, one area of growing interest in the
MT research community is in developing automatic
(statistical) post-editors. In any case, the
key here is that the task involve a domain-limited
repetitive translation task and that the automatic
translation are sufficiently high quality
to make post-editing more efficient (and lees
expensive) than human translation.
Using
fully automatic MT for second language learning
or translator training
Using
translation in language second language learning
has been controversial for some time but for
those who find it useful, fully automatic
MT could conceivably be (and no doubt already
have been) incorporated into on line reading,
writing and translation exercises. Whether
the system provides high quality translations
or merely hints at the content of the source
text, it might be used (obviously accordingly)
to assist in understanding or producing texts
in the language being acquired as well as
to provide materials which need to be edited
using knowledge of the language to be acquired.
But this topic is outside the area of expertise
and is best left to the interested reader.
One
area of potential benefit to both translator
training and MT, however, would be activities
that promote the development of corpora consisting
of multiple translations (in a given target
language) of a given set of source language
documents, especially if the translation were
annotated with linguistic information (morphological,
syntactic and semantic information). For translators
the central activity would be to compare and
contrast translations, identifying wherever
translation vary whether the variation is
the result of an error (classification being
useful), a non meaning impacting variation
(i.e., essentially paraphrases communicating
the same information content), or meaning
bearing variations permissible within the
set of possible (rational) interpretations
of a text. The resultant corpus would benefit
MT for both training and evaluating MT systems
and presumably benefit developing translators
by sensitizing them to the enormous range
of plausible interpretations (and therefore
translation) a text may have as well as providing
an interesting methodology for evaluating
a translator’s level of proficiency
and improvement over time.
References
Brown,
P. F., Della Pietra, S.A., Della Pietra, V.J.,
and Mercer, R.L. 1993. “The mathematics
of statistical machine translation: parameter
estimation.” Computational Linguistics
19 (2), 263-311.
Brown,
R., R. Hutchinson, P. Bennett, J. Carbonell,
and P. Jansen. 2003. Reducing Boundary Friction
Using Translation-Fragment Overlap",
in Proceedings of the Ninth Machine Translation
Summit, New Orleans, USA, pp. 24-31.
Deng,
Y., S. Kumar, and W. Byrne. 2004. Bitext Chunk
Alignment for Statistical Machine Translation.
CSLP Tech Report, Johns Hopkins University.
Dorr,
B. J. 1993. Machine translation: a view
from the lexicon. MIT Press, Cambridge,
Mass.
Durand,
J., P. Bennett, V. Allegranza, F. Van Eynde,
L. Humphreys, P. Schmidt & E. Steiner.
1991. The Eurotra Linguistic Specifications:
an overview, In: Machine Translation
6, Kluwer, Dordrecht, pp. 103-147.
Farwell,
D., and Y. Wilks. 1991. ULTRA: A Multilingual
Machine Translator. Proceedings of the
Machine Translation Summit III, 19-24.
Goodman,
K. and Nirenburg, S. (eds.) 1991. The
KBMT project: a case study in knowledge-based
machine translation. San Mateo, CA: Morgan
Kaufmann.
King,
M., Popescu-Belis, A. and Hovy, E. 2003. “FEMTI:
creating and using a framework for MT evaluation”
In: AMTA (2003), 224-231.
Munteanu,
D., A. Fraser, and D. Marcu. 2004. Improved
Machine Translation Performance via Parallel
Sentence Extraction from Comparable Corpora.
Proceedings of HLT/NAACL.
Nagao,
M. 1984. A framework of a mechanical translation
between Japanese and English by analogy principle.
In: Elithorn, A. & Banerji,R.(eds.) Artificial
and human intelligence (Amsterdam: North-Holland)
Nagao,
M.1989. Machine translation: how far can
it go? (Oxford: Oxford University Press)
Nirenburg,
S. (ed.) 1995. The Pangloss Mark III Machine
Translation System. A Joint Technical Report
by NMSU CRL, USC ISI and CMU CMT. Issued as
CMU tech report CMU-CMT-95-145 (Also available
as HTML from NMSU).
Owczarzak,
K., B. Mellebeek, D. Groves, J. Van Genabith
and A. Way. 2006. Wrapper Syntax for Example-based
Machine Translation. In Proceedings of
the 7th Conference of the Association for
Machine Translation in the Americas,
Boston, MA., pp.148—155.
Papineni,
K., Roukos, S., Ward, T. and Zhu, W.J. 2002.
“BLEU: a method for automatic evaluation
of machine translation.” In: ACL-2002:
40th Annual meeting of the Association for
Computational Linguistics, Philadelphia,
July 2002; 311-318.
White,
J.S. and T.A. O'Connell. 1994. The ARPA MT
Evaluation Methodologies: Evolution, Lessons,
and Future Approaches. Proceedings of
the 1994 Conference, Association for Machine
Translation in the Americas.
Yamada,
K., and K. Knight. 2001. A syntax-based Statistical
Translation Model. Proceedings of ACL,
523-530, Toulouse, France.
Desembre
2004
|