Changing Syntactic Classes in transfer-based Machine Translation

Yoshihiro Matsuo, Satoshi Shirai and Satoru Ikehara

NTT Communication Science Laboratories
1-2356 Take, Yokosuka-shi, Kanagawa, 238-03 JAPAN
{yosihiro,shirai,ikehara.}@nttkb.ntt.jp


Abstract

Many machine translation systems have two problems, (1) they decompose input sentences too finely and (2) it is difficult to adapt them to a specific domain. The direct parse tree translation method has been proposed as a solution of the above problems. One of the most important steps in this method is changing the syntactic classes to obtain high quality translation. In this paper, we discuss advantages of the direct parse tree translation method, and the mechanism for changing syntactic classes, an important problem when translating between very different languages such as Japanese and English.



[ NLPRS '95, Vol.1, pp.432-437 (December, 1995). ]





INDEX

     1 Introduction
2 Direct Parse Tree Translation
  2.1 Translation process of Direct Parse Tree Translation
  2.2 Advantages of the Direct Parse Tree Translation Method
    2.2.1 High quality translation for Expressions difficult to Translate
    2.2.2 The Role of Additional Translation Process to Conveniional MT systems
    2.2.3 Prevention of Undesired Side Effects
3 Changing syntactic classes
  3.1 Adverb to Adjective
  3.2 Noun to Verb
  3.3 Adjective to Adverb
  3.4 Verb or Adjective to Noun
  3.5 Clause to Phrase
4 Translation Examples
5 Conclusion
  References



1 Introduction

Many machine translation systems use the transfer method. Most of these are designed based on the principle of compositional semantics, where an input sentence is decomposed into many components such as noun phrases, verb phrases, adverbial phrases, and so on Each component is translated into the expressions of the target language independently and finally recombined into a sentence in the target language. However this method fails when individually translated components do not combine to form a correct sentence in the target language. In the case of translations between very different languages such as Japanese and English, it is very difficult to achieve high quality by this translation method.

In addition, these machine translation systems are usually composed of many translation phases such as morphological analysis, syntactic analysis, transfer process and target language generation. This separation of the translation processes complicates tuning a machine translation system to a specific domain because of the many distributed processes to be modified.

In order to solve these problems, we have proposed the direct parse tree translation method which matches patterns between dependency analyzed Japanese input sentence and parse tree patterns in a dictionary (Matsuo et al., 1994) The method comprises of three phases. First the part of the parse tree of an input sentence which matches the pattern is transfered into the framework of the target language. Second the leaf components separated from the framework are translated into the target language ready to be combined into the previously translated framework Finally these components are recombined with the translated framework.

One of the most important steps in this method is changing the syntactic classes (parts of speech) in the second process. To translate smoothly from one language to another it is often necessary to change the syntactic classes of elements.

In this paper, we discuss advantages of the direct parse tree translation method (in section 2), and the mechanism for changing syntactic classes (in section 3).




2 Direct Parse Tree Translation




2.1 Translation process of Direct Parse Tree Translation

Recently, the direct parse tree translation method (Matsuo et al., 1994) has been proposed as a method to translate sentences which are difficult to translate literally. This method directly translates parse tree structures into largely changed target language structures without decomposition of the input sentences. This method works with a traditional transfer-based translation system based on compositional semantics, bypassing the normal translation process. Figure 1 shows the translation process of the direct parse tree translation method applied to a Japanese to English machine translation system.

Figure 1: Translation process of Direct Parse Tree Translation

The translation is conducted as follows,

  1. Translation pairs which have very different structures between source and target language are previously prepared by hand in bilingual parse tree dictionary.

  2. The pattern matching part receives a dependency analyzed Japanese sentence and compares it with the Japanese parse tree patterns described in the Japanese parse tree dictionary.

  3. If the pattern matches the input tree, the parse tree dividing part divide the tree into a 'trunk part' and some 'leaf parts'.

  4. For the trunk part, since the scattered information is already captured, no further deep analysis is required. The system merely uses the English template which corresponds to the pattern from the Japanese parse tree dictionary.

  5. For each leaf part, the system invokes the relevant semantic analyzer and transfer process needed to generate the English structures. The syntactic classes of leaves are changed when necessary (see section 3).

  6. The English synthesizer receives the English template read from the English template dictionary and the English structures generated by the transfer process, and combines them into a complete English sentence.




2.2 Advantages of the Direct Parse Tree Translation Method

This translation system has the following advantages:

The following sections describe these advantages.




2.2.1 High quality translation for Expressions difficult to Translate

In a conventional machine translation system, an input sentence is decomposed into simple structures, such as noun phrases or unit sentences1, which are recombined the translations of each structure into a target language sentence. This feature may sometimes cause unexpected and unlikely decomposition of input sentences for good translation. Once decomposed, this translation method does not have an appropriate way to combine these unexpectedly decomposed small structures, and will produce the translation with the same parse tree structure as the input sentence.

To avoid such unexpected decomposition, the rules in the direct parse tree translation method are applied to arbitrary parts of the parse tree of a whole sentence. These rules can capture a large enough part of the parse tree, which represents the semantic units for both the source and target languages.

Thus high quality translation can be performed for expressions difficult to translate with conventional machine translation systems




2.2.2 The Role of Additional Translation Process to Conveniional MT systems

In the conventional machine translation systems, the translation process is separated into several phases, such as morphological analysis, syntactic analysis, semantic analysis, transfer, generation processes. Therefore, even if new translation conditions for some expressions are found, it is difficult to introduce them into the existing machine translator, because there are a lot of modules to be improved.

For example, in the translation of documents for a specific domain, it is sometimes necessary to add a new feature of modality, tune the semantic attribute propagation rule in noun phrase analyzer, or add a new transfer rule, and so on.

This feature can be a disadvantage to adapt the machine translation system into some specific domain.

The direct parse tree translation method can give the solution to such requirement, because this method requires only one translation rule to produce one newly expected translation. The concentration of the translation rule will decrease the cost to modify the machine translation systems.




2.2.3 Prevention of Undesired Side Effects

Translation rules need to be applied precisely to the aimed expressions. However, in natural language, the same expression structure does not always have the same meanings. The same structure has sometimes different meanings. Then there is a danger that the translation rules will be applied to unexpected expressions.

In the direct parse tree translation method, translation rules can be written using detailed semantic attributes as well as syntactic categories. And the rules are defined with many selective conditions widely distributed within parse tree. This assures that undesired side effects will be avoided.




3 Changing syntactic classes

The direct parse tree translation method translates parse tree structures into largely changed target parse tree structures. Therefore, the leaf parts cut out from the source language parse tree should sometimes be translated into different syntactic classes.

Especially in Japanese to English translation system, verbal phrases sometimes need to be transformed into noun phrases.

It is said that Japanese uses a higher proportion of verbs and English a higher proportion of nouns when compared to each other (Ando, 1986). This feature of these languages requires the mechanism to change the syntactic classes of Japanese verbs into English nouns and to transform Japanese clauses into English phrases such as noun phrases

These transformations are not always possible and are not always adequate. For example, it will be unacceptable if an complex subordinate clause is changed into a noun phrase. In the system shown in Figure 1, when there is a leaf that is difficult to transform, conventional machine translation method can undertake whole sentence translation. Therefore, it is not necessary to transform every expression and we have designed the syntactic class changer without forcible transformation.

To implement these transformations, the transformation process changes either the source language structure or the target language structure. Often the transformation can be carried out in either language. The direct parse tree translation method has transformation processes for both source and target language structures, and which is performed depends on the kind of transformation and the input sentence (Figure 1).

Several transformations are described in the following sections.




3.1 Adverb to Adjective

When Japanese verb phrases are translated into English noun phrases, Japanese adverbs need to be changed into adjectives in English. Besides the translation of these phrases, there are many cases that Japanese expressions corresponding to English complement adjectives are expressed by adverbs. For example, "paint $B!A(B blue" corresponds to the Japanese expression $B!A(B-o aoku nuru '$B!A(B- OBJ blue paint' where the aoku is an adverb modifying the verb nuru.

In this case, the syntactic class of aoku 'bluely', an adverb need to be changed into an adjective aoi 'blue'. However, in changing an adverb to an adjective, there are sometimes no English adverbs corresponding to Japanese adverb. Therefore, the system tries to change its syntactic class in Japanese by conjugating the Japanese word if possible, and uses the adjective translator. If changing the classes within Japanese expressions is impossible, the system uses the adverb translator and tries to change syntactic class of the translation in English structure. If both attempts fail, the whole sentence will be translated by conventional transfer methods.

(A"nuru"
(B noun-phrase :particle ("wa" "ga") )
(C noun-phrase :particle "o")
(D adverbial-phrase :sem color) )
((SUBJB)
(VP"paint")
(OBJC)
(COD :ADJ) )

Figure 2: Outline of bilingual rule (adverb to adjective)




3.2 Noun to Verb

When translating a Japanese functional verb such as okonau 'do' or suru 'do' which has an action noun on its o-case, it is sometimes preferable to use the action noun as a verb (Oku, 1990). In this case, the mechanism to transform the action noun into a verb is required. For example, the translation of kent.ANt-o okonau 'examination-OBJ do' into "examine" requires the transformation from the Japanese noun kentNt into the English verb "examine" (Figure 3). When the noun is a derivative of a Japanese nominal verb (sahen-meishi ), the system changes the noun to the verb within Japanese expression and translates it into English. Otherwise, the system translates the Japanese noun into an English noun and then looks up the verb derivative of the English noun in the English dictionary




3.3 Adjective to Adverb

In the process shown in the previous section, Japanese nouns are translated into English verbs. Accompanied this process, the modifiers of the Japanese nouns, namely adjectives, need to be changed into adverbs in English. For example, the translation of shousaina kentNt-o okonau (detailed examination-OBJ do' into "examine in detail", requires a syntactic class change from nominal adjective shousaina to the prepositional phrase "in detail", an adverbial phrase (Figure 3).

(A"okonau"
(B noun-phrase :particle ("wa" "ga") )
(C noun-phrase :particle "o"
(D noun-phrase :particle "no")
(E adjective-phraae) ) )
((SUBJB)
(VPC :VERB)
(OBJD)
(ADVPE :ADV) )

Figure 3: Outline of bilingual rule (noun to verb and adjective to adverb)




3.4 Verb or Adjective to Noun

The transformation from clauses to phrases which is mentioned in the following sections requires the transformation from verbs or adjectives to nouns. The system uses noun derivatives of the English obtained with conventional translation method.




3.5 Clause to Phrase

We have showed transformations from one phrase to another phrase with different syntactic classes in the above sections. There are more complicated cases which require syntactic class changes. In Japanese to English translation, changes from subordinate clause to noun phrase, gerundial phrase or infinitive are especially required. We will show only the transformation of clauses to noun phrases.

The direct parse tree translation method is currently equipped with 52 transform rules to transform clauses to noun phrases which are categorized with sentence patterns, kind of case element, kind of modifier, kind of verb, etc. Each rule uses the primitive transform mechanism described in the above sections. For example, applying the rule in Figure 4 to nioi-ga tsuyokat-ta-node kare-wa mezame-ta 'smell-SUBJ strong-PAST-CAUSE he-SUBJ wake-PAST' into "strong smell", and "The strong smell woke him up" is obtained.

(A"mezameru"
(B noun-phrase :partticle ("wa" "ga"))
(C subordinate-clause :particle "node "))
((SUBJC :NP)
(VP"wake")
(OBJB)
(ADV"up"))

Figure 4: Outline of bilingual rule (clause to noun-phrase 1)

Some exceptional transformation which cannot be achieved by the a priori rules of the direct parse tree translation method can be described with the bilingual rule of the direct parse tree translation method. For example, to obtain "The bell woke me up" as the translation of kane-ga nat-ta-node mezame-ta 'bell-SUBJ ring-PAST-CAUSE wake-PAST' the bilingual rule in Figure 5 can be used.

(A"mezameru"
(B noun-phrase :particle ("wa" "ga") )
(C "naru" :particle "node"
(D noun-phrase :particle "ga") ) )
((SUBJD)
(VP"wake")
(OBJB)
(ADV"up") )

Figure 5. Outline of bilingual rule (clause to noun-phrase 2)




4 Translation Examples

The direct parse tree translation method has been implemented into the Japanese to English machine translation system ALT-J/E and the syntactic class transformation processes were also realized. Some translation examples using the direct parse tree translation method with changing syntactic classes are shown in Table 1. From these results, it can be seen that, compafed with the conventional translation methods, the translation quality can be very much improved by the direct parse tree translation method.

Japanese
Gloss
Result with the direet parse tree translation
Result without the direct parse tree translation
Syntactic classes change
watashi-wahon-o sukoshikat-ta.
Ibookslightlybuy
I bought a few books.
I slightly bought a book.
adverb to adjective
amerika-wajidNtsya- yunyu-nokinshi-o kime-ta.
Americacar importprohibitiondecide
The United States decided to prohibit the import of car.
The United States decided on the prohibition of the import of car.
noun to verb
$B!A(BshNtsaina kentNt-ookonau.
detailedexaminationdo
$B!A(B examine in detail.
$B!A(B do a detailed examination.
adjective to adverb
kare-wajNtzuni oyogeru.
hewellcan swim
He is good at swimming.
He can swim well.
verb to noun
yNtsumi-kibun-ga tsuyoinaka$B!A(B.
wait-and-seemood strongcenter)
$B!A(B under the strong wait-and-see mood.
$B!A(B in that wait-and-see mood is strong.
clause to phrase

Table 1 : Translation Examples




5 Conclusion

This paper showed three advantages of the direct parse tree translation method ; (1) High quality translation can be performed for large-size parse tree's structures which are difficult to translate with conventional method ; (2) It works as an additional translation process to the conventional machine translation systems, so that it is very useful to build a domain specific translation system; (3) Transformation rule can be defined very precisely, then undesired side effects are prevented. In order to realize this method, it pointed out the necessity of 5 kinds of leaf translations which require syntactic class changes and showed how to implement these functions into existing machine translation systems. Currently the direct parse tree translation method was implemented in Japanese to English machine translation system, ALT-J/E (Ikehara, 1989) and translation rules are going to be collected. Our next study will be the evaluation of improvements in translation quality.




References

Ando, S. (1986).
Eigo-no Ronri Nihongo-no Ronri (English Logic and Japanese Logic). Taishukan Shoten. (in Japanese).

Ikehara, S. (1989).
Multi-Ievel Machine Translation Method. Future Computing Systems, 2(3).

Matsuo, Y, Shirai, S, Yokoo, A, and Ikehara, S. (1994).
Direct Parse Tree Translation in cooperation with the Transfer Method. In Proceedings of International Conference on New Methods in Language Processing (NeMLaP), Manchester.

Oku, M. (1990).
A Method for Analyzing Japanese Idioms. In Proceedings of Seoul International Conference on Natural Language Processing (SICONLP/90), Seoul.




Footnote
1 A unit sentence is made up of a verb and case elements. (Return)