ALT-J/E
The Automatic Language Translator Japanese to English


[ FORUM Issue 18, University of Queensland, pp.1-8 (December, 1993). ]



INDEX

     1 Introduction
  1.1 ALT-J/E's distinguishing features
  1.2 The translation process
  1.3 Pre- and post-editing
2 Basic Functions
  2.1 Correct translation of verbs
  2.2 Correct translation of nouns
  2.3 Translation of the particle
  2.4 When is an idiomatic expression idiomatic?
  2.5 Context processing: Supplementation
  2.6 Automatic rewriting within the source language
3. Japanese Analysis and Transfer
  3.1 Translation of the particle
  3.2 Translation of the particle
  3.3 Apposition
  3.4 Automatic rewriting within the source language
  3.5 Context processing: Supplementation, ellipsis and pronominalisation
4. English Generation
  4.1 Generating number
  4.2 Countability
  4.3 Use of possessive nouns as determiners
  4.4 Adverb positioning
  4.5 Word order in addresses
  References



1 Introduction

ALT-J/E, the Automatic Language Translator-Japanese to English, is an experimental machine translation system being developed at the NTT Network Information Systems Laboratories. The research is aimed at 'communication with translation'. The ultimate goal is speech-to-speech translation over the telephone. In the short term we hope to use the system for translating messages sent over electronic networks. Both these goals require high-quality machine translation with no pre-editing. All of the examples below are of unedited output.




1.1 ALT-J/E's distinguishing features

Precise semantic dictionaries: For semantic analysis ALT-J/E uses a semantic attribute system in which concepts are separated into some 3,000 categories, combining both is-a and has-a relationships. These categories are used to make semantic word dictionaries (400,000 words) and semantic structure dictionaries (15,000 patterns).

Appropriate selection of translations:- The semantic dictionaries allow ALT-J/E to use the relationships between words so as to make fine distinctions between shades of meanings--for example, to choose which of many possible translations should be selected. In addition, ALT-J/E is able to decide whether an idiomatic expression is being used literally or not.

Automatic rewriting: Complicated Japanese expressions are automatically rewritten into more easily translated ones, mimicking human pre-editing. This improves processing efficiency by reducing syntactic and semantic ambiguities.

Context processing: In addition to the translation of single sentences, ALT-J/E translates at the paragraph level. This means that it can supplement omitted elements across sentences. This is important in Japanese-English machine translation because Japanese often omits a sentence's subject where it is needed in English.

English generation: To correctly generate number and countability for noun phrases and appropriately position adverbs, ALT-J/E uses a combination of detailed lexical distinctions and semantically motivated rules.




1.2 The translation process

The process of translation can be divided into seven parts. First, ALT-J/E splits the Japanese text into morphemes. Second, it analyses the sentence syntactically, often giving multiple possible interpretations. Next, it rewrites complicated Japanese expressions. Fourth, ALT-J/E semantically evaluates the various interpretations. Fifth, syntactic and semantic criteria are used to select the best interpretation. Sixth, the selected information is transferred into English.

Finally, the English sentence is adjusted to give the correct inflectional forms.

ALT-J/E runs on a VAX 6620 and is written mainly in LISP. As it is still an experimental system it has not been optimised for speed. Currently the translation speed is around four Japanese characters per second.




1.3 Pre- and post-editing

Currently, to produce a high-quality translation using ALT-J/E, both pre- and post-editing are needed. How much pre- and post-editing is necessary depends on what you are translating. Because the ultimate aim of ALT-J/E is to translate with no editing at all, there has not been a great deal of research done on how much is needed. The current 'best' the system can do was tested by translating newspaper articles. By fine tuning translation rules and dictionaries, 75-85% of sentences can be translated so that they can be understood without access to the source text, and 20-30% translated so that they need no further revision. This implies that to translate all of the sentences so that they can be understood without access to the source would require pre-editing at least 15-25% of the sentences. A further 70-80% of the sentences need to be post-edited to get human quality output (although there should be some improvement from the pre-editing). For text written more simply than newspaper articles, correspondingly less editing is required.




2 Basic Functions




2.1 Correct translation of verbs

(1) , , .
The president employs a laborer and a laborer uses a machine but I spend money.

The verb is used three times in the Japanese sentence. It should be translated differently depending on what is used: "use PEOPLE" "employ"; "use MONEY or TIME" "spend"; the default translation is simply "use".




2.2 Correct translation of nouns

(2) , .
An elephant has a long trunk but a pig has a short snout.

In Japanese is used for both elephants and pigs, but in English it is better to use "trunk" and "snout" respectively. ALT-J/E's Japanese-English transfer dictionary specifies that "elephant's nose" should be translated as "trunk" and "pig's nose" as "snout". This sentence also shows how ALT-J/E is able to generate appropriate English sentence structures. A literal translation of the first clause would be "As for elephants, noses are long", but ALT-J/E recognises that the noses belong to the elephants, and so can correctly translate the clause.




2.3 Translation of the particle

(3) , .
I went to Osaka and Kyoto and he went to Kyoto with the president.

Both clauses share the surface structure ABC. ALT-J/E uses the semantic knowledge stored in its dictionaries to analyse the first clause into A(BC), as B and C are both PLACE NAMES, and the second into (AB)C, as both A and B are PEOPLE.




2.4 When is an idiomatic expression idiomatic?

(4) , .
I found his weak point and he grasped a cat's tail.

The expression "to grasp someone's tail" in Japanese is used idiomatically to mean "to find someone's weak point". A good translation system must be able to decide whether a phrase is being used idiomatically or not. In this case ALT-J/E's idiom dictionary rules that only AGENTS (PEOPLE or ORGANISATIONS) can have weak points, so in the case of a cat it is judged that the Japanese should be translated literally.




2.5 Context processing: Supplementation

(5)NTT.
NTT will introduce a new model exchange.

(6) , 20.
The new model exchange is equipped with a self checking function and NTT is planning to install 20 systems.

This example shows how ALT-J/E uses the meanings of nouns and verbs to consider relationships between sentences within a paragraph. Noun phrases that are omitted in Japanese and automatically supplied by ALT-J/E are shown underlined in the translation. The example is taken from a newspaper article.

ALT-J/E processes these two sentences as belonging to the same paragraph. Both the subject and object of the first sentence, "NTT" and "new model exchange", are stored as candidates for supplementation in the following sentences. ALT-J/E looks at the meaning of the verb "introduce" and judges that the thing introduced ("new model exchange") is more likely to be referred to in the following sentences.

In the second sentence two subjects are omitted. In the first clause, "new model exchange" is used as the subject, as it was marked as being the best of the two stored candidates. In the second clause, ALT-J/E judges that an ORGANISATION is more likely to plan than a MACHINE, so "NTT" is supplemented. Note that when a common noun is supplemented it takes the definite article, as it has been referred to before in the text.




2.6 Automatic rewriting within the source language

(7) , .
I went to school by train and he walked to school along a river.

(7l) I get on to a train and go to school and it went along a river and he walked and went to school.

In this example the original Japanese sentence has five verbs. Translation of the sentence as it stands gives the unnatural translation shown in (7l). ALT-J/E makes this easier to translate into natural English by rewriting the complicated Japanese expressions to more easily translated ones. For example, "I get on to a train" is rewritten to "by train" before being translated. The verb phrase becomes an adverbial phrase. Similarly, ALT-J/E reduces the number of verbs in the second clause by rewriting "go along" into pseudo-Japanese as a 'pseudo-particle' which will be translated as "along". The third rule shown here rewrites [...] "walk and go" to just "walk". The final result is a natural-sounding translation that uses only two verbs.




3. Japanese Analysis and Transfer




3.1 Translation of the particle

(8) .
One cloud on the tower of Ginkakuji Temple in Kyoto.

This example shows some of the ways that the particle can be translated. The most basic translation is "of". It is often omitted in translation-for example, when it links a classifier as in the case of "one of cloud", which is translated as "one cloud". Together with the noun "top" it is translated as "on". It can also be translated as "in" when it is used to join two locations.

(15) .
A group of hunters chased a pack of wolves that were chasing a herd of cattle.




3.2 Translation of the particle

(9) , .
Until I become famous for a movie made with a computer, I had often developed fever from overwork in this studio.

This example shows four different ways that the particle , generally translated as "by", can be translated. Three of the examples use the semantic structure dictionary, which is based on verbs. For example, the verb has the following pattern: N1N2N3--"N1 makes N2 with N3". Therefore is translated as "with". The fourth example, "in this studio", uses the fact that a studio is a place to translate as "in".




3.3 Apposition

(10) , N&C .
N & C Software Corp., a software company, developed Atelier Bit, a color printing system using a personal computer, jointly with Unicorn Automation Corp., a systems house.

In this example there are three kinds of apposition in Japanese-- A, B; AB; and AB. They are all translated into the same pattern in English--"B, an A", which ALT-J/E uses as the standard English form.




3.4 Automatic rewriting within the source language

(11) NTT
NTT will set up a showroom in the second floor and will set up meeting, conference and seminar rooms in the third floor.

In this sentence the verb is omitted from the first clause. This makes the sentence difficult to translate, so it is rewritten in Japanese with the verb supplemented. This can then be translated straightforwardly. Note also how in the English generation stage the compound noun phrase "meeting room and conference room and seminar room" is generated as the more compact "meeting, conference, and seminar rooms".




3.5 Context processing: Supplementation, ellipsis and pronominalisation

(12) NTT
NTT is selling a new model telephone set in business offices throughout the country.

(13)
The new model telephone set is equipped with an answering machine and is called Howdy Reponse.

(14)
It is popular with young people.

This example shows three of the abilities of ALT-J/E's context processing: supplementation, ellipsis and pronominalisation. The first sentence has nothing omitted, and so can be translated as is. The second sentence omits the subject in both clauses. When translating this sentence ALT-J/E examines the candidates from the first sentence then selects "new model telephone set" to become the first clause's subject. The noun phrase is made definite, as it is the same telephone set as referred to earlier. In the second clause the subject is judged to be the same as the first, and so is omitted in the English translation as it is obvious from the context. The phrase "new model telephone set" is again selected to supplement the subject in the final sentence. In this case, however, as it was also the subject of the last sentence, it can be unambiguously replaced by a pronoun, in this case "it".




4. English Generation




4.1 Generating number

(16)
That university recruits students from high schools throughout the country.

In Japanese there is no way of telling whether a particular noun is singular or plural by only looldng at the noun itself. In English, on the other hand, countable nouns must be either singular or plural. Sentences (15) and (16) give examples of four methods that can be used to decide whether to translate a noun as singular or plural.

Two methods are shown in the noun phrase "a group of hunters". First, "group" is translated as singular by default, as there is no specific information about it. A group must, however, consist of more than one member, so there must be more than one hunter, and thus "hunter" is translated as plural. The same arguments are used to translate "a pack of wolves" and "a herd of cattle". Note that for "cattle" the collective noun "cattle" is used instead of the plural "cows" to reflect the common usage.

In sentence (16), "university" is translated as singular by default. Because the meaning of the verb "recruit" is "to get or seek recruits", this implies that more than one students is being recruited, so "students" is translated as plural. Finally, "high school" is translated as plural because it is modified by "throughout the country", and a single high school cannot be spread throughout the country.

Thus four methods have been used to generate number: by default, according to the meaning of the noun phrase the noun is embedded in, according to the meaning of the verb phrase the noun is embedded in , and according to the meaning of the elements modifying the noun phrase itself.




4.2 Countability

(17)
While I was checking each computer, my son checked each piece of software.

This example shows how countable and uncountable nouns are treated differently in English. Only countable nouns can be directly modified by denumerators such as "each" and "every". Uncountable nouns have to be individuated by the use of a classifier such as "piece". ALT-J/E's dictionaries store classifiers for uncountable nouns, enabling the system to correctly generate "each piece of software".




4.3 Use of possessive nouns as determiners

(18)
He made his son an engineer and his daughter a doctor.

In English things that are closely related to people, such as parts of the body and relatives, are normally modified by possessive pronouns. This is not the case in Japanese, where the listener is expected to deduce these relationships from the context. In the absence of more specific information, ALT-J/E handles this by using the subject of the sentence to generate the possessive pronoun. Thus in sentence (17) is translated as "my son", while in sentence (18) it is translated as "his son" and is translated as "his daughter".




4.4 Adverb positioning

(19)
I only told a joke but, unexpectedly, he completely believed it and was very glad.

The four adverbs used in the above sentence fill three different roles, and these affect where they are placed in the sentence. The adverb "only" is used to restrict the focus of the following phrase, "unexpectedly" shows the speaker's judgment of the following sentence, while "completely" and "very" amplify the phrases they modify. ALT-J/E distinguishes between 45 different types of adverb.




4.5 Word order in addresses

(20) 1-2356. Our laboratory is located at 1-2356 Take, Yokosuka-shi, Kanagawa-ken.

Japanese addresses are written from the most general to the most specific--the opposite of English, where they are written from the most specific to the most general. In order to translate addresses correctly, ALT-J/E must first recognise that the compound noun is an address. This is done by recognising that its elements are place names followed by some numbers. The address can then be reordered correctly in the transfer stage.

This sentence also gives an example of how the translation of verbs depends on their context. Because ALT-J/E recognises that the address is indeed an address, it is able to translate "is in" as "is located at".




References

1.
F. Bond and K. Ogura. "Determination of Whether an English Noun Phrase is Countable or not Using 6 Levels of Lexical Count-ability." (in Japanese). Transactions of the Information Processing Society of Japan, 1993.

2.
F. Bond, K. Ogura, S. Ikehara, and S. Shirai. "Using the Meanings of Verbs to Select the Countability of English Noun Phrases." Proceedings of the 1993 IEICE Fall Conference, 1993.

3.
S. Ikehara et al. "An Evaluation Method for MT Systems and Its Application to ALT-J/E." Journal of Japanese Society for Artificial Intelligence, vol. 7, no. 6, 1992.

4.
S. Ikehara, M. Miyazaki, S. Shirai, and A Yokoo. "An Approach to Machine Translation Method based on Constructive Process Theory." REVIEW of the ECL, vol. 37, no. 1, pp. 39-44, 1989.

5.
S. Ikehara, S. Shirai, A Yokoo, and H. Nakaiwa. "Toward an MT System without Pre-Editing-Effects of New Methods in ALT-J/E." Proceedings of MT Summit-III, pp. 101-106, 1989.

6.
H. Nakaiwa and S. Ikehara. "Zero Pronoun Resolution in a Japanese to English Machine Translation System using Verbal Semantic Attributes." Proceedings of the 3rd Conference on Applied Natural Language Processing, April 1992.

7.
S. Shirai, S. Ikehara, and T. Kawaoka. "Effects of Automatic Rewriting of Source Language Within a Japanese to English MT System." Proceedings of the Fifth International Conference on Theoretical and Methodological Issues in Machine Translation, 1993.

Since graduating from the University of Queensland, Francis Bond has been working as a research engineer at the NTT Network Information Systems Laboratories. Readers with questions or comments can contact Francis at the following address:

Knowledge Systems Laboratory Email: <bond@nttkb.ntt.jp>
NTT Network Information Systems Laboratories Tel: 0468-59-8272 (Japan +81)
1-2356 Take, Yokosuka-shi, Fax: 0468-59-3633 (Japan +81)
Kanagawa-ken, JAPAN, 238-03