Automatically Aligning Japanese & English Newspaper Articles

Takahashi,Y., Shirai,S., Fujinami,S., Ikehara,S., Ueda,H. and Matsusima,H.

NTT Communication Science Lab., Tottori Univ., NTT Advanced Technology Co.
620C 1-2356 Take Yokosuka-Shi Kanagawa.238-03 JAPAN
TEL:+81 468 59 8238/E-mail: {yamato,shirai,fujinami}@nttkb.ntt.jp


Abstract

Bilingual Corpora are very useful in natural language processing. Unfortunately they are difficult to compile.

We have developed a method in which numeral valuess and proper nouns in the articles are used as keywords to align the Japanese and English newspaper articles automatically in order to develop a corpus.

In addition, we have developed a way to evaluate the results. We correctly align automatically an average of 38 out of 90 pairs of articles daily.



key words

Aligning articles, Bilingual corpura, Database, Newspaper articles



[ Technical Report of IEICE, NLC96-17, pp.55-62 (July, 1996). ]