Bilingual Corpora are very useful in natural language processing. Unfortunately they are difficult to compile.
We have developed a method in which numeral valuess and proper nouns in the articles are used as keywords to align the Japanese and English newspaper articles automatically in order to develop a corpus.
In addition, we have developed a way to evaluate the results. We correctly align automatically an average of 38 out of 90 pairs of articles daily.
Aligning articles, Bilingual corpura, Database, Newspaper articles