Using Pronunciation to Automatically Extract Bilingual Word Pairs

Yoshihiro Matsuo and Satoshi Shirai

NTT Communication Science Laboratories


Abstract

In spite of the importance of building domain specific bilingual dictionaries for large scale machine translation systems, it is difficult to gather word pairs for each domain. Proper nouns are especially hard to collect since they offen do not appear in dictionaries. Fortunately, the pronunciation of proper nouns does not vary much between languages as they are normally realized as loan words.

This paper describes a method of automatic extraction of bilingual word pairs using an estimation of their pronunciation. This method can extract 62% of unknown proper nouns pairs with a precision of 98% from untagged bilingual corpora.



[ IPSJ SIG Notes, 96-NL-116-15, pp.101-106 (November, 1996). ]