A Japanese Dependency Parser Based on a Decision Tree

MASAHIKO HARUNO,+ SATOSHI SHIRAl++ and YOSHIFUMI OOYAMA++

+ATR Human Information Processing Research Laboratories ++NTT Communication Science Laboratories


This paper describes a Japanese dependency parser that uses a decision tree. Japanese dependency parser generally prepares a modification matrix, each value of which represents how a phrase tends to modify the other. The parser determines the best dependency structure by totally optimizing the values in a sentence under several constraints. Therefore, our main task is to precisely evaluate the modification matrix from corpora. Conventional stochastic dependency parsers define a set of learning features and apply all of them regardless of phrase types. On the contrary, our decision tree based method automatically selects significant and enough number of features according to the phrase types. We can make use of large number of features that may have contribution to parsing accuracy. The proposed method was tested with EDR corpus and yielded significantly better (4%) performance over a conventional statistical dependency parser. In addition, we tested the following 4 properties of the system; 1. relation between parsing accuracy and pruning of decision tree, 2. relation between parsing accuracy and amount of training data, 3. relation between types of features and parsing accuracy and 4. parsing accuracy when additionally using frequent open class words and thesaurus categories. The results were 1. weak pruning yielded better performance, 2. the decision tree learning for dependency parsing required fifty thousands Japanese sentences, 3. the type of modifier and the modification distance are particularly effective for parsing accuracy and 4. open class words and thesaurus categories do not always improve the accuracy. These findings may offer the important clues to Japanese parser developments and corpus constructions in the future.



[ Transaction of Information Processing Society of Japan, Vol.39, No.12, pp.3177-3186 (December, 1998). ]