We deal with the identification of the determinative-measure compounds (DMs) in parsing Mandarin Chinese in this paper. The number of possible DMs is infinite, and cannot be listed exhaustively in a lexicon. However, the set of DMs can be described by regular expressions, and can be recognized by a finite automaton. We propose to identify DMs by regular expression before parsing as part of our morphological module. After investigating a large amount of linguistic data, we find that DMs are formed compositionally and hierarchically from simpler constituents. Based upon this fact, some grammar rules are constructed to combine determinatives and measures. In addition, a parser is formed to implement these rules. By doing so, almost all of the unlisted DMs are recognized. However, if only the DM recognition procedure is fired, many ambiguous results appear. With our word segmentation process, these ambiguities are greatly reduced.
本論文將提出剖析中文時如何處理定量式複合詞。像衍生性的複合詞一般,定量式複合詞也可不断地衍生新詞,數量龐雜無法在辭典中一一列出。因此造成斷詞或者是剖析時歧異產生。但比起其他複合詞,定量式複合詞卻較容易歸納其衍生的規則,進而使其在剖析前即已辨認出來。我們發現定量式的詞不但具有組合性同時也有階層關係,因此根據這種關係我們列出組合規則並將之應用於我們所設計的剖析系統中。結果發現,大部份的定量式複合詞皆可辨識出来,同時斷詞時產生的歧異性也大爲減低。