| Author Name | Affiliation | | PEI Bing-zhen | Dept.of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030, China, peibzgz@163.com College of Computer Science and Technology, Guizhou University, Guiyang 550025, China | | CHEN Xiao-rong | College of Computer Science and Technology, Guizhou University, Guiyang 550025, China | | HU Yi | Dept.of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030, China, peibzgz@163.com | | LU Ru-zhan | Dept.of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030, China, peibzgz@163.com |
|
| Abstract: |
| This article proposes a new general, highly efficient algorithm for extracting domain terminologies. This domain-independent algorithm with multi-layers of filters is a hybrid of statistic-oriented and rule-oriented methods. Utilizing the features of domain terminologies and the characteristics that are unique to Chinese, this algorithm extracts domain terminologies by generating multi-word unit (MWU) candidates at first and then filtering the candidates through multi-strategies. Our test results show that this algorithm is feasible and effective. |
| Key words: domain terminology multi-word unit (MWU) automatic extract filter |
| DOI:10.11916/j.issn.1005-9113.2009.02.029 |
| Clc Number:TP391 |
| Fund: |