一种改进的ML-kNN多标记文档分类方法

doi:10.11918/j.issn.0367-6234.2013.11.008

首页 > 过刊浏览>2013年第45卷第11期 >45-49. DOI:10.11918/j.issn.0367-6234.2013.11.008

一种改进的ML-kNN多标记文档分类方法
DOI:
                        10.11918/j.issn.0367-6234.2013.11.008
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:(哈尔滨工业大学 计算机科学与技术学院, 150001 哈尔滨) 
作者简介:程圣军(1985—),男,博士研究生; 唐降龙(1960—),男,教授,博士生导师.
通讯作者:
中图分类号:
基金项目:国家自然科学基金资助项目(7,8); 黑龙江省自然科学基金资助项目(F201021).

An improved ML-kNN approach for multi-label text categorization

Author:

Affiliation:

(School of Computer Science and Technology, Harbin Institute of Technology, 150001 Harbin, China)

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对应用传统k近邻算法进行多标记文档分类时忽略了标记之间相关性的问题,提出了一种改进的ML-kNN多标记文档分类方法．针对文本特征的特点,采用一种基于KL散度的距离尺度来更好地描述文档相似度．根据近邻样本所属类别的统计信息,通过一种模糊最大化后验概率法则来推理未标记文档的标记集合．与ML-kNN不同的是,该方法可以有效地利用标记相关性来提升分类性能．在3个标准数据集上,5个多标记学习常用评测指标下的实验结果表明:所提方法在多标记文档分类问题上要明显优于ML-kNN、Rank-SVM和BoosTexter等主流多标记学习算法．

Abstract:

Conventional kNN algorithms ignore label correlations when being applied to multi-label text categorization. To cover this shortage, an improved Multi-label kNN approach for text categorization is proposed. A specific distance metric based on KL divergence is derived to measure the similarity between individual documents. Based on statistical information gained from the label sets of neighboring documents, a fuzzy maximum a posteriori principle is utilized to conjecture the label sets of the unlabeled documents. Different from ML-kNN, the proposed approach can exploit label correlations to improve classification performance effectively. Experiments on three benchmark datasets using 5 popular multi-label evaluation metrics suggest that the proposed approach achieves superior performance to some well-established multi-label learning algorithms, such as ML-kNN、Rank-SVM and BoosTexter.

参考文献

相似文献

引证文献

引用本文

程圣军,黄庆成,刘家锋,唐降龙.一种改进的ML-kNN多标记文档分类方法[J].哈尔滨工业大学学报,2013,45(11):45. DOI:10.11918/j. issn.0367-6234.2013.11.008

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2013-11-30
出版日期:

出版声明

期刊订阅

引用本文

分享

相关视频

文章指标

历史

文章二维码