经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 程序设计 » MATLAB » 查看文章
MATLAB聚类有效性评价指标(外部)
来源:cnblogs  作者:凯鲁嘎吉  时间:2019/6/12 12:08:05  对本文有异议

MATLAB聚类有效性评价指标(外部)

作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/

更多内容,请看标签:MATLAB聚类

前提:数据的真实标签已知!

1. 归一化互信息(Normalized Mutual information)

定义

 

 

程序

  1. function MIhat = nmi(A, B)
  2. %NMI Normalized mutual information
  3. % A, B: 1*N;
  4. if length(A) ~= length(B)
  5. error('length( A ) must == length( B)');
  6. end
  7. N = length(A);
  8. A_id = unique(A);
  9. K_A = length(A_id);
  10. B_id = unique(B);
  11. K_B = length(B_id);
  12. % Mutual information
  13. A_occur = double (repmat( A, K_A, 1) == repmat( A_id', 1, N ));
  14. B_occur = double (repmat( B, K_B, 1) == repmat( B_id', 1, N ));
  15. AB_occur = A_occur * B_occur';
  16. P_A= sum(A_occur') / N;
  17. P_B = sum(B_occur') / N;
  18. P_AB = AB_occur / N;
  19. MImatrix = P_AB .* log(P_AB ./(P_A' * P_B)+eps);
  20. MI = sum(MImatrix(:));
  21. % Entropies
  22. H_A = -sum(P_A .* log(P_A + eps),2);
  23. H_B= -sum(P_B .* log(P_B + eps),2);
  24. %Normalized Mutual information
  25. MIhat = MI / sqrt(H_A*H_B); 

结果

  1. >> A = [1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3];
  2. >> B = [1 2 1 1 1 1 1 2 2 2 2 3 1 1 3 3 3];
  3. >> MIhat = nmi(A, B)
  4.  
  5. MIhat =
  6.  
  7. 0.3646

2. Rand统计量(Rand index)

定义

程序

  1. function [AR,RI,MI,HI]=RandIndex(c1,c2)
  2. %RANDINDEX - calculates Rand Indices to compare two partitions
  3. % ARI=RANDINDEX(c1,c2), where c1,c2 are vectors listing the
  4. % class membership, returns the "Hubert & Arabie adjusted Rand index".
  5. % [AR,RI,MI,HI]=RANDINDEX(c1,c2) returns the adjusted Rand index,
  6. % the unadjusted Rand index, "Mirkin's" index and "Hubert's" index.
  7.  
  8. if nargin < 2 || min(size(c1)) > 1 || min(size(c2)) > 1
  9. error('RandIndex: Requires two vector arguments')
  10. return
  11. end
  12.  
  13. C=Contingency(c1,c2); %form contingency matrix
  14.  
  15. n=sum(sum(C));
  16. nis=sum(sum(C,2).^2); %sum of squares of sums of rows
  17. njs=sum(sum(C,1).^2); %sum of squares of sums of columns
  18.  
  19. t1=nchoosek(n,2); %total number of pairs of entities
  20. t2=sum(sum(C.^2)); %sum over rows & columnns of nij^2
  21. t3=.5*(nis+njs);
  22.  
  23. %Expected index (for adjustment)
  24. nc=(n*(n^2+1)-(n+1)*nis-(n+1)*njs+2*(nis*njs)/n)/(2*(n-1));
  25.  
  26. A=t1+t2-t3; %no. agreements
  27. D= -t2+t3; %no. disagreements
  28.  
  29. if t1==nc
  30. AR=0; %avoid division by zero; if k=1, define Rand = 0
  31. else
  32. AR=(A-nc)/(t1-nc); %adjusted Rand - Hubert & Arabie 1985
  33. end
  34.  
  35. RI=A/t1; %Rand 1971 %Probability of agreement
  36. MI=D/t1; %Mirkin 1970 %p(disagreement)
  37. HI=(A-D)/t1; %Hubert 1977 %p(agree)-p(disagree)
  38.  
  39. function Cont=Contingency(Mem1,Mem2)
  40.  
  41. if nargin < 2 || min(size(Mem1)) > 1 || min(size(Mem2)) > 1
  42. error('Contingency: Requires two vector arguments')
  43. return
  44. end
  45.  
  46. Cont=zeros(max(Mem1),max(Mem2));
  47.  
  48. for i = 1:length(Mem1)
  49. Cont(Mem1(i),Mem2(i))=Cont(Mem1(i),Mem2(i))+1;
  50. end

程序中包含了四种聚类度量方法:Adjusted Rand index、Rand index、Mirkin index、Hubert index。

结果

  1. >> A = [1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3];
  2. >> B = [1 2 1 1 1 1 1 2 2 2 2 3 1 1 3 3 3];
  3. >> [AR,RI,MI,HI]=RandIndex(A,B)
  4.  
  5. AR =
  6.  
  7. 0.2429
  8.  
  9.  
  10. RI =
  11.  
  12. 0.6765
  13.  
  14.  
  15. MI =
  16.  
  17. 0.3235
  18.  
  19.  
  20. HI =
  21.  
  22. 0.3529

3. 参考文献

(simple) Tool for estimating the number of clusters

Mutual information and Normalized Mutual information 互信息和标准化互信息

Evaluation of clustering

 

原文链接:http://www.cnblogs.com/kailugaji/p/11003974.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号