kmeans是经典的聚类算法,newlisp提供了函数,同样分为train和query两个阶段。
kmeans的算法目的是将训练数据划分成k个类,按照一定的算法动态的选择k个中心点。下面是个例子,我添加了中文注释:
(set ‘data ‘( (6.57 4.96 11.91 0.9) (2.29 4.18 1.06 0.8) (8.63 2.51 8.11 0.7) (1.85 1.89 0.11 0.6) (7.56 7.93 5.06 0.6) (3.61 7.95 5.11 0.8) (7.18 3.46 8.7 0.3) (8.17 6.59 7.49 0.2) (5.44 5.9 5.57 0.4) (2.43 2.14 1.59 0.6) (2.48 2.26 0.19 0.8) (8.16 3.83 8.93 0.9) (8.49 5.31 7.47 0.86) (3.12 3.1 1.4 0.72) (6.77 6.04 3.76 0.12) (7.01 4.2 11.9 0.3) (6.79 8.72 8.62 0.7) (1.17 4.46 1.02 0.55) (2.11 2.14 0.85 0.72) (9.44 2.65 7.37 0.63))) (println "data length: " (length data)) (kmeans-train data 2 ‘K) ;;划分为两类,训练结果保存在context K中 (println (symbols ‘K)) ;;显示K中的symbols (println K:labels);;表示每个数据属于那个类的label (println "K:labels length: " (length K:labels)) (set ‘labeled-data (transpose (push K:labels (transpose data) -1)));;将类的label添加到list中 (println labeled-data) (exit)
同样,应该是可以支持增量式训练,只要保存context K,就可以不断的训练,因为kmeans-train函数原型最后一个参数是接收已经有的结果
./test.lsp data length: 20 (K:centroids K:clusters K:deviations K:labels) (2 1 2 1 2 2 2 2 2 1 1 2 2 1 2 2 2 1 1 2) K:labels length: 20 dean@dean-Latitude-3330:~/work/gitlab.com/sdf1/ews/code$ ./test.lsp data length: 20 (K:centroids K:clusters K:deviations K:labels) (2 1 2 1 2 2 2 2 2 1 1 2 2 1 2 2 2 1 1 2) K:labels length: 20 ((6.57 4.96 11.91 0.9 2) (2.29 4.18 1.06 0.8 1) (8.630000000000001 2.51 8.109999999999999 0.7 2) (1.85 1.89 0.11 0.6 1) (7.56 7.93 5.06 0.6 2) (3.61 7.95 5.11 0.8 2) (7.18 3.46 8.699999999999999 0.3 2) (8.17 6.59 7.49 0.2 2) (5.44 5.9 5.57 0.4 2) (2.43 2.14 1.59 0.6 1) (2.48 2.26 0.19 0.8 1) (8.16 3.83 8.93 0.9 2) (8.49 5.31 7.47 0.86 2) (3.12 3.1 1.4 0.72 1) (6.77 6.04 3.76 0.12 2) (7.01 4.2 11.9 0.3 2) (6.79 8.720000000000001 8.619999999999999 0.7 2) (1.17 4.46 1.02 0.55 1) (2.11 2.14 0.85 0.72 1) (9.44 2.65 7.37 0.63 2))
原文:http://blog.csdn.net/csfreebird/article/details/43451563