Self-weighted Multiple Kernel Learning for Graph-based Clustering and Semi-supervised Classification
The paper proposes a novel multiple kernel learning framework for clustering and semi-supervised learning.
Input: Kernel matrices $ {H^i}_{i=1}^r $, parameters $ α, β, γ $
Initialize: Random matrix $ S, K =\sum_i H^i / r $R
Repeat
(1) For each i, update the i-th column of \(S\) by $ S_i = (\gamma I + K) ^{-1} (K_{i,:} - \frac{\alpha G_i}{4} ) $
(2) Calculate \(K\) by $ K = \frac{2S^T - SS^T - I +2\beta \sum_i \omega_i H^i}{2\beta \sum_i \omega_i} $
(3) Update \(\omega\) by (9) $ \omega_i = \frac{1}{2||H^i - K||F} $
(4) For clustering, calculte \(P\) as the \(c\) smallest eigenvectors of \(L = D ? S\) correspond to the \(c\) smallest eigenvalues. For SSL, calculte \(P\) according to $ P_u = - L{uu}^{-1} L_{ul} Y_l $
Until stopping criterion is met.
Self-weighted Multiple Kernel Learning
? $ s.t. P^T P = I, S \geq 0 $
$K $: rarget kernel matrix
$ S $: similarity graph matrix
$ \omega_i $: optimal weight
$ H^i $: the i-th kernel matric, input
$ \alpha, \beta,\gamma $: input parameters
$ P$: cluster indicator matrix or label matrix
Fix P, K,optimize S
Use $ \sum_{ij} 1/2 ||P_{i,:} - P_{j,:} ||2^2 s{ij} = T_r(P^TLP)$ and set the first derivative of \(S\) to be zero, obtain:
$ S_i = (\gamma I + K) ^{-1} (K_{i,:} - \frac{\alpha G_i}{4} ) $
Fix S,P,optimize K
? $ K = \frac{2S^T - SS^T - I +2\beta \sum_i \omega_i H^i}{2\beta \sum_i \omega_i} $
? $ \min \limits_P T_r(P^TLP), s.t. P^TP = I $
Unify the two fundamental components of SSL(Graph construction and label inference) into a unified framework. Reformulate SMKL for semi-supervised classification as:
$ \min \limits_{S,P,K}T_r(K-2KS + S^TKS) + \gamma ||S||_F^2 + \alpha T_r(P^TLP) + \beta \sum_i \omega_i||H^i - K||_F^2 $
$ s.t. S \geq 0, P_l = Y_l $
$ Y_l = [y_1,y_2,...,y_l]^T $ : label matrix
$ y_i \in R^{c x 1} $ :one-hot,$y_{ij} = 1 $ means the i-th sample belong to label j 。
$ P = [P_l ; P_u] = [Y_l; P_u] $ $ P_u $ : unlabeled
$ P_u = - L_{uu}^{-1} L_{ul} Y_l $
Predict: $ y_i = arg \max \limits_j P_{ij}$
Existing parameter-weighted multiple kernel learning method suffers some problems.
原文:https://www.cnblogs.com/chensy86/p/13623621.html