首页 > 其他 > 详细

COMP9313 Week8 Classification and PySpark MLlib

时间:2020-07-20 15:03:13      阅读:84      评论:0      收藏:0      [点我收藏+]

https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es

https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L7.pdf

 

Machine Learning :

  1. Construct a model, predicting new data

  2. 

 

Evaluation Matrix:

  Positive/Negative: Label ∈{a,b,c,d} 选择a为positive,则其他都是negative

  False Positive:  not a but classified as a

  False Negative: a but classified as b or c or d

  True Positive : a and classified as a

  

  Precision = tp / tp+fp

  Recall = tp / tp+fn

  F1 = 2 * precision*recall / ( precision + recall)

 

  Micro:   True label 是 positive  

  Macro:  mean of F1 of each class label

 

Classification:

  1. Preprocessing and Feature Engineering 

    1) bag of words 

    2) 去高频词

    技术分享图片

 

 

  2.  Train classifier

  3. Evaluate the classifier

    1) split a ‘development set‘ from the training set 

    2)   k-fold cross-validation, 然后取 avg(accuracy)

      技术分享图片

 

 

Text Classification:

  1. Input •Document or sentence

  2.  •Output •Class label C ∈ {c1, c2, … }

  3. Classification methods:

     •Naïve bayes

    •Logistic regression

    •Support-vector machines •…

  4. Naïve Bayes

    1) bag of words -> features变成d维向量,label为c

    2) 最大后验概率

    3)假设条件独立。假设位置无关

    技术分享图片

 

 

    技术分享图片

  技术分享图片

  技术分享图片

 

   技术分享图片

 

 

 

PySpark MLlib:

   

COMP9313 Week8 Classification and PySpark MLlib

原文:https://www.cnblogs.com/ChevisZhang/p/13344371.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!