Machine Learning

时间：2019-12-03 02:11:26 阅读：83 评论：0 收藏：0 [点我收藏+]

Machine Learning
- Linear Regression
- Logistic Regression
  - Binary Classification

Machine Learning

Linear Regression

hypothesis: \(h_\theta(x)=\sum_{i=0}^{m}\theta_ix_i\), where \(x_0=1\)
Cost Function: \(J(\theta)=\frac{1}{2}\sum_{i=1}^{n}(h_{\theta}(x)-y)^2=\frac{1}{2}(X{\theta}-Y)^T(X{\theta}-Y)\)
Two methods for minimizing \(J(\theta)\):

(1) Close-form Solution: \({\theta}=(X^{T}X)^{-1}X^{T}Y\)

(2) Gradient Descent: repeat \({\theta}:={\theta}-\alpha\frac{\partial}{\theta}J(\theta)={\theta}-\alpha\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}={\theta}-{\alpha}{X^T}(X{\theta}-Y)\)

? Normalize the data in order to accelerate gradient descent: \(x:=(x-\mu)/(max-min)\) or \(x:=(x- min)/(max-min)\)

python code for the question

(1) close-form solution:

from numpy import*;
import numpy as np;
X = [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013];
XX = [1,1,1,1,1,1,1,1,1,1,1,1,1,1];
Y = [2.000, 2.500, 2.900, 3.147, 4.515, 4.903, 5.365, 5.704, 6.853, 7.971, 8.561, 10.000, 11.280, 12.900];
xx=X;
yy=Y;
X = mat([XX,X]);
Y = mat(Y);
XT = X.T;
tmp = X*XT;
tmp=tmp.I;
theta = tmp*X;
theta = theta*Y.T;
print(theta);
theta0 = theta[0][0];
theta1 = theta[1][0];
print(2014*theta1+theta0);

(2) gradient descent:

from numpy import *;
import numpy as np;
def getsum(theta,X,Y):
    return X*(theta.T*X-Y).T;
X = [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013];
X = mat([mat(ones(1,14)),X]);
Y = [2.000, 2.500, 2.900, 3.147, 4.515, 4.903, 5.365, 5.704, 6.853, 7.971, 8.561, 10.000, 11.280, 12.900];
alpha = 0.01;
theta = mat(zeros(2,1));
X /= 2000; Y /= 12;
las = 0;
while true:
    theta -= alpha*getsum(theta,X,Y);
    if(abs(las-theta[0][0])<=1e-6): break;
    las = theta[0][0];
print(theta);
print(theta[1][0]*2014+theta[0][0]);

Logistic Regression

Binary Classification

Hypothesis:

Define \(\delta(x)=\frac{1}{1+e^{-x}}\)

\(h_{\theta}(x)=\delta({\theta}^{T}x)=\frac{1}{1+e^{-{\theta}^{T}x}}\)

Actually, \(h_{\theta}(x)\) can be seen as the probility of y to be equal to 1, that is, \(p(y=1|x,\theta)\)

Cost Function:

\[cost(h_{\theta}(x),y)=\begin{cases}-ln(h_{\theta}(x)),\space\space\space y=1\\-ln(1-h_{\theta}(x)), \space\space\space y=0\\\end{cases}=-yln(h_{\theta}(x))-(1-y)ln(1-h_{\theta}(x))\]

\[J(\theta)=\sum_{i=1}^{n}cost(h_{\theta}(x^{(i)}),y^{(i)})\]

Gradient descent to minimize \(J(\theta)\)

repeat: \[{\theta}:={\theta}-{\alpha}\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}\]

Machine Learning

原文：https://www.cnblogs.com/vege-chicken-rainstar/p/11974022.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)