Fulvaz PlayGroud

Machine Learning | Week01 & 02

Week 01


supervisor learning

are given a data set and already know what our correct output should look like
unsupervise leanring

regression problem

predict result with continuous output

classification problem

predict in discrete categories

unsupervisor learning

我们有一堆数据,但是我们自己也不知道会产生什么结果,我们用算法produce结果

关键: clustering
收集相关的数据

Linear Regression with One Variable

The Hypothesis Function

Cost Function

描述hypothesis function准确度的函数
(https://leanote.com/api/file/getImage?fileId=5667cab8ab6441601e0014be)

这函数叫"Squared error function", or Mean squared error

1/2的原因是方便这个函数后面用梯度下降法(computation of the gradient descent), 因为平方的倒数项的导数的常数刚好和1/2约掉(as the derivative term of the square function will cancel out the 12 term.) <- 学英文

taking the derivative 求导

Gradient Descent

way to automatically improve our hypothesis function
repeat until convergence:
(https://leanote.com/api/file/getImage?fileId=5667cab8ab6441601e0014ba)

:= 的意思是assignment, 即编程里面的=, 因为数学不能直接使用=

没有必要修改alpha(learning rate), 取导的时候会,整项会自动变小

Gradient Descent for Linear Regression

经过一番不为人知的推导,得出来优化过的公式
(https://leanote.com/api/file/getImage?fileId=5667cab8ab6441601e0014c4)

推导过程在这里,但是我没勇气去看

线性代数复习

Notation and terms:

  • Aij refers to the element in the ith row and jth column of matrix A.

  • A vector with ‘n’ rows is referred to as an ‘n’-dimensional vector

  • vi refers to the element in the ith row of the vector.

  • In general, all our vectors and matrices will be 1-indexed.

  • Matrices are usually denoted by uppercase names while vectors are lowercase.

  • “Scalar” means that an object is a single value, not a vector or matrix.

  • R refers to the set of scalar real numbers

  • Rn refers to the set of n-dimensional vectors of real numbers

向量与矩阵

小写表示向量Vector
大写表示矩阵matrix

矩阵与代数

看图
(https://leanote.com/api/file/getImage?fileId=566839fcab6441616d001916)

Identity matrix

对角全是1,其他0..中文叫啥来着
AI = IA = A

inverse
-
A^-1
only square matrix has inverse matrix

A * A^-1 = I

week02 Multivariate Linear Regression


hypothesis: n特征,n个参数

Gradient Descent for Multiple Variables

Feature Scaling

如果多个特征中,每个特征的样本之间量级差别特别大,其收敛速度会非常慢

考虑归一化,使converge收敛更快
(https://leanote.com/api/file/getImage?fileId=566839fcab6441616d001912)

尽量使x在[-1, 1], 当然是个经验值

###Mean normalization
(https://leanote.com/api/file/getImage?fileId=566839fcab6441616d001917)

Learning Rate

if not work, use smaller alpha

  • 足够小的alpha,J(theta)会在每次迭代中变小
  • 太小的alpha会收敛很慢

###如何找到alpha
尝试
….0.001, …0.01,… 0.1, ….1

Features and Polynomial Regression

选择正确的model(feature)

比如房子, w*d(size)一个特征, 比两个w, d要好

还要hypothesis的选择
比如说房价, x^3…比x^2会好,因为x^2上升到一定程度房价反而下降了,这是不可能的是吧

要根据实际情况选择

值得一提的是,如果多项式次数比较多,归一化会非常重要

Normal Equation

求J方法
1.求导
但如果theta不是实数就跪在了地上

2.矩阵求θ
$θ=(X^TX)^{-1}X^T$

ps:如果你用normal equation, feature scaling就看可以不加了

pro&cons:
(https://leanote.com/api/file/getImage?fileId=566942e1ab64413e67000579)

###what if $XX^T$ is non-invertible?
原因:

  1. 有冗余数据
  2. feature太多

Multivariate linear regression

算θ的通用公式
(https://leanote.com/api/file/getImage?fileId=566e3d76ab64416467000efc)

Week02 Octave tutoria

load
save a.mat a

A(3, 2)
A(2, :) %取出第二行的内容
A(:, 2)
A([1 3], :) %取第1,3行内容
A = [A, [100; 101; 102]] %右边添加一列
A(:) %全部数据放一行
C = [A; B] %A下面B
C = [A B] %A右边B
eye identity matrix

ComputingData

A B %矩阵相乘
A .
B %分别相乘 .是元素级别的操作
abs(A)
v+ ones(length(v), 1)
A%tanspose (A)` %tanspose and transpose
max(a)
a < 3
find(a < 3)
A = magic(3) %generate a 3*3 which have the same resule
sum(a) %将a全部元素加起来
prod
floor
ceil
sum(A, 1) ssu up eacch col
sum(A, 2) sum up each row
pinv %inverse a matrix

ploting data

plot
hold on %plot in the same window
xlabel
ylabel
legend
title
print -dpng \’ex.png\’
figure(1); plot %plot in different window
subplot(1, 2, 1) %create 1*2 grid and focus on the first elememt, use subplot again to draw the following plot
axis([0.5 1 -1 1])
imageesc(A) % eh….
colorbar
colormap gray

Control statement

1
2
3
4
%for loop
for i=1:10,
end;

while

1
2
while i <=5
end

disp % == print!

###define your own function
cd to the path of source code

1
2
3
4
% source.m
function [y1, y2] = name(params)
y1 = x^2
y2 = x^3

Vectorization

矩阵运算比for loop快得多,而且代码量少很多 (代码少,bug也就少

###example
(https://leanote.com/api/file/getImage?fileId=5670404fab64416467001cdf)

###gredient descent
(https://leanote.com/api/file/getImage?fileId=5670404fab64416467001cdc)
其实就是将求导运算转换成了矩阵运算