Linear regression is a good problem to begin our machine learning career. Not only because of its easiest form and logic but also it’s a good basis from which we can extend to other more complicated methods like nonlinear regression and kernel functions. Linear regression has been used for more than 200 years and it is still a good model to solve new problems we might come across nowadays.
Its form is:
where the vectors x , the number y are input and output respectively, w, b are the parameter vectors of the model.
From a general view to a machine learning problem, the parameters can be viewed as two different objects:
An unknown constant
A random variable
Thie first opinion comes from the frequentist statistics, while the second one is the basis of Bayesian statistician.
What we do in this post is estimating the parameters by least-squares methods. And the data we used here is the weights of a newborn baby by days from WHO:
View of algebra
Our task is predicting the weights of a newborn baby on a certain day after his birth. From equation (1), (2) and the task, we can get the error of the ith training point:
where yi is the target according to the ith input xi from the training set. In our task, the output and target are a real number.
Then the total error(Notation: loss function is a function of one pair of input and target), sum of squares of the whole training set become:
Our mission now is to minimize equation (4). Be careful: in the training phase, the unknown variables in equation (4) are w and b. Because it is a quadric function, so there exists one and only one minimum point. And the necessary condition of the minimum points is:
Its gradient should be equal to 0
Assuming we have N training points in the sample, so we can calculate its gradient [∂b∂etotal∂w∂etotal]:
to solve these complex equations, we use a little trick here. Summations of xi , yi, xi2 and xiyi is the obstacle in front of us. We prefer multiplications to summations in equations. So we would like to substitute the summation with multiplications. We have
then we substitute equation (8) and (9) into equation (7) we get:
for the first part in equation (7). And substitute equation (10) to the second part in equation (7):
import pandas as pds import numpy as np import matplotlib.pyplot as plt
data_file = pds.read_csv('./data/babys_weights_by_months.csv') data_x = np.array(data_file['day']) data_y_male = np.array(data_file['male']) data_y_female = np.array(data_file['female']) # calculate mean of x and y data_x_bar = np.mean(data_x) data_y_male_bar = np.mean(data_y_male)
# calculate w using equation 11 sum_1 = 0 sum_2 = 0 for i inrange(len(data_x)): sum_1 += data_x[i]*(data_y_male[i]-data_y_male_bar) sum_2 += data_x[i]*(data_x[i]-data_x_bar) w = sum_1/sum_2 # calculate b using equation 10 b = data_y_male_bar - w* data_x_bar # plot the line day_0 = data_x day_end = data_x[-1] days = np.array([day_0,day_end]) plt.plot(days,days*w+b, c='r') plt.scatter(data_file['day'], data_file['male'], c='r', label='male', alpha=0.5) plt.xlabel('days') plt.ylabel('weight(kg)') plt.legend() plt.show()
And its plot is like:
whose weight is 0.018442166 and the intercept is 4.160650577.
View of Geometric
To such a simple example with just two parameters above, the calculation of parameter could mess us up. However, the practical task always has more parameters, say hundreds or even thousand parameters. It seems impossible for us to solve that.
Now let’s review the linear relation in equation (2) and when we have a training sample consisted of m points :
and they are under the same linear relation. Then they can be combined as:
We use a simplified equation to represent the relation in equation 18:
From the linear algebra points, equation 19 represents that y is in the column space of X. This is corresponding to the phenomena that all the points of the set (16) stand in a line. When the points are not in a line, the equation (19) does not hold and what we need to do is find a vector y^ in the column space which is the closest one to the vector y:
And as we have known, the projection of y to the column space of X has the shortest distance to y
Our mission now is to find w to make:
where y^ is the projection of y in the column space of X.
According to the projection equation in linear algebra:
Then substitute equation (21) into equation (22) and assuming (XTX)−1 exists:
deffit(self, x, y): x = np.array(x).reshape(-1, 1) # add a column which is all 1s to calculate bias of linear function x = np.c_[np.ones(x.size).reshape(-1, 1), x] y = np.array(y).reshape(-1, 1) if self.method == 'OLS': w = np.linalg.inv(x.transpose().dot(x)).dot(x.transpose()).dot(y) b = w w = w return w, b