Keywords: linear classification
Recall Linear Regression
In the posts ‘Introduction to Linear Regression’, ‘Simple Linear Regression’ and ‘Polynomial Regression and Features-Extension of Linear Regression’, we had discussed the regression task. The goal of regression is to find out a function or hypothesis that given an input , the hypothesis can make a prediction which should be as close to the target as possible. And both the target and prediction are contiuous. They have the properties of numbers:
Considering 3 inputs , and and their coresponding targets , and . Then a good predictor should give the predictions , and where the distance between and is larger than the one between and
Some properties of linear regression or other regression tasks we should pay attention:
- The goal of regression is to produce a hypothesis that can give a prediction as close to the target as possible
- The output of the hypothesis and target are continuous numbers and have numerical meanings, like distance, order and so on.
On the other side, we met more classification tasks in our life than regression. Such as in the supermarket we can tell the apple and the orange apart easily. And we can even verify this apple is tasty or not.
Then the goal of classification is clear:
Assign input to a certain class of classes. And belongs to one and only one class.
The input , like the input of regression, can be a feature or basis function of feature and can be continuous or discrete. However, its output is different from that of regression. Let’s go back to the example that we can tell apple, orange, and pineapple apart. The difference between apple and orange and the difference between apple and pineapple can not be compared, for the distance(it is the mathematical name of difference) itself had no means.
In the mathematical models, apple and orange can not be calculated. So a usual step in classification task is mapping the target or labels of an example into a number, like for the apple and for the orange.
A binary code scheme is another way to code targets. To a two classes mission, the numerical labels can be:
It’s equal to:
and to a classes target, the binary code scheme is:
The -dimensional input and is called the input space. In the classification task, the input points can be separated by the targets, and these parts of space are called decision regions and the boundaries between decision regions are called decision boundaries or decision surfaces. When the decision boundary is linear, the task is called linear classification.
There are roughly two kinds of procedure of classification:
- Discriminant Function: assign input to a certain class directly.
- We infer firstly and then make a decision basing on the posterior probability.
- Inference of can be calculated directly
- can also be calculated by Bayesian Theorem
They are discriminate model and generative model, respectively.
In the regression problem, the output of the linear function:
is an approximation of target. But in the classification task, we want the output is the class to which the input belongs. However, the output of the linear function is always continuous. Then we wish this value can represent the probability of a given class but not the class label, say . To achieve this, function which is called ‘action function’ in machine learning and ‘link function’ in statistical learning is introduced. For example, we can choose a threshold function as the active function:
where is the threshold function. In this case, the boundary is , and so is a line. Technically, is no more linear. It is a generalized linear model. By the way, the input can be replaced by a basis function as in the polynomial regression.
Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. ↩︎