Keywords: discriminant functions, decision boundary
In the post ‘Least Squares Classification’, we have seen, in a linear classification task, the decision boundary is a line or hyperplane by which we tell two classes apart. And if our model is based on the decision boundary or, in other words, we partition inputs by a function and a threshold, the model is a discriminant model and the decision boundary can be formed by the function and the threshold.
Now, we are going to talk about how the decision boundaries look like in the -classes problem when and . To illustrate the boundaries, we only consider the 2D(two dimensional) input vector who has only two components.
The easiest decision boundary comes from 2-dimensional input space which is partitioned into 2 classes as:
whose decision boundary is:
This equation is equal to because is also a constant, so then they can be merged. Of course, the 1-dimensional input space is easier than 2-dimensional, but its decision boundary is a point that represents nothing.
Let’s go back to the line, and some properties of the linear decision boundary are:
- The vector always points to a certain region and is perpendicular to the line.
- decides the location of the boundary relative to the origin.
- The perpendicular distance to the line of a point can be calculated by where
For these three properties are all basic concepts of a line, we just roughly prove the third point:
proof: We set is the projection of on the line.
We using the first point that is perpendicular to the line and is the union vector:
and we substitute equation (2) to the line function :
So we have
However, augmented vectors and can cancel of the original boundary equation. So a -dimensional hyperplane that goes through the origin could take the place of an arbitrary -dimensional hyperplane.
Things changed when we consider more than 2 classes. Their boundaries become more complicated, and we have 3 different strategies for this problem intuitively:
This strategy needs at least classifiers(boundaries). Each classifier just decides which side belongs to class and the other side does not belong to . So when we have two boundaries, the condition is like:
where the region is embarrassed, based on the properties of the decision boundary, and the definition of classification in the post’From Linear Regression to Linear Classification’, region can not belong to and simultaneously.
So the first strategy can work for some regions, but there are some black whole region where the input belongs to more than one class and some white whole region where the input belongs no classes(region could be such a region)
Another kind of multiple class boundary is the combination of several 1-versus-1 linear decision boundaries. Both sides of a decision boundary belong to a certain class, not like the 1-versus-rest classifier. And to a class task, it needs binary discriminant functions.
However, the contradiction still exists. Region belongs to class , , and simultaneously.
We us a set of linear functions:
and an input belongs to when where and . According to this definition, the decision boundary between class and class is where and . Then a decision hyperplane is defined as:
These decision boundaries separate the input spaces into single connect, convex regions.
choose two points in the region that . and are two points in the region. An arbitrary point on the line between and can be written as where . For the linearity of we have:
Because and belong to class , and where . Then and the region of class is convex.
The last strategy seems good. And what we should do is to estimate the parameters of the model. The most famous approaches that can be used to estimate the parameters of discriminant functions are:
Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. ↩︎