Discriminant Functions and Decision Boundary

Keywords: discriminant functions, decision boundary

Discriminant Function in Classification

The discriminant function or discriminant model is on the other side of ‘the generative model’. So we, here, have a look at the behave of discriminant function in linear classification.[1]

In the post ‘Least Squares Classification’, we have seen, in a linear classification task, the decision boundary is a line or hyperplane by which we tell two classes apart. And if our model is based on the decision boundary or, in other words, we partition inputs by a function and a threshold, the model is a discriminant model and the decision boundary can be formed by the function and the threshold.

Now, we are going to talk about how the decision boundaries look like in the KK-classes problem when K=2K=2 and K>2K>2. To illustrate the boundaries, we only consider the 2D(two dimensional) input vector x\boldsymbol{x} who has only two components.

Two classes

The easiest decision boundary comes from 2-dimensional input space which is partitioned into 2 classes as:

whose decision boundary is:

wTx+w0= constant (1)\boldsymbol{w}^T\boldsymbol{x}+w_0=\text{ constant }\tag{1}

This equation is equal to wTx+w0=0\boldsymbol{w}^T\boldsymbol{x}+w_0=0 because w0w_0 is also a constant, so then they can be merged. Of course, the 1-dimensional input space is easier than 2-dimensional, but its decision boundary is a point that represents nothing.

Let’s go back to the line, and some properties of the linear decision boundary are:

  1. The vector w\boldsymbol{w} always points to a certain region and is perpendicular to the line.
  2. w0w_0 decides the location of the boundary relative to the origin.
  3. The perpendicular distance rr to the line of a point x\boldsymbol{x} can be calculated by r=y(x)wr=\frac{y(\boldsymbol{x})}{||\boldsymbol{w}||} where y(x)=wTx+w0y(\boldsymbol{x})=\boldsymbol{w}^T\boldsymbol{x}+w_0

For these three properties are all basic concepts of a line, we just roughly prove the third point:

proof: We set x\boldsymbol{x}_{\perp} is the projection of x\boldsymbol{x} on the line.

We using the first point that w\boldsymbol{w} is perpendicular to the line and ww\frac{\boldsymbol{w}}{||\boldsymbol{w}||} is the union vector:

x=x+rww(2)\boldsymbol{x}=\boldsymbol{x}_{\perp}+r\frac{\boldsymbol{w}}{||\boldsymbol{w}||}\tag{2}

and we substitute equation (2) to the line function y(x)=wTx+w0y(\boldsymbol{x})=\boldsymbol{w}^T\boldsymbol{x}+w_0 :

y(x)=wT(x+rww)+w0=wTx+wTrww+w0=wTrww=rw2w(3)\begin{aligned} y(\boldsymbol{x})&=\boldsymbol{w}^T(\boldsymbol{x}_{\perp}+r\frac{\boldsymbol{w}}{||\boldsymbol{w}||})+w_0\\ &=\boldsymbol{w}^T\boldsymbol{x}_{\perp}+\boldsymbol{w}^Tr\frac{\boldsymbol{w}}{||\boldsymbol{w}||}+w_0\\ &=\boldsymbol{w}^Tr\frac{\boldsymbol{w}}{||\boldsymbol{w}||}\\ &=r\frac{||\boldsymbol{w}||^2}{||\boldsymbol{w}||}\\ \end{aligned}\tag{3}

So we have

r=y(x)w(4)r=\frac{y(\boldsymbol{x})}{||\boldsymbol{w}||}\tag{4}

Q.E.D

However, augmented vectors w=[w0w1wd]\boldsymbol{w}= \begin{bmatrix}w_0\\w_1\\ \vdots\\w_d\end{bmatrix} and x=[1x1xd]\boldsymbol{x}= \begin{bmatrix}1\\x_1\\ \vdots\\x_d\end{bmatrix} can cancel w0w_0 of the original boundary equation. So a d+1d+1-dimensional hyperplane that goes through the origin could take the place of an arbitrary dd-dimensional hyperplane.

Multiple Classes

Things changed when we consider more than 2 classes. Their boundaries become more complicated, and we have 3 different strategies for this problem intuitively:

1-versus-the-rest Classifier

This strategy needs at least K1K-1 classifiers(boundaries). Each classifier kk just decides which side belongs to class kk and the other side does not belong to kk. So when we have two boundaries, the condition is like:

where the region R4R_4 is embarrassed, based on the properties of the decision boundary, and the definition of classification in the post’From Linear Regression to Linear Classification’, region R4R_4 can not belong to C1\mathcal{C}_1 and C2\mathbb{C}_2 simultaneously.

So the first strategy can work for some regions, but there are some black whole region where the input x\boldsymbol{x} belongs to more than one class and some white whole region where the input x\boldsymbol{x} belongs no classes(region R3R_3 could be such a region)

1-versus-1 classifier

Another kind of multiple class boundary is the combination of several 1-versus-1 linear decision boundaries. Both sides of a decision boundary belong to a certain class, not like the 1-versus-rest classifier. And to a KK class task, it needs K(K1)/2K(K-1)/2 binary discriminant functions.

However, the contradiction still exists. Region R4R_4 belongs to class C1\mathcal{C}_1, C2\mathcal{C}_2, and C3\mathcal{C}_3 simultaneously.

KK Linear functions

We us a set of KK linear functions:

y1(x)=w1Tx+w10y2(x)=w2Tx+w20yK(x)=wKTx+wK0(5)\begin{aligned} y_1(\boldsymbol{x})&=\boldsymbol{w}^T_1\boldsymbol{x}+w_{10}\\ y_2(\boldsymbol{x})&=\boldsymbol{w}^T_2\boldsymbol{x}+w_{20}\\ &\vdots \\ y_K(\boldsymbol{x})&=\boldsymbol{w}^T_K\boldsymbol{x}+w_{K0}\\ \end{aligned}\tag{5}

and an input belongs to kk when yk(x)>yj(x)y_k(\boldsymbol{x})>y_j(\boldsymbol{x}) where j{1,2,,K}j\in \{1,2,\cdots,K\} and jkj\neq k. According to this definition, the decision boundary between class kk and class jj is yk(x)=yj(x)y_k(\boldsymbol{x})=y_j(\boldsymbol{x}) where k,j{1,2,,K}k,j\in\{1,2,\cdots,K\} and jkj\neq k. Then a decision hyperplane is defined as:

(wkwj)Tx+(wk0wj0)=0(6)(\boldsymbol{w}_k-\boldsymbol{w}_j)^T\boldsymbol{x}+(w_{k0}-w_{j0})=0\tag{6}

These decision boundaries separate the input spaces into KK single connect, convex regions.

proof:
choose two points in the region kk that k{1,2,,K}k\in \{1,2,\cdots,K\}. xA\boldsymbol{x}_A and xB\boldsymbol{x}_B are two points in the region. An arbitrary point on the line between xA\boldsymbol{x}_A and xB\boldsymbol{x}_B can be written as x=λxA+(1λ)xB\boldsymbol{x}'=\lambda \boldsymbol{x}_A + (1-\lambda)\boldsymbol{x}_B where 0λ10\leq\lambda\leq1. For the linearity of yk(x)y_k(\boldsymbol{x}) we have:

yk(x)=λyk(xA)+(1λ)yk(xB)(7)y_k(\boldsymbol{x}')=\lambda y_k(\boldsymbol{x}_A) + (1-\lambda)y_k(\boldsymbol{x}_B)\tag{7}

Because xA\boldsymbol{x}_A and xB\boldsymbol{x}_B belong to class kk, yk(xA)>yj(xA)y_k(\boldsymbol{x}_A)>y_j(\boldsymbol{x}_A) and yk(xB)>yj(xB)y_k(\boldsymbol{x}_B)>y_j(\boldsymbol{x}_B) where jkj\neq k. Then yk(x)>yj(x)y_k(\boldsymbol{x}')>y_j(\boldsymbol{x}') and the region of class kk is convex.

Q.E.D

The last strategy seems good. And what we should do is to estimate the parameters of the model. The most famous approaches that can be used to estimate the parameters of discriminant functions are:

  1. Least square
  2. Fisher’s linear discriminant
  3. Perceptron algorithm

References


  1. Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006. ↩︎