CV_HW5

마감일

2022/12/12

제출자

전기정보공학부 2017-12433 김현우

Theory Question

1.1 K-means Clustering (10 points)

[5 pts] Given a dataset χ={0,2,4,6,18,20}\chi = \{ 0, 2, 4, 6, 18, 20 \}χ={0,2,4,6,18,20}, initialize the kkk-means clustering algorithm with 2 cluster centers c1=3c_1=3c1​=3 and c2=4c_2=4c2​=4. What are the values of c1c_1c1​ and c2c_2c2​ after the first iteration of kkk-means? Also report the values after the second iteration.

In first iteration, we need to first find membership of each data. We can set first cluster as {0,2}\{0,2\}{0,2} which are closer to c1=3c_1 =3c1​=3 than c2=4c_2=4c2​=4, and second cluster as {4,6,18,20}\{4,6,18,20\}{4,6,18,20} which are closer to c2=4c_2=4c2​=4 than c1=3c_1=3c1​=3. Then, we can compute new cluster center using average of cluster points as below.

c_1 = (0+2)/2 = 1 \\ c_2 = (4+6+18+20)/4 = 12

Thus, we can say the new cluster center after fisrt iteration is

c_1=1, c_2=12

. Similar way, we can set first cluster as

\{0,2,4,6\}

which are closer to

c_1 =1

than

c_2=12

, and second cluster as

\{18, 20\}

which are closer to

c_2=12

than

c_1=1

. Then, we can compute new cluster center using average of cluster points as below.

c_1 = (0+2+4+6)/4 = 3 \\ c_2 = (18+20)/2 = 19

Thus, we can say the new cluster center after second iteration is

c_1=3, c_2=19

[5 pts] Given the same dataset as in (b)(b)(b), perform greedy initialization to get the initial k=3k=3k=3 centers. Start with c1=4c_1 =4c1​=4. Below is the greedy initialization process.

Choose c1c_1c1​

Choose the next center cic_ici​ to be arg max⁡x∈χ{D(x)}\argmax_{x \in \chi} \{ D(x)\}argmaxx∈χ​{D(x)}

Repeat step 2 until kkk ceneters are chosen
where at any given time, with the current set of cluster centers C\mathcal CC,

D(x) = \min_{c \in {\mathcal C}}\| x-c\|_2

For the first choose of new center, the current set of cluster ceneters

\mathcal C = \{4\}

. Thus,

D(x)

is just

D(x)=\|x-4\|_2

and the values in dataset which makes

D(x)

maximum is

x=20

(farthest point). Therefore, the new cluster center is

20

For the second choose of new center, the current set of cluster ceneters

\mathcal C = \{4, 20\}

. Thus,

D(x)=\min(\|x-4\|_2, \|x-20\|_2)

. Since

D(0) = 4, D(2)=2, D(4) = 0, D(6)=2, D(18)=2, D(20)= 0

, the maximum

D

occurs at

x=0

. Therefore, the new cluster center is

0.

Finally the greedy initialization of the cluster centers is

\{ 0, 4, 20\}

1.2 Spectral Clustering (20 points)

Let

x_1=(\frac{\sqrt 3}{2}, \frac{1}{2}), x_2=(\frac{\sqrt 3}{2}, -\frac{1}{2}), x_3=(-\frac{\sqrt 3}{2}, \frac{1}{2}), x_4=(-\frac{\sqrt 3}{2}, -\frac{1}{2})

. Assume that they form two groups;

x_1

and

x_2

are in the same group and

x_3

and

x_4

are in the other group. Your job is to perform spectral clustering with two different similarity measures.

For

(a)

and

(b)

, use the following similarity measure.

w_{i,j} = \begin{cases} 1 \quad {\rm if\ } x_i\ {\rm and}\ x_j\ {\rm are\ in\ the\ same\ group} \\ 0 \quad {\rm otherwise} \end{cases}

[4 pts] What are the values of WWW and LLL? (WWW: similarity matrix, LLL: Laplacian matrix)

Since

x_1, x_2

are in the same group and

x_3, x_4

are in the same group,

w_{1,1}, w_{1,2}, w_{2,1}, w_{2,2}

are 1 and

w_{3,3}, w_{3,4}, w_{4,3}, w_{4,4}

are 1. Otherwise, the

w_{i,j}

are 0. Thus, we can obtain the similarity matrix

W

as below.

W = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ \end{bmatrix}

Since

D

is the diagonal matrix which its each diagonal term is the sum of each row of

W

, we can compute D as below.

D = \begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 2 \end{bmatrix}

Then we can finally compute the laplacian matrix

L

D-W

as following.

L = \begin{bmatrix} 1 & -1 & 0 & 0 \\ -1 & 1 & 0 & 0 \\ 0 & 0 & 1 & -1 \\ 0 &0 & -1 & 1 \end{bmatrix}

[6 pts] Get the first and second eigenvectors of LLL. (You don’t need to show the process.) Interpret the results in terms of graph connectivity.

The laplacian matrix LLL have two eigenvectors [1,1,0,0]⊤,[0,0,1,1]⊤[1, 1, 0, 0]^\top, [0, 0, 1, 1]^\top[1,1,0,0]⊤,[0,0,1,1]⊤with eigenvalue 0 and two eigenvectors [−1,1,0,0]⊤,[0,0,−1,1]⊤[-1, 1, 0, 0]^\top, [0, 0, -1, 1]^\top[−1,1,0,0]⊤,[0,0,−1,1]⊤ with eigenvalue 2. We can choose two eigenvectors with smallest eigenvalues as [1,1,0,0]⊤,[0,0,1,1]⊤[1, 1, 0, 0]^\top, [0, 0, 1, 1]^\top[1,1,0,0]⊤,[0,0,1,1]⊤. 

Since the two smallest eigenvector values (in row) can be clearly divided as [1,0][1,0][1,0] and [0,1][0,1][0,1], we can say the graph is disconnected with 2 connected components. (which corresponds to the similarities given)

For

(c)

and

(d)

, use the following similarity measure,

w_{i,j} = \frac{\left\langle x_i, x_j \right\rangle + 1}{2},

where

\left\langle \cdot, \cdot \right\rangle

is an inner product

[4 pts] What are the values of WWW and LLL? (WWW: similarity matrix, LLL: Laplacian matrix)

We can compute each

w_{i,j}

using the given equation, to formulate similarity matrix

W

as below.

W = \begin{bmatrix} 1 & 0.75 & 0.25 & 0 \\ 0.75 & 1 & 0 & 0.25 \\ 0.25 & 0 & 1 & 0.75 \\ 0 & 0.25 & 0.75 & 1 \\ \end{bmatrix}

Since

D

is the diagonal matrix which its each diagonal term is the sum of each row of

W

, we can compute D as below.

D = \begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 2 \end{bmatrix}

Then we can finally compute the laplacian matrix

L

D-W

as following.

L = \begin{bmatrix} 1 & -0.75 & -0.25 & 0 \\ -0.75 & 1 & 0 & -0.25 \\ -0.25 & 0 & 1 & -0.75 \\ 0 & -0.25 & -0.75 & 1 \\ \end{bmatrix}

[6 pts] Get the first and second eigenvectors of LLL. (You don’t need to show the process.) Interpret the results in terms of graph connectivity.

The laplacian matrix

L

have eigenvector

[1, 1, 1, 1]^\top

with eigenvalue 0 and eigenvector

[-1, -1, 1, 1]^\top

with eigenvalue 0.5, eigenvector

[-1, 1, -1, 1]^\top

with eigenvalue 1.5, and eigenvector

[1, -1, -1, 1]^\top

with eigenvalue 2. We can choose two eigenvectors with smallest eigenvalues as

[1, 1, 1, 1]^\top, [-1, -1, 1, 1]^\top

. Since the eigenvector which has smallest eigenvalue is

[1,1,1,1]^\top

, we can say the graph is connected. (which corresponds to the similarities given)

1.3 SVM (20 points)

Suppose that training examples are points in 2-D space. The positive examples are

X_+ = \{(1,1), (-1,-1) \}

. The negative examples are

X_{\_} = \{(1,-1), (-1,1) \}

[4 pts] Are the positive examples linearly separable from the negative examples? (i.e.i.e.i.e. can you draw a line to separate the positive examples from negative examples)?

Let’s consider line

l: ax+by+c=0

on 2-D space and assume the line

l

seperates

X_+

and

X_\_

. If

a

b

equals to 0.

We can consider two cases,

 axi+byi+c>0,∀xi∈X+ax_i + by_i + c > 0, \forall x_i \in X_+axi​+byi​+c>0,∀xi​∈X+​, axi+byi+c<0,∀xi∈X_ax_i + by_i + c < 0, \forall x_i \in X_\_axi​+byi​+c<0,∀xi​∈X_​

axi+byi+c<0,∀xi∈X+ax_i + by_i + c < 0, \forall x_i \in X_+axi​+byi​+c<0,∀xi​∈X+​, axi+byi+c>0,∀xi∈X_ax_i + by_i + c > 0, \forall x_i \in X_\_axi​+byi​+c>0,∀xi​∈X_​

For the first case,

a + b + c >0, -a-b+c >0

and

a-b+c <0, -a + b + c <0

. This makes contradiction since

a >0

from

a + b+ c > 0, -a + b+ c < 0

and both

a <0

from

-a-b+c > 0,a-b+c <0

Similarly, for the second case,

a + b + c <0, -a-b+c <0

and

a-b+c >0, -a + b + c >0

. This makes contradiction since

a <0

from

a + b+ c < 0, -a + b+ c > 0

and both

a >0

from

-a-b+c < 0,a-b+c >0

Thus the assumtion is wrong and we can say there are no line separating positive examples and negative examples, which means the examples are not linearly separable to each other.

[8 pts] Consider the feature transformation ϕ(x)=[1,x,y,xy]T\phi(x) = [1, x, y, xy]^Tϕ(x)=[1,x,y,xy]T, where xxx and yyy are the first and second coordinates of an example. Write down the transformed coordinates ofX+X_{+}X+​ and X_X_{\_}X_​ (i.e.i.e.i.e. ϕ(X+)\phi(X_+)ϕ(X+​) and ϕ(X_)\phi(X_{\_})ϕ(X_​) for all four examples).

The transformed coordinates of

X_+

and

X_\_

are as below.

\phi(X_+) = \{[1,1,1,1]^T, [1, -1, -1, 1]^T \} \\ \phi(X_\_) = \{[1,1,-1,-1]^T, [1, -1, 1, -1]^T \} \\

[8 pts] Consider the prediction function y(x)=wTϕ(x)y(x) = w^T\phi(x)y(x)=wTϕ(x). Give the coefficient www of a maximum-margin decision surface separating the positive from the negative examples. (hint: www is [4×1][4\times 1][4×1] vector, whose elements are only 0 or 1).

We need to find

w

satisfying below equation.

w^* = \argmin_w \|w\|, \quad\|w^T\phi(x)\|\ge1

By using hint that

w

is consisted of only 0 and 1, we need to minimize the number of 1 to minimize

\|w\|

Let’s consider there exists no 1 in

w

. Then,

w^T\phi(x)=0

for all

x

. Thus we cannot satisfy the constraint.

Then, let’s consider

w = [0, 0, 0, 1]^T

which has one 1. Then

w^T[1,1,1,1]^T = 1, w^T[1,-1,-1,1]^T = 1

and

w^T [1,1,-1,-1]^T=-1, w^T[1, -1,1,-1]^T = -1

, and we can satisfiy the constraint.

Thus, we can say the coefficient

w

of a maximum-margin decision surface is

[0,0,0,1]^T