Duplicate

CV_HW5

마감일
2022/12/12
제출자
전기정보공학부 2017-12433 김현우

Theory Question

1.1 K-means Clustering (10 points)

a.
[5 pts] Given a dataset χ={0,2,4,6,18,20}\chi = \{ 0, 2, 4, 6, 18, 20 \}, initialize the kk-means clustering algorithm with 2 cluster centers c1=3c_1=3 and c2=4c_2=4. What are the values of c1c_1 and c2c_2 after the first iteration of kk-means? Also report the values after the second iteration. In first iteration, we need to first find membership of each data. We can set first cluster as {0,2}\{0,2\} which are closer to c1=3c_1 =3 than c2=4c_2=4, and second cluster as {4,6,18,20}\{4,6,18,20\} which are closer to c2=4c_2=4 than c1=3c_1=3. Then, we can compute new cluster center using average of cluster points as below.
c1=(0+2)/2=1c2=(4+6+18+20)/4=12c_1 = (0+2)/2 = 1 \\ c_2 = (4+6+18+20)/4 = 12
Thus, we can say the new cluster center after fisrt iteration is c1=1,c2=12c_1=1, c_2=12. Similar way, we can set first cluster as {0,2,4,6}\{0,2,4,6\} which are closer to c1=1c_1 =1 than c2=12c_2=12, and second cluster as {18,20}\{18, 20\} which are closer to c2=12c_2=12 than c1=1c_1=1. Then, we can compute new cluster center using average of cluster points as below.
c1=(0+2+4+6)/4=3c2=(18+20)/2=19c_1 = (0+2+4+6)/4 = 3 \\ c_2 = (18+20)/2 = 19
Thus, we can say the new cluster center after second iteration is c1=3,c2=19c_1=3, c_2=19.
b.
[5 pts] Given the same dataset as in (b)(b), perform greedy initialization to get the initial k=3k=3 centers. Start with c1=4c_1 =4. Below is the greedy initialization process.
a.
Choose c1c_1
b.
Choose the next center cic_i to be arg maxxχ{D(x)}\argmax_{x \in \chi} \{ D(x)\}
c.
Repeat step 2 until kk ceneters are chosen where at any given time, with the current set of cluster centers C\mathcal C,
D(x)=mincCxc2D(x) = \min_{c \in {\mathcal C}}\| x-c\|_2
For the first choose of new center, the current set of cluster ceneters C={4}\mathcal C = \{4\}. Thus, D(x)D(x) is just D(x)=x42D(x)=\|x-4\|_2 and the values in dataset which makes D(x)D(x) maximum is x=20x=20 (farthest point). Therefore, the new cluster center is 2020.
For the second choose of new center, the current set of cluster ceneters C={4,20}\mathcal C = \{4, 20\}. Thus, D(x)=min(x42,x202)D(x)=\min(\|x-4\|_2, \|x-20\|_2). Since D(0)=4,D(2)=2,D(4)=0,D(6)=2,D(18)=2,D(20)=0D(0) = 4, D(2)=2, D(4) = 0, D(6)=2, D(18)=2, D(20)= 0, the maximum DD occurs at x=0x=0. Therefore, the new cluster center is 0.0. Finally the greedy initialization of the cluster centers is {0,4,20}\{ 0, 4, 20\}.

1.2 Spectral Clustering (20 points)

Let x1=(32,12),x2=(32,12),x3=(32,12),x4=(32,12)x_1=(\frac{\sqrt 3}{2}, \frac{1}{2}), x_2=(\frac{\sqrt 3}{2}, -\frac{1}{2}), x_3=(-\frac{\sqrt 3}{2}, \frac{1}{2}), x_4=(-\frac{\sqrt 3}{2}, -\frac{1}{2}). Assume that they form two groups; x1x_1 and x2x_2 are in the same group and x3x_3 and x4x_4 are in the other group. Your job is to perform spectral clustering with two different similarity measures.
For (a)(a) and (b)(b), use the following similarity measure.
wi,j={1if xi and xj are in the same group0otherwisew_{i,j} = \begin{cases} 1 \quad {\rm if\ } x_i\ {\rm and}\ x_j\ {\rm are\ in\ the\ same\ group} \\ 0 \quad {\rm otherwise} \end{cases}
a.
[4 pts] What are the values of WW and LL? (WW: similarity matrix, LL: Laplacian matrix)
Since x1,x2x_1, x_2 are in the same group and x3,x4x_3, x_4 are in the same group, w1,1,w1,2,w2,1,w2,2w_{1,1}, w_{1,2}, w_{2,1}, w_{2,2} are 1 and w3,3,w3,4,w4,3,w4,4w_{3,3}, w_{3,4}, w_{4,3}, w_{4,4} are 1. Otherwise, the wi,jw_{i,j} are 0. Thus, we can obtain the similarity matrix WW as below.
W=[1100110000110011]W = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ \end{bmatrix}
Since DD is the diagonal matrix which its each diagonal term is the sum of each row of WW, we can compute D as below.
D=[2000020000200002]D = \begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 2 \end{bmatrix}
Then we can finally compute the laplacian matrix LL as DWD-W as following.
L=[1100110000110011]L = \begin{bmatrix} 1 & -1 & 0 & 0 \\ -1 & 1 & 0 & 0 \\ 0 & 0 & 1 & -1 \\ 0 &0 & -1 & 1 \end{bmatrix}
b.
[6 pts] Get the first and second eigenvectors of LL. (You don’t need to show the process.) Interpret the results in terms of graph connectivity. The laplacian matrix LL have two eigenvectors [1,1,0,0],[0,0,1,1][1, 1, 0, 0]^\top, [0, 0, 1, 1]^\topwith eigenvalue 0 and two eigenvectors [1,1,0,0],[0,0,1,1][-1, 1, 0, 0]^\top, [0, 0, -1, 1]^\top with eigenvalue 2. We can choose two eigenvectors with smallest eigenvalues as [1,1,0,0],[0,0,1,1][1, 1, 0, 0]^\top, [0, 0, 1, 1]^\top. Since the two smallest eigenvector values (in row) can be clearly divided as [1,0][1,0] and [0,1][0,1], we can say the graph is disconnected with 2 connected components. (which corresponds to the similarities given)
For (c)(c) and (d)(d), use the following similarity measure,
wi,j=xi,xj+12,w_{i,j} = \frac{\left\langle x_i, x_j \right\rangle + 1}{2},
where ,\left\langle \cdot, \cdot \right\rangle is an inner product
c.
[4 pts] What are the values of WW and LL? (WW: similarity matrix, LL: Laplacian matrix)
We can compute each wi,jw_{i,j} using the given equation, to formulate similarity matrix WW as below.
W=[10.750.2500.75100.250.25010.7500.250.751]W = \begin{bmatrix} 1 & 0.75 & 0.25 & 0 \\ 0.75 & 1 & 0 & 0.25 \\ 0.25 & 0 & 1 & 0.75 \\ 0 & 0.25 & 0.75 & 1 \\ \end{bmatrix}
Since DD is the diagonal matrix which its each diagonal term is the sum of each row of WW, we can compute D as below.
D=[2000020000200002]D = \begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 2 \end{bmatrix}
Then we can finally compute the laplacian matrix LL as DWD-W as following.
L=[10.750.2500.75100.250.25010.7500.250.751]L = \begin{bmatrix} 1 & -0.75 & -0.25 & 0 \\ -0.75 & 1 & 0 & -0.25 \\ -0.25 & 0 & 1 & -0.75 \\ 0 & -0.25 & -0.75 & 1 \\ \end{bmatrix}
d.
[6 pts] Get the first and second eigenvectors of LL. (You don’t need to show the process.) Interpret the results in terms of graph connectivity.
The laplacian matrix LL have eigenvector [1,1,1,1][1, 1, 1, 1]^\topwith eigenvalue 0 and eigenvector [1,1,1,1][-1, -1, 1, 1]^\top with eigenvalue 0.5, eigenvector [1,1,1,1][-1, 1, -1, 1]^\top with eigenvalue 1.5, and eigenvector [1,1,1,1][1, -1, -1, 1]^\top with eigenvalue 2. We can choose two eigenvectors with smallest eigenvalues as [1,1,1,1],[1,1,1,1][1, 1, 1, 1]^\top, [-1, -1, 1, 1]^\top. Since the eigenvector which has smallest eigenvalue is [1,1,1,1][1,1,1,1]^\top, we can say the graph is connected. (which corresponds to the similarities given)

1.3 SVM (20 points)

Suppose that training examples are points in 2-D space. The positive examples are X+={(1,1),(1,1)}X_+ = \{(1,1), (-1,-1) \}. The negative examples are X_={(1,1),(1,1)}X_{\_} = \{(1,-1), (-1,1) \}.
a.
[4 pts] Are the positive examples linearly separable from the negative examples? (i.e.i.e. can you draw a line to separate the positive examples from negative examples)?
Let’s consider line l:ax+by+c=0l: ax+by+c=0 on 2-D space and assume the line ll seperates X+X_+ and X_X_\_. If aa or bb equals to 0.
We can consider two cases,
a.
axi+byi+c>0,xiX+ax_i + by_i + c > 0, \forall x_i \in X_+, axi+byi+c<0,xiX_ax_i + by_i + c < 0, \forall x_i \in X_\_
b.
axi+byi+c<0,xiX+ax_i + by_i + c < 0, \forall x_i \in X_+, axi+byi+c>0,xiX_ax_i + by_i + c > 0, \forall x_i \in X_\_
For the first case, a+b+c>0,ab+c>0a + b + c >0, -a-b+c >0 and ab+c<0,a+b+c<0a-b+c <0, -a + b + c <0. This makes contradiction since a>0a >0 from a+b+c>0,a+b+c<0a + b+ c > 0, -a + b+ c < 0 and both a<0a <0 from ab+c>0,ab+c<0-a-b+c > 0,a-b+c <0.
Similarly, for the second case, a+b+c<0,ab+c<0a + b + c <0, -a-b+c <0 and ab+c>0,a+b+c>0a-b+c >0, -a + b + c >0. This makes contradiction since a<0a <0 from a+b+c<0,a+b+c>0a + b+ c < 0, -a + b+ c > 0 and both a>0a >0 from ab+c<0,ab+c>0-a-b+c < 0,a-b+c >0.
Thus the assumtion is wrong and we can say there are no line separating positive examples and negative examples, which means the examples are not linearly separable to each other.
b.
[8 pts] Consider the feature transformation ϕ(x)=[1,x,y,xy]T\phi(x) = [1, x, y, xy]^T, where xx and yy are the first and second coordinates of an example. Write down the transformed coordinates ofX+X_{+} and X_X_{\_} (i.e.i.e. ϕ(X+)\phi(X_+) and ϕ(X_)\phi(X_{\_}) for all four examples).
The transformed coordinates of X+X_+ and X_X_\_ are as below.
ϕ(X+)={[1,1,1,1]T,[1,1,1,1]T}ϕ(X_)={[1,1,1,1]T,[1,1,1,1]T}\phi(X_+) = \{[1,1,1,1]^T, [1, -1, -1, 1]^T \} \\ \phi(X_\_) = \{[1,1,-1,-1]^T, [1, -1, 1, -1]^T \} \\
c.
[8 pts] Consider the prediction function y(x)=wTϕ(x)y(x) = w^T\phi(x). Give the coefficient ww of a maximum-margin decision surface separating the positive from the negative examples. (hint: ww is [4×1][4\times 1] vector, whose elements are only 0 or 1).
We need to find ww satisfying below equation.
w=arg minww,wTϕ(x)1w^* = \argmin_w \|w\|, \quad\|w^T\phi(x)\|\ge1
By using hint that ww is consisted of only 0 and 1, we need to minimize the number of 1 to minimize w\|w\|.
Let’s consider there exists no 1 in ww. Then, wTϕ(x)=0w^T\phi(x)=0 for all xx. Thus we cannot satisfy the constraint.
Then, let’s consider w=[0,0,0,1]Tw = [0, 0, 0, 1]^T which has one 1. Then wT[1,1,1,1]T=1,wT[1,1,1,1]T=1w^T[1,1,1,1]^T = 1, w^T[1,-1,-1,1]^T = 1 and wT[1,1,1,1]T=1,wT[1,1,1,1]T=1w^T [1,1,-1,-1]^T=-1, w^T[1, -1,1,-1]^T = -1, and we can satisfiy the constraint.
Thus, we can say the coefficient ww of a maximum-margin decision surface is [0,0,0,1]T[0,0,0,1]^T.