Theory Question
1.1 K-means Clustering (10 points)
a.
[5 pts] Given a dataset , initialize the -means clustering algorithm with 2 cluster centers and . What are the values of and after the first iteration of -means? Also report the values after the second iteration.
In first iteration, we need to first find membership of each data. We can set first cluster as which are closer to than , and second cluster as which are closer to than . Then, we can compute new cluster center using average of cluster points as below.
Thus, we can say the new cluster center after fisrt iteration is .
Similar way, we can set first cluster as which are closer to than , and second cluster as which are closer to than . Then, we can compute new cluster center using average of cluster points as below.
Thus, we can say the new cluster center after second iteration is .
b.
[5 pts] Given the same dataset as in , perform greedy initialization to get the initial centers. Start with . Below is the greedy initialization process.
a.
Choose
b.
Choose the next center to be
c.
Repeat step 2 until ceneters are chosen
where at any given time, with the current set of cluster centers ,
For the first choose of new center, the current set of cluster ceneters . Thus, is just and the values in dataset which makes maximum is (farthest point). Therefore, the new cluster center is .
For the second choose of new center, the current set of cluster ceneters . Thus, . Since , the maximum occurs at . Therefore, the new cluster center is
Finally the greedy initialization of the cluster centers is .
1.2 Spectral Clustering (20 points)
Let . Assume that they form two groups; and are in the same group and and are in the other group. Your job is to perform spectral clustering with two different similarity measures.
For and , use the following similarity measure.
a.
[4 pts] What are the values of and ? (: similarity matrix, : Laplacian matrix)
Since are in the same group and are in the same group, are 1 and are 1. Otherwise, the are 0. Thus, we can obtain the similarity matrix as below.
Since is the diagonal matrix which its each diagonal term is the sum of each row of , we can compute D as below.
Then we can finally compute the laplacian matrix as as following.
b.
[6 pts] Get the first and second eigenvectors of . (You don’t need to show the process.) Interpret the results in terms of graph connectivity.
The laplacian matrix have two eigenvectors with eigenvalue 0 and two eigenvectors with eigenvalue 2. We can choose two eigenvectors with smallest eigenvalues as .
Since the two smallest eigenvector values (in row) can be clearly divided as and , we can say the graph is disconnected with 2 connected components. (which corresponds to the similarities given)
For and , use the following similarity measure,
where is an inner product
c.
[4 pts] What are the values of and ? (: similarity matrix, : Laplacian matrix)
We can compute each using the given equation, to formulate similarity matrix as below.
Since is the diagonal matrix which its each diagonal term is the sum of each row of , we can compute D as below.
Then we can finally compute the laplacian matrix as as following.
d.
[6 pts] Get the first and second eigenvectors of . (You don’t need to show the process.) Interpret the results in terms of graph connectivity.
The laplacian matrix have eigenvector with eigenvalue 0 and eigenvector with eigenvalue 0.5, eigenvector with eigenvalue 1.5, and eigenvector with eigenvalue 2. We can choose two eigenvectors with smallest eigenvalues as .
Since the eigenvector which has smallest eigenvalue is , we can say the graph is connected. (which corresponds to the similarities given)
1.3 SVM (20 points)
Suppose that training examples are points in 2-D space. The positive examples are . The negative examples are .
a.
[4 pts] Are the positive examples linearly separable from the negative examples? ( can you draw a line to separate the positive examples from negative examples)?
Let’s consider line on 2-D space and assume the line seperates and . If or equals to 0.
We can consider two cases,
a.
,
b.
,
For the first case, and . This makes contradiction since from and both from .
Similarly, for the second case, and . This makes contradiction since from and both from .
Thus the assumtion is wrong and we can say there are no line separating positive examples and negative examples, which means the examples are not linearly separable to each other.
b.
[8 pts] Consider the feature transformation , where and are the first and second coordinates of an example. Write down the transformed coordinates of and ( and for all four examples).
The transformed coordinates of and are as below.
c.
[8 pts] Consider the prediction function . Give the coefficient of a maximum-margin decision surface separating the positive from the negative examples. (hint: is vector, whose elements are only 0 or 1).
We need to find satisfying below equation.
By using hint that is consisted of only 0 and 1, we need to minimize the number of 1 to minimize .
Let’s consider there exists no 1 in . Then, for all . Thus we cannot satisfy the constraint.
Then, let’s consider which has one 1. Then and , and we can satisfiy the constraint.
Thus, we can say the coefficient of a maximum-margin decision surface is .