Computer Science 8751
Machine Learning

Programming Assignment 4
K-Means Clustering (30 points)
Due Monday, November 27, 2006

Introduction

The most popular clustering algorithm is K-means clustering. In K-means an initial set of K cluster centers is chosen. Then a process is repeated: (1) for each example assign it to the center it is closest to and (2) move the cluster centers closer to the center of all the examples that the center represents. You should implement a K-means clustering algorithm as described below.

Details

Implement an MLP using your dataset class from assignment 1. Your code should have inputs:

You should initialize all of the cluster centers by randomly assigning them to separate examples. Note that your clustering ignores the output feature for creating the clusters. Once you have finished you should evaluate your well your clustering works by assigning a "class" to each cluster corresponding to the output value that occurs most for that cluster of points. You should then use this to estimate the "accuracy" of your clusters on a set of test examples.

Test your code on the same datasets from Program 3.

What to Hand In

You should hand in a documented copy of your code (including your dataset class files). Also create an archive of the code and email it to rmaclin@gmail.com. Make sure to provide a good general description of your code.

In addition hand in output and a writeup for all of your testing. Try to present the testing in a way to show how effective your system is.