CS 8751 (Fall 2006) Program 4

Computer Science 8751
Machine Learning

Programming Assignment 4
K-Means Clustering (30 points)
Due Monday, November 27, 2006

Introduction

The most popular clustering algorithm is K-means clustering. In K-means an initial set of K cluster centers is chosen. Then a process is repeated: (1) for each example assign it to the center it is closest to and (2) move the cluster centers closer to the center of all the examples that the center represents. You should implement a K-means clustering algorithm as described below.

Details

Implement an MLP using your dataset class from assignment 1. Your code should have inputs:

train_name - the prefix for the training data
test_name - the prefix for the testing data
k - how many clusters to use
learning_rate - the amount to change each feature towards the average of the values it corresponds to (this value would be greater than 0 and less than or equal to 1, if its 1 you simple move each cluster center to the average feature values of its examples)
num_steps - how many times to repeat the assign then move process

You should initialize all of the cluster centers by randomly assigning them to separate examples. Note that your clustering ignores the output feature for creating the clusters. Once you have finished you should evaluate your well your clustering works by assigning a "class" to each cluster corresponding to the output value that occurs most for that cluster of points. You should then use this to estimate the "accuracy" of your clusters on a set of test examples.

Test your code on the same datasets from Program 3.

What to Hand In

You should hand in a documented copy of your code (including your dataset class files). Also create an archive of the code and email it to rmaclin@gmail.com. Make sure to provide a good general description of your code.

In addition hand in output and a writeup for all of your testing. Try to present the testing in a way to show how effective your system is.

Computer Science 8751 Machine Learning Programming Assignment 4 K-Means Clustering (30 points) Due Monday, November 27, 2006

Introduction

Details

What to Hand In

Computer Science 8751
Machine Learning

Programming Assignment 4
K-Means Clustering (30 points)
Due Monday, November 27, 2006