Assignment 2 -- due Thursday, September 19 at the beginning of lab
CS 5521 Fall Semester, 2013
25 Points

Topics: Growth of Functions, Comparison of Sorts

The assignment:
consists of two parts: in the first part, you are asked to solve problems from the text; in the second part, you are asked to implement insertion sort and merge sort and run them to find the "cross-over" point where merge sort becomes faster than insertion sort.

Part 1: Growth of Functions (10 points)

Do the following Exercises from the text:

Part 2: Comparing run times for sorting (15 points)

The run time of merge sort is independent of the sorting of its input (or at least it should be). On the other hand the run time of insertion sort depends on the input: in the best case, when the array is already sorted, the run time is Θ(n); in the worst case -- the array is reverse-sorted -- the run time is Θ(n2); in the average case when each new number is inserted about half way down the sorted part of the array, the run time is also Θ(n2), but the constant multiplier is about 1/2 of what it is for the worst case. There are two subparts to this part: first determine the "cross-over" value of n where merge sort first beats insertion sort; in the second subpart, verify what I said above that insertion sort is twice as fast in the average case as it is in the worst case.

Subpart A: Comparison of average case run times for insertion and merge sort (12 points)

Implement insertion sort and merge sort as two routines. Have the user enter n, the number of integers to be sorted. Rather than generating n random numbers with a random number generator, have your program generate the "average case" situation described in the lab: put the numbers 1, 2, 3, ..., n/2 in that order into A at every other index, and put the numbers n, n-1, n-2, ..., n/2 + 1 in that order into the remaining indices. You may assume n is even or odd if one or the other is easier.

To time your insert and merge programs, you can use the clock() function in time.h (so you must #include <time.h> at the top of your program). The function clock() returns a value of type clock_t which contains the number of milliseconds since the program started - unfortunately rounded to the nearest 100th of a second (see below for a fix to this problem). Thus to get timings of sections of code, you could do something like:

clock_t tstart, tend, ttotal ;
tstart = clock() ;

/* code to be timed */

tend = clock() ;
ttotal = tend - tstart ;
printf( "time = %u\n", ttotal ) ;
Note that the type clock_t is an unsigned (probably long) integer. I'm sure this can also be done in C++ (and maybe       cout << ttotal << endl ;       would work, but I'm not sure of the details -- please let me know of any subtle details you encounter if you use C++).

But there is a problem with getting accurate timings. Most small programs, such as insertion sort and merge sort run so fast that the times reported are very small -- maybe less than 100th of a second even when sorting 100 numbers. To cure this problem and magnify the results, put your array intialization and call to your sort routine in a for loop that executes numIterations ( = 100) times, and then divide your total time by numIterations (you may even have to make numIterations = 1000, or 10000, ...).

Also, for merge sort, to avoid allocating and de-allocating memory on each call to the merge() subroutine, you can make the temporary arrays L and R global (or pass their names, i.e. pointers, as parameters to merge() and merge-sort()).

Finally, run your two programs, timing them, for different values of n, until you find the "cross-over" value of n for which merge sort first beats insertion sort. You might try to do a sort of "binary search" for that value -- i.e. try n = 100, and if insertion wins, try n = 1000. Assuming that merge wins at n = 1000, try n = 500, etc. I think that the cross-over value is somewhere between 10 and 1000. Note that different people might get different answers, due to different encodings of the algorithms.

Pseudocode:

Generate average-case data in the array OrigData[]

(1) Time the following loop to see how much time copying takes
For i = 1 to NumIterations
   copy OrigData[] to A[]       // requires an inner loop 

(2) Time insertion sort:
For i = 1 to NumIterations
   copy OrigData[] to A[]
   InsertionSort( A, 1, n )

(3) Time merge sort:
For i = 1 to NumIterations
   copy OrigData[] to A[]
   MergeSort( A, 1, n )
To get the actual times for insertion sort and merge sort, subtract the copying time (1) from the times of segments (2) and (3).

Subpart B: The worst/average case factor of two (3 points)

Add the following code to your program:
Generate worst-case data in the array OrigData[]

(4) Time insertion sort:
For i = 1 to NumIterations
   copy OrigData[] to A[]
   InsertionSort( A, 1, n )
To generate worst-case data, fill the array OrigData[] with the numbers n, n-1, n-2, ... 2, 1. Then time a run of your program for the largest value of n that you used in Subpart 1 -- the run time for (4) should be roughly twice as long as the run time for (2) (after subtracting the copying time from both). Report the two run times and the worst case / average case ratio.

What To Hand In For Part 2: The results of your timing runs and the code for each of your sort routines. For subpart A, show timing runs for n = "cross-over" value, and for n-10 and n+10. For subpart B, report the two run times and the worst case / average case ratio.


Page URL: http://www.d.umn.edu /~ddunham/cs5521f13/assignments/a2/assignment.html
Page Author: Doug Dunham
Last Modified: Thursday, 03-Oct-2013 21:24:20 CDT
Comments to: ddunham@d.umn.edu