Chloropleth Mapping
In this exercise, you will prepare a chloropleth map illustrating 2004 gase consumption per 100,000 people for the United States. The data source is the same as the last lab. First you will need to calculate gas consumption/100,000, and then calculate the class limits for 4 classes according to the following class limit schemes:
1) Natural breaks
Determine class intervals using a natural breaks method only after examining the data thoroughly. Rank the datain descending order, and scan them for the largest arithmetic breaks between adjacent values. These breaks will serve as class delimiters. The values you should use for your class limits will be the value halfway between two adjacent values that are in different classes.
These limits are:
648.6 - 778.7
778.8 - 918.3
918.4 - 1521.3
1521.4 - 1573.1
2) Quantiles
By dividing the data into quantiles (specifically quartiles for this exercise), you force the same number of observations to fall within each class (excluding remainders). To divide the data into quartiles:
a) Rank the data, from high to low
b) Calculate the upper limits using the following equation:
UCLn = nk
where UCLn is the upper limit of the nth class
n = the number of the class for which the upper limit is desired
k = the number of observations/number of classes;
c) Using the value calculated for UCLn, count up from the low end of the ranking and mark off the first class boundary immediately after the UCLnth observation. Continue counting from the class boundaries to mark subsequent class boundaries;
These limits are:
648.6 - 1074.4
1074.5 - 1212.1
1212.2 - 1288.6
1288.7 - 1573.1
3) Equal Steps based on the data range
Using this method, you will force the same ranges of data values to fall within each class, yet you cannot guarantee that each class will hold the same number of observations. To determine the class limits:
a) Calculate the range of data, R
R = H - L
where H equals the highest observation and L equalls the lowest observation;
b) Find the common difference (spread) for each class, CD, where:
CD = R/number of classes;
c) Determine the upper class limits, UCLn where:
UCLn = L + n(CD), n = class number;
These limits are:
648.6 - 879.7
879.8 - 1110.8
1110.9 - 1341.9
1342.0 - 1573.1
e) Repeat this procedure but this time omit Wyoming and South Carolina. After the class limits have been calculated, add WY and SC to the highest category;
These limits are:
648.6 - 829.7
828.8 - 1010.8
1010.9 - 1191.9
1192.0 - 1573.1
4) Measures of central tendency
You will now use the mean and standard deviation of the data to determine class limits.
4.1 Standard Deviation
a) Calculate the mean and standard deviation for the gas consumption data. Form class limits as follows:
Class 1: < -1SD
Class 2: -1SD to mean
Class 3: mean to +1SD
Class 4: > +1SD
b) If these data are normally distributed, then one would expect classes 2 and 3 to each contain 34% of the observations and classes 1 and 4 to each contain 16% of the observations. For the US, then, we may test teh normalcy of data by predicting how many observations should fall within each class. Classes 2 and 3 should each contain 17 (rounding down) states, and classes 1 and 4 should each contain 8 (rounding down) states. How does the distribution of the gas consumption data compare?
Class 1: 5 states (9 when omitting WY and SC from calculations)
Class 2: 23 states (18 when omitting WY and SC from calculations)
Class 3: 18 states (17 when omitting WY and SC from calculations)
Class 4: 5 states (5 when omitting WY and SC from calculations)
c) The limits for this method are:
648.6 - 1003.2
1003.3 - 1178.8
1178.9 - 1354.4
1354.5 - 1573.1
d) Calculate the mean and standard deviation excluding the values for WY and SC. Form four classes using your calculated parameters as before;
The limits for this method are:
648.6 - 1000.7
1000.8 - 1164.8
1164.9 - 1328.9
1329.0 - 1573.1
4.2) Nest Means
We will now use the method of nested means to classify the data.
a) To find the class divisions, first take the mean of the entire data set. Using teh mean as the first subdivision of the data, take the means of the upper and lower halves of the data. Use the values for the means as class limits;
b) The resulting class limits are:
648.6 - 1030.8
1030.9 - 1178.8
1178.9 - 1300.4
1300.5 - 1573.1
c) Now perform these calculations excluding the values for WY and SC;
648.6 - 1024.1
1024.2 - 1164.8
1164.9 - 1279.4
1279.5 - 1573.1
5) Jenk's Optimization Method
The goal of using this method is to form classes that are internally homogenous while assuring heterogeneity among classes. To use this methodology on this dataset would be time consuming - it entails numerous calculations and iterations to maximize within class homogeneity and between class heterogeneity. Instead, we will use Jenk's method to analyze the appropriateness of the 8 classifications you have just performed. Here is the procedure:
a) Compute the mena of the entire dataset, and calculate the sum of the squared deviations of each observation: SUM{(mean-X)^2
This amount is called the squared deviations, array mean (SDAM)
b) Use the class boundaries calculated in parts 1-4 for this exercise. Compute the class means;
c) Calculate the deviations of each X in each class from its class mean (mean-x), and square these;
d) Sum the squared differences in the previous step for each classification. This is the squared deviations, class means (SDCM);
e) Compute goodness of variance fit (GVF)
Where GVF = (SDAM-SDCM)/SDAM)
GVF_____Method
0.587____Natural Breaks
0.785____Quantiles
0.868____Equal Steps
0.901____Equal Steps (omitting WY and SC)
0.864____Standard Deviation
0.908____Standard Deviation (omitting WY and SC)
0.847____Nested Means
0.896____Nested Means (omitting WY and SC)
Given this set of data the standard deviaton (omitting WY and SC) has the best goodness of fit, meaning that this method is a fairly good in portraying the data on a 4-class chloropleth map.
.